CFM Checklist For WebPAC Redirection

March 30th, 1998
Jeff Graubart

There are several issues of concern to the CFM developer when fine tuning CFM redirection.

1.     What elements to redirect on
 

    The redirection list of elements is usually much smaller than the display list.
    Unfortunately, this can be server specific, even though there are standards in the
    library industry. A library analyst should be consulted to insure that the default
    field in the fld_usmarc... cfm file follows accepted practice. For instance, suppose
    the standard indexing of an author name uses the subfields [a-name] and
    [d-date of birth]. The fld_usmarc_eng_t.cfm should contain the following
    entry (display portion omitted):
    [T_AUTHOR,
        [+
            [$
                [$
                    Display portion .......
                ]
                [$ [* 1, %NL,]
                        "{R}R_AUTHOR "
                        [^ S100a, ] [^ ì ì S100d,]
                        "{/R}"
                ]
            ]
        ]
    ]
 
    However, suppose that some vendor XYZ with a significant market share indexes
    only on the ‘a’ subfield. Then there must be a field in fld_usmarc_eng_t.cfm
    which handles redirection for vendor XYZ.

    [T_AUTHOR_XYZ,
            ....
            [$ [* 1, %NL,]
                    "{R}R_AUTHOR "
                    [^ S100a, ]
                    "{/R}"
                ]
            ]
        ]
    ]
 

    If the vendor does not have a significant market share then the field should be
    placed in the fld_usmarc_eng_t_custom.cfm.
2.     Punctuation
    There are three classes of punctuation to consider when designing redirectable
    fields; required, optional, illegal. Unfortunately, what is sent in the MARC record
    often bears little resemblance to how the field is indexed. When designing the
    standard redirectable field, library standards should be adhered to.
    In general, all illegal and optional punctuation should be removed from the
    instance. This is done with a regular expression in the "RegularExpressions"
    section of the fld... cfm file. and a ‘rex semantic’ which applies the regular
    expression to the instance of data. For instance, many MARC records contain
    an author name in the form "Twain, Mark.". However, these are usually
    indexed in the form "Twain Mark". While some servers apply a process called
    "normalization" which eliminates commas and periods from the search string,
    others will find no hits when searching on "Twain, Mark.". To solve this we
    define a regular expression in the fld...cfm file.
    {RegularExpressions
        [REMOVE_COMMA_AND_PERIOD, ì[,.]î ìî 0,]
    }
 
    which says to replace all occurrences of comma and period with nothing. (For
    more details see the documentation on regular expressions). To invoke this regular
    expression on each instance of a given semantic, the semantic name is enclosed in
    a ‘rex semantic’ grouping.
    ...
    [$ [* 1, %NL,]
            "{R}R_AUTHOR "
            [^ [@ REMOVE_COMMA_AND_PERIOD, S100a, ]]
            [^ ì ì [@ REMOVE_DASHES, S100d,]]
            "{/R}"
    ]
    ...
    Unless we are certain that all major vendors return required punctuation in the
    USMARC record, it is best to remove it from the instance and reinsert it when
    formatting the field. For instance, most vendors do not send the dashes in the x,y
    and z subfields of subject, however, these are required for proper searching of
    the index. Just in case they are sent, they should be removed to prevent them
    from being included twice. (Note: A library analyst should be consulted
    for the definitive word on whether dashes are ever sent in the MARC record.)
    ...
    [$ [* 1, %NL,]
            "{R}R_SUBJECT "
            [^ S600a, ]
            [^ ì -- ì [@ REMOVE_DASHES, S600x,]]
            [^ ì -- ì [@ REMOVE_DASHES, S600y,]]
            [^ ì -- ì [@ REMOVE_DASHES, S600z,]]
            "{/R}"
    ]
    ...
    Once again, if a vendor requires exceptional punctuation in the search string, a
    special field should be added to the fld...cfm for major vendors or added to the
    custom CFM file for minor vendors.
3.     Redirection Qualifier
    The generic qualifiers R_AUTHOR, R_SUBJECT, etc. are mapped to specific
    qualifiers in the connection CFM file. The qualifier refers to a SearchAttribute
    which contains a list of attributes to be sent to the server along with the
    redirected term. The ‘use’ attribute usually refers to the index that will be
    searched. Redirection terms are often sent to the server as phrases "[4, 1,]",
    and not truncated "[5, 100,]" , however, this can be adjusted in the connection
    CFM file, if need be. There is no reason why the redirection searchAttributes need
    to be from the same set of attributes displayed to the user. By setting the category
    code in the searchAttribute to 0, the search attribute will not be displayed but can
    still be used for redirection.

4.     Character Set Translation
 

    This area has major architectural flaws in version 8.0 of the BibEngine. These
    problems are resolved in version 8.1. Currently in version 8.0, all instances are
    translated to the display character set which is usually HTML Latin1. The server
    however, usually indexes on the ALA character set which is a form of ISO 2022.
    This is no problem for the 7 bit ASCII characters (the first 128 characters),
    however, no hits will be found for characters in the upper 128 because different
    codes and sequences are used. Currently a translator is being worked on which
    can translate the display character sets back to the ALA character set. This
    translator will be called on redirection strings sent from 8.0 WebPAC. (Note: The
    translator will be used in 8.1 WebPAC for keyboarded search strings, so the
    effort is not wasted.)

    In version 8.1 CFM files, translation is specified explicitly in the CFM file. Since
    no translation is usually required for redirection strings (They are already in the
    ALA character set), nothing will change in the redirection sections of the CFM
    file.

    In some rare cases, the server sends out the ALA character set, yet indexes on 7
    bit ASCII. This technique is used for databases which are "internationalized"
    using pre unicode methods. To translate the redirection string to 7-bit ASCII, the
    following can be used (8.1 only).

    ...
    [$ [* 1, %NL,]
                "{R}R_AUTHOR ì %TRANSDEL, ì0î %TRANSDEL,
        [^ [@ REMOVE_COMMA_AND_PERIOD, S100a, ]]
        [^ ì ì [@ REMOVE_DASHES, S100d,]]
        %TRANSDEL, ì{/R}"
    ]
    ...
    The ‘0’ between the first two translation delimiters denotes a special datatype
    which indicates that the TO_SERVER_TRANSLATOR is to be used.
    The TO_SERVER_TRANSLATOR is defined in the ‘Maps’ section of the
    connection CFM.
    {Maps
        [TO_SERVER_TRANSLATOR,
            PATH, ASCII7,
        ]
    }
    Normally, the TO_SERVER_TRANSLATOR is defined as:
    {Maps
        [TO_SERVER_TRANSLATOR,
            PATH, ALACHARSET,
        ]
    }
    with no %TRANSDEL’s around the redirected text. The translator is used only
    for keyboarded input. Since the internationalized databases will have their own
    fld...cfm files, this will not be an issue.

    In general, no action can be taken in 8.0 CFM files or should be taken in 8.1
    CFM files for character set issues around redirected text. For more information
    on character set translation in 8.1 CFM files, see the documentation on character
    set translation.

5.     Debugging Hints
 
    When debugging failed redirections, there are two things to consider. First,
    a redirection is nothing more than a search with a search term. If you enter the
    phrase you think you are redirecting on in an ordinary search (not a scan), you
    should get back the same number of hits. Thus you can use ordinary searches
    to experiment with punctuation, fields to redirect on, attributes to use etc.

    Secondly, the actual term is sent to the server in a Z39.50 Search Request PDU.
    You should check the z3950.log to look for anything suspicious in the log.
     

Systems Task Force Documents

 

Last Updated: March 31, 1998