Recommendations of the
UIAS Interim Task Force for Database Standards and Management

In response to the charge from the CSU UIAS Management Team, the Interim Task Force for Database Standards and Management (Celia Bakke, San Jose State University; Gina Roth, Cal Poly Pomona, Tamara Frost Trujillo (chair), CSU Sacramento) (henceforth referred to in report as "the Task Force") is submitting these recommendations.

Charge 1: Review the strategies for building a union catalog for the UIAS and recommend to the UIAS Management Team the CSU strategy for building the UIAS union catalog.

At the UIAS Kick-off meeting held at WestEd on August 5, 1997, Ameritech presented three models for structuring the UIAS union catalog:

  • Model 1. Deduplicated, merged master record model;
  • Model 2. Master Record Linked Bibs Model;
  • Model 3. Multiple Database Union View Index Model.
In addition, a member of the UIAS Management Team suggested a fourth model, an "enriched master record model," on January 13, 1998. The Task Force considered these four models for the UIAS structure and particularly the benefits and disadvantages of each related to maintaining the union catalog and authority control.

Since both model 1 and 2 are based on a master record model, the Task Force compared models 1 and 2 to determine whether the retention of the entire bibliographic record in model 2 would make the long term maintenance of the union catalog easier and might result in a higher quality database. Although model 2 requires the retention and linking of full bibliographic records to the master record representing the same bibliographic entity, it appears to the Task Force that model 2 does not make maintenance of the union catalog either easier, of better quality, or even substantially different than with model 1 (see appendix).

With model 3 maintenance of the union catalog would result in each separate database being maintained independently, i.e. maintenance of one library's record would not need to set any algorithms in motion as with model 1 or 2 so in this regard it would be an easier machine manipulation of files.

In respect to authority control and these three models, in model 1 authority control would be exercised over that set of access points in each master record specified by CSU to be under authority control. In model 2 there are two options in regards to authority control. Model 2, option A is the same as model 1 i.e. only the access points in the master record specified by CSU to be under authority control would come under authority control. This means that the full bibliographic records retained on the UIAS server and linked to the master record would not be under authority control. Both options of model 2 allow for the possibility of a user viewing the individual linked records off the master record. By retaining all bibliographic records from each CSU campus associated with each master record, this model retains very sizeable quantities of bibliographic records online for possible future use not yet defined. This model also provides another set of backups of each local system as part of the UIAS backup.

With respect to authority control and model 2 there could also be an option B which would require the Ameritech software to authorize all those headings in the master record (same as in model 1 and in model 2 option A) AND authorize any unique headings (not already in the master record) in the linked records. This variation of the master record concept would provide access to those access points only found in the linked records. This might be helpful at times but would also give us additional errors or maintenance problems.

In model 3 authority control would be exercised over each library’s database independently. The success of the union catalog view would be dependent on how consistently authority control has been able to replicate the headings across the independent files. For example, if ten CSU libraries hold the same title but the form of heading varies across the ten libraries, the record would be retrieved depending on how successfully authority control can make the forms of heading the same using the authority record for that heading and its cross reference structure. A patron’s search may or may not call up all ten records or reflect the holdings of the ten libraries in this case. This model is being used by Indiana University (IU) Libraries, but in a direct input mode where all the IU libraries are doing direct input and maintenance to that model.

An additional concern about model 3 was stated by Ameritech at the August 5th meeting where the company’s representative indicated that library users "using a Z39.50 client who were accessing the UIAS union catalog from outside the CSU might not have broadcast searching functionality and so would miss the union view provided to a generic web browser."

The fourth or "enriched master record" model would consist of a master record enhanced with additional fields lacking in the master record but present in one or more of the other non-master records. For example, if the master record has an 856 field but lacks a 505 (table of contents) field, the 505 field would be added to the master record thus enriching the master record and providing a more complete master record. The Task Force discussed this fourth model at length. It is a model that the Task Force does not recommend for several reasons. The matching algorithm herein recommended (see Recommendation 5) should already result in most of the desired fields being in the designated master record. In addition, and most significantly, this fourth model would require a substantial amount of additional machine manipulation and the ability to evaluate, link, replace, delete fields from multiple records resulting in minimal enrichment not warranted by this degree of added complexity. Instead the Task Force recommends that the UIAS Management Team pursue enhancements to the whole database, e.g. purchase of the Table of Contents data for the UIAS union catalog.

Recommendation 1: Because the goal is to build the best union catalog with the least human intervention, the Task Force recommends Model 1 as the UIAS structure. Model 2 B and Model 3 for the reasons described above will add to the inconsistencies in the database. Model 2 (both options) also retains a very large quantity of additional records on the server without a particular goal in mind. Each library should be retaining a current full back up of their local system. Model 4 will not achieve sufficient results to warrant the substantial additional complexity.

Recommendation 2: The UIAS Task Force for Database Standards and Management should be asked to help in the identification of bibliographic and authority records for inclusion in the UIAS test file as well as in the review the UIAS test file results.

Recommendation 3: Upon completion of the union catalog test, if the UIAS union catalog is built in Horizon then the first four campuses that should be loaded in the Horizon system are:

  1. San Diego State
  2. San Jose State
  3. San Francisco State
  4. CSU Northridge.
Since these are the CSU libraries with the largest holdings, this would provide an immediate large UIAS database. However, if we use RetroLink, this process will result in the merging and deduplication of the bibliographic records of all 22 CSU libraries in one batch process. The resulting product from this RetroLink process would be a single set of master records that would then be loaded and indexed in the Horizon system.

Recommendation 4: The following matching algorithm should be used to identify and match duplicate records during the creation of the UIAS as well as on an ongoing basis:

Match point 1: An exact match of the OCLC number (fields 001 and 019)

If there is an exact match on the OCLC number, continue matching algorithm as follows in the order specified:

  • match point 2: An exact match of the ISBN (field 020 subfield a)
  • match point 3: An exact match of the ISSN (field 022 subfield a)
  • match point 4: An exact match of the Government Documents Number (field 086 subfield a)
  • match point 5: Library of Congress control number (LCCN) (field 010 subfield a)
  • match point 6: the first five words of field 245 subfield a
To consider bibliographic records matches, they must match on match point 1 and one of the following: match point 2, match point 3, match point 4, match point 5, or match point 6.

If bibliographic records do not match on match point 1, still consider records a match if records match at least two of the match points 2 through 6 above, e.g. a match on ISBN (match point 2) and LCCN (match point 5).

We recommend that the matching algorithm be closely monitored during the pilot database creation phase. It may be necessary to identify additional match elements primarily to prevent false matches rather than non-matching.

If there is not an exact match based on the above parameters, do not consider the records to be matched and load the records separately into the union catalog.

Recommendation 5: Use the following master record algorithm to choose the master record from among those records identified as matched as a result of the above matching algorithm.

The choices (in order of preference) for the master record are:

  1. First choice: Cataloging source is DLC and input by DLC and level blank
  2. Second choice: Cataloging source is GPO and input by GPO and level blank
  3. Third choice: DLC in field 040 either in subfield a or c and level blank
  4. Fourth choice: Cataloging source other than DLC or GPO and input source is other than DLC or GPO and level I with 042 field present
  5. Fifth choice: Cataloging source other than DLC or GPO and input source is other than DLC or GPO and level I without 042 field present
  6. Sixth choice: Cataloging source other than DLC or GPO and input source is other than DLC or GPO and level K
If none of the six choices above are relevant to the duplicate records at hand, e.g. non-OCLC records, proceed to select master record using the steps described below.

If there are two or more duplicate records from the same choice category, e.g. three DLC/DLC records but two have been enhanced by the local libraries, select the master record based on the number of elements present in that record. The bibliographic record elements in order of most important to least important are:

  1. Electronic location and access (field 856)
  2. Table of Contents (field 505) [Can be dropped if Table of Contents information is loaded for the entire UIAS union catalog as per recommendation below]
  3. Juvenile subject headings (fields 6xx with a second indicator of 1)
  4. Index term-Genre/form (field 655)
  5. Summary information (field 520)
  6. MeSH subject headings (fields 6xx with second indicator of 2) record with the most recent cataloging/update date
For example, if there are three choice 4 records that are considered matches and if, Record #1 is an I level record with a 856 field and LC subject headings

Record #2 is an I level record with a 655 field and LC subject headings

Record #3 is an I level record with MeSh subject headings

Then, Record #1 would be designated the master record.

If anytime during the algorithm, if two or more records can potentially be the master record because they have the same elements based on the above algorithm, the bibliographic record with the latest cataloging/update date should be selected as the master record.

Recommendation 6: The UIAS master record should not be identified in the public display as the bibliographic record of a particular library but generically as the UIAS master record. However, the source of the master record should be available in the staff mode.

Recommendation 7: The CSU Union List of Serials (ULS) data should be loaded neither as a part of the UIAS union catalog nor as a separate file in the UIAS union catalog.

The UIAS union catalog will contain the majority of serials, including periodicals, thereby, in large part, superseding the ULS data. The Task Force is concerned that providing the ULS data as a separate file will be confusing. Users will have two separate files to access for information about CSU periodicals and the information in these files will likely not be identical in all cases.

However, several issues pertaining to the ULS data need to be addressed:

  • First, identify the extent of ULS bibliographic records and corresponding holdings information not represented in local OPACs and thereby, not included in the UIAS union catalog. This is particularly problematic when the ULS bibliographic record represents a unique title within the CSU.
  • Secondly, survey each CSU library to determine whether local summary holdings information is available in the local OPAC and where it is stored (bibliographic record? Check-in record? Field tags?).
  • Thirdly, what holdings information is retrievable from the union catalog using Z39.50.
The Task Force, working with the UIAS Project Manager, searched one periodical title, American Libraries, in each of the CSU libraries' local OPACs using the Resource Sharing System (RSS).RSS uses a Z39.50 Version 3 client that will function similarly to the Z39.50 client in WebPAC for NT.

In the search results from this one title, the Task Force found that there was no consistency in how the summary holdings information was presented in each local OPAC. Also the Task Force found that periodical summary holdings for American Libraries were being stored both in the check-in and/or bibliographic record.

Four libraries used the check-in record to record the summary holding statement for American Libraries. The remainder of the CSU libraries (with one exception where the MARC record was not retrievable) used the bibliographic record to store the summary periodical holdings statement but a variety of fields were used: 590, 850, 852, 866, and 949. The location and call number information was (with 2 exceptions) stored in the 852 field. As indicated above, this issue requires further investigation.

Questions that remain are:

  • Since the current version of Z39.50 does not allow for retrieval of data stored in check-in records, what are the alternatives for the libraries that may have stored holdings data in part or totally in check-in records?
  • Has each CSU library consistently used the same MARC field to store summary holdings information?
  • Are there limitations in WebPAC that would prevent the mapping of local holdings data stored in the local catalogs when this data is retrieved and viewed? For example, when a library has placed summary holdings in more than one MARC field?
In conjunction with recommendation 7, we are not, however, prepared at this time to make any additional recommendation on whether the OCLC CSU Union List of Serials (CSU ULS), which also updates other resources, e.g. Melvyl and CARL UnCover, should or should not be maintained. Information supplied to the Task Force by the CSU Chancellor's Office indicates that the CSU ULS is currently updated, and that new fiche copies are sent out to all CSU libraries twice a year. CSU's participation in CALS, a periodical union list for academic libraries and CULP, another ULS fiche product through CLASS, has ceased.

Recommendation 8: Each CSU library should be provided the opportunity to designate what bibliographic records should not be included in the UIAS union catalog. However, the Task Force supports as inclusive a representation of local holdings in the UIAS union catalog as possible including records that are for local on site use only.

Recommendation 9: It should be checked whether any CSU library(s) has purchased table of contents information for its local OPAC. If yes, it needs to be verified that there are no restrictions in loading this data also into the union catalog.

Recommendation 10: CSU Bakersfield (GEAC Advance), Cal Poly Pomona (INNOPAC), CSU Chico (Horizon), Sonoma State (DRA) and Cal Poly San Luis Obispo (INNOPAC) have been designated by the UIAS Management Team as the pilot test sites for the UIAS union catalog. Records from each of the pilot test sites should be included using the above proposed algorithms, CSU profile specifications, Library of Congress authority control files, and other guidelines from these recommendations. A certain number of random records should be chosen.

Specific types of records need to be hand-picked from these catalogs in order to test the following:

  • algorithms herein,
  • the display of holdings including those of periodicals and other serials,
  • the authority control functionality including geographic headings used as corporate headings,
  • the loading of new and revised authority records,
  • and the adding of new, incoming bibliographic records to the database (duplicates and non-duplicates).
The handpicked records (or especially created records for the purpose of testing) should include: duplicate records across the campuses when some among them are:
  • records with 856 fields
  • records with juvenile subject headings
  • records with MeSh subject headings
  • records with table of contents information
  • records with genre headings
  • records identical but with different cataloging dates
  • records with periodical/serial summary holdings in different places in the local records (e.g. in 850 field, 866 field)
  • analytics for serials and monographic sets
  • Same title different formats, e.g. printed book and book on tape
  • Bibliographic records representing different formats, e.g. serials, non-book including Internet only.
  • Any other records potentially problematic in the creation of the union catalog, e.g. local practices, etc.
Charge 2: Review what authority control strategies, if any, to use on the union catalog.

Recommendation 11: The current version of Library of Congress name and subject authority files should be purchased and loaded onto the UIAS server and used in conjunction with the creation of the UIAS database. The Task Force further recommends that a subscription to the LC name and subject authority files also be purchased and be loaded in a timely fashion during the period that the union catalog is being created and on an ongoing basis.

It is important that the LC authority file data be available at the time of union catalog creation. Otherwise many unnecessary brief authority records will be created and which will be difficult to remove and overlay in an automated fashion. This will require human intervention thus resulting in a cleanup problem.

Recommendation 12: As indicated earlier, with recommended model 1, access points in the master record will be under authority control as specified in the CSU UIAS profile.

Recommendation 13: The entire Library of Congress name and subject authority files should be available on the UIAS server. Incoming bibliographic records should be passed against the Library of Congress authority files with see and see also cross reference not becoming activated (made visible in the public catalog display) until the authority heading is linked to an UIAS bibliographic record. However, in the staff mode, the entire authority files should be searchable and viewable including those Library of Congress authority records not yet linked to an access point in the union catalog.

Recommendation 14: The Library of Congress subject heading access points in the bibliographic records need to be run against both the LC name and subject authority files.

Recommendation 15: As the access points from the UIAS master records are passed against the LC authority files, the vendor should assume that the access point from the bibliographic record and the authorized heading (1xx field) or the former heading from the authority record are the same only if all elements between the two are identical. If a bibliographic access point checks as identical to a former heading or to the 1xx field in the authority record, the 1xx authorized heading in that authority record will be linked to that bibliographic record.

The one exception when all elements between the two headings do not have to be identical is in the case of personal names. Vendor should assume that the personal name from the bibliographical record and the authorized heading from the authority record are the same even if the bibliographic record lacks the death date present in the authority record or the reverse. With the exception of the death date being absent, these two headings must otherwise be identical.

Recommendation 16: Library of Congress update files/tapes will be issued during the period the union catalog is being created. Vendor should load these as these become available. See also charge 4 below.

 

Charge 3: Recommend to the UIAS Management Team values for Systems, Cataloging and PAC parameters which will be used to build the union catalog and search the union catalog.

See separate profile document reviewed and coded by Task Force. It should be noted that the Task Force carefully reviewed and prepared the Horizon profile according to the training the Task Force received from the vendor and the RFP intent.

 

Charge 4: Review and recommend options for maintaining authority control records, if any.

Recommendation 17: A subscription to the Library of Congress authority records should be purchased and the updates and the new authority records loaded in the UIAS in a timely fashion.

Recommendation 18: As Library of Congress new authority records are loaded into the UIAS as a result of the subscription recommended in #17, new Library of Congress authority records should replace brief UIAS authority records if all elements in the 1xx, 4xx or 5xx field are the same with the exception of death date being present in either the brief authority record or the full Library of Congress authority record but not the other. In all other aspects the brief authority record and LC full authority records must be identical.

Recommendation 19: In the case of Library of Congress updates to previous LC authority records, the updated LC authority record should overlay the previous LC authority record if the authority record number is the same.

We want to note that the updating of compound subject headings will be problematic because when such subject headings split into one or more separate headings (e.g. when Nurses and Nursing split into two separate subject headings, Nurses and Nursing) the Library of Congress often but not always re-uses the same Library of Congress control number for one of the revised headings. This is a maintenance issue that cannot be resolved without human intervention.

Charge 5: Make recommendations on maintaining the UIAS union catalog

Recommendation 20: UIAS Management should designate one person as having primary responsibility for the UIAS union catalog database to ensure a current and accurate resource for CSU. In order for this individual to fulfill his/her responsibility, UIAS Management should also plan to invest in some outsourcing of union catalog maintenance and/or investigate with COLD the feasibility of handling some limited ongoing maintenance of the UIAS through some shared approach among the CSU libraries.

The Task Force feels it is unrealistic to think that the UIAS union catalog will be able to be maintained entirely by machine manipulation. Examples of maintenance that will require human intervention include the brief authority records that will not be able to be overlaid by an automated algorithm and the splitting of compound subject headings. It should be noted that subject subdivisions might not be altered when primary subject headings are replaced through an automated overlay. This may be another maintenance issue.

Recommendation 21: Any maintenance done by the master record library in the local system should overlay on the UIAS master record.

Recommendation 22: Any maintenance or new cataloging done by a non-master record library may result in the non-master record being judged a more preferred record by the algorithm thus resulting in a new UIAS master record.

Recommendation 23: The maintenance and new cataloging done in the local systems should be uploaded to/extracted for inclusion in the UIAS without any separate action by the local library staff.

Recommendation 24: The uploading of new cataloging/maintenance from the local systems to the UIAS should occur in real time without degradation to the local systems.

Recommendation 25: Table of Contents information should be made available in the UIAS union catalog. Even if some table of contents data can be made available through the loading of data from one or more of the local OPACs, it should still be investigated what additional table of contents is available from a vendor, the cost of such additional table of contents data, and the estimated number of enhancements that would result.

Recommendation 26: If the table of contents information (i.e. field 505) is purchased and added to the union catalog, this data should be updated annually through data load or FTP from vendor. The 505 field does not need to be a protected field.

Recommendation 27: The ongoing review of the performance of the system should be informed by actual search results and statistics.

Recommendation 28: UIAS staff should keep current the information located in 856 fields in bibliographic records across the union catalog with regular use of an URL checker.

 

 

Appendix:

Maintenance of union catalog: Comparison between Models 1 and 2

With model 1: UIAS master record with holding libraries listed

1. Record is uploaded from master record library

Results in: master record being replaced (see recommendation 20) 2. Record is uploaded from non-master record library -- uploaded record can be either a new cataloging record OR maintenance to an existing holding library record Results in: algorithm being run and if incoming record or existing record with changes is "better" than current master record, master record is replaced. If not "better", incoming record results in holding library being listed as owning if not already listed as having (see recommendation 21). 3. Withdrawals made by master record library Results in: Listing for the master library being deleted, but master record remains in database until "better" record is uploaded (see step 2 above) because no new record is being submitted for evaluation (i.e. other libraries are represented by holding mnemonic only). If there are no other holdings attached, master record is deleted. Requires no additional human intervention of local staff. 4. Withdrawals made by non-master record library Results in listing for the non-master library is deleted. If this is the last copy in the system, then the master record will also be deleted. Requires no additional human intervention of local staff. With model 2: master record with full records of non-master libraries on UIAS server

1. Record is uploaded from master record library

Results in: same as in model 1 2. Record is uploaded from non-master record library -- uploaded record could be either a new

holding library OR maintenance to an existing holding library record

Results in: The same as with model 1 except that non-master record is preserved on server. Non-Master record that was uploaded overlays the existing record from that library if previous record exists. 3. Withdrawals made by master record library Results in: same as with model 1

OR

Algorithm selects a new master record from remaining attached bibliographic record(s) & old master record is deleted

4. Withdrawals made by non-master record library Results in: Same as with model 1 except the full non-master record is deleted
 
Task Force Reports