Recommendations of the UIAS Interim Task Force for
Database Standards and Management

DRAFT — January 12, 1998

In response to the charge from the CSU UIAS Management Team, the Interim Task Force for Database Standards and Management (Celia Bakke, San Jose State University; Gina Roth, Cal Poly Pomona, Tamara Frost Trujillo (chair), CSU Sacramento) (henceforth referred to in report as "the Task Force") is submitting these recommendations.

Charge 1: Review the strategies for building a union catalog for the UIAS and recommend to the UIAS Management Team the CSU strategy for building the UIAS union catalog.

At the UIAS Kick-off meeting held at WestEd on August 5, 1997, Ameritech presented three models for structuring the UIAS union catalog: model 1. Deduplicated, merged master record model; model 2. Master Record Linked Bibs Model; model 3. Multiple Database Union View Index Model. The Task Force considered these three models for the UIAS structure and particularly the benefits and disadvantages of each related to maintaining the union catalog and authority control.

Since both model 1 and 2 are based on a master record model, the Task Force compared models 1 and 2 to determine whether the retention of the entire bibliographic record in model 2 would make the long term maintenance of the union catalog easier and might result in a higher quality database. Although model 2 requires the retention and linking of full bibliographic records to the master record representing the same bibliographic entity, it appears to the Task Force that model 2 does not make maintenance of the union catalog either easier, of better quality, or even substantially different than with model 1 (see appendix).

With model 3 maintenance of the union catalog would result in each separate database being maintained independently, i.e. maintenance of one library's record would not need to set any algorithms in motion as with model 1 or 2 so in this regard it would be an easier machine manipulation of files.

In respect to authority control and these three models, in model 1 authority control would be exercised over that set of access points in each master record specified by CSU to be under authority control. In model 2 there are two options in regards to authority control. Model 2, option A is the same as model 1 i.e. only the access points in the master record specified by CSU to be under authority control would come under authority control. This means that the full bibliographic records retained on the UIAS server and linked to the master record would not be under authority control. Both options of model 2 allow for the possibility of a user viewing the individual linked records off the master record. By retaining all bibliographic records from each CSU campus associated with each master record, this model retains very sizeable quantities of bibliographic records online for possible future use not yet defined. This model also provides another set of backups of each local system as part of the UIAS backup.

With respect to authority control and model 2 there could also be an option B which would require the Ameritech software to authorize all those headings in the master record (same as in model 1 and in model 2 option A) AND authorize any unique headings (not already in the master record) in the linked records. This variation of the master record concept would provide access to those access points only found in the linked records. This might be helpful at times but would also give us additional errors or maintenance problems.

In model 3 authority control would be exercised over each library's database independently. The success of the union catalog view would be dependent on how consistently authority control has been able to replicate the headings across the independent files. For example, if ten CSU libraries hold the same title but the form of heading varies across the ten libraries, the record would be retrieved depending on how successfully authority control can make the forms of heading the same using the authority record for that heading and its cross reference structure. A patron's search may or may not call up all ten records or reflect the holdings of the ten libraries in this case. This model is being used by Indiana University Libraries, but in a direct input mode where all the IU libraries are doing direct input and maintenance to that model.

An additional concern about model 3 was stated by Ameritech at the August 5th meeting where the company's representative indicated that library users "using a Z39.50 client who were accessing the UIAS union catalog from outside the CSU might not have broadcast searching functionality and so would miss the union view provided to a generic web browser."

Recommendation 1: Because the goal is to build the best union catalog with the least human intervention, the Task Force recommends model 1 as the UIAS structure. Model 2 B and Model 3 for the reasons described above will add to the inconsistencies in the database. Model 2 (both options) also retains a very large quantity of additional records on the server without a particular goal in mind. Each library should be retaining a current full back-up of their local system.

Recommendation 2: The UIAS Bibliographic Task Force should be asked to review the UIAS test file and make suggestions and propose corrections.

Recommendation 3: Upon completion of the union catalog test, the first three campuses that should be loaded in the UIAS are 1. San Diego State 2. San Jose State and 3. San Francisco State. Since these are the CSU libraries with the largest holdings, this would provide an immediate large UIAS database of good quality bibliographic records.

Recommendation 4: The following matching algorithm should be used to identify and match duplicate records during the creation of the UIAS as well as on an ongoing basis:

match point 1: An exact match of the OCLC number (fields 001 and 019)

If there is an exact match on the OCLC number, continue matching algorithm as follows in the order specified:

  • match point 2: An exact match of the ISBN (field 020 subfield a)
  • match point 3: An exact match of the ISSN (field 022 subfield a)
  • match point 4: An exact match of the Government Documents Number (field 086 subfield a)
  • match point 5: Library of Congress control number (LCCN) (field 010 subfield a)
  • match point 6: the first five words of field 245 subfield a
To consider bibliographic records matches, they must match on match point 1 and one of the following: match point 2, match point 3, match point 4, match point 5, or match point 6.

If bibliographic records do not match on match point 1, still consider records a match if records match at least two of the match points 2 through 6 above, e.g. a match on ISBN (match point 2) and LCCN (match point 5).

We recommend that the matching algorithm be closely monitored during the pilot database creation phase. It may be necessary to identify additional match elements primarily to prevent false matches rather than non-matching.

If there is not an exact match based on the above parameters, do not consider the records to be matched and load the records separately into the union catalog.

Recommendation 5: Use the following master record algorithm to choose the master record from among those records identified as matched as a result of the above matching algorithm.

The choices (in order of preference) for the master record are: First choice: Cataloging source is DLC and input by DLC and level blank Second choice: Cataloging source is GPO and input by GPO and level blank Third choice: DLC in field 040 either in subfield a or c and level blank Fourth choice: Cataloging source other than DLC or GPO and input source is other than DLC or GPO and level I with 042 field present Fifth choice: Cataloging source other than DLC or GPO and input source is other than DLC or GPO and level I without 042 field present Sixth choice: Cataloging source other than DLC or GPO and input source is other than DLC or GPO and level K

If none of the six choices above are relevant to the duplicate records at hand, e.g. non-OCLC records, proceed to select master record using the steps described below.

If there are two or more duplicate records from the same choice category, e.g. three DLC/DLC records but two have been enhanced by the local libraries, select the master record based on the number of elements present in that record. The bibliographic record elements in order of most important to least important are:

  • Electronic location and access (field 856)
  • Table of Contents (field 505)
  • Juvenile subject headings (fields 6xx with a second indicator of 1)
  • Index term-Genre/form (field 655)
  • MeSH subject headings (fields 6xx with second indicator of 2) record with the most recent cataloging/update date
For example, if there are three choice 4 records that are considered matches and if

Record #1 is an I level record with a 505 field and LC subject headings,

Record #2 is an I level record with a 655 field and LC subject headings,

Record #3 is an I level record with MeSh subject headings,

Then,

Record #1 would be designated the master record

If anytime during the algorithm, if two or more records can potentially be the master record because they have the same elements based on the above algorithm, the bibliographic record with the latest cataloging/update date should be selected as the master record.

Recommendation 6: The UIAS master record should not be identified as the bibliographic record of a particular library but generically as the UIAS master record.

Recommendation 7: The CSU Union List of Serials (ULS) data should not be loaded either as a part of the UIAS union catalog or as a separate file in the UIAS unconnected to the union catalog.

The UIAS union catalog will include the large majority of serials including periodicals thereby in large part superseding the ULS data. The Task Force is concerned that even loading the ULS data as a separate file apart form the union catalog will be confusing because users will have two separate places to find information about CSU periodicals and the information in these two places will likely not be identical in all (many?) cases. However, several issues pertaining to the ULS data need to be addressed. First, the extent to which ULS bibliographic records and associated holdings information are not represented in local OPACs and thereby not included in the UIAS union catalog. This is particularly problematic when the ULS bibliographic record represents a unique title within the CSU libraries. Secondly, we need to ascertain the retrieval capability of local summary holdings information using Z39.50 Horizon software. Each CSU library needs to be surveyed to determine what local summary holdings information is available in local OPACs and what is retrievable from the union catalog environment using Z39.50. Summary holdings information in local systems may be in one of several locations, e.g. in the 850 field of the bibliographic record or in the check-in record.

Given our recommendation above, we are not, however, prepared at this time to make any recommendation on whether the OCLC CSU union list of serials should or should not be maintained which also updates other resources, e.g. Melvyl.

Recommendation 8: CSU Bakersfield (GEAC Advance), Cal Poly Pomona (INNOPAC), CSU Chico (Horizon), Sonoma State (DRA) and Cal Poly San Luis Obispo (INNOPAC) have been designated by the UIAS Management Team as the pilot test sites for the UIAS union catalog. Records from each of the pilot test sites should be included using the above proposed algorithms, CSU profile specifications, Library of Congress authority control files, and other guidelines from these recommendations. A certain number of random records should be chosen. However, specific type of records also need to be handpicked to test the following: the algorithms herein, the display of holdings including those of periodicals and other serials, the authority control functionality including geographic headings used as corporate headings and the loading of new and revised authority records, and the adding of new, incoming bibliographic records to the database (duplicates and non-duplicates). The handpicked records (or especially created records for the purpose of testing) should include: duplicate records across the campuses when some among them are:

  • records with 856 fields
  • records with juvenile subject headings
  • records with MeSh subject headings
  • records with table of contents information
  • records with genre headings
  • records identical but with different cataloging dates
  • records with periodical/serial summary holdings in different places in the local records (i.e. in 850 field versus in check-in record)
  • analytics for serials and monographic sets
  • Same title different formats, e.g. printed book and book on tape
  • Bibliographic records representing different formats, e.g. serials, non-book including Internet only.
Charge 2: Review what authority control strategies, if any, to use on the union catalog.

Recommendation 9: The current version of Library of Congress name and subject authority files should be purchased and loaded onto the UIAS server and used in conjunction with the creation of the UIAS database. The Task Force further recommends that a subscription to the LC name and subject authority files also be purchased and be loaded in a timely fashion during the period that the union catalog is being created. A subscription for the LC Name and Subject Authorities should be in place for maintenance of the union catalog.

It is important that the LC authority file data be available at the time of union catalog creation otherwise many unnecessary brief authority records will be created many of which will be difficult to remove and overlay in an automated fashion and will require human intervention thus resulting in a cleanup problem.

Recommendation 10: As indicated earlier, with recommended model 1, access points in the master record will be under authority control as specified in the CSU UIAS profile.

Recommendation 11: The entire Library of Congress name and subject authority files should be available on the UIAS server. Incoming bibliographic records should be passed against the Library of Congress authority files with see and see also cross reference not becoming activated (made visible in the catalog display) until the authority heading is linked to an UIAS bibliographic record.

Recommendation 12: The Library of Congress subject heading access points need to be run against both the LC name and subject authority files.

Recommendation 13: As the access points from the UIAS master records are passed against the LC authority files, vendor should assume that the access point from the bibliographic record and the authorized heading (1xx field) or the former heading from the authority record are the same only if all elements between the two are identical. If a bibliographic access point checks as identical to a former heading or to the 1xx field in the authority record, the 1xx authorized heading in that authority record will be linked to that bibliographic record.

The one exception when all elements between the two headings do not have to be identical is in the case of personal names. Vendor should assume that the personal name from the bibliographical record and the authorized heading from the authority record are the same even if the bibliographic record lacks the death date present in the authority record or the reverse. With the exception of the death date being absent, these two headings must otherwise be identical.

Recommendation 14: Library of Congress update files/tapes will be issued during the period the union catalog is being created. Vendor should load these as these become available. See also charge 4 below.

Charge 3: Recommend to the UIAS Management Team values for Systems, Cataloging and PAC parameters which will be used to build the union catalog and search the union catalog.

See separate profile document reviewed and coded by Task Force.

Charge 4: Review and recommend options for maintaining authority control records, if any.

Recommendation 15: A subscription to the Library of Congress authority records should be purchased and updates to new authority records loaded in the UIAS in a timely fashion.

Recommendation 16: As Library of Congress updates to new authority records are loaded into the UIAS as a result of the subscription recommended in #15, new Library of Congress authority records should replace brief UIAS authority records if all elements in the 1xx field are the same with the exception of death date being present in either the brief authority record or the full Library of Congress authority record but not the other. In all other aspects the brief authority record and LC full authority records must be identical.

Recommendation 17: In the case of Library of Congress updates to previous LC authority records, the updated LC authority record should overlay the previous LC authority record if the authority record number is the same.

We want to note that the updating of compound subject headings will be problematic because when such subject headings split into one or more separate headings (e.g. when Nurses and Nursing split into two separate subject headings, Nurses and Nursing) the Library of Congress often but not always re-uses the same Library of Congress control number for one of the revised headings. This is a maintenance issue that cannot be resolved without human intervention.

Recommendation 18: Former headings in Library of Congress authority records headings should be run against the UIAS database and the new authorized 1xx substituted if all elements in the former heading from the authority record and the UIAS bibliographic record are identical except for death dates as described in #16 above.

Charge 5: Make recommendations on maintaining the UIAS union catalog.

Recommendation 19: UIAS Management should designate one person as having primary responsibility for the UIAS union catalog database to ensure a current and accurate resource for CSU. In order for this individual to fulfill his/her responsibility, UIAS Management should also plan to invest in some outsourcing of union catalog maintenance and/or investigate with COLD the feasibility of handling some limited ongoing maintenance of the UIAS through some shared approach among the CSU libraries.

The Task Force feels it is unrealistic to think that the UIAS union catalog will be able to be maintained entirely by machine manipulation. Examples of maintenance that will require human intervention include the brief authority records that will not be able to be overlaid by an automated algorithm and the splitting of compound subject headings. It should be noted that subject subdivisions might not be altered when primary subject headings are replaced through an automated overlay. This may be another maintenance issue.

Recommendation 20: Any maintenance done by the master record library in the local system should overlay on the UIAS master record.

Recommendation 21: Any maintenance or new cataloging done by a non-master record library may result in the non-master record being judged a more preferred record by the algorithm thus resulting in a new UIAS master record.

Recommendation 22: The maintenance and new cataloging done in the local systems should be uploaded to/extracted for inclusion in the UIAS without any separate action by the local library staff.

Recommendation 23: The uploading of new cataloging/maintenance from the local systems to the UIAS should occur in real time without degradation to the local systems.

Recommendation 24: Table of Contents information should be made available in the UIAS union catalog.

Appendix:

Maintenance comparison between Models 1 & 2

With model 1:

UIAS master record with holding libraries listed.

  1. Record is uploaded from master record library. Results in: master record being replaced (see recommendation 20)
  2. Record is uploaded from non-master record library. Uploaded record can be either a new cataloging record OR maintenance to an existing holding library record. Results in: algorithm being run and if incoming record or existing record with changes is "better" than current master record, master record is replaced. If not "better", incoming record results in holding library being listed as owning if not already listed as having (see recommendation 21).
  3. Withdrawals made by master record library. Results in: Listing for the master library being deleted, but master record remains in database until "better" record is uploaded (see step 2 above) because no new record is being submitted for evaluation (i.e. other libraries are represented by holding mnemonic only). If there are no other holdings attached, master record is deleted.
  4. Withdrawals made by non-master record library. Results in: listing for the non-master library is deleted. If this is the last copy in the system, then the master record will also be deleted.
With model 2:

Master Record with full records of non-master libraries on UIAS server

  • Record is uploaded from master record library. Results in: same as in model 1.
  • Record is uploaded from non-master record library. Uploaded record could be either a new holding library OR maintenance to an existing holding library record. Results in: The same as with model 1 except that non-master record is preserved on server. Non-Master record that was uploaded overlays the existing record from that library if previous record exists.
  • Withdrawals made by master record library. Results in: same as with model 1 — OR —Algorithm selects a new master record from remaining attached bibliographic record(s) & old master record is deleted
  • Withdrawals made by non-master record library. Results in: Same as with model 1 except the full non-master record is deleted.
Task Force Reports