Building the Union Catalog for PharosTM
Directions for Extraction of GAP Bibliographic Records from CSU Library Systems

Initial Build

The first stage of the two stage process of building the Pharos Union Catalog has been completed.  Using the 11+ million records extracted from the 22 CSU libraries in late 1998, the initial build of the union catalog was completed in July of 1999 . The Task Force for Database Standards and Management and Ying Liu, the Pharos Database Manager, have been reviewing the 3.5+ million  records loaded into the Horizon system that is being used to host the union catalog.

The Horizon database used for the union catalog was preloaded with the Library of Congress Name and Subject Authority records. These records have been updated to reflect the changes that have been made through July 2000 by LC and will continue to be updated on an ongoing basis with loads of new and updated authority records from LC.

The English language monographs and serial records with a publication date greater than 1991 in the union catalog database have been extracted and sent to Blackwells' where they were enriched with table of contents and summary information. The 1996 through 1998 enriched records have now been loaded back into the union catalog database. As new records are added to the union catalog, they will be sent to Blackwells' for ToC and summary enrichment processing.

Maintenance

Numerous test databases were created using a maintenance program that implements the master record algorithm. This is the same algorithm used by a program that was used to merge the initial 11.5+ million bibliographic records extracted in late 1998 into the single file of 3.5+ million records. The maintenance program was also tested in late 1999 and early 2000 by loading gap records extracted from San Francisco State's catalog into the live union catalog database.

The maintenance program initially matches similar bibliographic records. Once a match has been made, the maintenance program then determines if the incoming record should become the master record, again using the rules described in the algorithm. Another important step is the creation of a new field in this record that is used by the Pharos software to create a hypertext link to the local catalog. A user can click on this link to obtain the local call number and circulation status of the holdings in that library.

The maintenance program is designed to accommodate new records, updated records and deleted records. Depending on the status of the existing record in the union catalog, a new record can be created, an existing record modified, or an existing record can be deleted from the database. The maintenance program has been created and extensively reviewed to insure that the union catalog will continue to keep pace with each CSU online catalog.

We are now ready to receive a file of "gap" records extracted from your library's local system.

Gap File: Defined

A gap file represents those records that update, add to, or delete records from each library's local online catalog database. For the purposes of the Pharos union catalog these gap file procedures pertain to bibliographic records only. Updating of authority records will be done by the Pharos project office. Updating of item records is not relevant to the updating of the master records in the union catalog, e.g. the adding of a second copy of the same edition of a title by a library should not be submitted, although there is no harm done to the union catalog database if such a record is submitted.

Real Time Updating

The goal of providing a real-time, seamless interface that could be used to update the union catalog, using records from your library's catalog is not yet available. A plan to implement this type of an interface is under development. The currency of the union catalog, until the real-time interface is available, will depend on the submission of gap files of bibliographic records.  Our goal is to keep the union catalog current with your local catalog, but first, we have over twelve months of cataloging activity to digest before we can narrow the gap. We are unsure of the number of records that we will obtain in this first of what will probably be many gap files, but knowing that library budgets were uncommonly flush in the last year, we expect the number to be significant.

A possible set of gap files would include:

  1. Bibliographic records added to your system since the date of the initial extraction of records for the Pharos union catalog.  Please check with Ying Liu if you are unsure of the date of the initial extraction of records for the union catalog. We expect that your library has added many new bibliographic records to your catalog due to the additional funding that was available to CSU libraries for purchasing new materials for library collections in FY 1999 & 2000. The users of the union catalog will certainly want to find records for these important new acquisitions.
  2. Bibliographic records that have been modified since the date of the initial extraction of records for the Pharos union catalog. A modified record constitutes any maintenance to the bibliographic record since the extraction date.  Updates and new records may be submitted as an integrated file or as separate files based on local preference or capability. Again the extraction date for the update and new record file(s) should be noted by the local library as well as submitted to the Database Manager.
  3. Bibliographic records that have been deleted from your local catalog since the date of the initial extraction of records for the Pharos union catalog. Since the initial extraction of bibliographic records for the creation of the union catalog, campuses have either been maintaining a paper copy of records deleted from the local catalog database or marking/flagging the deleted bibliographic record in the local database.
    1. Paper deletes
      For campuses maintaining a paper file of deleted records: Please send these print-outs to:

                  Ying Liu
                  Pharos Database Manager
                  Office of the Chancellor
                  California State University
                  401 Golden Shore, 2nd Floor
                  Long Beach, California  90802-4210

      To avoid the record keeping associated with printing out deleted records, the Pharos project office and the Task Force for Database Standards and Management will be working with each library to devise a method of submitting deleted records electronically.
       

      Electronic deletes
      Electronic deletes fall into two categories: 1). Deletes from systems in which flagging/marking of deletes is not possible requiring such records to be submitted as a separate electronic file identified as deleted records and 2). Deletes from systems in which flagging/marking of deletes is possible thus making it possible to submit deletes in an integrated file together with updates and new records. With the second category of records, libraries must identify the flag (code) used and the location of the code in the record. (The flag to mark deleted records cannot have multiple uses.)
When the records have been extracted they can be sent to the Pharos Project Office via SFTP [sftp: sftp.calstate.edu Username: MyCampusPassword: unique] Contact Ying Liu at 562 951-4261 to get your actual username and password. Ying Liu will begin processing your records as soon as they are received. Please contact Ying Liu to discuss the gap record extraction timetable that would best suit your library's situation. Because we will be processing these files one at a time, we are not setting a deadline for completion. Rather, we would like some libraries to volunteer to extract as soon as is possible, while other libraries can choose a later date. As libraries commit to a time frame this information will be posted on a UIAS web site. 

The gap file extraction process will vary according to your library's automation system software and equipment configuration. Common to all, however, are the following issues that need to be considered when planning and carrying out the bibliographic extraction:

  1. No changes should be made to bibliographic records already in your database while you are in the process of extracting the gap records from your database. This is necessary in order to insure the integrity of each library's records. New records, however, can still be added to the bibliographic database during the extraction process. We expect that your gap file record extraction process will take far less time than the initial extraction and that the period of time during which database edits should be restricted to also be brief. We want that this gap extraction process to cause little or no disruption to the schedules of any of your library personnel. Please contact Ying Liu at 562 951-4261 to review the requirements for this extraction and to schedule a date to start your gap file extraction.
  2. The library system's internal bibliographic control number must be included in all extracted records. We would like this number to be located in the same field in which the original extraction placed the internal control number. If you suspect that a change has been made to your system that would have changed this, it is strongly recommended that you extract one test record and that  you can send this record  to Ying Liu for analysis. A web page has been established where you can check to see what was recorded for the initial extraction. This number is crucial to the operation of the union catalog because it is the "hook" back the local system which will enable the union catalog to display call number(s) and circulation status for each title. Please indicate which MARC field will be used to store the internal control number and supply this information to the Pharos Database Manager. Note: The program implementing the master record algorithm to merge records will use the 035 field. Placing the internal control number in the 035 could cause this program to merge two bibliographic records that should not merge, etc. For this reason it is recommended that the 035 field NOT be used to store the internal control number. (This "Hook-to-Holdings" capability can be viewed at http://Pharos.calstate.edu:5080/webpac/pharosstart.html using the "Union Catalog" search option.)
  3. Classes of records which are present in the local system but which are not suitable for inclusion in the union catalog should be identified and, preferably, removed or excluded from the file of extracted records from your local library system. Such records could include; faculty owned copies of materials on Reserve, items such as room keys, titles withdrawn from the library, payment records, records used for check-in purposes, and other items that would not normally be considered part of a library's collection. Records like the ones described above may already be "suppressed" from public view in your OPAC. If this is the case, it is also likely that your library automation vendor could assist you in mapping this characteristic information to an otherwise unused MARC field. Having this information in the records that you extract will enable the programs used to build the union catalog to exclude these records during the load or suppress them from public view.
  4. The date that the gap file extraction is completed should be noted and this information sent to the Pharos Database Manager. Keeping track of the date and last record number extracted will be important when the next file of gap records is requested later in the year. Please send a copy of this information to yliu@calstate.edu Do not extract any additional records until notified by the Pharos Database Manager.
  5. Send paper copies, which your library has saved, of any bib records deleted from your local system since the initial extraction of records to Ying Liu, Pharos Database Manager,  Office of the Chancellor, California State University, 401 Golden Shore, Long Beach, CA 90802-4210. These printouts will be used by the Union Catalog Database Manager to edit the union catalog to reflect changes at the campus level between the time the initial extraction occurred and the present.
  6. Send computer files containing only deleted records to: the Pharos project office via SFTP [sftp.calstate.edu Username: MyCampus Password: unique] Contact Ying Liu at 562 951-4261 to obtain your username and password.
  7. For the purposes of the Pharos Union Catalog the definition of a deleted record is a bibliographic record that was in your system's database at the time of the initial extraction and which has now been removed because the library no longer owns a copy of the material represented by the bibliographic record being removed. Thus, a library, which deletes a record from its local system because it is discovered to be a duplicate entry, would not need to make a printout of that deletion since the title would still remain in the database.
  8. Please contact the Pharos Database Manager at 562 951-4261 to schedule a date to extract your first Gap File. As we determine the number of bibliographic records contained in all of the 22 CSU libraries' gap files and the rate at which these records can be processed the new maintenance program we will be able to anticipate when we will begin a regular schedule of extracting records.
The gap record extraction process described here is similar to, but smaller in size than, the extraction process that you followed in late 1998. We expect that this extraction process will be less burdensome for your staff because of the smaller size of the extraction. Nevertheless, We expect that questions and problems will arise. Please call Ying Liu  at 562 951-4261 to discuss ways to make this process as easy as possible or to work out problems that you encounter. Alternatively, send your comments to: Ying Liu at yliu@calstate.edu

Ongoing Submittal of gap files

The first gap file will be larger than any subsequent gap file because of the time that was required to create the union catalog. Subsequent gap files should be submitted on a regular basis. When the first round of gap files have been processed we will have a better idea of when we will be able to request that libraries begin to routinely submit gap files. At that time libraries will be able to choose to either submit gap files on a daily or weekly basis. All update gap files  must be submitted and processed in the order in which they are created by the library. We will be working with each library to insure that gap file names include information that identifies the source and date of extraction. An example of this would be: CCH04232000NEW. Translated, this name would indicate that CSU Chico extracted these new records on April 23, 2000.

You can search the CSU union catalog at http://Pharos.calstate.edu:5080/webpac/pharosstart.html using the "Union Catalog" link. Even though the indexing of these 5 million records is now complete fine tuning of the the searching capabilities offered through the Pharos Web interface continues.

Directions for the FTP of records to Pharos Union Catalog:

SFTP: sftp.calstate.edu

Username: MyCampus

Password: unique

You can ftp records directly to your campus' root directory when using your campus's special username - there is no need to change directories

bakersfield
chico
channelislands
dominguezhills
fresno
fullerton
hayward
humboldt
longbeach
losangeles
maritimeacademy
montereybay
northridge
pomona
sacramento
sanbernardino
sandiego
sanfrancisco
sanjose
sanluisobispo
sanmarcos
sonoma
stanislaus

Directions for

Innovative systems
Geac Systems
Horizon Systems
DRA Systems
Endeavor Systems

UIAS Task Force for Database Standards & Management Documents

UIAS Task Force for Systems Documents

 

Last Updated: January 28, 2008