|
Database Maintenance and Standards Task Force Pharos Union Catalog: Maintenance Issues - July 1999 Issue: DBMS Committee should assist in reviewing the ongoing building of the union catalog. Possible strategies: The 11.7 million CSU bibliographic records extracted in late 1998 and early 1999 have been processed at Ameritech by a program that implemented the Master Record Algorithm. This program merged these records into one file of 3,525,760 records. Then, these 3.5+ million records were loaded and indexed [completion date: July 24, 1999] by the Horizon system being used to host the CSU Union Catalog. Reports will now be created that can be used to identify the cleanup tasks and the magnitude of these tasks. The Horizon Client software, recently sent out to members of the Database Task Force on a CD-ROM, will need to be installed so that the Task Force members can look at the full MARC records in the Union Catalog. The Pharos gateway server does provide access to the union catalog as well and also displays MARC records, but not all fields are displayed. We know that some cleanup is required, but I would like the Task Force to determine if the catalog is still usable by the public. We will be anxious to extract the gap records from the 22 CSU libraries so that we can update the catalog for use during the fall term. Prior to that, however, the Taskm Force will need to review a new database that has been created using the Horizon system that also implements the Master Record Algorithm. This database is named Pharos and needs to be compared to Pharos5, the name of the database that was created by Steve Yates' program. Issue: Foreign initial articles. On Feb. 18 in minutes we indicated that an exception list would have to be created in order to deal with each instance where a foreign article is not really a foreign article -- Los Alamos, Los Angeles. How does this relate to the indexing currently going on? By when is this exception list needed? We still need this exception list. The list needs to include the initial article and the phrase in which we want the exception to apply. For example: Los is one initial article exception and then we need to list each instance in which we want the exception to be applied; Los Angeles, Los Alamitos, Los Gatos, Los Alamos, etc. I have made an entry for Los Angeles using the Horizon union catalog table editor. The change took effect immediately – no re-indexing was needed. If you use Pharos to browse for Los Angeles as a title, the browse search drops you into the Los section of the title index. Prior to this change a similar search dropped you into the Angeles section of the index. Issue: Hook to holdings for serials -- status and what needs to be done The mechanism is already in place for this. The CSU libraries just need to make sure that they are providing a summary holdings statement in the MARC record. The full record display in the Journal Access Core Collection database displays hooks to all CSU library online catalogs and the MELVYL PE database. This hook is made with an ISSN number. It works often but not always. Issue: ULS -- As far as we know this is on hold until the union catalog is operational and CSU has an opportunity to evaluate need for continuing ULS. Is this correct? There has not been any new discussion about loading the CSU ULS into the union catalog database or a separate database. The MELVYL PE database does contain the CSU ULS information and this database can be searched in Pharos in two ways. It can be selected for a direct search or it can be searched from any number of abstracting and indexing databases in a cross database search. See the example above. Issue:. Indexing - What remaining issues are there? Now that the initial load and indexing of the bibliographic records is complete. We can look at the list of desired indexes and see if they match our expectations. Remember that the indexes available via the Staff PAC client, using the Horizon 5.1 client software, will be different than the indexes available in the Pharos web interface. We can add any indexes to the Pharos web interface that are available in the Staff PAC client interface. At some point this Fall, probably after the first group of gap records has been loaded and the IBM Server has been relocated to WestEd, we will want to implement the ProIndexing software so that we can add more and faster indexes to the union catalog. This step would also require the implementation of some new Web client software, currently being developed by Ameritech. Issue: The Jan. 26th authorities recommendations (version 3) -- Was this shared with Ameritech? Response? This was handled back in February and the issues resolved in a conference call. Issue: TOC -- Where are the contract negotiations? How will we incorporate the TOC given that it has not happened as planned in regards to timetable? A contract with Blackwell’s has been signed for the enrichment of the union catalog records with the Blackwell’s file of TOC and summary information. This contract provides an opportunity for the individual CSU libraries to create a custom profile that Blackwell’s could use to enrich the individual CSU catalogs. The negotiated price for this aspect of the contract is $ .90 per record. I have asked Ameritech's Paul Johnson to extract the bib records from the Horizon system that have a publication date greater than 1991. These records will be sent to Blackwell’s via FTP. Blackwell’s has indicated that it will take them two weeks to process these records. When they complete the initial enrichment process, Blackwell’s will then FTP the enriched bib records back to Ameritech where we will then reload the records into the Horizon system. Issue: 590 fields -- Will this field be displayed in the public mode? Is our understanding correct that we will be able to delete this field or any field from public display with ease if we find that local information is more confusing than helpful after we see this? We can display or not display any field in the MARC record. The 590 is no different.
This is a good idea. This sounds like a rewrite
of the Master Record Algorithm in plane English.
We will have to deal with these issues through the Remote Patron Authentication Server setup, as well as working with vendors that are providing a licensed service via http. This will be campus by campus solution. This issue should probably be included in a discussion about cataloging electronic resources.
Recommendations for Maintenance of the Pharos Union Catalog Maintenance after creation of the union catalog: After the creation of the union catalog, the DBMS Task Force should review the database and indicate what corrections need to be made and in what priority order before the catalog is made available to the public. The Management Team will probably want to use the union catalog while the clean-up work is being done, unless the Database Task Force finds the catalog contains seriously misleading information. In general, priority should be given to reducing any unnecessary multiple authority records which result in multiple variations for the same access point. I agree, but I would put as the first priority the identification and deletion of records loaded into the union catalog that actually do not describe bibliographic materials. Top priority should also be given to other corrections that impede public use of the union catalog.
This is already available, in a fashion, from the Pharos Prototype Server. Records can be sent via E-mail, all that is needed is the e-mail address for the database manager. Comments can be added to the message describing the problem. Directions need to be made available to the persons who would be likely to send these messages so that they understand how to send in the problem records. This information could be available through the Pharos User interface. UIAS DBMS reserves the right to request any additional automatic maintenance reports after having some experience with the union catalog and maintaining it. The Task Force can request reports at anytime for any purpose they deem necessary to improve the quality of the database. The Project Manager will work with Ameritech Library Services to make these reports available in a timely fashion. Lists to be automatically generated: a. Duplicate local bibliographic control numbers that occur in more than one master record (may occur as a result of a library uploading an updated bibliographic record) b. Master records that match on only one of the following match points to be used to identify possible duplicate bibliographic records: OCLC number, ISSN, ISBN, 028 (match on title only provides too many records to check which are not really duplicates) We need more detailed descriptions of each of these reports. The more explicit we are in requesting reports the faster we will get these reports and the reports will provide information that can be used to identify records that need immediate attention. c. Any master record without attached listings, i.e. without any local bibliographic control number(s). This would be a report of records that do not include 935 fields. d. Any authority records with same content but different heading use codes. We need a more detailed description of this report or we will get a very long report that does not provide an accurate picture of the records that should be merged. e. Any new LC full authority record that has the same content as an existing system generated authority record but did not overlay. LC Authority records will overlay previously loaded LC Authority records, but will not overlay system generated authority records. The latter would need to be merged manually, a trivial process made easier by the generation of a report identifying candidates for merging. 2. How do we need to enhance the existing algorithm to accommodate the ongoing maintenance of the union catalog? RECOMMENDATION: The DBMS Task Force endorses the following automated maintenance process proposed in the UIAS Interim Task Force Response and Recommendations (Feb. 2, 1998, p. 10): The algorithm is applied to any update uploaded by the master or the non-master record library (Update = upload of a bibliographic record with a local bibliographic control number already in Pharos). Vendor cannot use the local bibliographic control number as a match point for purposes of overlaying an existing record because the bibliographic records may not be the same even when the local bibliographic control numbers are the same. The Master Record algorithm used to build the database is the same algorithm that will be used to update the database. The Master Record Algorithm program written by Steve Yates and used to merge and deduplicate the 11.7 million bibliographic records will not be used again. Paul Johnson has taken the same algorithm and written a program using the Horizon client software. This new program will process the gap file records in the same manner that the program Steve Yates developed for the initial merge. This is the maintenance program that Sam Foster referred to in June 1998. The local control number was not used as a match point in this algorithm. If a record is uploaded from a non-master record library and it is an update record (the library is already listed on the master record as having this title but is not the master record) the algorithm is applied and if the incoming update record is a "better" record, per the algorithm, the incoming update record will replace the existing master record originally submitted by another library. This is how it will work. If a record is uploaded and it is a new bibliographic record (not update record), for the uploading library, the algorithm is run and if the incoming new record matches an existing master record and is "better", per the algorithm, the incoming new record replaces the existing master record. If the incoming new record does not match an existing record, incoming record become a new master record by itself. The master record algorithm must process all records, including new, modified & deleted records. The algorithm will always choose the "better" record. [Sam Foster wrote May 12, 1998: If GEAC can output the deleted MARC records in a separate file we could handle them as deleted records without the leader status as long as the file is not mixed records.] Withdrawals made by master record library result only in the listing for the master record library being deleted from that UIAS record, but master record remains in database until a "better" record is uploaded or until the last listing on the master record for that title is deleted. This is the way it will work. Withdrawals made by a non-master record library result in the listing for the non-master record library being deleted from the master record. This is the way it will work. If the last local bibliographic control number attached to a master record is withdrawn, then the master record is automatically deleted from the union catalog. This is the way it could work. Alternatively, the records could be suppressed to the public, but remain in the database for staff use. 3. What should be the ongoing mechanism for handling deletions and updates of bibliographic records from the union catalog? RECOMMENDATIONS: a. Determine and clarify to CSU libraries what is required to have the capability of real-time updating including deleting records between local catalogs and the union catalog. b. Make real-time updating a development priority and set a timetable for accomplishing this objective. c. During the interim period before real time updating and deleting are available, UIAS DBMS should take the lead to determine what is the easiest automated way to accomplish updating and deleting bibliographic records between each local automated integrated library system (Horizon, Endeavor, INNOPAC, Advance, DRA) and the union catalog. This information should be posted on the Listserv. d. Flexibility should be provided for submitting local deletes and gap files so as to accommodate local library practices. e. Project Manager should post on the UIAS Web site a "minimum" schedule for local libraries' submittal of gap files and deletes. Libraries may select to submit gap files and deletes more frequently than minimum calendar requires. Yes, this needs to be done. f. Local library should clearly label gap files and deletes when sending them to UIAS Project Manager Office. g. Project Manager should post weekly status reports on match/merge process. The investigation of real-time updating is ongoing. Realistically, a solution will not become available before 2000. 4. Are there any maintenance issues related to the TOC? RECOMMENDATIONS: Have Project Manager seek clarification with Blackwell’s regarding the following questions: a. How does TOC deal with multi-volume sets not published together or published out of sequence? Are these updates that BNA offers? I asked Blackwell’s this question and this is the response that I received: Blackwell’s editing staff reject books with obvious set characteristics such as volume numbering, set title with volume title, etc. If a volume 1 does slip through and the MARC record is coded as multi-volume, the subsequent TOCs will never be added as we have no process by which we can deliver additional 505s for added volumes. b. Per our earlier recommendation, TOC data is protected data. Does this affect only the overlay process, and can we nevertheless correct typos that are deemed important to correct? Typos can be corrected. The TOC data is only protected when a new record is loaded. 5. Are there any maintenance issues related to authorities? There are no plans at this time to apply authority control to TOC authors. The information about the authors in the TOC has not previously been normalized and Horizon does not apply authority control to the 9xx fields. RECOMMENDATION: a. See also the separate recommendations by the DBMS Task Force on authorities. b. Have capability for automatic list that provides all headings that are the same but tags and/or sub-fields are different (if relevant, recommendation in authorities memo cannot be realized by vendor) We can create virtually any report that we want, but I think that we have learned that to be useful, reports have to be cleverly thought out so that patterns of problems clearly stand out. This recommendation probably needs more detail. c. Only display those headings to the public that are used in a bibliographic record (i.e. linked to a Pharos bibliographic record). Blind Authorities should not be displayed to users in Pharos. d. In the case of the LC authorities updates to previous LC authority records, the updated LC authority record should overlay the previous LC authority record if the authority record number is the same. This is the way that it works. e. If the 1xx or the 4xx in a new incoming LC authority record matches the text of a system generated, brief authority record, (with the exception of the death date being present either in the LC or brief authority record and not in the other record) the LC authority record should replace the brief, system generated authority record. This is currently not possible in an automated process. These records can be identified in a report and quickly merged by qualified and authorized personnel. f. Problems with existing
authorities/indexing not mentioned in separate authorities memo:
Provide more records in initial displayI need a clarification of this request, which would probably be referred to the User Services Task Force. Provide number of hits for all displays.Result set numbers are now being provided in all Pharos results screens: Browse, Brief results and Full View of Record 6. Are there ongoing maintenance priorities we want to suggest for the UIAS database manager? RECOMMENDATIONS:
RECOMMENDATIONS:
c. Overlay of LC full authority record with an updated LC authority record This has been validated in the UIAS database. d. Deletion of an authority record e. Change of subject authority record to name authority record This has been validated in the UIAS database. The change is also reflected in the indexes as well. f. Additions to algorithm pertaining to maintenance described above. We need to discuss this further. g. Generation of automatic lists mentioned above, i.e. for duplicate OCLC numbers, deleted master records, duplicate local bibliographic numbers, records that match on one match point excluding title match point This is a list of reports. Reports are created by querying the union catalog database. This query produces a result set. The result set can then be ordered and formatted in a way that makes it more useful to a person who is going to use the report to look-up records in the database. We will create these reports and when they have been fine-tuned we will create a simple procedure so that a report can be re-run when needed. We need to carefully think through each report, identifying exactly what we want in the report, how the information should be formatted, etc. The goal should be to generate reports that quickly provide pointers to records that need attention. |
|
|
|