Pharos Union Catalog Master Record Algorithm

Definitions of matching, master record class, and eclass are listed below.>
  • The record with a lower "master record class" number of a match will be the master record.>
  • If the "master record class" numbers of a match are the same, or neither record of a match has a "master record class", the master record will be based on the "eclass".>
  • The record with a lower "eclass" number will be the master record.>
  • If both records have an "eclass" number of 3, a count of the 7XX and 246 tags is taken, and the record that has a higher count of 7XX/246 tags will be taken as the master record.>
  • If the "eclass" number is the same for both records (except for eclass number 3), a count of the "eclasses" for each record is taken, and the record that has a higher count of "eclasses" will be the master record.>
  • If the master record has not been selected by the preceding master record logic, then the record with most recent date will be the master record.>
    • This is determined by the 005 date.>
    • If only one record has a 005, that record is kept.>
    • If neither record has a 005, the date is determined by the first six numbers in the 008.>
    • If the dates are the same, the first record into the database is the master record.>
Matching

These are the fields used to find a match between two records.
If multiple occurrences of a tag exist, then each occurrence of the tag will be evaluated as a match point.

  1. OCLC Number : tag 001, 010$o, 019$a (all subfield a’s within each 019 tag), and 035$a. The OCLC number will be normalized - preceding alpha characters and zero’s and any suffixes will be ignored. 035 tags that have a subfield ‘b’ will not be used as an OCLC numberunless they have (OCoLC) in subfield a>. All unique OCLC numbers will be merged into the record kept from all matching records.
  2. 020$a
  3. 022$a
  4. 028$a : this field will be normalized.
  5. 086$a
  6. 010$a : only the first eleven characters will be used, this field is normalized and suffixes are ignored.
  7. 245$a : only the first five words will be used. If there are fewer than five words then match on all words. If the titles are fewer than five words and differ in length they won't match, even if the first few words are the same for both titles.
  • To consider two records a match, they must match on:
    • match point 1 (OCLC number) and one of the following:
    • match points 2 through 7.
  • Or they must match on at least two of the match points 2 through 7.

  •  
  • If the records are serials, and they both have OCLC numbers, they must match on the OCLC number.
  • Do not select a serial record with 247 fields as the master record unless the title is uniquely held by only one library.
  • Serials are determined by leader 07 (Bibliographic Level).
  •  
    Master Record Class

    The master record class list is in priority of preference with master record class #1 being the most desirable.

    All leaders and 008 matches are case insensitive. i.e. 000/17 I will match 000/17 i.

    Input

    000/17 Encoding level

    008/39 Cataloging source

    040/ac Original cataloging agency/Transcribing agency

    042 exists Authentication code

    1. 000/17 is blank and 008/39 is blank and 040$ac are both "DLC"
    2. 008/39 is blank and 040$a is "DLC"
    3. 000/17 is blank and 008/39 is a, b, or c
    4. 040$ac are both "GPO"
    5. 000/17 is I [alpha]  and 042 tag exists
    6. 000/17 is blank and 008/39 is d and 042 tag exists
    7. 000/17 is 1 [numeric] and 008/39 is blank
    8. 000/17 is I
    9. 000/17 is 2 and 008/39 is blank
    10. 000/17 is 5 or 7 and 008/39 is blank
    11. 000/17 is K, L, or M
    Eclass

    The eclass list is in priority of preference with eclass #1 being the most desirable.

    1. 505 tag exists
    2. 780 or 785 tag exists
    3. count of 7XX and 246 tags
    4. 856 tag exists
    5. 6XX 2nd indicator is 1
    6. 655 tag exists
    7. 520 tag exists
    8. 6XX 2nd indicator is 2
    9XX Tags

    935: This field repeats for each library that has a title for the master record. The subfield a of each 935 is the OCLC symbol for that library. The subfield b is that library's local system control number. Pharos uses the 935 field to build a Hook-to-Holdings hypertext link which the user or system can use to display the local library's item record information, i.e., location, local call number, and  circulation status.

    997: A list of the unique OCLC numbers after they have been normalized.

    998: The subfield a contains the library that the master record belongs to.

    999: There is a 999 field for each library that holds the title represented by the master record.

    999$a: The first $a is the Campus Location. The second $a is the control number (tag 001 by USMARC definition) from the original record of that library. This is not necessarily the local systems control number (tag 035 by USMARC definition, but varies with each systems implementation. The 001 could possibly be the local systems control number). If the OCLC number happens to be in the 001, this number is both the OCLC number and the control number. This is used to easily find the original record for testing purposes (at least on RetroLink’s end).

    999$b: Master Record Class number (the lower the number, the higher the priority)

    999$c: Eclass number (the lower the number, the higher the priority)

    999$d: Count of all matching eclasses

    999$e: (method of resolving which record to keep)

      1. If the method is ‘master record class’, the record kept has a lower master record class than the other and no further processing was needed.
      2. If the method is ‘eclass’, the master record classes are the same or they both don't have a master record class. And the record kept has a lower eclass than the other.
      3. If the method is ‘7XX/246’, the master record classes are the same or they both don't have a master record class. And the record kept has a higher count of 7XX/246 tags.
      4. If the method is ‘count’, the master record classes are the same or they both don't have a master record class. And the eclasses are the same. But the record kept has more eclasses than the other record.
      5. If the method is ‘date’, everything was the same. But the record kept has a more recent date. This is determined by the 005 date. If only one record has an 005, that record is kept. If neither record has an 005, it is determined by the first six numbers in the 008.
      6. If the method is ‘load order’, there wasn't anything to distinguish which record to keep, so the first record into the database was kept.
      7. If the method is ‘serial 247’, the record is a serial record and it is the master record because it doesn't have any 247 tags, and the record that matched it did have a 247 tag.
    If a 999 tag doesn't have a $b, $c, or $d, that record either didn't have that class, or it resolved which record to keep before it got to that point.

    The first 999 tag of each record wont have a $e because it is the originating record and doesn't match against itself. The other 999$e’s show the method used to resolve which record was kept.

    Note : If there are two or more records that matched the master record , the $e will show the method for the two records that matched, which might not be the same for the first record and the final record.

    For instance, say records 1, 2, and 3 all matched. Records 1 and 2 matched first, keeping record 2 by the eclass (neither record has a master record class).

    The $e of the 999 tag for record 1 would be ‘eclass’, which is merged into record 2.

    At this point, the 999 for record 2 will not have $e.

    Then records 2 and 3 match, keeping record 3 by the master record class (record 3 has a master record class, record 2 doesn't).

    Now the 999$e for record 2 will be ‘master record class’, and all the 999 tags from record 2 are merged into record 3 (which includes the 999 from record 1).

    The 999$e for record 1 will still be ‘eclass’ because that was the method for determining which record to keep between records 1 and 2.

    Looking at the final data though, one might think the 999$e for record 1 should have been ‘master record class’, because if you compare records 1 and 3, that would be what the $e of record 1 would be.

    But records 1 and 3 were never compared, so the $e for record 1 remains ‘eclass’ from the comparison of records 1 and 2.
     

    UIAS Task Force for Systems Documents

     

    Last Updated: November 30, 1998