| |
Pharos Union Catalog Master Record Algorithm
Definitions of matching, master record class, and
eclass are listed below.>
-
The record with a lower "master record class"
number of a match will be the master record.>
-
If the "master record class" numbers of a
match are the same, or neither record of a match has a "master record class",
the master record will be based on the "eclass".>
-
The record with a lower "eclass" number will
be the master record.>
-
If both records have an "eclass" number of
3, a count of the 7XX and 246 tags is taken, and the record that has a
higher count of 7XX/246 tags will be taken as the master record.>
-
If the "eclass" number is the same for both
records (except for eclass number 3), a count of the "eclasses" for each
record is taken, and the record that has a higher count of "eclasses" will
be the master record.>
-
If the master record has not been selected
by the preceding master record logic, then the record with most recent
date will be the master record.>
-
This is determined by the 005 date.>
-
If only one record has a 005, that record
is kept.>
-
If neither record has a 005, the date is determined
by the first six numbers in the 008.>
-
If the dates are the same, the first record
into the database is the master record.>
Matching
These are the fields used to find a match between two records.
If multiple occurrences of a tag exist, then each occurrence of the
tag will be evaluated as a match point.
-
OCLC Number : tag 001, 010$o, 019$a (all subfield a’s within each 019 tag),
and 035$a. The OCLC number will be normalized - preceding alpha characters
and zero’s and any suffixes will be ignored. 035 tags that have a subfield
‘b’ will not be used as an OCLC numberunless they have (OCoLC)
in subfield a>. All unique OCLC numbers will be merged into the record
kept from all matching records.
-
020$a
-
022$a
-
028$a : this field will be normalized.
-
086$a
-
010$a : only the first eleven characters will be used, this field is normalized
and suffixes are ignored.
-
245$a : only the first five words will be used. If there are fewer than
five words then match on all words. If the titles are fewer than five words
and differ in length they won't match, even if the first few words are
the same for both titles.
To consider two records a match, they must match on:
-
match point 1 (OCLC number) and one of the following:
-
match points 2 through 7.
Or they must match on at least two of the match points 2 through 7.
If the records are serials, and they both have OCLC numbers, they must
match on the OCLC number.
Do not select a serial record with 247 fields as the master record unless
the title is uniquely held by only one library.
Serials are determined by leader 07 (Bibliographic Level).
Master Record Class
The master record class list is in priority of preference with master
record class #1 being the most desirable.
All leaders and 008 matches are case insensitive. i.e. 000/17 I will
match 000/17 i.
Input
000/17 Encoding level
008/39 Cataloging source
040/ac Original cataloging agency/Transcribing agency
042 exists Authentication code
-
000/17 is blank and 008/39 is blank and 040$ac are both "DLC"
-
008/39 is blank and 040$a is "DLC"
-
000/17 is blank and 008/39 is a, b, or c
-
040$ac are both "GPO"
-
000/17 is I [alpha] and 042 tag exists
-
000/17 is blank and 008/39 is d and 042 tag exists
-
000/17 is 1 [numeric] and 008/39 is blank
-
000/17 is I
-
000/17 is 2 and 008/39 is blank
-
000/17 is 5 or 7 and 008/39 is blank
-
000/17 is K, L, or M
Eclass
The eclass list is in priority of preference with eclass #1 being the
most desirable.
-
505 tag exists
-
780 or 785 tag exists
-
count of 7XX and 246 tags
-
856 tag exists
-
6XX 2nd indicator is 1
-
655 tag exists
-
520 tag exists
-
6XX 2nd indicator is 2
9XX Tags
935: This field repeats for each library that has a title for the master
record. The subfield a of each 935 is the OCLC symbol for
that library. The subfield b is that library's local system control
number. Pharos uses the 935 field to build a Hook-to-Holdings hypertext
link which the user or system can use to display the local library's item
record information, i.e., location, local call number, and circulation
status.
997: A list of the unique OCLC numbers after they have been normalized.
998: The subfield a contains the library that the master record belongs
to.
999: There is a 999 field for each library that holds the title represented
by the master record.
999$a: The first $a is the Campus Location. The second $a is the control
number (tag 001 by USMARC definition) from the original record of that
library. This is not necessarily the local systems control number (tag
035 by USMARC definition, but varies with each systems implementation.
The 001 could possibly be the local systems control number). If the OCLC
number happens to be in the 001, this number is both the OCLC number and
the control number. This is used to easily find the original record for
testing purposes (at least on RetroLink’s end).
999$b: Master Record Class number (the lower the number, the higher
the priority)
999$c: Eclass number (the lower the number, the higher the priority)
999$d: Count of all matching eclasses
999$e: (method of resolving which record to keep)
-
If the method is ‘master record class’, the record kept has a lower master
record class than the other and no further processing was needed.
-
If the method is ‘eclass’, the master record classes are the same or they
both don't have a master record class. And the record kept has a lower
eclass than the other.
-
If the method is ‘7XX/246’, the master record classes are the same or they
both don't have a master record class. And the record kept has a higher
count of 7XX/246 tags.
-
If the method is ‘count’, the master record classes are the same or they
both don't have a master record class. And the eclasses are the same. But
the record kept has more eclasses than the other record.
-
If the method is ‘date’, everything was the same. But the record kept has
a more recent date. This is determined by the 005 date. If only one record
has an 005, that record is kept. If neither record has an 005, it is determined
by the first six numbers in the 008.
-
If the method is ‘load order’, there wasn't anything to distinguish which
record to keep, so the first record into the database was kept.
-
If the method is ‘serial 247’, the record is a serial record and it is
the master record because it doesn't have any 247 tags, and the record
that matched it did have a 247 tag.
If a 999 tag doesn't have a $b, $c, or $d, that record either didn't have
that class, or it resolved which record to keep before it got to that point.
The first 999 tag of each record wont have a $e because it is the originating
record and doesn't match against itself. The other 999$e’s show the method
used to resolve which record was kept.
Note : If there are two or more records that matched the master record
, the $e will show the method for the two records that matched, which might
not be the same for the first record and the final record.
For instance, say records 1, 2, and 3 all matched. Records 1 and 2 matched
first, keeping record 2 by the eclass (neither record has a master record
class).
The $e of the 999 tag for record 1 would be ‘eclass’, which is merged
into record 2.
At this point, the 999 for record 2 will not have $e.
Then records 2 and 3 match, keeping record 3 by the master record class
(record 3 has a master record class, record 2 doesn't).
Now the 999$e for record 2 will be ‘master record class’, and all the
999 tags from record 2 are merged into record 3 (which includes the 999
from record 1).
The 999$e for record 1 will still be ‘eclass’ because that was the method
for determining which record to keep between records 1 and 2.
Looking at the final data though, one might think the 999$e for record
1 should have been ‘master record class’, because if you compare records
1 and 3, that would be what the $e of record 1 would be.
But records 1 and 3 were never compared, so the $e for record 1 remains
‘eclass’ from the comparison of records 1 and 2.
UIAS
Task Force for Systems Documents
|