Cutter Code Comparison

Author name (family name first):

Total Cutter code length (max): 4

Simple Cutter:
Banerjee Code:
Educated Cutter:
Cube Code:
Hash Code:

About

What is this? Some experiments comparing different Cutter Code variants by an MLIS student.

The "Simple Cutter" Code is generated by following the procedure described on Library of Congress instruction sheet G 63, as described on this webpage. The Banerjee Code is from the Cataloging Calculator by Kyle Banerjee. (I lightly adapted Banerjee's original code for it to work with this site.)

The "Educated Cutter" Code is my attempt at setting ranges for first-two letter combinations not specified in G 63 (e.g. names starting with "AC"). My goal was to make all the "buckets" for each digit as having an equal chance of being used (thereby increasing room for growth/hospitality) while staying compatible with G 63.

The Cube Code is my best attempt at an improved version of the Cutter Code if allowed to break from G 63. Cube Code assigns a letter + two-digit number code based on the first three letters in a given name. Crucially, the two-digit number is based on the letter frequencies.

Hash Code uses MurmurHash3 for the author name (via this JavaScript library) to encode the author's name after the 2nd letter. It is arguably the most radical form here, as it does not maintain alphabetical order beyond the 2nd letter of the author's name.

How I used letter-frequency analysis to make Educated Cutter Codes and Cube Codes

To create both the "Educated Cutter" Code and the Cube Code, I needed realistic letter frequency data, preferable from real author-name data. I got 2 million names from the Library of Congress. I then fed these names in to a Rust program I wrote to analyze letter frequencies.

For Cube Code, the program creates 26 "tables" (one for each possible first-letter of a name). Each of these tables has 676 values ranging for 1 to 99, representing the second and third letters in the given name. A single number, like 25, is reused for ranges of combinations of second and third letters if they appear infrequently in the data.

Here's the Cube Code table or "map" for names that begin with the letter "B":

      B_A B_B B_C B_D B_E                                                                             B_Y B_Z
BA_   01, 01, 01, 02, 03, 03, 03, 04, 04, 05, 05, 06, 07, 08, 09, 11, 12, 12, 13, 14, 15, 16, 17, 18, 19, 21, 
BB_   22, 23, 24, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 
BC_   25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 
BD_   25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 
BE_   25, 26, 26, 27, 28, 28, 28, 29, 29, 29, 29, 31, 32, 33, 34, 35, 36, 36, 37, 38, 39, 41, 42, 43, 43, 43, 
      44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 
      44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 
      44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 
      45, 45, 45, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 51, 51, 51, 51, 51, 51, 51, 
      51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 
      52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 
      52, 53, 54, 54, 54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 
      56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 
      56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 
      56, 56, 57, 57, 58, 58, 58, 59, 59, 61, 61, 61, 62, 62, 63, 64, 64, 64, 65, 66, 67, 68, 69, 71, 71, 72, 
      72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 
      72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 
      72, 73, 74, 75, 76, 77, 78, 78, 78, 79, 81, 81, 81, 81, 81, 82, 83, 84, 85, 85, 85, 86, 87, 87, 87, 87, 
      87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 
      87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 
      87, 87, 87, 88, 89, 89, 89, 91, 91, 91, 91, 91, 92, 92, 92, 92, 92, 93, 94, 95, 96, 97, 98, 98, 98, 98, 
      98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 
      98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 
      98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 
BY_   98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 
BZ_   98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,

We can see that so few author names start with "BB", "BC", or "BD" that the number 25 covers almost the entire range from "BBD" to "BEA". Thus, all names in that range, like "Beagle", become "B25". (In contrast, "BAD" is such a common starting triple that it gets its own number: 02.)

Additional letters, beginning with the fourth and fifth ("GL" in "BEAGLE"), are similarly processed in pairs, if possible, to better optimize bucket sizes based on letter frequencies observed.

Still To Do

I'd love to add the "Cutter-Sanborn Three-Figure Author Tables (Swanson-Swift 1969 revision)" to this tool.

With Cube Code, I'm not convinced I made the best assumptions in creating the process. If the goal is to have similar author names generate unique codes with the fewest code figures possible, there might be a better approach to assigning ranges based on letter-frequency data.

Links to code, contact info

The code for this website and the JavaScript that powers the form above is on GitHub, as well as the program I wrote to create the underlying letter-frequency data.

If you have any thoughts, please feel free to email me at sschlink [at] pratt dot edu.