| Simple Cutter: | |
| Banerjee Code: | |
| Educated Cutter: | |
| Cube Code: |
What is this? Some experiments comparing different Cutter Code variants. The "Simple Cutter" Code is generated by following the procedure described on Library of Congress instruction sheet G 63, as described on this webpage. The Banerjee Code is from the Cataloging Calculator by Kyle Banerjee. (I lightly adapted Banerjee's original code for it to work with this site.)
The "Educated Cutter" Code is my attempt at setting ranges for first-two letter combinations not specified in G 63 (e.g. names starting with "AC"). My goal was to make all the "buckets" for each digit as having an equal chance of being used (thereby increasing room for growth/hospitality) while staying compatible with G 63.
The Cube Code is my best attempt at an improved version of the Cutter Code if allowed to break from G 63. Cube Code assigns a letter + two-digit number code based on the first three letters in a given name. Crucially, the two-digit number is based on the letter frequencies.
To create both the "Educated Cutter" Code and the Cube Code, I needed realistic letter frequency data, preferable from real author-name data. I got 2 million names from the Library of Congress. I then fed these names in to a Rust program I wrote to analyze letter frequencies.
For Cube Code, the program creates 26 "tables" (one for each possible first-letter of a name). Each of these tables has 676 values ranging for 1 to 99, representing the second and third letters in the given name. A single number, like 25, is reused for ranges of combinations of second and third letters if they appear infrequently in the data.
Here's the Cube Code table or "map" for names that begin with the letter "B":
B_A B_B B_C B_D B_E B_Y B_Z
BA_ 01, 01, 01, 02, 03, 03, 03, 04, 04, 05, 05, 06, 07, 08, 09, 11, 12, 12, 13, 14, 15, 16, 17, 18, 19, 21,
BB_ 22, 23, 24, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
BC_ 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
BD_ 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25,
BE_ 25, 26, 26, 27, 28, 28, 28, 29, 29, 29, 29, 31, 32, 33, 34, 35, 36, 36, 37, 38, 39, 41, 42, 43, 43, 43,
44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44,
44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44,
44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45,
45, 45, 45, 46, 46, 46, 47, 47, 47, 47, 47, 47, 48, 48, 48, 48, 48, 48, 49, 51, 51, 51, 51, 51, 51, 51,
51, 51, 51, 51, 51, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52,
52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52,
52, 53, 54, 54, 54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56,
56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56,
56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56,
56, 56, 57, 57, 58, 58, 58, 59, 59, 61, 61, 61, 62, 62, 63, 64, 64, 64, 65, 66, 67, 68, 69, 71, 71, 72,
72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72,
72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72, 72,
72, 73, 74, 75, 76, 77, 78, 78, 78, 79, 81, 81, 81, 81, 81, 82, 83, 84, 85, 85, 85, 86, 87, 87, 87, 87,
87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87,
87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87, 87,
87, 87, 87, 88, 89, 89, 89, 91, 91, 91, 91, 91, 92, 92, 92, 92, 92, 93, 94, 95, 96, 97, 98, 98, 98, 98,
98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98,
98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98,
98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98,
BY_ 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98,
BZ_ 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99, 99,
We can see that so few author names start with "BB", "BC", or "BD" that the number 25 covers almost the entire range from "BBD" to "BEA". Thus, all names in that range, like "Beagle", become "B25". (In contrast, "BAD" is such a common starting triple that it gets its own number: 02.)
Additional letters, beginning with the fourth and fifth ("GL" in "BEAGLE"), are similarly processed in pairs, if possible, to better optimize bucket sizes based on letter frequencies observed.
I'd love to add the "Cutter-Sanborn Three-Figure Author Tables (Swanson-Swift 1969 revision)" to this tool.
With Cube Code, I'm not convinced I made the best assumptions in creating the process. If the goal is to have similar author names generate unique codes with the fewest code figures possible, there might be a better approach to assigning ranges based on letter-frequency data.
The code for this website and the JavaScript that powers the form above is on GitHub, as well as the program I wrote to create the underlying letter-frequency data.
If you have any thoughts, please feel free to email me at sschlink [at] pratt dot edu.