What news from AWS re:Invent last week will have the most impact on you?
Amazon Q, an AI chatbot for explaining how AWS works.
Super-fast S3 Express storage.
New Graviton 4 processor instances.
Emily Freeman leaving AWS.
I don't use AWS, so none of this will affect me.
Tech Culture

What the Death of the Card Catalog Taught Us about Data Migration

Jul 30th, 2017 6:00am by
Featued image for: What the Death of the Card Catalog Taught Us about Data Migration
Feature image via Pixabay.

When we transition to a new technology, do we lose valuable metadata? Could errors be introduced in the process of transfer of data from one medium to another? It’s an issue society has already had to confront — when the world’s libraries transitioned to digital and online catalogs from their older systems based on paper cards.

Maybe there are some lessons to be learned by looking at this essential point of transition from physical to digital form.

 The Card Catalog - Books, Cards, and Literary Treasures.This April Chronicle Books released The Card Catalog: Books, Cards, and Literary Treasures, a look back at the storage media of yesteryear. The Washington Post calls it “a heady antidote to the technophilia threatening our culture,” noting that by 1969 the Library of Congress was printing 79 million of the paper cards every year. The New Yorker once calculated that by 1968 the Library of Congress was distributing “about a thousand cards a minute, for around five cents a card.”

But as the generations rolled on, things changed.

For centuries, card catalogs were the essential way that libraries kept track of their books and the content within those books. Each book got its own card, which detailed — sometimes handwritten — the author, date published and other pertinent info. They were then arranged in alphabetical order, by author, title, subject matter and additional indexes.

In 1986, another New Yorker article remembered that “bleak” winter day when the New York Public Library closed its card catalog room so it could replace the 8,000 oak card-catalog drawers with a sleek set of 32 computer terminals. And a new watershed was reached less than two years ago when the Online Computer Library Center (OCLC) printed its very last catalog card. It had been producing them since 1971, shipping out up to eight tons of catalog cards each week — and over the years it had printed more than 1.9 billion cards. But with libraries moving to online catalogs, there just wasn’t enough demand anymore for printed paper cards.

But the OCLC had also played a big role in bringing about that transition, according to this 1994 article in the New Yorker by author Nicholson Baker, since the nonprofit had also been transferring old library catalog cards into “machine-readable form” for a small fee. Baker had found 60 people busily working on card-catalog collections from 40 different libraries, including the public libraries in Los Angeles, San Francisco, and Cincinnati.

Nicholson Baker bemoaned as libraries purchasing “remedial software meant to correct the hash that earlier technologies have made of information once safely stored on paper.”

“Libraries are entrusting us with the history of their library collection,” said the OCLC’s Maureen Finn, who ran the program. By 1994, the program had been running since the 1970s, and “In seventeen years, we’ve never lost a card.”

But what happened to all those old cards, with all of their additional notes and “See also” references directing patrons to other parts of a library’s collections? Finn told the New Yorker that “most library managers that I talk to will say, ‘We are storing them because it makes the staff feel good, and we will be getting rid of them.'”

By 1994, millions of cards were being destroyed each year. Some libraries even converted them into scratch paper. The subject catalogs had already vanished at Dartmouth, Kent State, and Boston University. The article also reported that “A recycling firm called Earthworm, Inc., carted off the bulk of MIT’s cards in 1989.” Historian Helen Rand Parish complained that it was all comparable to burning the Library of Alexandria.

 Oakland Public Library - 2017 - image by David Cassel

There was obviously a use case for the transition to digital. Online catalogs are cheaper, can be accessed remotely, and, the New Yorker pointed out, they didn’t grow mold, unlike the water-damaged card catalog at one engineering library at the University of Toronto. Online catalogs were also more easily accessible by people in wheelchairs, and — perhaps as importantly — there was government funding for the conversion.

Yes, one study found that preteens had more trouble using the online catalogs. But Baker saw bigger problems…

“The unfortunate truth is that, in practice, existing frozen card catalogs… are typically being replaced by local databases that are full of new errors, are much harder to browse efficiently, are less rich in cross-references and subject headings, lack local character, do not group related titles and authors together particularly well, and are in many cases stripped of whole classes of specific historical information (e.g., the original price of the book, its acquisition date, its original cataloguing date, its accession number, the original cataloguer’s own initials, the record of any copies that have been withdrawn, and whether it was a gift or a purchase) that existed free, using up no disk space or computer-room electricity, requiring no pricey software updates or daily backups or hardware service calls…”

The OCLC — the nonprofit producing many of these digitized catalogs —had offered bounties and special discounts to librarians who submitted new entries for books that weren’t yet in their master database. The New Yorker saw this as incentivizing the world’s librarians to be “engaged in the creation of a kind of virtual community long before there were such things as Usenet and listservs, to pump up the burgeoning database… a highly democratic, omnidirectional collaboration among hundreds of thousands of once-isolated documentalists.”

By 1994 OCLC’s database held 30 million records — about 25 percent from the Library of Congress, but “the majority being the work of nearly seven thousand member libraries.” But the quality wasn’t always perfect — bedeviled by superfluous extra listings for different ways of indicating, for example, Tennyson or Alexander the Great. And in the early days, the cumbersome process for correcting errors actually required a stamped letter sent through the postal service.

OCLC then implemented an automated cleanup through their “Duplicate Detection and Resolution” software, according to Martin Dillon, director of the OCLC’s Library Resources Management Division. “When databases get as large as ours the contribution of individual humans is severely limited. The task is so large that no practical number of humans could handle it.” At one point their software found over 600,000 duplicated titles.

Soon other companies had created error-correcting programs for digital library catalogs, leading to a state which the Baker bemoaned as libraries purchasing “remedial software meant to correct the hash that earlier technologies have made of information once safely stored on paper.”

The article sees a world suffering from the “random loss of thousands of books as a result of clerical errors committed in disassembling each card catalog, sorting and boxing and labeling its cards, and converting them en masse to machine-readable form — a kind of incidental book burning that is without flames or crowds and, strangest of all, without motive… every cataloger and technical-services person I asked admitted that there are now books in their library that, owing to inevitable slip-ups of one sort or another, aren’t in the online catalog that is supposed to help you find them.”

Libraries and Makers

It’s bracing to read that article today, but it’s left me thinking that maybe these are the kind of issues that always crop up on the road to more information that’s more easily accessible. My local public library now has an online catalog which can not only tell me a book’s call number, but whether the book is actually on the shelves. And if it’s not, the online listing can also tell me which other branches may have a copy. I can even request that they hold the book for me — all online. I can do this all from the comfort of my own home.

And of course, there’s an even bigger picture. I remember a 2013 talk from the director of the San Rafael Public Library at one of San Francisco’s “Nerd Nite” meetups. Sarah Houghton — who writes the popular blog “Librarian in Black” — had titled her presentation “Where’d the Card Catalog Go? Today’s Ass-Kicking Libraries and Librarians.”

Houghton reminded the audience that today’s libraries are about more than books: “Libraries are one of the first institutions to really wholeheartedly embrace the maker movement.” They offer 3D printers and video production equipment, and many offer electronics classes. Some let you check out everything from tools to seeds, musical instruments, toys, and even artwork. Libraries also host events for the community — the San Francisco Public Library actually held a “literary speed-dating” event.

“If that’s not sexy enough, you can kill a pig and butcher it at your local library,” she says — putting up a photo of a butchering workshop that’s been conducted at libraries around the U.S. “If you like to kill things, maybe you can get your library to kill stuff with you.”

Perhaps the ultimate takeaway is that at the end of the day, data only exists as a tool for a larger mission.

“Our currency is information,” Houghton told her audience. “So if there’s a technology that makes it easier to purvey information, that makes our job better.

“And you are the beneficiaries of that because you get information faster and easier.”


Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.