Harvard's Googled Library

In agreements announced December 14, the world's largest search engine, Google, undertook to build an on-line reading room to house digital, searchable versions of millions of books belonging to Harvard, Stanford, the University of Michigan, the New York Public Library, and Oxford University, and to make them freely available to anyone anywhere on the Internet. The collaboration has the potential to create what Harvard librarians call "a revolutionary new information-location tool" and "an important public good."

Harvard began with a pilot project: digitizing 40,000 of the five million books at the Harvard Depository in Southborough, Massachusetts. Google staff began to work at the depository in mid January, using their own proprietary apparatus (which the company will neither describe nor allow to be photographed), and should finish by summer.

Harvard wants to see how the production process goes in the pilot project before agreeing to let Google go further, explains Sidney Verba, Pforzheimer University Professor and director of the Harvard University Library. Although he and his colleagues do not believe that this is risky business, they want to make certain that Google doesn't damage books, or lose them, or keep them out of circulation too long. For its part, Google needs a better sense of the price of wholesale digitization, says Verba. That newly public, cash-flush company will pay all the bills, apart from some support expenses borne by Harvard. Google isn't talking, but others speculate that digitization might cost $10 per book; with earlier technology, costs were far higher. Finally, the laws that apply to digitized books still under copyright are uncertain and changing, and even though many publishers have already entered into agreements with Google about how their books may be used, it is not certain, says Verba, that the grand plan won't be challenged.

Verba has seen Google staffers scan a book and judges the process kind, gentle, and efficient. "We are fairly well convinced the pilot project is going to work," he says. If so, then Google will get going on all 15 million Harvard books, a job that will likely take between five and 10 years.

When the pilot project is done, the first 40,000 books will be virtually browsable. Now, because they are off campus, their content is difficult to assess, even by Harvard users. Then, anyone will be able to go to Google and read the full text of out-of-copyright materials. Probably no text of books under copyright will be displayed at first, but the library anticipates that if the full project goes forward, snippets, or paragraphs, or perhaps even a page or two of copyrighted books will be shown by Google -- enough text to enable a researcher to determine whether to go to a library to see the whole book or to buy a copy.

In the coming era, a student will go to Google to search for books on Ralph Waldo Emerson, let's say, and read the books or sample them on-line. If the student wants to know what Emerson had for breakfast, a keyword search will zip through the books and uncover any reference to his tastes. (Verba is not sure in what order books will be arranged in search results. Perhaps the most popular ones will be at the top of the list. Perhaps some yet-to-be-introduced software, informed by artificial intelligence, will direct a user more effectively to what that user, in particular, really wants.)

Verba says that eventually the Google page shown to users at Harvard will have a link to HOLLIS, the University's on-line library catalog, so that a researcher can easily see what library at Harvard has a wanted book and whether it is available.

A student could approach an Emerson research project in this way, by going first to Google, but Verba recommends starting with HOLLIS. That catalog will show all books about Emerson at Harvard and tell which have been Googleized, providing a link. Moreover, HOLLIS catalogs books too fragile to be scanned, as well as materials in numerous other media -- photographs, for instance -- and so would also reveal that Harvard has Emerson's papers.

Some students today may incline to the heresy that the Internet can safely be their sole source of research information: if you can't learn on the Internet that Emerson liked his eggs sunny side up, the information isn't worth having. The Googleization of Harvard's miles and miles of books, however, may usefully marry the Internet and the library in the mind of any child of the times with a tendency to feel they are divorced.


You might also like

The Roman Empire’s Cosmopolitan Frontier

Genetic analysis reveals a culture enriched from both sides of the Danube.

Tobacco Smoke and Tuberculosis

Harvard researchers illuminate a longstanding epidemiological connection. 

Discourse and Discipline

Harvard’s Faculty of Arts and Sciences broaches two tough topics.

Most popular

Small-Town Roots

Professors’ humble beginnings, concentration choices, and a mini history of Harvard and Radcliffe presidents

Vita: Fanny Bullock Workman

Brief life of a feisty mountaineer: 1859-1925

Being Black at Work

Realizing the full potential of black employees

More to explore

Illustration of a box containing a laid-off fossil fuel worker's office belongings

Preparing for the Energy Transition

Expect massive job losses in industries associated with fossil fuels. The time to get ready is now.

Apollonia Poilâne standing in front of rows of fresh-baked loaves at her family's flagship bakery

Her Bread and Butter

A third-generation French baker on legacy loaves and the "magic" of baking

Illustration that plays on the grade A+ and the term Ai

AI in the Academy

Generative AI can enhance teaching and learning but augurs a shift to oral forms of student assessment.