Tuesday, December 14, 2004

A Digital Library (thanks to Google)

Since I read Malcolm Gladwell's The Tipping Point* a few months ago, I've wondered what the tipping point might be for a transition from print to electronic books. It's clear that major research libraries have invested heavily in building digital collections, but few have successfully managed to make digital collections an integrated part of the collection development policies and procedures for the library. Even fewer smaller libraries have been able to even imagine such a change. Digitization efforts have continued to be viewed as pilot projects rather than a part of the routine operation of the library.

Having participated in a couple digitization projects, it has become clear that building a sustainable model for digitization is a significant challenge. Apart from external funding for special "projects" operational budgets haven't been increased to provide increased resources to digitize collections. As well, the amount and cost of print publications continues to increase.

And then, along comes Google. It's been clear for months that Google has a very aggressive plan. Google Scholar was only the beginning. I don't think we yet know the full extent of the Google plan. With the resources of their very successful IPO, they are able to take on a very ambitious digitization project. I suspect they see the potential of multiple revenue streams that will support this effort that grow out of more than just the advertising that we see on their website. Agreements with online book vendors and on-demand printing are just a couple that come immediately to mind. Yet, while this ambitious plan has a measure of risk, I believe it is only through some sort of commercial/not-for-profit collaboration that a sustainable model can be developed for such a digitization project.

If they succeed, this might be the tipping point. With the Million Book project originated at Carnegie Mellon University, as well as e-book projects being archived at the Internet Archive (http://www.archive.org/texts/texts.php), and the efforts just announced by the Library of Congress to digitize 70,000 books by next April in cooperation with a group of international libraries from the United States, Canada, Egypt, China and the Netherlands, there begins to be a substantive body of print books available in digital form. Perhaps, this will be the tipping point at which it becomes expected that one will search Google to locate the book you are seeking.

This raises several significant issues. First, for theological libraries, the collection being digitized is not necessarily well balanced. It is not clear that books from Harvard Divinity School's library will be included in the initial 40,000 book pilot project at Harvard. Even if all goes well, and Harvard decides to proceed beyond the pilot project, how much of Andover-Harvard Library collection will be digitized remains a question. Hopefully much of it will be, but even if all of it is digitized, it remains a collection designed to support the curriculum and research at Harvard, not necessarily the curriculum and research of other theological schools. The risk is that searchers will assume that everything is digitized when only a portion may be.

A great deal of anecdotal evidence and some more systematic studies indicate that at least undergraduates already select Google as the search engine of choice when they seek information. Adding bibliographic records and full-text to the Google database will only solidify that notion. If it can't be found on Google, it won't be found (or at least used.) Harvard plans to provide bibliographic records for the books digitized from their collection, and I suspect the other libraries will as well. In any event, Google's new relationship with OCLC will undoubtedly provide good catalog records. But I doubt the Google search interface is going to evolve to resemble a library's catalog where one can develop a structured search that utilizes the various fields that are carefully created by catalogers. What will be the future of the bibliographic record? Bibliographies? Library catalogs?

This raises a number of questions for me about the future of library efforts to provide web sites and web resources, and perhaps even online catalogs as we know them today. Will it make more sense, for example, to find a way of providing circulation status through Google? And what will be the role of what we have known as public services in an environment where not only the search engine, but the content is served up by a commercial entity outside the library?

Much of the traditional role of libraries has been to develop collections. These cohesive collections have reflected the mission and nature of the communities they have served. Users have selected and read books from these collections in the context of those collections. As we move to a massive online digital collection, that context is lost. Indeed, even the context of the book may be lost as one may read only small excerpts from a book, missing the linear development of an argument that is obtained only by reading the entire book. Not only the way we search and access information, but the very way we perceive information is about to change.

*Gladwell, Malcolm. The Tipping Point: How Little Things Can Make a Big Difference. 1st ed. Boston: Little Brown, 2000.