Sunday, August 28, 2005

Google searching

Will Google’s Keyword Searching Eliminate the Need for LC Cataloging and Classification?
by Thomas Mann

abstract: Google Print does not "change everything" regarding the need for professional cataloging and classification of books; its limitations make cataloging and classification even more important to researchers. Google’s keyword search mechanism, backed by the display of results in "relevance ranked" order, is expressly designed and optimized for quick information seeking rather than scholarship. Internet keyword searching does not provide scholars with the structured menus of research options, such as those in OPAC browse displays, that they need for overview perspectives on the book literature of their topics. Keyword searching fails to map the taxonomies that alert researchers to unanticipated aspects of their subjects. It fails to retrieve literature that uses keywords other than those the researcher can specify; it misses not only synonyms and variant phrases but also all relevant works in foreign languages. Searching by keywords is not the same as searching by conceptual categories. Google software fails especially to retrieve desired keywords in contexts segregated from the appearance of the same words in irrelevant contexts. As a consequence of the design limitations of the Google search interface, researchers cannot use Google to systematically recognize relevant books whose exact terminology they cannot specify in advance. Cataloging and classification, in contrast, do provide the recognition mechanisms that scholarship requires for systematic literature retrieval in book collections.

Thomas Mann articulates the many problems most librarians recognize with reliance on keyword searching using a Google or Google-like search engine. As much as I agree with him, I think he overlooks several significant issues:

1. Like it or not, teens and young adults have grown up in a digital culture. The modes of searching and retrieving information that relies on cataloging and classified shelving systems are rooted in a print culture. This is not likely to change and will become a more significant issue with each new generation. Mann declares that relying on keyword searching and Internet sources to the exclusion of traditional library sources is "bad scholarship." Indeed it is bad scholarship judged according to the standards of the 20th century. What is not clear is whether those standards will endure in the 21sth century.

2. Clayton Christensen talks about "disruptive technologies" in his The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail. Disruptive technologies generally appear less adequate for the task than established technologies. They often develop rapidly and on different tragectories to be far superior than the established technologies. I don't know that Google's approach to digitization and information discovery is the disruptive technology that will surpass traditional library approaches, but I don't think that possibility can be easily dismissed.

3. Google is obviously able to move ahead at a rapid pace, but even with its capacity, it will take a number of years to complete the project they've launched. To imagine that the information discovery technology that they currently provide will not advance along with the growth of the digital content is a questionable assumption.

4. I wonder if the very nature of what Google is creating can really be compared with a library of print books. It's probably too soon to say, but the physical constraints of print publishing begin to disappear with digital content. To assume the content (and consequently our discovery and use of the content) is not affected to some extent by its medium may not be a valid assumption.

Mann's critique of the Google-Print project is a significant critique, but perhaps not the whole story.