Thursday, June 23, 2005

Google Translator: The Universal Language

Google Blogscoped provides an overview of one of Google's projects. Machine translation has always seemed really problematic. The approach described below may throw enough computing resources at the problem to work out the problem. How applicable the United Nations documents are to translating theological literature is an interesting question. When I was at Carnegie Mellon University a few years ago, I talked with one of the researchers about a translation project he was working on. He talked about the Bible being an interesting sort of Rosetta Stone given the number of translations that are available. That provides its own set of problems, but it is an interesting approach...

excerpt: This is the Rosetta Stone approach of translation. Let'’s take a simple example: if a book is titled “Thus Spoke Zarathustra” in English, and the German title is “Also sprach Zarathustra”, the system can begin to understand that "“thus spoke"” can be translated with "“also sprach"”. (This approach would even work for metaphors --– surely, Google researchers will take the longest available phrase which has high statistical matches across different works.) All it needs is someone to feed the system the two books and to teach it the two are translations from language A to language B, and the translator can create what Franz Och called a "“language model."” I suspect it'’s crucial that the body of text is immensely large, or else the system in its task of translating would stumble upon too many unlearned phrases. Google used the United Nations Documents to train their machine, and all in all fed 200 billion words. This is brute force AI, if you want --– it works on statistical learning theory only and has not much real "“understanding"” of anything but patterns.