| Digitizing Books One Word at a Time |
|
|
| Written by dekard | |
| Monday, 07 January 2008 | |
|
One of the biggest projects going on right now is the digitization of millions of older books, newspapers and other documents. Obviously, any significant document is already available electronically but what about the millions of smaller and less significant books? These still contain very valuable information, history and other perspectives. Maintaining that resource and making it more widely available is a tremendous way to deepen the knowledge that humanity possesses. The best way to accomplish this is to scan the texts and let a computer automatically recognize the characters and words. Unfortunately this is extremely difficult to do accurately on some older texts since the paper is degraded, the ink bleed or simply the font used to too hard to read for a modern OCR system. How can this be fixed? A project lead by the Carnegie Mellon University has worked with one of its professors, Luis von Ahn, to solve this problem. Luis von Ahn developed the CAPTCHA engine which if you are a member of this site you've already used. Its a simple tool which display random number or letters on a colored background. You are supposed to decifer that code and enter it into the website to help prevent spam. Its a tremendously useful tool and is highly effective. But, Luis von Ahn looked at all the work being done on those random letters and numbers and decded that a more useful work could be performed. He and his team have developed a system which extracts those impossible to recognize words from older texts and displays those to users. In turn, those users are helping to fight spam AND process the backlog of millions of books and other texts that need to be recognized and made available online. You can see how the millions of daily users of the CAPTCHA system can now become a boost to the online development and display of these valuable texts. I believe in this project and so this site has been converted to use the new and improved reCAPTCHA system immediately. There's a few links here you are welcome to explore if you'd like more information. If you are not yet a member of this site just register and you'll be able to help the book project and join a great community at the same time. Comment (4) |
| < Prev | Next > |
|---|