Google’s Open Source OCR
OCR -Optical character recognition- is a type of software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text. OCR also has the capability to translate pictures of characters into a standard encoding scheme representing them (e.g. ASCII or Unicode).
Why am I blogging about OCR? Well because Google has its finger in this Open Source Pie.
OCR History
Tesseract was the original OCR engine developed at the HP Labs between 1985 and 1995. HP decided to abandon OCR research and, for ten years, the software’s development has been frozen. In 2005, HP made Tesseract open source (Apache License) and Google, together with a research institute, have continued the development of the program.
Why is it important for Google to be invloved in OCR?
OCR is useful for Google Book Search and it could be useful for Picasa or Image Search in addition to an object recognition engine. And, if Google improves the software, it could be launched as a successful alternative to commercial applications.
Watch this space…


























