![]() On accident, I stumbled across Microsoft Office Document Imaging. It performed abysmally on the provided testimage.tifīut it did build. ![]() It only accepts uncompressed bitonal tiffs. I have no doubts that the press alone (not to mention Google's involvement) will propel Tesseract towards OCR fame and fortune, but it sounds like it's not usable at this point: It only is configured to build under MSVC++6 for Windows. Yeah, there's been some chatter in the blogospheres and internets about Tesseract since Google assisted in re-releasing it as an open source project. For example: content is C unrir in gocr, sounds like drunken elvish to me. Gocr seems to go all the way down here - error in 98% of recognized characters, randomly added spaces, etc. And I am talking about an ascii black text on a white page, without other elements. Personally I'd even say the effect of trying to OCR a page is so crappy it is not even worth installing the gocr engine (seems like the total rewrite in 0.40 did not help much). To let things be clear - gocr is not ready, to say the least. JOCR). The easiest way to try it out is the GOCR Win Frontend, which installs GOCR as well. Here's a rundown of what I found, wrapping up with a program that wasn't technically free, but I already had it. There's a good chance you've got it, too. I looked around for free OCR software, and was a little bit surprised that there wasn't much out there. OmniPage is great software, but it costs $149 for the basic version, which doesn't really make sense if you're just using it to avoid retyping a little text from a screenshot every now and then. My company recently used OmniPage Pro in a project which loaded data from hundreds of PowerPoint slides into SQL Server for reporting and analysis 1. There are some great commercial OCR packages out there. Since most of my work is text based (C#, SQL, HTML, documentation, communications, etc.), the obvious next step is to grab the code from a screenshot. Of course I can retype it, but OCR would be better. OCR (Optical Character Recognition) can really come in handy. For example, I previously wrote about how I use Timesnapper as a black box to recover work which would otherwise be lost. There are number of free OCR options listed here:.You can still install Microsoft Office Document Imaging for free using the instructions here:.Unfortunately, Office 2010 removes that feature (Microsoft Office Document Imaging). It was written 5 years ago, but still gets a lot of traffic. Update (2011): This post describes using a little know feature of Microsoft Office which does a good job with OCR.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |