OCR Tips: Difference between revisions

Jump to navigation Jump to search
87 bytes added ,  2 October 2012
m
Line 51: Line 51:


Tesseract makes characteristic errors.  Some of these such as "\/\/" or "\X/" substituted for for "W" can be  
Tesseract makes characteristic errors.  Some of these such as "\/\/" or "\X/" substituted for for "W" can be  
be globally replaced as it is highly unlikely that they would occur on their own on a label.  Others such as "O" substituted for "0", "1" or "!" substituted for "l" or "Z" substituted for "2" or visa versa can be replaced in a context-dependent manner in dates, latitudes and longitudes, etc.  For instance, "0ct. !Z, ZOlZ" can be located with a regular expression and changed to "Oct. 12, 2012" so that it can be entered into a database.
be globally replaced as it is highly unlikely that they would occur on their own on a label.  Others such as "O" substituted for "0", "1" or "!" substituted for "l" or "Z" substituted for "2" or visa versa can be replaced in a context-dependent manner in dates, latitudes and longitudes, etc.  For instance, a string containing multiple errors such as "0ct. !Z, ZOlZ" can be programmatically located with a regular expression and changed to "Oct. 12, 2012" or even "12-October-2012" so that it can be entered into a database.


<br>Misc notes:
<br>Misc notes:


Will often recognize vertical text<br> Image input can be tif, jpeg, or gif
Will often recognize vertical text<br> Image input can be tif, jpeg, or gif
4

edits

Navigation menu