OCR Tips: Difference between revisions

Jump to navigation Jump to search
m
Line 51: Line 51:


Tesseract makes characteristic errors.  Some of these such as "\/\/" or "\X/" substituted for for "W" can be  
Tesseract makes characteristic errors.  Some of these such as "\/\/" or "\X/" substituted for for "W" can be  
be globally replaced.  Others such as "O" substituted for "0", "1" or "!" substituted for "l" or "Z" substituted for "2" or visa versa can be replaced in a context-dependent manner in dates, latitudes and longitudes, etc.  For instance, "0ct. !Z, ZOlZ" can be located with a regular expression and changed to "Oct.12, 2012" so that it can be entered into a database.
be globally replaced.  Others such as "O" substituted for "0", "1" or "!" substituted for "l" or "Z" substituted for "2" or visa versa can be replaced in a context-dependent manner in dates, latitudes and longitudes, etc.  For instance, "0ct. !Z, ZOlZ" can be located with a regular expression and changed to "Oct. 12, 2012" so that it can be entered into a database.


<br>Misc notes:
<br>Misc notes:


Will often recognize vertical text<br> Image input can be tif, jpeg, or gif.
Will often recognize vertical text<br> Image input can be tif, jpeg, or gif.
4

edits

Navigation menu