OCR SaaS
OCR SaaS
- SaaS
- Software as a Service. Read more about what SaaS means at Wikipedia
Needs:
- Accept incoming request and return a refId of the job.
- Process the ocr with the available OCR engines
- Support zBar for barcode detection
- Support language detection
- Support calling endpoints
- Support round-robin features to share service evenly
Future needs:
- Socket.io support
- Cleaning techniques
- Techniques to determine if the OCR is handwriting or type
- Support imagemagick adjustment techniques for better results
- Support custom training files
- Support Abbyy and OmniPage
- Support EverNote
- Support OpenCV image detection features
- Support OCR outputs that maintain word/character locations (such as hOCR generated by Tesseract - http://code.google.com/p/hocr-tools/)
Current Plans:
- Use NodeJS along with some
Possible Routes
- imageAdd
- imageRemove
- imageStatus
- queueInfo
- ping
imageAdd( uri, [id], [endpoint] )
returns:
{ success: bool , refId: uuid }
imageRemove( refId )
returns:
{ success: bool }
Questions:
Q1: How will the OCR server get the images to process? One-at-a-time image uploads aren't that hard and can be scripted, but is there any need for submitting larger datasets via FTP or other mechanisms?