OCR SaaS: Difference between revisions

No edit summary
No edit summary
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
[[Category:AOCR]]
=OCR SaaS=
=OCR SaaS=
:::;SaaS: Software as a Service. Read more about what SaaS means at [http://en.wikipedia.org/wiki/Software_as_a_service| Wikipedia]
:::;SaaS: Software as a Service. Read more about what SaaS means at [http://en.wikipedia.org/wiki/Software_as_a_service| Wikipedia]
Line 19: Line 20:
*Support EverNote
*Support EverNote
*Support OpenCV image detection features
*Support OpenCV image detection features
*Support OCR outputs that maintain word/character locations (such as hOCR generated by Tesseract - http://code.google.com/p/hocr-tools/)


==Current Plans:==
==Current Plans:==

Latest revision as of 15:48, 17 April 2014

OCR SaaS

SaaS
Software as a Service. Read more about what SaaS means at Wikipedia

Needs:

  • Accept incoming request and return a refId of the job.
  • Process the ocr with the available OCR engines
  • Support zBar for barcode detection
  • Support language detection
  • Support calling endpoints
  • Support round-robin features to share service evenly

Future needs:

  • Socket.io support
  • Cleaning techniques
  • Techniques to determine if the OCR is handwriting or type
  • Support imagemagick adjustment techniques for better results
  • Support custom training files
  • Support Abbyy and OmniPage
  • Support EverNote
  • Support OpenCV image detection features
  • Support OCR outputs that maintain word/character locations (such as hOCR generated by Tesseract - http://code.google.com/p/hocr-tools/)

Current Plans:

  • Use NodeJS along with some

Possible Routes

  • imageAdd
  • imageRemove
  • imageStatus
  • queueInfo
  • ping

imageAdd( uri, [id], [endpoint] )

returns:

 { 
  success: bool
, refId: uuid 
 }

imageRemove( refId )

returns:

 {
  success: bool
 }


Questions:

Q1: How will the OCR server get the images to process? One-at-a-time image uploads aren't that hard and can be scripted, but is there any need for submitting larger datasets via FTP or other mechanisms?