OCR Resources: Difference between revisions

From iDigBio
Jump to navigation Jump to search
No edit summary
No edit summary
Line 11: Line 11:
*List of other OCR software: http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software
*List of other OCR software: http://en.wikipedia.org/wiki/List_of_optical_character_recognition_software


Biodiversity Informatics Tools Incorporating OCR Technology
== Biodiversity Informatics Tools Incorporating OCR Technology ==


*[http://www.apiaryproject.org Apiary Project] - High-throughput workflow for computer-assisted human parsing of biological specimen label data
*[http://www.apiaryproject.org Apiary Project] - High-throughput workflow for computer-assisted human parsing of biological specimen label data


*'''HerbIS''' (Erudite Recorded Botanical Information Synthesizer) - software algorithms that processes and presents herbarium label data in machine-understandable format through the use of natural language processing (NLP). Created at the Yale Peabody Museum of Natural History.
*'''HerbIS''' (Erudite Recorded Botanical Information Synthesizer) - Software algorithms that processes and presents herbarium label data in machine-understandable format through the use of natural language processing (NLP). Created at the Yale Peabody Museum of Natural History.


*[[http://symbiota.org Symbiota]] - Specimen-based virtual flora/fauna software with a built in module for specimen digitization that incorporates OCR technology
*[http://symbiota.org Symbiota] - Specimen-based virtual flora/fauna software with a built in module for specimen digitization that incorporates OCR technology


*[http://daryllafferty.com/salix SALIX] - the Semi-automatic Label Information eXtraction system is designed to capture herbarium specimen label data with the use of optical character recognition technologies and transfer those data into a database.
*[http://daryllafferty.com/salix SALIX] - Semi-automatic Label Information eXtraction system is designed to capture herbarium specimen label data with the use of optical character recognition technologies and transfer those data into a database.

Revision as of 19:02, 6 August 2012

OCR Software used by ADBC projects

  • ABBYY FineReader - high performing proprietary OCR software provided by the ABBYY software company. The Professional and Corporate Editions are designed specifically for Microsoft Windows operating systems.
  • OCRopus - free document analysis and optical character recognition (OCR) system released under the Apache License, Version 2.0 with a very modular design through the use of plugins.
  • Tesseract - Open source optical character recognition engine available under the Apache License, Version 2.0. Software is capable to functioning on various operating systems. Considered to be one of the more accurate OCR engines that are available under a free software license.
  • Zerox OCR engine -

Biodiversity Informatics Tools Incorporating OCR Technology

  • Apiary Project - High-throughput workflow for computer-assisted human parsing of biological specimen label data
  • HerbIS (Erudite Recorded Botanical Information Synthesizer) - Software algorithms that processes and presents herbarium label data in machine-understandable format through the use of natural language processing (NLP). Created at the Yale Peabody Museum of Natural History.
  • Symbiota - Specimen-based virtual flora/fauna software with a built in module for specimen digitization that incorporates OCR technology
  • SALIX - Semi-automatic Label Information eXtraction system is designed to capture herbarium specimen label data with the use of optical character recognition technologies and transfer those data into a database.