Transcription Hackathon: Difference between revisions

Line 42:

** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription.

** Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names): link to be provided by aOCR?

*** (Some [http://webprojects.huh.harvard.edu/authority_files/ botantists]: RDF and tab-delimited.)

Transcription Hackathon: Difference between revisions

Navigation menu

Search