Transcription Hackathon: Difference between revisions

Jump to navigation Jump to search
Line 42: Line 42:
** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription.
** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription.
** Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names): link to be provided by aOCR?
** Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names): link to be provided by aOCR?
*** (Some [http://webprojects.huh.harvard.edu/authority_files/ botantists]: RDF and tab-delimited.)