Transcription Hackathon: Difference between revisions
Jump to navigation
Jump to search
Line 42: | Line 42: | ||
** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription. | ** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription. | ||
** Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names): link to be provided by aOCR? | ** Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names): link to be provided by aOCR? | ||
*** (Some [http://webprojects.huh.harvard.edu/authority_files/ botantists]: RDF and tab-delimited.) |