Transcription Hackathon: Difference between revisions

no edit summary
No edit summary
Line 38: Line 38:
*Yonggang Liu, ACIS iDigBio: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/Yonggang_image_ingestion_appliance.pdf iDigBio Image Ingestion Appliance]
*Yonggang Liu, ACIS iDigBio: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/Yonggang_image_ingestion_appliance.pdf iDigBio Image Ingestion Appliance]
*Paul Kimbereley, Smithsonian: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/SI_Center.pdf Smithsonian Transcription Center]
*Paul Kimbereley, Smithsonian: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/SI_Center.pdf Smithsonian Transcription Center]
*William Ulate, Missouri Botanical Garden: Purposeful Gaming and BHL [https://www.idigbio.org/wiki/images/f/fb/Purposeful_Gaming_BHL_Dec_2013.pdf]


== Development Resources  ==
== Development Resources  ==
Line 65: Line 66:
* Gold Images from aOCR Hackthon:
* Gold Images from aOCR Hackthon:
** CSV file with URLs for the Images on iDigBio beta server (Uploaded by Image Ingestion Appliance): [http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/ent.csv ent], [http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/herb.csv herb],[http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/lichens.csv lichens].
** CSV file with URLs for the Images on iDigBio beta server (Uploaded by Image Ingestion Appliance): [http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/ent.csv ent], [http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/herb.csv herb],[http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/lichens.csv lichens].
* Code from the aOCR Hackthon:
* Code from the aOCR Hackthon:
** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription.
** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription.
2

edits