Presentations & Reports: Difference between revisions

Presentations & Reports (view source)

567 bytes added , 18 March 2013

m

4,713

edits

@@ Line 5: / Line 5: @@
 ::;Hackathon Metrics - Alex Thompson
-::;Parsing Dataset 1 - Daryl Lafferty
+::;[https://www.idigbio.org/workshop-presentations/aocr-hackathon/SALIX2.ppt Parsing Dataset 1 using SALIX 2] - Daryl Lafferty.: SALIX is “Semi-Automatic Label Information eXtraction” parsing system, developed and used extensively at Arizona State University. The purpose is to parse OCR'd label data into the respective data fields (e.g. Collector, collection number, etc.). The original SALIX required user intervention with each label to format and proofread. SALIX 2 tries to remove the “Semi” and make it fully automatic. Written in C++ in Windows. Development was focused on Lichen labels.
 ::;[http://manuscripttranscription.blogspot.com/2013/02/improving-ocr-inputs-from-ocr-outputs.html Improving OCR Inputs from OCR Outputs] - Ben Brumfield: Efforts to improve the quality of OCR by pre-processing images based on the output of 'naive' OCR execution.  Topics included handwriting detection within Dataset 1 ([http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html final report]) and label extraction from Dataset 3 ([http://manuscripttranscription.blogspot.com/2013/02/results-of-ocrocrop-approach-to.html final report]).