Dataset Errata: Difference between revisions

Jump to navigation Jump to search
m
Line 87: Line 87:


Gold Parsed WIS-L-0011732_lg.csv (and many other lichen gold parsed labels) removes a space from verbatimLatitude and from verbatimLongitude, changing this: 60° 33.579'N into this: 60°33.579'N. The space removal is inconsistent, on some labels, not on others.  (Bryan: Agreed. Should be fixed to match the label. I think if the OCR had been perfect the space would not be n the OCR file do it is a tough call.)  
Gold Parsed WIS-L-0011732_lg.csv (and many other lichen gold parsed labels) removes a space from verbatimLatitude and from verbatimLongitude, changing this: 60° 33.579'N into this: 60°33.579'N. The space removal is inconsistent, on some labels, not on others.  (Bryan: Agreed. Should be fixed to match the label. I think if the OCR had been perfect the space would not be n the OCR file do it is a tough call.)  
----


Gold Parsed NY01075791_lg.csv converts the "u" in "Mull" to an umlaut yielding "Müll". This actually reflects the original label, but not the Gold OCR NY01075791_lg.txt file, which has "Mull". Same for NY01075792_lg.csv, and several other in the series. (Bryan: The OCR messed up. Gold should fix OCR errors so the umlaut shoudl saty.)  
Gold Parsed NY01075791_lg.csv converts the "u" in "Mull" to an umlaut yielding "Müll". This actually reflects the original label, but not the Gold OCR NY01075791_lg.txt file, which has "Mull". Same for NY01075792_lg.csv, and several other in the series. (Bryan: The OCR messed up. Gold should fix OCR errors so the umlaut shoudl saty.)  
 
:::FIXED for Lichen set in gold csv of /home/aocr/webroot/datasets/lichens/gold/parsed and gold txt of /home/aocr/webroot/datasets/lichens/gold/ocr
:::made the csv and txt files consistent with the label image -- if umlaut is present in the image, I put it in the csv and txt files. --[[User:Dpaul|Dpaul]] 17:40, 1 July 2013 (EDT)
----
----


4,713

edits

Navigation menu