4,713
edits
m (→Lichen TENN) |
|||
Line 87: | Line 87: | ||
Gold Parsed WIS-L-0011732_lg.csv (and many other lichen gold parsed labels) removes a space from verbatimLatitude and from verbatimLongitude, changing this: 60° 33.579'N into this: 60°33.579'N. The space removal is inconsistent, on some labels, not on others. (Bryan: Agreed. Should be fixed to match the label. I think if the OCR had been perfect the space would not be n the OCR file do it is a tough call.) | Gold Parsed WIS-L-0011732_lg.csv (and many other lichen gold parsed labels) removes a space from verbatimLatitude and from verbatimLongitude, changing this: 60° 33.579'N into this: 60°33.579'N. The space removal is inconsistent, on some labels, not on others. (Bryan: Agreed. Should be fixed to match the label. I think if the OCR had been perfect the space would not be n the OCR file do it is a tough call.) | ||
---- | |||
Gold Parsed NY01075791_lg.csv converts the "u" in "Mull" to an umlaut yielding "Müll". This actually reflects the original label, but not the Gold OCR NY01075791_lg.txt file, which has "Mull". Same for NY01075792_lg.csv, and several other in the series. (Bryan: The OCR messed up. Gold should fix OCR errors so the umlaut shoudl saty.) | Gold Parsed NY01075791_lg.csv converts the "u" in "Mull" to an umlaut yielding "Müll". This actually reflects the original label, but not the Gold OCR NY01075791_lg.txt file, which has "Mull". Same for NY01075792_lg.csv, and several other in the series. (Bryan: The OCR messed up. Gold should fix OCR errors so the umlaut shoudl saty.) | ||
:::FIXED for Lichen set in gold csv of /home/aocr/webroot/datasets/lichens/gold/parsed and gold txt of /home/aocr/webroot/datasets/lichens/gold/ocr | |||
:::made the csv and txt files consistent with the label image -- if umlaut is present in the image, I put it in the csv and txt files. --[[User:Dpaul|Dpaul]] 17:40, 1 July 2013 (EDT) | |||
---- | ---- | ||