Dataset Errata: Difference between revisions

Jump to navigation Jump to search
no edit summary
No edit summary
Line 1: Line 1:
== Errors noted in various files ==
== Errors noted in various files ==


New Errors 2/27/13, D. Lafferty
New Errors 2/27/13, D. Lafferty Label NY01075759_lg.txt has authority (part of verbatimScientificName) as: "Kocourková & F. Berger". Gold Parsed NY01075759_lg.csv has "Kocourkova & F. Berger", without the accent on the "a". (Or should we convert foreign characters to English characters???) (Bryan: All "special characters should be preserved by using UTF-8)
Label NY01075759_lg.txt has authority (part of verbatimScientificName) as: "Kocourková & F. Berger". Gold Parsed NY01075759_lg.csv has "Kocourkova & F. Berger", without the accent on the "a". (Or should we convert foreign characters to English characters???)


'''Gold Parsing Errors'''  
'''Gold Parsing Errors'''  


Many of the Lichen Gold labels have verbatimLatitude and verbatimLongitude, but the Gold Parsed files do not have the calculated decimalLatitude and decimalLongitude. This seems especially true for the New York labels. (Daryl)  
Many of the Lichen Gold labels have verbatimLatitude and verbatimLongitude, but the Gold Parsed files do not have the calculated decimalLatitude and decimalLongitude. This seems especially true for the New York labels. (Daryl) (Bryan: I think the decimal values were "bonus" I could be wrong. If we choose to do this later it might be easier to pre-fill as many fields as we can using your algorithm.)


This is open to debate, but I think Elevation should be a pure numeric field, assumed to be in meters. Therefore, it should not be expressed as "750 m", but rather as "750". verbatimElevation, of course, should retain the "m" if it was present on the label. (Note that Darwin Core apparently does not have a field called "elevation", but rather MinimumElevationInMeters, and MaximumElevationInMeters, both numeric fields.) Not sure if this is something to change on the labels, but worth being aware of. I think parsing programs should generate the Darwin Core fields. (Daryl)  
This is open to debate, but I think Elevation should be a pure numeric field, assumed to be in meters. Therefore, it should not be expressed as "750 m", but rather as "750". verbatimElevation, of course, should retain the "m" if it was present on the label. (Note that Darwin Core apparently does not have a field called "elevation", but rather MinimumElevationInMeters, and MaximumElevationInMeters, both numeric fields.) Not sure if this is something to change on the labels, but worth being aware of. I think parsing programs should generate the Darwin Core fields. (Daryl) (Bryan: Odd to not have "elevation" I agree with the use of verbatimElevation. If "elevation" is filled it is numeric.)


Inconsistency in the Gold Parsed labels for Country. If a US State is listed as the state, the label doesn't always say the name of the country, though it is obviously the USA. Some Gold parsed results leave it blank, some fill it in with "USA", or "United States", though neither of these are on the label. I think it is valid to fill it in, but it should be consistent. (Daryl)  
Inconsistency in the Gold Parsed labels for Country. If a US State is listed as the state, the label doesn't always say the name of the country, though it is obviously the USA. Some Gold parsed results leave it blank, some fill it in with "USA", or "United States", though neither of these are on the label. I think it is valid to fill it in, but it should be consistent. (Daryl) (Bryan: I think for Gold the field should not be filled in if it is not on the label.)


Many Gold Parse Tennessee lichen labels have country errors. Examples:  
Many Gold Parse Tennessee lichen labels have country errors. Examples:  


-- Gold Parsed TENN-L-0000001_lg.csv lists country as "USA", but on the .txt label, it is "U.S.A." (with periods). Same with Gold Parsed TENN-L-0000035_lg.csv and others.(Daryl)  
-- Gold Parsed TENN-L-0000001_lg.csv lists country as "USA", but on the .txt label, it is "U.S.A." (with periods). Same with Gold Parsed TENN-L-0000035_lg.csv and others.(Daryl)  (Bryan: Agreed. Should be fixed to match the label.)


-- Gold Parsed TENN-L-0000005_lg.csv leaves country blank, but the label shows it as "USA". Again, maybe this is OK, but it should be consistent. (Daryl)  
-- Gold Parsed TENN-L-0000005_lg.csv leaves country blank, but the label shows it as "USA". Again, maybe this is OK, but it should be consistent. (Daryl) (Bryan: Agreed. Should be fixed.)


<br> Inconsistency and errors in TENN Lichen Gold Parsed dateIdentified. Examples:  
<br> Inconsistency and errors in TENN Lichen Gold Parsed dateIdentified. Examples:  
Line 224: Line 223:
"Silver Parsed CSV Files" There were some errors in the Silver CSV dataset. (Steven C.)  
"Silver Parsed CSV Files" There were some errors in the Silver CSV dataset. (Steven C.)  


<br> NY01075760_lg character encoding in verbatimScientificName typos in verbatimCoordinates  
NY01075760_lg character encoding in verbatimScientificName
typos in verbatimCoordinates


NY01075761_lg misspelling in verbatimScientificName
NY01075761_lg misspelling in verbatimScientificName  


NY01075762_lg misspelling in habitat
NY01075762_lg misspelling in habitat misspelling in verbatimLocality  
misspelling in verbatimLocality


NY01075764_lg misspelling in units for verbatimElevation
NY01075764_lg misspelling in units for verbatimElevation  


NY01075765_lg character encoding in verbatimScientificName
NY01075765_lg character encoding in verbatimScientificName removed extra period in verbatimEventDate  
removed extra period in verbatimEventDate


NY01075768_lg separated verbatimLocality data into two columns
NY01075768_lg separated verbatimLocality data into two columns  


NY01075769_lg misspelling in habitat
NY01075769_lg misspelling in habitat  


NY01075770_lg character encoding in verbatimScientificName
NY01075770_lg character encoding in verbatimScientificName character encoding in habitat  
character encoding in habitat


NY01075773_lg misspelling in verbatimScientificName
NY01075773_lg misspelling in verbatimScientificName misspelling in verbatimLocality  
misspelling in verbatimLocality


NY01075774_lg character encoding in verbatimScientificName
NY01075774_lg character encoding in verbatimScientificName  


NY01075775_lg misspelling in country
NY01075775_lg misspelling in country  


NY01075776_lg character encoding in verbatimLocality
NY01075776_lg character encoding in verbatimLocality  


NY01075777_lg character encoding in country
NY01075777_lg character encoding in country  


NY01075779_lg character encoding in verbatimCoordinates
NY01075779_lg character encoding in verbatimCoordinates  
NY01075780_lg misspelling in verbatimInstitution
misspelling in verbatimLocality
removed coordinates in verbatimLocality


NY01075781_lg character encoding in verbatimElevation
NY01075780_lg misspelling in verbatimInstitution misspelling in verbatimLocality removed coordinates in verbatimLocality


NY01075782_lg separated verbatimLocality data into two columns
NY01075781_lg character encoding in verbatimElevation
removed coordinates in verbatimLocality
character encoding in habitat


NY01075786_lg misspelling in verbatimScientificName
NY01075782_lg separated verbatimLocality data into two columns removed coordinates in verbatimLocality character encoding in habitat


NY01075787_lg misspelling in verbatimLocality
NY01075786_lg misspelling in verbatimScientificName
removed coordinates in verbatimLocality
misspelling in verbatimCoordinates
misspelling in habitat


NY01075788_lg misspelling in verbatimLocality
NY01075787_lg misspelling in verbatimLocality removed coordinates in verbatimLocality misspelling in verbatimCoordinates misspelling in habitat
removed coordinates in verbatimLocality
character encoding in verbatimCoordinates


NY01075789_lg misspelling in verbatimLocality
NY01075788_lg misspelling in verbatimLocality removed coordinates in verbatimLocality character encoding in verbatimCoordinates  
removed coordinates in verbatimLocality
character encoding in verbatimCoordinates


NY01075790_lg misspelling in habitat
NY01075789_lg misspelling in verbatimLocality removed coordinates in verbatimLocality character encoding in verbatimCoordinates
separated verbatimLocality data into three columns
removed coordinates in verbatimLocality


NY01075791_lg character encoding in verbatimScientificName
NY01075790_lg misspelling in habitat separated verbatimLocality data into three columns removed coordinates in verbatimLocality


NY01075792_lg misspelling in verbatimLocality
NY01075791_lg character encoding in verbatimScientificName


NY01075794_lg misspelling in verbatimLocality
NY01075792_lg misspelling in verbatimLocality  


NY01075795_lg misspelling in verbatimLocality
NY01075794_lg misspelling in verbatimLocality  


NY01075802_lg character encoding in verbatimScientificName
NY01075795_lg misspelling in verbatimLocality


NY01075803_lg created new identifiedBy column
NY01075802_lg character encoding in verbatimScientificName  
created new verbatimScientificName column
moved verbatimScientificName data from third row to new column


NY01075805_lg created new verbatimScientificName column
NY01075803_lg created new identifiedBy column created new verbatimScientificName column moved verbatimScientificName data from third row to new column  
moved verbatimScientificName data from third row to new column


NY01075806_lg character encoding in verbatimScientificName
NY01075805_lg created new verbatimScientificName column moved verbatimScientificName data from third row to new column


NY01075813_lg misspelling in verbatimLocality
NY01075806_lg character encoding in verbatimScientificName


NY01075814_lg misspelling in county
NY01075813_lg misspelling in verbatimLocality  
misspelling in verbatimLocality
removed coordinates in verbatimLocality
misspelling in habitat


NY01075817_lg moved verbatimScientificName data to scientificName
NY01075814_lg misspelling in county misspelling in verbatimLocality removed coordinates in verbatimLocality misspelling in habitat
entered verbatimScientificName


NY01075818_lg misspelling in habitat
NY01075817_lg moved verbatimScientificName data to scientificName entered verbatimScientificName


NY01075819_lg misspelling in recordedBy
NY01075818_lg misspelling in habitat


NY01075821_lg misspelling in verbatimLocality
NY01075819_lg misspelling in recordedBy
removed coordinates in verbatimLocality
added coordinates to verbatimCoordinates


NY01075822_lg removed coordinates in verbatimLocality
NY01075821_lg misspelling in verbatimLocality removed coordinates in verbatimLocality added coordinates to verbatimCoordinates


NY01075823_lg moved identifiedBy and dateIdentified data up one row
NY01075822_lg removed coordinates in verbatimLocality
created new verbatimScientificName column
moved verbatimScientificName data from third row to new column


NY01075827_lg misspelling in county
NY01075823_lg moved identifiedBy and dateIdentified data up one row created new verbatimScientificName column moved verbatimScientificName data from third row to new column
misspelling in verbatimLocality


NY01075828_lg misspelling in verbatimLocality
NY01075827_lg misspelling in county misspelling in verbatimLocality  
removed coordinates in verbatimLocality


NY01075829_lg misspelling in habitat
NY01075828_lg misspelling in verbatimLocality removed coordinates in verbatimLocality


NY01075831_lg misspelling in verbatimLocality
NY01075829_lg misspelling in habitat
removed coordinates in verbatimLocality


NY01075837_lg misspelling in county
NY01075831_lg misspelling in verbatimLocality removed coordinates in verbatimLocality


TENN-L-0000001_lg character encoding in occurrenceRemarks
NY01075837_lg misspelling in county
misspelling in habitat
character encoding in verbatimLocality


TENN-L-0000002_lg character encoding in verbatimScientificName
TENN-L-0000001_lg character encoding in occurrenceRemarks misspelling in habitat character encoding in verbatimLocality
misspelling in habitat


TENN-L-0000004_lg misspelling in habitat
TENN-L-0000002_lg character encoding in verbatimScientificName misspelling in habitat  
misspelling in verbatimInstitution


TENN-L-0000005_lg misspelling in datasetName
TENN-L-0000004_lg misspelling in habitat misspelling in verbatimInstitution
misspelling in occurrenceRemarks
character encoding in verbatimLocality


TENN-L-0000006_lg misspelling in verbatimElevation
TENN-L-0000005_lg misspelling in datasetName misspelling in occurrenceRemarks character encoding in verbatimLocality
edited verbatimEventDate


TENN-L-0000007_lg separated verbatimLocality into two columns
TENN-L-0000006_lg misspelling in verbatimElevation edited verbatimEventDate
misspellings in both verbatimLocality columns


TENN-L-0000009_lg character encoding in habitat
TENN-L-0000007_lg separated verbatimLocality into two columns misspellings in both verbatimLocality columns
character encoding in catalogNumber


TENN-L-0000010_lg separated verbatimLocality into two columns
TENN-L-0000009_lg character encoding in habitat character encoding in catalogNumber


TENN-L-0000012_lg character encoding in datasetName
TENN-L-0000010_lg separated verbatimLocality into two columns
character encoding in occurrenceRemarks


TENN-L-0000013_lg misspelling in occurrenceRemarks
TENN-L-0000012_lg character encoding in datasetName character encoding in occurrenceRemarks  
misspelling in verbatimLocality


TENN-L-0000014_lg misspelling in datasetName
TENN-L-0000013_lg misspelling in occurrenceRemarks misspelling in verbatimLocality  
misspelling in fieldNotes
character encoding in verbatimLocality
separated recordedBy into two columns


TENN-L-0000022_lg character encoding in recordedBy
TENN-L-0000014_lg misspelling in datasetName misspelling in fieldNotes character encoding in verbatimLocality separated recordedBy into two columns


TENN-L-0000027_lg character encoding in verbatimScientificName
TENN-L-0000022_lg character encoding in recordedBy


TENN-L-0000028_lg character encoding in verbatimScientificName
TENN-L-0000027_lg character encoding in verbatimScientificName  


TENN-L-0000029_lg misspelling in recordedBy
TENN-L-0000028_lg character encoding in verbatimScientificName


TENN-L-0000032_lg character encoding in verbatimScientificName
TENN-L-0000029_lg misspelling in recordedBy


TENN-L-0000033_lg separated dataSetName into two columns
TENN-L-0000032_lg character encoding in verbatimScientificName
separated fieldNotes into two columns


TENN-L-0000041_lg character encoding in datasetName
TENN-L-0000033_lg separated dataSetName into two columns separated fieldNotes into two columns
misspelling in verbatimLocality


TENN-L-0000044_lg character encoding in datasetName
TENN-L-0000041_lg character encoding in datasetName misspelling in verbatimLocality


TENN-L-0000045_lg separated verbatimLocality into two columns
TENN-L-0000044_lg character encoding in datasetName


TENN-L-0000046_lg character encoding in datasetName
TENN-L-0000045_lg separated verbatimLocality into two columns


TENN-L-0000047_lg character encoding in datasetName
TENN-L-0000046_lg character encoding in datasetName  


TENN-L-0000048_lg misspelling in verbatimLocality
TENN-L-0000047_lg character encoding in datasetName


TENN-L-0000049_lg separated verbatimLocality into two columns
TENN-L-0000048_lg misspelling in verbatimLocality  


TENN-L-0000051_lg character encoding in verbatimLocality
TENN-L-0000049_lg separated verbatimLocality into two columns


TENN-L-0000052_lg character encoding in verbatimScientificName
TENN-L-0000051_lg character encoding in verbatimLocality  
character encoding in datasetName
character encoding in habitat
character encoding in verbatimLocality
character encoding in recordedBy


TENN-L-0000053_lg character encoding in recordNumber
TENN-L-0000052_lg character encoding in verbatimScientificName character encoding in datasetName character encoding in habitat character encoding in verbatimLocality character encoding in recordedBy


TENN-L-0000054_lg character encoding in datasetName
TENN-L-0000053_lg character encoding in recordNumber


TENN-L-0000056_lg edited recordedBy
TENN-L-0000054_lg character encoding in datasetName


TENN-L-0000057_lg character encoding in verbatimLocality
TENN-L-0000056_lg edited recordedBy
misspelling in verbatimInstitution


TENN-L-0000058_lg separated dataSetName into two columns
TENN-L-0000057_lg character encoding in verbatimLocality misspelling in verbatimInstitution  
character encoding in verbatimInstitution


TENN-L-0000059_lg character encoding in stateProvince
TENN-L-0000058_lg separated dataSetName into two columns character encoding in verbatimInstitution
character encoding in verbatimScientificName
character encoding in verbatimCoordinates
misspelling in recordedBy


TENN-L-0000061_lg edited verbatimLocality
TENN-L-0000059_lg character encoding in stateProvince character encoding in verbatimScientificName character encoding in verbatimCoordinates misspelling in recordedBy  
misspelling in recordedBy


TENN-L-0000063_lg separated dataSetName into two columns
TENN-L-0000061_lg edited verbatimLocality misspelling in recordedBy
character encoding in identificationRemarks


TENN-L-0000064_lg character encoding in verbatimScientificName
TENN-L-0000063_lg separated dataSetName into two columns character encoding in identificationRemarks


TENN-L-0000065_lg character encoding in verbatimScientificName
TENN-L-0000064_lg character encoding in verbatimScientificName  


TENN-L-0000068_lg edited habitat
TENN-L-0000065_lg character encoding in verbatimScientificName
character encoding in verbatimInstitution


TENN-L-0000072_lg separated verbatimLocality into two columns
TENN-L-0000068_lg edited habitat character encoding in verbatimInstitution  
misspelling in country
edited verbatimScientificName
character encoding in verbatimInstitution


TENN-L-0000073_lg misspelling in verbatimLocality
TENN-L-0000072_lg separated verbatimLocality into two columns misspelling in country edited verbatimScientificName character encoding in verbatimInstitution
character encoding in verbatimCoordinates
misspelling in recordedBy


TENN-L-0000074_lg character encoding in recordedBy
TENN-L-0000073_lg misspelling in verbatimLocality character encoding in verbatimCoordinates misspelling in recordedBy
character encoding in verbatimScientificName
character encoding in verbatimLocality


TENN-L-0000075_lg character encoding in datasetName
TENN-L-0000074_lg character encoding in recordedBy character encoding in verbatimScientificName character encoding in verbatimLocality  
misspelling in verbatimScientificName
separated verbatimLocality into two columns
character encoding in both verbatimLocality columns
character encoding in verbatimCoordinates


TENN-L-0000076_lg misspelling in datasetName
TENN-L-0000075_lg character encoding in datasetName misspelling in verbatimScientificName separated verbatimLocality into two columns character encoding in both verbatimLocality columns character encoding in verbatimCoordinates
character encoding in verbatimScientificName
separated verbatimLocality into two columns
character encoding in both verbatimLocality columns
character encoding in recordedBy


TENN-L-0000077_lg character encoding in county
TENN-L-0000076_lg misspelling in datasetName character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in both verbatimLocality columns character encoding in recordedBy
character encoding in verbatimLocality
character encoding in catalogNumber


TENN-L-0000079_lg character encoding in verbatimInstitution
TENN-L-0000077_lg character encoding in county character encoding in verbatimLocality character encoding in catalogNumber


TENN-L-0000080_lg character encoding in catalogNumber
TENN-L-0000079_lg character encoding in verbatimInstitution


TENN-L-0000083_lg character encoding in verbatimScientificName
TENN-L-0000080_lg character encoding in catalogNumber


TENN-L-0000084_lg character encoding in datasetName
TENN-L-0000083_lg character encoding in verbatimScientificName  
character encoding in verbatimScientificName
character encoding in verbatimLocality


TENN-L-0000087_lg character encoding in recordNumber
TENN-L-0000084_lg character encoding in datasetName character encoding in verbatimScientificName character encoding in verbatimLocality  
character encoding in habitat
character encoding in verbatimLocality
character encoding in verbatimInstitution


TENN-L-0000089_lg misspelling in country
TENN-L-0000087_lg character encoding in recordNumber character encoding in habitat character encoding in verbatimLocality character encoding in verbatimInstitution  
separated verbatimLocality into two columns
misspelling in verbatimLocality
misspelling in verbatimInstitution
misspelling in datasetName


TENN-L-0000090_lg character encoding in verbatimInstitution
TENN-L-0000089_lg misspelling in country separated verbatimLocality into two columns misspelling in verbatimLocality misspelling in verbatimInstitution misspelling in datasetName


TENN-L-0000091_lg character encoding in datasetName
TENN-L-0000090_lg character encoding in verbatimInstitution
character encoding in verbatimScientificName
character encoding in catalogNumber


TENN-L-0000093_lg edited verbatimLocality
TENN-L-0000091_lg character encoding in datasetName character encoding in verbatimScientificName character encoding in catalogNumber  
character encoding in catalogNumber


TENN-L-0000095_lg character encoding in verbatimScientificName
TENN-L-0000093_lg edited verbatimLocality character encoding in catalogNumber
edited country
character encoding in verbatimLocality


TENN-L-0000097_lg character encoding in verbatimScientificName
TENN-L-0000095_lg character encoding in verbatimScientificName edited country character encoding in verbatimLocality


TENN-L-0000098_lg character encoding in verbatimScientificName
TENN-L-0000097_lg character encoding in verbatimScientificName  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimEventDate
character encoding in verbatimCoordinates


TENN-L-0000099_lg separated dataSetName into two columns
TENN-L-0000098_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimEventDate character encoding in verbatimCoordinates
character encoding in stateProvince
misspelling in verbatimScientificName
character encoding in verbatimLocality
character encoding in verbatimLatitude
character encoding in catalogNumber


WIS-L-0011726_lg character encoding in verbatimScientificName
TENN-L-0000099_lg separated dataSetName into two columns character encoding in stateProvince misspelling in verbatimScientificName character encoding in verbatimLocality character encoding in verbatimLatitude character encoding in catalogNumber
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
misspelling in verbatimElevation
character encoding in recordedBy


WIS-L-0011727_lg character encoding in verbatimScientificName
WIS-L-0011726_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates misspelling in verbatimElevation character encoding in recordedBy
separated verbatimLocality into two columns
misspelling in verbatimLocality
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0011728_lg character encoding in verbatimScientificName
WIS-L-0011727_lg character encoding in verbatimScientificName separated verbatimLocality into two columns misspelling in verbatimLocality character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0011729_lg separated verbatimLocality into two columns
WIS-L-0011728_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0011730_lg character encoding in verbatimScientificName
WIS-L-0011729_lg separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
misspelling in habitat


WIS-L-0011731_lg character encoding in verbatimScientificName
WIS-L-0011730_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates misspelling in habitat
character encoding in identifiedBy
separated verbatimLocality into two columns
misspelling in associatedTaxa
misspelling in verbatimElevation


WIS-L-0011732_lg separated verbatimLocality into two columns
WIS-L-0011731_lg character encoding in verbatimScientificName character encoding in identifiedBy separated verbatimLocality into two columns misspelling in associatedTaxa misspelling in verbatimElevation
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0011733_lg character encoding in verbatimLocality
WIS-L-0011732_lg separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in habitat
character encoding in verbatimCoordinates


WIS-L-0011734_lg character encoding in verbatimScientificName
WIS-L-0011733_lg character encoding in verbatimLocality character encoding in habitat character encoding in verbatimCoordinates
character encoding in verbatimCoordinates
character encoding in habitat
character encoding in recordNumber
separated verbatimLocality into two columns


WIS-L-0011736_lg character encoding in verbatimLatitude
WIS-L-0011734_lg character encoding in verbatimScientificName character encoding in verbatimCoordinates character encoding in habitat character encoding in recordNumber separated verbatimLocality into two columns
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012025_lg character encoding in verbatimScientificName
WIS-L-0011736_lg character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012026_lg character encoding in verbatimScientificName
WIS-L-0012025_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates


WIS-L-0012027_lg character encoding in verbatimScientificName
WIS-L-0012026_lg character encoding in verbatimScientificName  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012028_lg character encoding in verbatimScientificName
WIS-L-0012027_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat
misspelling in verbatimElevation


WIS-L-0012029_lg character encoding in verbatimScientificName
WIS-L-0012028_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat misspelling in verbatimElevation
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012030_lg character encoding in verbatimLatitude
WIS-L-0012029_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012031_lg character encoding in verbatimScientificName
WIS-L-0012030_lg character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012031_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat


WIS-L-0012032_lg character encoding in verbatimScientificName
<br> WIS-L-0012032_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012033_lg character encoding in verbatimScientificName
WIS-L-0012033_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates misspelling in verbatimElevation  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
misspelling in verbatimElevation


WIS-L-0012034_lg character encoding in verbatimScientificName
WIS-L-0012034_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012035_lg character encoding in verbatimScientificName
WIS-L-0012035_lg character encoding in verbatimScientificName separated verbatimLocality into two columns misspelling in verbatimLocality character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
misspelling in verbatimLocality
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012036_lg character encoding in verbatimScientificName
WIS-L-0012036_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012037_lg character encoding in verbatimScientificName
WIS-L-0012037_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012039_lg character encoding in verbatimScientificName
WIS-L-0012039_lg character encoding in verbatimScientificName character encoding in verbatimLocality character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in verbatimLocality
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012040_lg character encoding in verbatimScientificName
WIS-L-0012040_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012041_lg character encoding in verbatimScientificName
WIS-L-0012041_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012042_lg character encoding in datasetName
WIS-L-0012042_lg character encoding in datasetName character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
character encoding in verbatimScientificName
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012043_lg character encoding in verbatimScientificName
WIS-L-0012043_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat separated verbatimLocality into two columns  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat
separated verbatimLocality into two columns


WIS-L-0012044_lg character encoding in verbatimScientificName
WIS-L-0012044_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012045_lg character encoding in verbatimScientificName
WIS-L-0012045_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012046_lg character encoding in verbatimScientificName
WIS-L-0012046_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012047_lg character encoding in verbatimScientificName
WIS-L-0012047_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012048_lg character encoding in verbatimScientificName
WIS-L-0012048_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012049_lg character encoding in verbatimScientificName
WIS-L-0012049_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012050_lg character encoding in verbatimScientificName
WIS-L-0012050_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012051_lg character encoding in verbatimScientificName
WIS-L-0012051_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012052_lg character encoding in verbatimScientificName
WIS-L-0012052_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012053_lg character encoding in verbatimScientificName
WIS-L-0012053_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012054_lg character encoding in verbatimScientificName
WIS-L-0012054_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012055_lg character encoding in verbatimScientificName
WIS-L-0012055_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012056_lg separated verbatimLocality into two columns
WIS-L-0012056_lg separated verbatimLocality into two columns character encoding in habitat  
character encoding in habitat


WIS-L-0012057_lg character encoding in verbatimScientificName
WIS-L-0012057_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimCoordinates  
character encoding in verbatimLatitude
character encoding in verbatimCoordinates


WIS-L-0012058_lg separated verbatimLocality into two columns
WIS-L-0012058_lg separated verbatimLocality into two columns character encoding in verbatimLongitude character encoding in verbatimCoordinates  
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012059_lg character encoding in verbatimScientificName
WIS-L-0012059_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012060_lg character encoding in verbatimScientificName
WIS-L-0012060_lg character encoding in verbatimScientificName character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat  
character encoding in verbatimScientificName
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat


WIS-L-0012061_lg character encoding in verbatimScientificName
WIS-L-0012061_lg character encoding in verbatimScientificName separated verbatimLocality into two columns removed coordinates in verbatimLocality character encoding in associatedTaxa  
separated verbatimLocality into two columns
removed coordinates in verbatimLocality
character encoding in associatedTaxa


WIS-L-0012062_lg character encoding in verbatimScientificName
WIS-L-0012062_lg character encoding in verbatimScientificName character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates character encoding in habitat character encoding in verbatimInstitution  
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates
character encoding in habitat
character encoding in verbatimInstitution


WIS-L-0012063_lg character encoding in verbatimScientificName
WIS-L-0012063_lg character encoding in verbatimScientificName character encoding in verbatimLocality removed coordinates in verbatimLocality character encoding in associatedTaxa  
character encoding in verbatimLocality
removed coordinates in verbatimLocality
character encoding in associatedTaxa


WIS-L-0012064_lg character encoding in verbatimScientificName
WIS-L-0012064_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012065_lg character encoding in verbatimScientificName
WIS-L-0012065_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in habitat  
separated verbatimLocality into two columns
character encoding in habitat


WIS-L-0012067_lg character encoding in verbatimScientificName
WIS-L-0012067_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLatitude character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
character encoding in verbatimLatitude
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012068_lg character encoding in verbatimScientificName
WIS-L-0012068_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012069_lg character encoding in verbatimScientificName
WIS-L-0012069_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012070_lg character encoding in verbatimScientificName
WIS-L-0012070_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012071_lg character encoding in verbatimScientificName
WIS-L-0012071_lg character encoding in verbatimScientificName separated verbatimLocality into two columns removed coordinates in verbatimLocality character encoding in associatedTaxa  
separated verbatimLocality into two columns
removed coordinates in verbatimLocality
character encoding in associatedTaxa


WIS-L-0012073_lg character encoding in verbatimCoordinates
WIS-L-0012073_lg character encoding in verbatimCoordinates character encoding in verbatimLatitude character encoding in verbatimLongitude  
character encoding in verbatimLatitude
character encoding in verbatimLongitude


WIS-L-0012074_lg character encoding in verbatimCoordinates
WIS-L-0012074_lg character encoding in verbatimCoordinates character encoding in habitat misspelling in verbatimLocality  
character encoding in habitat
misspelling in verbatimLocality


WIS-L-0012075_lg character encoding in verbatimScientificName
WIS-L-0012075_lg character encoding in verbatimScientificName character encoding in verbatimCoordinates character encoding in verbatimLatitude character encoding in verbatimLongitude  
character encoding in verbatimCoordinates
character encoding in verbatimLatitude
character encoding in verbatimLongitude


WIS-L-0012076_lg character encoding in verbatimScientificName
WIS-L-0012076_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimCoordinates character encoding in verbatimLatitude character encoding in verbatimLongitude  
separated verbatimLocality into two columns
character encoding in verbatimCoordinates
character encoding in verbatimLatitude
character encoding in verbatimLongitude


WIS-L-0012077_lg character encoding in verbatimLocality
WIS-L-0012077_lg character encoding in verbatimLocality character encoding in habitat  
character encoding in habitat


WIS-L-0012078_lg character encoding in verbatimScientificName
WIS-L-0012078_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012082_lg character encoding in verbatimScientificName
WIS-L-0012082_lg character encoding in verbatimScientificName character encoding in verbatimCoordinates character encoding in verbatimLatitude character encoding in verbatimEventDate character encoding in recordNumber  
character encoding in verbatimCoordinates
character encoding in verbatimLatitude
character encoding in verbatimEventDate
character encoding in recordNumber


WIS-L-0012084_lg character encoding in verbatimScientificName
WIS-L-0012084_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


WIS-L-0012085_lg character encoding in verbatimScientificName
WIS-L-0012085_lg character encoding in verbatimScientificName separated verbatimLocality into two columns character encoding in verbatimLongitude character encoding in verbatimCoordinates  
separated verbatimLocality into two columns
character encoding in verbatimLongitude
character encoding in verbatimCoordinates


WIS-L-0012086_lg character encoding in verbatimScientificName
WIS-L-0012086_lg character encoding in verbatimScientificName separated verbatimLocality into two columns  
separated verbatimLocality into two columns


<br> '''End New Errors'''


'''End New Errors'''
----
----


== Errors noted below are fixed ==
== Errors noted below are fixed ==
<br>
::Gold label NY01075763_lg.txt has Pyrenidium actinellurn, should be Pyrenidium actinellum.  Gold Parsed copies the error verbatim (as it should) and needs to be corrected if the .txt file is corrected.


::::/home/aocr/datasets/lichens/gold/outputs/human/NY01075763_lg.txt fixed --[[User:Dpaul|Dpaul]] 17:28, 26 February 2013 (EST)
<br>
::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075763_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:28, 26 February 2013 (EST)
 
::::/webroot/datasets/lichens/gold/ocr/NY01075763_lg.txt fixed --[[User:Dpaul|Dpaul]] 16:33, 27 February 2013 (EST)
::Gold label NY01075763_lg.txt has Pyrenidium actinellurn, should be Pyrenidium actinellum. Gold Parsed copies the error verbatim (as it should) and needs to be corrected if the .txt file is corrected.
 
::::/home/aocr/datasets/lichens/gold/outputs/human/NY01075763_lg.txt fixed --[[User:Dpaul|Dpaul]] 17:28, 26 February 2013 (EST)  
::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075763_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:28, 26 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/ocr/NY01075763_lg.txt fixed --[[User:Dpaul|Dpaul]] 16:33, 27 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/parsed/NY01075763_lg.csv fixed --[[User:Dpaul|Dpaul]] 16:33, 27 February 2013 (EST)
::::/webroot/datasets/lichens/gold/parsed/NY01075763_lg.csv fixed --[[User:Dpaul|Dpaul]] 16:33, 27 February 2013 (EST)


::datasets/lichens/gold/ocr/WIS-L-0012040_lg.txt: Longitude recorded as L49 (capitalized for clarity) instead of 149
::datasets/lichens/gold/ocr/WIS-L-0012040_lg.txt: Longitude recorded as L49 (capitalized for clarity) instead of 149


::::/webroot/datasets/lichens/gold/ocr/WIS-L-0012040_lg.txt fixed --[[User:Dpaul|Dpaul]] 16:39, 27 February 2013 (EST)
::::/webroot/datasets/lichens/gold/ocr/WIS-L-0012040_lg.txt fixed --[[User:Dpaul|Dpaul]] 16:39, 27 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/parsed/WIS-L-0012040_lg.csv fixed --[[User:Dpaul|Dpaul]] 16:39, 27 February 2013 (EST)
::::/webroot/datasets/lichens/gold/parsed/WIS-L-0012040_lg.csv fixed --[[User:Dpaul|Dpaul]] 16:39, 27 February 2013 (EST)


== Unicode Reserved character (single quote) ==
== Unicode Reserved character (single quote) ==
The following files use Unicode Character 'PRIVATE USE TWO' (U+0092) as a single quote mark
* NY_00617142.txt
* NY_01334334.txt


::/webroot/datasets/herb/gold/ocr/NY_00617142.txt fixed --[[User:Dpaul|Dpaul]] 16:59, 27 February 2013 (EST)
The following files use Unicode Character 'PRIVATE USE TWO' (U+0092) as a single quote mark
 
*NY_00617142.txt
*NY_01334334.txt
 
::/webroot/datasets/herb/gold/ocr/NY_00617142.txt fixed --[[User:Dpaul|Dpaul]] 16:59, 27 February 2013 (EST)  
::/webroot/datasets/herb/gold/ocr/NY_01334334.txt fixed --[[User:Dpaul|Dpaul]] 16:59, 27 February 2013 (EST)
::/webroot/datasets/herb/gold/ocr/NY_01334334.txt fixed --[[User:Dpaul|Dpaul]] 16:59, 27 February 2013 (EST)


== Right single Quote ==
== Right single Quote ==
The following files contain the unicode character u+2019, Right Single Quotation Mark
 
*datasets/lichens/gold/ocr/NY01075760_lg.txt
The following files contain the unicode character u+2019, Right Single Quotation Mark  
*datasets/lichens/gold/ocr/NY01075761_lg.txt
 
*datasets/lichens/gold/ocr/NY01075761_lg.txt
*datasets/lichens/gold/ocr/NY01075760_lg.txt  
*datasets/lichens/gold/ocr/NY01075762_lg.txt
*datasets/lichens/gold/ocr/NY01075761_lg.txt  
*datasets/lichens/gold/ocr/NY01075764_lg.txt
*datasets/lichens/gold/ocr/NY01075761_lg.txt  
*datasets/lichens/gold/ocr/NY01075768_lg.txt
*datasets/lichens/gold/ocr/NY01075762_lg.txt  
*datasets/lichens/gold/ocr/NY01075768_lg.txt
*datasets/lichens/gold/ocr/NY01075764_lg.txt  
*datasets/lichens/gold/ocr/NY01075770_lg.txt
*datasets/lichens/gold/ocr/NY01075768_lg.txt  
*datasets/lichens/gold/ocr/NY01075771_lg.txt
*datasets/lichens/gold/ocr/NY01075768_lg.txt  
*datasets/lichens/gold/ocr/NY01075771_lg.txt
*datasets/lichens/gold/ocr/NY01075770_lg.txt  
*datasets/lichens/gold/ocr/NY01075771_lg.txt
*datasets/lichens/gold/ocr/NY01075771_lg.txt  
*datasets/lichens/gold/ocr/NY01075776_lg.txt
*datasets/lichens/gold/ocr/NY01075771_lg.txt  
*datasets/lichens/gold/ocr/NY01075777_lg.txt
*datasets/lichens/gold/ocr/NY01075771_lg.txt  
*datasets/lichens/gold/ocr/NY01075779_lg.txt
*datasets/lichens/gold/ocr/NY01075776_lg.txt  
*datasets/lichens/gold/ocr/NY01075779_lg.txt
*datasets/lichens/gold/ocr/NY01075777_lg.txt  
*datasets/lichens/gold/ocr/NY01075781_lg.txt
*datasets/lichens/gold/ocr/NY01075779_lg.txt  
*datasets/lichens/gold/ocr/NY01075785_lg.txt
*datasets/lichens/gold/ocr/NY01075779_lg.txt  
*datasets/lichens/gold/ocr/NY01075785_lg.txt
*datasets/lichens/gold/ocr/NY01075781_lg.txt  
*datasets/lichens/gold/ocr/NY01075786_lg.txt
*datasets/lichens/gold/ocr/NY01075785_lg.txt  
*datasets/lichens/gold/ocr/NY01075786_lg.txt
*datasets/lichens/gold/ocr/NY01075785_lg.txt  
*datasets/lichens/gold/ocr/NY01075787_lg.txt
*datasets/lichens/gold/ocr/NY01075786_lg.txt  
*datasets/lichens/gold/ocr/NY01075787_lg.txt
*datasets/lichens/gold/ocr/NY01075786_lg.txt  
*datasets/lichens/gold/ocr/NY01075788_lg.txt
*datasets/lichens/gold/ocr/NY01075787_lg.txt  
*datasets/lichens/gold/ocr/NY01075788_lg.txt
*datasets/lichens/gold/ocr/NY01075787_lg.txt  
*datasets/lichens/gold/ocr/NY01075789_lg.txt
*datasets/lichens/gold/ocr/NY01075788_lg.txt  
*datasets/lichens/gold/ocr/NY01075789_lg.txt
*datasets/lichens/gold/ocr/NY01075788_lg.txt  
*datasets/lichens/gold/ocr/NY01075797_lg.txt
*datasets/lichens/gold/ocr/NY01075789_lg.txt  
*datasets/lichens/gold/ocr/NY01075798_lg.txt
*datasets/lichens/gold/ocr/NY01075789_lg.txt  
*datasets/lichens/gold/ocr/NY01075812_lg.txt
*datasets/lichens/gold/ocr/NY01075797_lg.txt  
*datasets/lichens/gold/ocr/NY01075817_lg.txt
*datasets/lichens/gold/ocr/NY01075798_lg.txt  
*datasets/lichens/gold/ocr/NY01075818_lg.txt
*datasets/lichens/gold/ocr/NY01075812_lg.txt  
*datasets/lichens/gold/ocr/NY01075819_lg.txt
*datasets/lichens/gold/ocr/NY01075817_lg.txt  
*datasets/lichens/gold/ocr/NY01075820_lg.txt
*datasets/lichens/gold/ocr/NY01075818_lg.txt  
*datasets/lichens/gold/ocr/NY01075821_lg.txt
*datasets/lichens/gold/ocr/NY01075819_lg.txt  
*datasets/lichens/gold/ocr/NY01075821_lg.txt
*datasets/lichens/gold/ocr/NY01075820_lg.txt  
*datasets/lichens/gold/ocr/NY01075822_lg.txt
*datasets/lichens/gold/ocr/NY01075821_lg.txt  
*datasets/lichens/gold/ocr/NY01075828_lg.txt
*datasets/lichens/gold/ocr/NY01075821_lg.txt  
*datasets/lichens/gold/ocr/NY01075829_lg.txt
*datasets/lichens/gold/ocr/NY01075822_lg.txt  
*datasets/lichens/gold/ocr/NY01075830_lg.txt
*datasets/lichens/gold/ocr/NY01075828_lg.txt  
*datasets/lichens/gold/ocr/NY01075831_lg.txt
*datasets/lichens/gold/ocr/NY01075829_lg.txt  
*datasets/lichens/gold/ocr/TENN-L-0000059_lg.txt
*datasets/lichens/gold/ocr/NY01075830_lg.txt  
*datasets/lichens/gold/ocr/TENN-L-0000073_lg.txt
*datasets/lichens/gold/ocr/NY01075831_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0011728_lg.txt
*datasets/lichens/gold/ocr/TENN-L-0000059_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0011730_lg.txt
*datasets/lichens/gold/ocr/TENN-L-0000073_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0011736_lg.txt
*datasets/lichens/gold/ocr/WIS-L-0011728_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0012033_lg.txt
*datasets/lichens/gold/ocr/WIS-L-0011730_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0012035_lg.txt
*datasets/lichens/gold/ocr/WIS-L-0011736_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0012039_lg.txt
*datasets/lichens/gold/ocr/WIS-L-0012033_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0012035_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0012039_lg.txt  
*datasets/lichens/gold/ocr/WIS-L-0012082_lg.txt
*datasets/lichens/gold/ocr/WIS-L-0012082_lg.txt


::/webroot/datasets/lichens/gold/ocr above files in this directory all fixed --[[User:Dpaul|Dpaul]] 17:22, 27 February 2013 (EST)
::/webroot/datasets/lichens/gold/ocr above files in this directory all fixed --[[User:Dpaul|Dpaul]] 17:22, 27 February 2013 (EST)


== Right Double Quote ==
== Right Double Quote ==
The following files contain the unicode character u+201D, Right Double Quotation Mark
 
* datasets/lichens/gold/ocr/WIS-L-0012053_lg.txt
The following files contain the unicode character u+201D, Right Double Quotation Mark  
 
*datasets/lichens/gold/ocr/WIS-L-0012053_lg.txt  
**fixed --[[User:Dpaul|Dpaul]] 15:51, 27 February 2013 (EST)
**fixed --[[User:Dpaul|Dpaul]] 15:51, 27 February 2013 (EST)


== Parse file errors ==
== Parse file errors ==
::Inconsistency in Gold Parsed decimalLatitude and decimalLongitude in many labels. All omitted from NYBG lichens and Tennesee lichens. Gold Parsed WIS-L-0011728_lg.csv has decimalLatitude & decimalLongitude rounded to 3 decimal digits (e.g. 60.467). WIS-L-0011729_lg.csv has decimalLatitude rounded to 2 decimal digits (60.15), decimalLongitude rounded to 1 decimal digit (-152.6). Typical of variations found throughout the files. It's possible that trailing zeros were just stripped off, but this inconsistency makes it impossible to match all the labels with a parsing program.
 
::Inconsistency in Gold Parsed decimalLatitude and decimalLongitude in many labels. All omitted from NYBG lichens and Tennesee lichens. Gold Parsed WIS-L-0011728_lg.csv has decimalLatitude &amp; decimalLongitude rounded to 3 decimal digits (e.g. 60.467). WIS-L-0011729_lg.csv has decimalLatitude rounded to 2 decimal digits (60.15), decimalLongitude rounded to 1 decimal digit (-152.6). Typical of variations found throughout the files. It's possible that trailing zeros were just stripped off, but this inconsistency makes it impossible to match all the labels with a parsing program.


::::'''Alex will change the metrics to avoid counting off for stripped trailing zeroes'''. --[[User:Dpaul|Dpaul]] 15:36, 27 February 2013 (EST)
::::'''Alex will change the metrics to avoid counting off for stripped trailing zeroes'''. --[[User:Dpaul|Dpaul]] 15:36, 27 February 2013 (EST)


::Inconsistency in capitalization of verbatim fields in many Gold Parsed lichens. Example: NY01075763_lg.csv. In the label and OCR text the county is capitalized as ST. FRANCOIS, but in NY01075763_lg.csv it is title case: St. Francois. The state MISSOURI is capitalized in both the .txt and the .csv file. The scoring program is case sensitive, so any difference between the gold .csv and the program generated .csv will be marked wrong.
::Inconsistency in capitalization of verbatim fields in many Gold Parsed lichens. Example: NY01075763_lg.csv. In the label and OCR text the county is capitalized as ST. FRANCOIS, but in NY01075763_lg.csv it is title case: St. Francois. The state MISSOURI is capitalized in both the .txt and the .csv file. The scoring program is case sensitive, so any difference between the gold .csv and the program generated .csv will be marked wrong.


::::'''Alex will change the metrics to be case-insensitive'''. --[[User:Dpaul|Dpaul]] 17:28, 26 February 2013 (EST)
::::'''Alex will change the metrics to be case-insensitive'''. --[[User:Dpaul|Dpaul]] 17:28, 26 February 2013 (EST)


::Gold Parsed NY01075759_lg.csv: verbatimEventDate is 1998-04-19, should be 19 April 1998.
::Gold Parsed NY01075759_lg.csv: verbatimEventDate is 1998-04-19, should be 19 April 1998.


::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:06, 26 February 2013 (EST)
::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:06, 26 February 2013 (EST)  
::::/home/aocr/datasets/lichens/silver/parsed/human/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:06, 26 February 2013 (EST)
::::/home/aocr/datasets/lichens/silver/parsed/human/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:06, 26 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/parsed/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:35, 27 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/parsed/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:35, 27 February 2013 (EST)


::Gold Parsed NY01075759_lg.csv: eventDate is 4/19/1998, should be 1998-04-19 according to Darwin Core (http://rs.tdwg.org/dwc/terms/#eventDate).
::Gold Parsed NY01075759_lg.csv: eventDate is 4/19/1998, should be 1998-04-19 according to Darwin Core (http://rs.tdwg.org/dwc/terms/#eventDate).


::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:06, 26 February 2013 (EST)
::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:06, 26 February 2013 (EST)  
::::/home/aocr/datasets/lichens/silver/parsed/human/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:06, 26 February 2013 (EST)
::::/home/aocr/datasets/lichens/silver/parsed/human/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:06, 26 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/parsed/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:35, 27 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/parsed/NY01075759_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:35, 27 February 2013 (EST)  
::::/webroot/datasets/lichens/silver/parsed/NY01075759_lg.csv okay --[[User:Dpaul|Dpaul]] 17:35, 27 February 2013 (EST)
::::/webroot/datasets/lichens/silver/parsed/NY01075759_lg.csv okay --[[User:Dpaul|Dpaul]] 17:35, 27 February 2013 (EST)
Line 961: Line 669:
::Gold Parsed NY01075770_lg.csv omits collector number, but should be 852.
::Gold Parsed NY01075770_lg.csv omits collector number, but should be 852.


::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075770_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:18, 26 February 2013 (EST)
::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075770_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:18, 26 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/parsed/NY01075770_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:38, 27 February 2013 (EST)
::::/webroot/datasets/lichens/gold/parsed/NY01075770_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:38, 27 February 2013 (EST)


::Gold OCR NY01075786_lg.txt has "(Ach.) Mil'll. Arg.", but on the image label it is "(Ach.) Müll. Arg." This error is carried to the Gold Parsed .csv file (which should be corrected if the .txt file is corrected).
::Gold OCR NY01075786_lg.txt has "(Ach.) Mil'll. Arg.", but on the image label it is "(Ach.) Müll. Arg." This error is carried to the Gold Parsed .csv file (which should be corrected if the .txt file is corrected).


::::/home/aocr/datasets/lichens/gold/outputs/human/NY01075786_lg.txt fixed --[[User:Dpaul|Dpaul]] 18:28, 26 February 2013 (EST)
::::/home/aocr/datasets/lichens/gold/outputs/human/NY01075786_lg.txt fixed --[[User:Dpaul|Dpaul]] 18:28, 26 February 2013 (EST)  
::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075786_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:28, 26 February 2013 (EST)
::::/home/aocr/datasets/lichens/gold/parsed/human/NY01075786_lg.csv fixed --[[User:Dpaul|Dpaul]] 18:28, 26 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/parsed/NY01075786_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:52, 27 February 2013 (EST)
::::/webroot/datasets/lichens/gold/parsed/NY01075786_lg.csv fixed --[[User:Dpaul|Dpaul]] 17:52, 27 February 2013 (EST)  
::::/webroot/datasets/lichens/gold/ocr/NY01075786_lg.txt fixed --[[User:Dpaul|Dpaul]] 17:52, 27 February 2013 (EST)
::::/webroot/datasets/lichens/gold/ocr/NY01075786_lg.txt fixed --[[User:Dpaul|Dpaul]] 17:52, 27 February 2013 (EST)


<br>


::Label image NY01075760_lg.jpg had a spec of dirt next to "F. Berger", introducing an apostrophe as "Kocourkova & 'F. Berger" in the Gold OCR. Gold Parsed NY01075760_lg.csv corrected "Kocourkova & 'F. Berger" back to "Kocourkova & F. Berger", omitting the apostrophe. Probably a valid correction, but not in a verbatim field.
::Label image NY01075760_lg.jpg had a spec of dirt next to "F. Berger", introducing an apostrophe as "Kocourkova &amp; 'F. Berger" in the Gold OCR. Gold Parsed NY01075760_lg.csv corrected "Kocourkova &amp; 'F. Berger" back to "Kocourkova &amp; F. Berger", omitting the apostrophe. Probably a valid correction, but not in a verbatim field.


::::/home/aocr/webroot/datasets/lichens/gold/parsed/NY01075760_lg.csv changed gold parsed aocr:verbatimScientificName to include the apostrophe to be consistent for verbatim field. fixed --[[User:Dpaul|Dpaul]] 16:06, 27 February 2013 (EST)
::::/home/aocr/webroot/datasets/lichens/gold/parsed/NY01075760_lg.csv changed gold parsed aocr:verbatimScientificName to include the apostrophe to be consistent for verbatim field. fixed --[[User:Dpaul|Dpaul]] 16:06, 27 February 2013 (EST)


Back to the [https://www.idigbio.org/wiki/index.php/2013_AOCR_Hackathon_Wiki Hackathon Wiki]
Back to the [https://www.idigbio.org/wiki/index.php/2013_AOCR_Hackathon_Wiki Hackathon Wiki]

Navigation menu