Specify Data Quality Toolkit
Overview
This toolkit contains Specify-specific resources for the Data Quality Toolkit 2024.
Catalog Numbers and Other Identifiers
Duplicate Catalog Numbers
Problem: The same catalog number is used multiple times within your dataset. (This problem may or may not be intentional, depending on your collection's policies. It is generally best to not duplicate catalog numbers, when possible).
How to FIND this Problem in Your Dataset:
How to FIX this Problem in your Dataset:
Dates
Identified Date Earlier than Collected Date
Problem: The date the specimen was identified (dateIdentified field) is earlier than the date the specimen was collected (eventDate).
How to FIND this Problem in Your Dataset:
How to FIX this Problem in your Dataset:
Geography
Improperly Negated Latitudes/Longitudes
Problem: The sign of the latitude (decimalLatitude) or longitude (decimalLongitude) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative.
How to FIND this Problem in Your Dataset:
How to FIX this Problem in your Dataset:
Missing Latitudes/Longitudes
Problem: A record has a latitude value, but not a longitude value.
How to FIND this Problem in Your Dataset:
How to FIX this Problem in your Dataset:
No batch fixing possible. You will need to review the records and either add lat/long values or remove the orphaned lat/long values.
Misspelled Geographic Unit Names
Problem: The geographic units (e.g., country, state, county) are misspelled, resulting in poor matching of geographic unit names to existing geographic lists.
How to FIND this Problem in Your Dataset:
How to FIX this Problem in your Dataset:
Taxonomy
Misspelled Taxonomic Names
Problem: Scientific names are misspelled, resulting in poor matching of taxonomic names to taxonomic databases.
How to FIND this Problem in Your Dataset:
How to FIX this Problem in your Dataset: