Excel Data Quality Toolkit: Difference between revisions

m
 
(9 intermediate revisions by 3 users not shown)
Line 86: Line 86:


'''Solution:'''
'''Solution:'''
Excel does not have an easy solution for this problem.


=== Georeference Metadata with no Associated Georeference ===
=== Georeference Metadata with no Associated Georeference ===
'''Problem:''' Metadata fields regarding coordinates, such as [https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters coordinateUncertaintyInMeters], [https://dwc.tdwg.org/terms/#dwc:georeferenceProtocol georeferenceProtocol], [https://dwc.tdwg.org/terms/#dwc:georeferenceSources georeferenceSources], [https://dwc.tdwg.org/terms/#dwc:georeferencedBy georeferencedBy], [https://dwc.tdwg.org/terms/#dwc:georeferenceRemarks georeferenceRemarks], and [https://dwc.tdwg.org/terms/#dwc:geodeticDatum geodeticDatum] are provided, but no coordinates are present. This is sometimes intentional, particularly when georeferencedBy and georeferencedRemarks are used to indicate whether a record was purposefully not georeferenced. However, it is rare that the other metadata fields can be used without associated coordinates (i.e., [https://dwc.tdwg.org/terms/#dwc:decimalLatitude decimalLatitude], [ https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude], or [https://dwc.tdwg.org/terms/#dwc:verbatimCoordinates verbatimCoordinates]).
'''Problem:''' Metadata fields regarding coordinates, such as [https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters coordinateUncertaintyInMeters], [https://dwc.tdwg.org/terms/#dwc:georeferenceProtocol georeferenceProtocol], [https://dwc.tdwg.org/terms/#dwc:georeferenceSources georeferenceSources], [https://dwc.tdwg.org/terms/#dwc:georeferencedBy georeferencedBy], [https://dwc.tdwg.org/terms/#dwc:georeferenceRemarks georeferenceRemarks], and [https://dwc.tdwg.org/terms/#dwc:geodeticDatum geodeticDatum] are provided, but no coordinates are present. This is sometimes intentional, particularly when georeferencedBy and georeferencedRemarks are used to indicate whether a record was purposefully not georeferenced. However, it is rare that the other metadata fields can be used without associated coordinates (i.e., [https://dwc.tdwg.org/terms/#dwc:decimalLatitude decimalLatitude], [ https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude], or [https://dwc.tdwg.org/terms/#dwc:verbatimCoordinates verbatimCoordinates]).


'''Solution:'''
'''Solution:''' Excel does not have an easy solution for this problem.


=== Elevation is Unlikely ===
=== Elevation is Unlikely ===
Line 105: Line 106:
'''Problem:''' The sign of the latitude ([https://dwc.tdwg.org/terms/#dwc:decimalLatitude decimalLatitude]) or longitude ([https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude]) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative.
'''Problem:''' The sign of the latitude ([https://dwc.tdwg.org/terms/#dwc:decimalLatitude decimalLatitude]) or longitude ([https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude]) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative.


'''Solution:'''
'''Solution:''' Excel does not have an easy solution for this problem.


=== Invalid Coordinates ===
=== Invalid Coordinates ===
'''Problem:''' Coordinates deviate from accepted ranges or formats, like decimalLatitude and decimalLongitude exceeding -90 to 90 and -180 to 180, respectively. verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second.  
'''Problem:''' Coordinates deviate from accepted ranges or formats, like decimalLatitude and decimalLongitude exceeding -90 to 90 and -180 to 180, respectively. verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second.  


'''Solution:'''
'''Solution:''' Excel does not have an easy solution for this problem.


=== Lower Geography Values are Provided, but No Higher Geography ===
=== Lower Geography Values are Provided, but No Higher Geography ===
Line 133: Line 134:
|-  
|-  
| || || Kerr || missing state and country
| || || Kerr || missing state and country
|-
| USA || Texas || Kerr || OK
|}
|}


Line 149: Line 152:
'''Problem:''' The provided value for [https://dwc.tdwg.org/terms/#dwc:country country] and [https://dwc.tdwg.org/terms/#dwc:countryCode countryCode] do not match.
'''Problem:''' The provided value for [https://dwc.tdwg.org/terms/#dwc:country country] and [https://dwc.tdwg.org/terms/#dwc:countryCode countryCode] do not match.


'''Solution:'''
'''Solution:''' Excel does not have an easy solution for this problem.


=== Mismatched Geographic Terms ===
=== Mismatched Geographic Terms ===
'''Problem:''' A record has lower geographic terms (e.g., state/province, county) that do not exist under the provided higher geographic term(s). For example, country = Canada and stateProvince = Sussex. There is no Sussex province in Canada.
'''Problem:''' A record has lower geographic terms (e.g., state/province, county) that do not exist under the provided higher geographic term(s). For example, country = Canada and stateProvince = Sussex. There is no Sussex province in Canada.


'''Solution:'''
'''Solution:''' Excel does not have an easy solution for this problem.


=== Missing Geodetic Datum ===
=== Missing Geodetic Datum ===
Line 232: Line 235:


'''Solution:'''
'''Solution:'''
# Open your dataset in Excel and highlight Row 1.
# From the Data toolbar, click "Filter".
# Navigate to the column containing your higher taxonomy values (e.g. dwc:order) and select the arrow icon next to the column header indicating that this column can now be filtered on.
# At the bottom of the pop-up window, deselect "Select All", scroll down, and then only check the "Blanks" box (you may need to scroll down in the pop-up window to locate it).
# Resolve missing higher taxonomy as you are able.


== Other Issues ==
== Other Issues ==
Line 246: Line 255:
'''Problem:''' When transferring text files between Unix/Linux and DOS/Windows systems, line endings can become inconsistent. Unix/Linux systems typically use line feed (LF) characters, while DOS/Windows systems use carriage return (CR) and line feed (LF) combinations. This mismatch can result in extra characters appearing in the data, causing visual artifacts and processing errors.
'''Problem:''' When transferring text files between Unix/Linux and DOS/Windows systems, line endings can become inconsistent. Unix/Linux systems typically use line feed (LF) characters, while DOS/Windows systems use carriage return (CR) and line feed (LF) combinations. This mismatch can result in extra characters appearing in the data, causing visual artifacts and processing errors.


'''Solution:'''
'''Solution:''' Excel does not have an easy solution for this problem.


=== Invalid Individual Count ===
=== Invalid Individual Count ===
Line 338: Line 347:
| MaterialCitation || MaterialCitation
| MaterialCitation || MaterialCitation
|}
|}
[https://github.com/Jegelewicz/data_quality/blob/main/Excel_examples/2024%20Excel%20data%20quality%20checks.xlsx Sample file]
3

edits