3
edits
Jegelewicz (talk | contribs) |
|||
(19 intermediate revisions by 3 users not shown) | |||
Line 32: | Line 32: | ||
=IF(A1>TODAY(),"ERROR","OK") | =IF(A1>TODAY(),"ERROR","OK") | ||
[https://github.com/Jegelewicz/data_quality/blob/main/Excel_examples/2024%20Excel%20data%20quality%20checks.xlsx Sample file] | |||
=== Date is Suspiciously Old === | === Date is Suspiciously Old === | ||
Line 51: | Line 53: | ||
=IF(A1<B1,"ERROR","OK") | =IF(A1<B1,"ERROR","OK") | ||
[https://github.com/Jegelewicz/data_quality/blob/main/Excel_examples/2024%20Excel%20data%20quality%20checks.xlsx Sample file] | |||
=== Year, Month, and Day Values Do Not Match Date === | === Year, Month, and Day Values Do Not Match Date === | ||
Line 62: | Line 66: | ||
=DAY(A1) | =DAY(A1) | ||
[https://github.com/Jegelewicz/data_quality/blob/main/Excel_examples/2024%20Excel%20data%20quality%20checks.xlsx Sample file] | |||
== Geography == | == Geography == | ||
Line 80: | Line 86: | ||
'''Solution:''' | '''Solution:''' | ||
Excel does not have an easy solution for this problem. | |||
=== Georeference Metadata with no Associated Georeference === | === Georeference Metadata with no Associated Georeference === | ||
'''Problem:''' Metadata fields regarding coordinates, such as [https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters coordinateUncertaintyInMeters], [https://dwc.tdwg.org/terms/#dwc:georeferenceProtocol georeferenceProtocol], [https://dwc.tdwg.org/terms/#dwc:georeferenceSources georeferenceSources], [https://dwc.tdwg.org/terms/#dwc:georeferencedBy georeferencedBy], [https://dwc.tdwg.org/terms/#dwc:georeferenceRemarks georeferenceRemarks], and [https://dwc.tdwg.org/terms/#dwc:geodeticDatum geodeticDatum] are provided, but no coordinates are present. This is sometimes intentional, particularly when georeferencedBy and georeferencedRemarks are used to indicate whether a record was purposefully not georeferenced. However, it is rare that the other metadata fields can be used without associated coordinates (i.e., [https://dwc.tdwg.org/terms/#dwc:decimalLatitude decimalLatitude], [ https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude], or [https://dwc.tdwg.org/terms/#dwc:verbatimCoordinates verbatimCoordinates]). | '''Problem:''' Metadata fields regarding coordinates, such as [https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters coordinateUncertaintyInMeters], [https://dwc.tdwg.org/terms/#dwc:georeferenceProtocol georeferenceProtocol], [https://dwc.tdwg.org/terms/#dwc:georeferenceSources georeferenceSources], [https://dwc.tdwg.org/terms/#dwc:georeferencedBy georeferencedBy], [https://dwc.tdwg.org/terms/#dwc:georeferenceRemarks georeferenceRemarks], and [https://dwc.tdwg.org/terms/#dwc:geodeticDatum geodeticDatum] are provided, but no coordinates are present. This is sometimes intentional, particularly when georeferencedBy and georeferencedRemarks are used to indicate whether a record was purposefully not georeferenced. However, it is rare that the other metadata fields can be used without associated coordinates (i.e., [https://dwc.tdwg.org/terms/#dwc:decimalLatitude decimalLatitude], [ https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude], or [https://dwc.tdwg.org/terms/#dwc:verbatimCoordinates verbatimCoordinates]). | ||
'''Solution:''' | '''Solution:''' Excel does not have an easy solution for this problem. | ||
=== Elevation is Unlikely === | === Elevation is Unlikely === | ||
Line 99: | Line 106: | ||
'''Problem:''' The sign of the latitude ([https://dwc.tdwg.org/terms/#dwc:decimalLatitude decimalLatitude]) or longitude ([https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude]) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative. | '''Problem:''' The sign of the latitude ([https://dwc.tdwg.org/terms/#dwc:decimalLatitude decimalLatitude]) or longitude ([https://dwc.tdwg.org/terms/#dwc:decimalLongitude decimalLongitude]) does not match the sign/hemisphere of the given country. For example, all longitudes in the U.S. should be negative. | ||
'''Solution:''' | '''Solution:''' Excel does not have an easy solution for this problem. | ||
=== Invalid Coordinates === | === Invalid Coordinates === | ||
'''Problem:''' Coordinates deviate from accepted ranges or formats, like decimalLatitude and decimalLongitude exceeding -90 to 90 and -180 to 180, respectively. verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second. | '''Problem:''' Coordinates deviate from accepted ranges or formats, like decimalLatitude and decimalLongitude exceeding -90 to 90 and -180 to 180, respectively. verbatimCoordinates have to be valid values for coordinates in decimal degrees, degrees decimal minutes, degrees minutes second. | ||
'''Solution:''' | '''Solution:''' Excel does not have an easy solution for this problem. | ||
=== Lower Geography Values are Provided, but No Higher Geography === | === Lower Geography Values are Provided, but No Higher Geography === | ||
'''Problem:''' Lower geography (e.g., county, state/province) values exist, but no higher geography values (e.g., country) are provided. | '''Problem:''' Lower geography (e.g., county, state/province) values exist, but no higher geography values (e.g., country) are provided. | ||
'''Solution:''' | '''Solution:''' Check for missing higher levels of geography with this formula where A1 is the country, B1 is the stateProvince, and C1 is the county. | ||
=IF(ISBLANK(C1),IF(ISBLANK(B1),IF(ISBLANK(A1),"no geography","OK"),IF(ISBLANK(A1),"missing country","OK")),IF(ISBLANK(B1),IF(ISBLANK(A1),"missing state and country","missing state"),IF(ISBLANK(A1),"no country","OK"))) | =IF(ISBLANK(C1),IF(ISBLANK(B1),IF(ISBLANK(A1),"no geography","OK"),IF(ISBLANK(A1),"missing country","OK")),IF(ISBLANK(B1),IF(ISBLANK(A1),"missing state and country","missing state"),IF(ISBLANK(A1),"no country","OK"))) | ||
{|class="wikitable" | |||
|+ Sample Results | |||
|- | |||
! country !! stateProvince !! county !! check | |||
|- | |||
| || Texas || Kerr|| no country | |||
|- | |||
| USA || || Kerr|| missing state | |||
|- | |||
| || Texas || || missing country | |||
|- | |||
| USA || || || OK | |||
|- | |||
| || || Kerr || missing state and country | |||
|- | |||
| USA || Texas || Kerr || OK | |||
|} | |||
[https://github.com/Jegelewicz/data_quality/blob/main/Excel_examples/2024%20Excel%20data%20quality%20checks.xlsx Sample file] | |||
=== Minimum and Maximum Elevation Values Mismatched === | === Minimum and Maximum Elevation Values Mismatched === | ||
Line 119: | Line 146: | ||
=IF(A1>B1,"ERROR","OK") | =IF(A1>B1,"ERROR","OK") | ||
[https://github.com/Jegelewicz/data_quality/blob/main/Excel_examples/2024%20Excel%20data%20quality%20checks.xlsx Sample file] | |||
=== Mismatched Country and CountryCode Values === | === Mismatched Country and CountryCode Values === | ||
'''Problem:''' The provided value for [https://dwc.tdwg.org/terms/#dwc:country country] and [https://dwc.tdwg.org/terms/#dwc:countryCode countryCode] do not match. | '''Problem:''' The provided value for [https://dwc.tdwg.org/terms/#dwc:country country] and [https://dwc.tdwg.org/terms/#dwc:countryCode countryCode] do not match. | ||
'''Solution:''' | '''Solution:''' Excel does not have an easy solution for this problem. | ||
=== Mismatched Geographic Terms === | === Mismatched Geographic Terms === | ||
'''Problem:''' A record has lower geographic terms (e.g., state/province, county) that do not exist under the provided higher geographic term(s). For example, country = Canada and stateProvince = Sussex. There is no Sussex province in Canada. | '''Problem:''' A record has lower geographic terms (e.g., state/province, county) that do not exist under the provided higher geographic term(s). For example, country = Canada and stateProvince = Sussex. There is no Sussex province in Canada. | ||
'''Solution:''' | '''Solution:''' Excel does not have an easy solution for this problem. | ||
=== Missing Geodetic Datum === | === Missing Geodetic Datum === | ||
Line 206: | Line 235: | ||
'''Solution:''' | '''Solution:''' | ||
# Open your dataset in Excel and highlight Row 1. | |||
# From the Data toolbar, click "Filter". | |||
# Navigate to the column containing your higher taxonomy values (e.g. dwc:order) and select the arrow icon next to the column header indicating that this column can now be filtered on. | |||
# At the bottom of the pop-up window, deselect "Select All", scroll down, and then only check the "Blanks" box (you may need to scroll down in the pop-up window to locate it). | |||
# Resolve missing higher taxonomy as you are able. | |||
== Other Issues == | == Other Issues == | ||
Line 220: | Line 255: | ||
'''Problem:''' When transferring text files between Unix/Linux and DOS/Windows systems, line endings can become inconsistent. Unix/Linux systems typically use line feed (LF) characters, while DOS/Windows systems use carriage return (CR) and line feed (LF) combinations. This mismatch can result in extra characters appearing in the data, causing visual artifacts and processing errors. | '''Problem:''' When transferring text files between Unix/Linux and DOS/Windows systems, line endings can become inconsistent. Unix/Linux systems typically use line feed (LF) characters, while DOS/Windows systems use carriage return (CR) and line feed (LF) combinations. This mismatch can result in extra characters appearing in the data, causing visual artifacts and processing errors. | ||
'''Solution:''' | '''Solution:''' Excel does not have an easy solution for this problem. | ||
=== Invalid Individual Count === | === Invalid Individual Count === | ||
Line 312: | Line 347: | ||
| MaterialCitation || MaterialCitation | | MaterialCitation || MaterialCitation | ||
|} | |} | ||
[https://github.com/Jegelewicz/data_quality/blob/main/Excel_examples/2024%20Excel%20data%20quality%20checks.xlsx Sample file] |
edits