Data Problems: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 25: | Line 25: | ||
*Download format and term definitions | *Download format and term definitions | ||
**The columns after download are not in logical order. All columns that are identifiers should be clustered together, locality information clustered together, collecting event clustered etc. Within the clusters the data elements can be in a loose order, but the elements should be together. | **The columns after download are not in logical order. All columns that are identifiers should be clustered together, locality information clustered together, collecting event clustered etc. Within the clusters the data elements can be in a loose order, but the elements should be together. | ||
**Several terms are included in the download that represent the same information, but are named only slightly different (ex. VerbatimEventDate, verbatimEventDate). These should be merged in the download file or at least returned next to each other in the download file. | **Several terms are included in the download that represent the same information, but are named only slightly different (ex. VerbatimEventDate, verbatimEventDate). These should be merged in the download file or at least returned next to each other in the download file. (fixed in next release of portal) | ||
**There is no document that defines the terms. One should be provided. Further, those definitions should have URI identifiers so that individuals can reuse them with confidence (including them in a meta.xml). | **There is no document that defines the terms. One should be provided. Further, those definitions should have URI identifiers so that individuals can reuse them with confidence (including them in a meta.xml). | ||
*Portal behavior | *Portal behavior | ||
**When searching the portal, certain fields should not be an exact match. These include Collector and Locality fields. There are others, but these were the most limiting. | **When searching the portal, certain fields should not be an exact match. These include Collector and Locality fields. There are others, but these were the most limiting. | ||
**Higher taxonomy should be included to improve the search. Family name being the most important. If it is not in the dataset from the provider, it should automatically be added upon ingestion to iDigBio. Without the higher taxonomy, a user will miss specimen records they are likely looking for. | **Higher taxonomy should be included to improve the search. Family name being the most important. If it is not in the dataset from the provider, it should automatically be added upon ingestion to iDigBio. Without the higher taxonomy, a user will miss specimen records they are likely looking for. (improvements coming in next release of portal - using GBIF Nub taxonomy) | ||
*Minor issues | *Minor issues | ||
**Terms should be evaluated for continuity. The term “row number” contains a space. | **Terms should be evaluated for continuity. The term “row number” contains a space. | ||
**Ideally would like a tsv as well as a csv download. | **Ideally would like a tsv as well as a csv download. (support for tsv export format is coming in next release of portal) | ||
|valign="top"| K. Seltmann, R. Rabeler, TTD TCN | |valign="top"| K. Seltmann, R. Rabeler, TTD TCN | ||
|} | |} |
Revision as of 12:51, 9 February 2015
The following are anecdotes contributed by users of iDigBio's data. They aim to be helpful in several ways:
- Anyone submitting data should read them and make adjustments and improvements in their own to avoid the issues
- They can be a springboard for interested parties to address overall data quality issues
Anecdotes
Anecdote | Contact |
---|---|
|
K. Schultz, EOL |
|
C. Johnson, AEC |
|
K. Seltmann, R. Rabeler, TTD TCN |