Data Problems: Difference between revisions

Jump to navigation Jump to search
(Created page with "The following are anecdotes contributed by users of iDigBio's data. They aim to be helpful in several ways: #Anyone submitting data should read them and make adjustments and i...")
 
Line 19: Line 19:


*Download format and term definitions
*Download format and term definitions
The columns after download are not in logical order. All columns that are identifiers should be clustered together, locality information clustered together, collecting event clustered etc. Within the clusters the data elements can be in a loose order, but the elements should be together.
**The columns after download are not in logical order. All columns that are identifiers should be clustered together, locality information clustered together, collecting event clustered etc. Within the clusters the data elements can be in a loose order, but the elements should be together.
Several terms are included in the download that represent the same information, but are named only slightly different (ex. VerbatimEventDate, verbatimEventDate). These should be merged in the download file or at least returned next to each other in the download file.
**Several terms are included in the download that represent the same information, but are named only slightly different (ex. VerbatimEventDate, verbatimEventDate). These should be merged in the download file or at least returned next to each other in the download file.
There is no document that defines the terms. One should be provided. Further, those definitions should have URI identifiers so that individuals can reuse them with confidence (including them in a meta.xml).
**There is no document that defines the terms. One should be provided. Further, those definitions should have URI identifiers so that individuals can reuse them with confidence (including them in a meta.xml).


*Portal behavior
**When searching the portal, certain fields should not be an exact match. These include Collector and Locality fields. There are others, but these were the most limiting.
**Higher taxonomy should be included to improve the search. Family name being the most important. If it is not in the dataset from the provider, it should automatically be added upon ingestion to iDigBio. Without the higher taxonomy, a user will miss specimen records they are likely looking for.


Portal behavior
**Minor issues
When searching the portal, certain fields should not be an exact match. These include Collector and Locality fields. There are others, but these were the most limiting.
**Terms should be evaluated for continuity. The term “row number” contains a space.
Higher taxonomy should be included to improve the search. Family name being the most important. If it is not in the dataset from the provider, it should automatically be added upon ingestion to iDigBio. Without the higher taxonomy, a user will miss specimen records they are likely looking for.
**Ideally would like a tsv as well as a csv download. (K. Seltmann, R. Rabeler, TTD TCN)
 
Minor issues
Terms should be evaluated for continuity. The term “row number” contains a space.
Ideally would like a tsv as well as a csv download. (K. Seltmann, R. Rabeler, TTD TCN)