Data Problems: Difference between revisions

Line 5: Line 5:
===Anecdotes===
===Anecdotes===
*Your Darwin Core archives have all the information we need, but filtering out all the label images will be a challenge.  We may have to employ a content-based image retrieval algorithm for the collections that have both label only and organism images, and this may take a while to develop.
*Your Darwin Core archives have all the information we need, but filtering out all the label images will be a challenge.  We may have to employ a content-based image retrieval algorithm for the collections that have both label only and organism images, and this may take a while to develop.
**I was surprised to find a creative commons license link in the dcterms:rights field of occurrence files.  In the files I looked at (e.g., Recordset 69037495-438d-4dba-bf0f-4878073766f1), there is no dwc:rightsHolder entry in the occurrence file, so it appears that there is a license, but the licensor is not named?  If these occurrences really have license restrictions, this complicates things for us.  Our data model treats the image + metadata as one media object, and we cannot accommodate different licenses.  If the media & occurrence licenses are always the same, it wouldn't be a problem, but in cases where they are different, we could not use the data from the occurrence file.  This means descriptions and locality information could not be displayed alongside the image on EOL, and they would not be available through the EOL API, which considerably decreases the value of these images to our users.
**I was surprised to find a creative commons license link in the dcterms:rights field of occurrence files.  In the files I looked at (e.g., Recordset 69037495-438d-4dba-bf0f-4878073766f1), there is no dwc:rightsHolder entry in the occurrence file, so it appears that there is a license, but the licensor is not named?  If these occurrences really have license restrictions, this complicates things for us.  Our data model treats the image + metadata as one media object, and we cannot accommodate different licenses.  If the media & occurrence licenses are always the same, it wouldn't be a problem, but in cases where they are different, we could not use the data from the occurrence file.  This means descriptions and locality information could not be displayed alongside the image on EOL, and they would not be available through the EOL API, which considerably decreases the value of these images to our users.
**Also, we would not be able to use label data in TraitBank if the occurrences are licensed.  While we recognize licenses at the data set level, we do not implement them at the level of individual records.  We have had discussions about this and came to the conclusion that like measurements and facts, occurrence records are unlikely to be protected by copyright, especially when they are presented in a commonly used standard like DwC. Of course, we won't know for sure until somebody files a lawsuit.  But we decided to err on the side of openness.  Is there any chance this issue could be brought up for discussion at iDigBio?
**Also, we would not be able to use label data in TraitBank if the occurrences are licensed.  While we recognize licenses at the data set level, we do not implement them at the level of individual records.  We have had discussions about this and came to the conclusion that like measurements and facts, occurrence records are unlikely to be protected by copyright, especially when they are presented in a commonly used standard like DwC. Of course, we won't know for sure until somebody files a lawsuit.  But we decided to err on the side of openness.  Is there any chance this issue could be brought up for discussion at iDigBio?
**We'll have a little more work to do before we're ready to import any of the iDigBio data.  I'll let you know if there is any progress on our end. (K. Schultz, EOL)
**We'll have a little more work to do before we're ready to import any of the iDigBio data.  I'll let you know if there is any progress on our end. (K. Schultz, EOL)


5,887

edits