CRIA and iDigBio meet

 

Date: 23 May, 2013

CRIA and iDigBio Meeting

Today the CRIA and iDigBio teams met via adobeconnect to exchange ideas and share information.

Attendees:

CRIA: http://www.cria.org.br, http://splink.cria.org.br/index?&setlang=en

Dora Canhos, Director
Sidnei de Souza, IT

iDigBio: https://www.idigbio.org

Larry Page, PI iDigBio (https://www.idigbio.org/content/dr-larry-page-documenting-diversity)
Alex Thompson, IT iDigBio
Matt Collins, IT iDigBio
Greg Traub, IT iDigBio
Reed Beaman, Bioinformatics at FLMNH
Kevin Love, IT iDigBio
Renato Figueiredo, IT iDigBio
Joanna McCaffrey, Bioinformatics at iDigBio
Sarfaraz Soomro, graduate student, IT iDigBio
Kyuho Jeong, graduate student, IT iDigBio

We began by introducing ourselves, and then addressed some questions for CRIA that the iDigBio team had prepared ahead of time. iDigBio was interested in learning about CRIA’s project experience, since their projects are similar and CRIA’s work is many years ahead of iDigBio’s. While some of the parameters of their projects are different from each other, their overall missions are similar, i.e., to provide integrated access to the natural history data of all of the national institutions. What is different is that CRIA’s scope includes observation data, and iDigBio’s scope includes neither observation data nor collections from federal institutions. Additionally, CRIA’s scope does not yet include helping data providers with digitization, although they would like to change that.

iDigBio: Are you a GBIF node for Brazil?

CRIA: No. Brazil has recently signed the memorandum of understanding with GBIF (October 2012) and its official node is being structured at LNCC – Laboratório Nacional de Computação Científica (www.lncc.br) and is part of SiBBr (Sistema de Informação sobre a Biodiversidade Brasileira) of the Ministry of Science, Technology & Innovation. We believe that in the near future CRIA will be providing data to GBIF through the speciesLink network that is continuously developed and maintained by CRIA.

iDigBio: Do you have plans to put your source code into open source?

CRIA: No, we are short-staffed, and most of our tools and applications are tied to our database. We haven’t had the necessary resources to document all software for use by third parties, with the exception of openModeller, a framework for ecological niche modeling that is available at SourceForge (http://openmodeller.sourceforge.net).

iDigBio: Is your tool spLinker available for testing?

CRIA: spLinker is an application that enables biological collections to mirror their data in regional servers integrated to the speciesLink network. Data fields are mapped according to DarwinCore and curators can mark specific fields or records that they do not wish to share publicly. All data sent to the regional servers are harvested and brought to a central database and made freely and openly available. CRIA was asked whether iDigBio can test spLinker and the answer was ‘yes’. The person to be contacted is Alexandre Marino (marino@cria.org.br).

iDigBio: How do you represent and use non-curated dataset, like taxonomy lists, authority files?

iDigBio: What is your data architecture?

CRIA: We discussed use of various floras to validate data, as well as data checking. We use the following lists to check the status of a name:

·      Moure's Bee Catalogue (2012, Jul 24) - 9780 records - 5781 accepted names

·      List of Species of the Brazilian Flora (Dec 31, 2012) - 94492 records - 54245 accepted names

·      Catalogue of Life - Annual Checklist (2011) - 2479704 records - 1631792 accepted names

·      DSMZ Bacteria (Nov 2012) - 14375 records - 12341 accepted names

As to our data, we use DarwinCore as the data model and both DiGIR and TAPIR protocols. For data repatriation we also study each data provider and basically use whatever is best for the data provider.

iDigBio: What were your biggest surprises in your project?

CRIA: Perhaps the biggest surprise is the fact that the project is still ongoing and increasing its importance.

iDigBio: What are your strategies for sustainability?

CRIA: Working as a network of collections (over 300), institutions and researchers is the project’s greatest strength. Minimizing the complexity of participation at the data provider’s end as far is informatics is concerned was also a fundamental strategy.

While CRIA’s staff size is small, we feel that the network is sufficiently important to be able to achieve at least minimum funding for its maintenance, (although funding is a fundamental and perennial issue), and we see that the momentum is continuing to encourage new partners to make their data available.

iDigBio: Do we have things that you might be interested in?

CRIA: We would be interested in iDigBio’s workshop/working groups, and especially the digitization workflow documentation. Dora suggested that their student, Flávia Pezzini (pezzini@cria.org.br), would be interested in looking at those resources in particular, and Joanna offered to give her any assistance desired.

We are also interested in iDigBio’s public participation and citizen scientist initiatives.

iDigBio: Some resources you might be interested in include the working group:

https://www.idigbio.org/wiki/index.php/IDigBio_Working_Groups - Public Participation in Digitization (CitSci)

a report of a workshop held in September, 2012

https://www.idigbio.org/wiki/index.php/Joint_Public_Participation_in_Digitization_and_Outreach_and_Education_Workshop

These are some other links to start looking for info in iDigBio related to digitization and contribution data to iDigBio:

iDigBio Welcome

https://www.idigbio.org/sites/default/files/Welcome_To_iDigBio.pdf

Image Policy

https://www.idigbio.org/sites/default/files/sites/default/files/Image_File_Format_Recommendations_and_Standards.pdf

Working Groups – open to all

https://www.idigbio.org/wiki/index.php/IDigBio_Working_Groups

Large amounts of digitization resources:

https://www.idigbio.org/wiki/index.php/Digitization_Resources

 

from Dora Canhos and Joanna McCaffrey