Call for Participation: Hackathon on iDigBio APIs/Services and Interoperability

Goal: Design, develop, implement, test and/or document uses of iDigBio data via its APIs

Location: University of Florida, Gainesville, FL

Dates and times: June 3-5, 2015; 8 am - 5 pm each day

To apply: https://ufl.qualtrics.com/SE/?SID=SV_6Wr1womZuY7O5o1 deadline February 28, 2015. Invited applicants will be notified by March 9th.

Synopsis

We encourage interested individuals (programmers and non-programmers) to apply for participation in a hackathon focused on the development and/or integration of application that use iDigBio APIs to ingest, access, visualize and search biocollections data.

Motivation and Goals

iDigBio (http://idigbio.org) has ingested more than 25 million specimens and 4 million media objects from biodiversity collections with world-wide range. This great resource of biodiversity information has been made accessible not only through the iDigBio portal, but also through Application Programming Interfaces (APIs) that applications written in any programming language can consume since 2013 to (a) access specimens, media, media metadata, datasets and publishers information, (b) perform searches, and (c) ingest media and its metadata.

Currently, this great resource is mainly used by the iDigBio portal and iDigBio developed applications. The goals of this hackathon are to: lower the entrance barrier to potential direct uses of the API by disseminating more broadly its capabilities and generating a body of use-case examples that can be reused by others, identify new opportunities for integration with other cyberinfrastructures, and develop collaborative pilot experiments that build on existing interoperability of other cyberinfrastructures.

Resources and Proposed Activities

All applicants are encouraged to get acquainted with the iDigBio APIs (https://www.idigbio.org/wiki/index.php/IDigBio_API):

Specimen data access APIs (v1)
Media ingestion APIs
Upcoming specimen and media search APIs (v2)

Some sample applications accessing iDigBio data can be viewed at the links shown below:

PhyloJIVE @iDigBio (http://phylojive.acis.ufl.edu/)
R library to use iDigBio (https://www.idigbio.org/wiki/index.php/IDigBio_API#Client_Libraries)
Using Arbor and OpenRefine to access iDigBio (http://blog.opentreeoflife.org/2014/10/07/tree-for-all-hackathon-series-taxon-sampling-part-2/)

The following proposed activities represent an attempt to strike a balance between the detailed descriptions of deliverables and the opening of opportunities for creative products to emerge from new interactions. Activities are detailed below, and these will get a crisper focus as participants are selected, and their interests are shared with the organizers. Under each topic two examples of applications are provided, but by no means should applicants feel limited by these bootstrapping ideas.

1) Applications that search, query, discover or generally mine iDigBio data

New visualizations to study evolution of biodiversity can be created by mining the iDigBio in space, time and taxa. The search API allows aggregations and statistics to be computed efficiently, which could be displayed in graphs, trees, maps, videos or any other visual method.
Patterns in iDigBio data can be discovered applying machine learning algorithms. Since numerous such studies are possible, it becomes important to make use efficient use of filters through search.

2) Applications intended for provision of data to iDigBio and ingestion of data from sources other than typical providers

Detailed information of host relationships is a classic example of data captured in collections, but not fully standardized into vocabularies or ontologies. Applications that can more flexibly accept new sets of definitions can facilitate exchange of data being digitized.
Media files are often generated before specimen data are digitized, creating the challenge of balancing the need to share rapidly the media and later relating to the specimen data. Applications that support this workflow in a gradual and friendly manner are needed to speed up the digitization process.

3) Applications that combine 1) and 2), possibly annotating, modifying or relating iDigBio data with themselves or other data

As technologies and knowledge evolve, the need to update data is unavoidable. These updates can be performed programmatically as changes are triggered by users with the need to follow new vocabularies, or new specifications, and can be propagated through the network of data providers and aggregators. Applications that can exchange the information and keep changes as annotations or versions are needed.
In certain situations, related data can complement each other. For example, when herbaria sheets are collected as a set, not all sheets get the same level of information in the label and physical annotations may vary. Methods for searching related data and proposing updates can lead to more complete information.

4) Applications that add functionality to iDigBio services, possibly by running services elsewhere and integrating iDigBio data and processing tools

Data quality processes have been developed by many projects simultaneously, including services that work as authorities for certain type of data. Combining these services into a single workflow can minimize development time by fostering reuse of existing services.
Many properties of the media are currently not captured in a machine searchable manner. Applications that can process iDigBio media files to extract features of specimens would facilitate future discoveries by increasing the amount of information available.

5) Contributions to re-usable libraries in different languages (e.g., R, PHP, Java, Python and JavaScript) and applications that use those libraries.

A basic R library has been developed to use the iDigBio search API. This library can be expanded to make use of other types of iDigBio APIs.
The iDigBio image ingestion appliance has been developed in Python. Repackaging it as a library or replicating its functionality in other languages allows its reuse by other applications.

Application

Who may apply

Anyone may apply regardless of position or level of experience. Women and underrepresented minorities are especially encouraged to apply. We expect a mixture of programmers, scientists, and programmer-scientists who have a specific interest in leveraging iDigBio resources. Applicants should understand basic concepts of biodiversity informatics. Those who consider themselves non-programmers should be able to talk about code and discuss design ideas with a programmer. All applicants are encouraged to participate in collaborative idea development prior to applying (below).

Idea development

Hackathon projects are planned and executed by teams of 3 to 7 people. The ideas come from you, the participants: rather than being decided in advance, teams and projects emerge by a guided self-organization process on the first day of the hackathon. This process is greatly aided if participants have sifted through ideas and identified potential team-mates in advance. We invite you to sign in to our online repository to post ideas, and offer comments on others’ ideas. Participation in this process is not required, but is strongly encouraged.

Application process

Online applications (https://ufl.qualtrics.com/SE/?SID=SV_6Wr1womZuY7O5o1) will be considered through February 28, 2015, and invited applicants will be notified by March 9th. The application consists of contact information, a statement describing how your training and experience prepare you to participate successfully, and a short description of a potential project idea.

Travel support

Travel support is available, and the logistics for arranging travel and housing will be communicated to accepted applicants.

Open-Source Requirement

All software produced at the hackathon will have an open source license, and will be developed in the open, with code on the GitHub public repository from the beginning of the event. Other non-software material developed during this hackathon will be openly licensed under CC-BY.

For more information

If you have questions about any aspect of this call for participation, feel free to contact:

Andréa Matsunaga (ammatsun@ufl.edu)
José Fortes (fortes@ufl.edu)
Renato Figueiredo (renatof@ufl.edu)