Developing an Ontology for Insect Natural History Data
Overview
What can we say about the natural history of any particular insect species? Is it an herbivore? A parasitoid? How do we know? The answers to these questions depend on data about insect natural history from observations of insects in the field or in the lab (which can coincide with collecting specimens). Currently, such data are typically widely scattered and difficult to discover and analyze. This is due, in part, to the difficulty of aggregating natural history data in centralized databases. Data from natural history observations tend to be highly heterogeneous and can vary drastically in detail, granularity, and content. Aggregating such data without information loss is a challenge. The overarching goal of this workshop is to take the first steps toward developing a robust, usable ontology for insect natural history data (observations). This ontology will provide computable semantics for insect natural history observations, which will, in turn, facilitate rich, automated, and truly reproducible data integration and aggregation.
Note that the intent is to develop a general model for insect natural history data at the relatively low level of observation/data collection. Thus, the ontology should not be designed with a single, narrow technological or scientific application in mind. Insect natural history data are useful in many contexts, including applied entomology, ecological studies, and phylogenetic comparative analyses, and a well-designed ontology should result in data sets that are broadly useful across all these fields.
Workshop Venue, etc.
The workshop will be held May 30, 2018, through June 1, 2018. The calendar entry is here: https://www.idigbio.org/content/developing-ontology-insect-natural-history-data
The workshop will be held in the 105 Classroom Building on the north side of the University of Florida campus. Here is a campus map showing the location of the building: https://campusmap.ufl.edu/#/index/0105
For those of you staying at the Holiday Inn, here are walking directions from the hotel: https://tinyurl.com/yagoab9j
We will meet in Room 310, on the third floor of the building.
Workshop Goals and Outcomes
- Assembling example data for developing and testing the ontology. The example data will be drawn primarily from natural history specimens, with an emphasis on label data available from iDigBio records. Additional example data from literature records and direct observation will also be gathered. Result: A broad set of example data that covers all of the major insect orders (Hemiptera, Coleoptera, Diptera, Lepidoptera, Hymenoptera) and covers the major “kinds” of natural history data commonly gathered for these orders as well as a broad spectrum of informational detail.
- Defining the scope of the ontology and associated data standards. This will initially require identifying, in detail, the categories of natural history information and information acquisition methodologies to include in the ontology. Once these decisions are made, further scoping decisions will center on the additional kinds of information that need to be included, such as environmental context and taxonomy. Result: A document describing 1) the kinds of natural history information to include in the ontology with simple examples; 2) information acquisition methodologies to include in the ontology, again with examples; and 3) relevant context information to include in the ontology, with examples.
- Identifying candidate terms and concepts to include in the ontology. We will use example data, ontology scope information, and natural language processing (NLP) techniques on source texts to identify candidate terms and concepts for inclusion in the ontology. We will then evaluate which terms should be included in the final domain model and assess how the terms are related to one another. Result: A set of candidate ontology terms and concepts and preliminary relationships among them.
- Evaluating existing ontological resources. To ensure interoperability and prevent duplication of effort, we will examine available ontologies for entities that we can reuse directly. We will especially focus on ontologies that are part of the The Open Biological and Biomedical Ontologies (OBO) Foundry (http://obofoundry.org/). Result: A list of candidate entities from existing ontologies that could be reused for the insect natural history data ontology.
- Writing competency questions to test the utility of the ontology and identifying users and use cases. We will develop a set of natural language (i.e., informal) competency questions to more precisely define ontology requirements. As a complement to the competency questions, we will also identify likely user groups and use cases for the ontology. Result: A set of unambiguous, precisely stated competency questions that reference likely ontology terms whenever possible, and a list of potential ontology (and natural history data) users and use cases to accompany the competency questions.
- Developing a plan for post-workshop ontology development. B. Stucky will have primary responsibility for most technical development work, but it will be important to have mechanisms and plans for continued engagement of the workshop participants and other interested persons, for reviewing and testing ontology entities, and for disseminating results. We might also discuss ideas for building natural history data resources. Result: An actionable plan for post-workshop ontology development, review, testing, and publication, and possibly, preliminary plans for developing data resources.
- Planning a short workshop report manuscript for publication. I (B. Stucky) would like to prepare a relatively short workshop report manuscript for publication as soon as possible after the workshop. We will not attempt to actually write the manuscript at the workshop (too time consuming, probably), but we should work on outlining and discussing venue. Result: A detailed outline of the workshop report manuscript, and a plan for completion and publication venue.
(Tentative) Workshop Agenda and Schedule
This is a proposed workshop schedule. It is still somewhat in flux, and I fully expect that we will need to adjust the schedule as we go, because it is quite difficult to predict how much time we’ll need for any one activity.
Wednesday, May 30
Time | Activities |
---|---|
8:30-9:00 AM | Breakfast provided at meeting location (room 310 of the 105 Building) |
9:00-9:30 AM | Welcome, logistic details, participant introductions |
9:30-10:30 AM | Workshop introduction: Workshop scope, goals, and a conceptual introduction to ontologies, knowledge representation, and data reasoning (B. Stucky); description of and instructions for example data assembly task; group assignments |
10:30-10:45 AM | Break (coffee, etc. provided) |
10:45-12:00 PM | Assembling example data (work in groups, by insect order) |
12:00-1:00 PM | Lunch (provided) |
1:00-2:30 PM | Assembling example data (work in groups, by insect order) |
2:30-3:00 PM | Data assembly wrap-up; Description of and instructions for example data analysis and ontology scoping |
3:00-3:15 PM | Break (snacks provided) |
3:15-5:00 PM | Analysis of example data and ontology scoping (continue working in groups) |
6:00 PM | Group dinner (location TBD) |
Thursday, May 31
Time | Activities |
---|---|
8:30-9:00 AM | Breakfast provided at meeting location (room 310 of the 105 Building) |
9:00-10:30 AM | Analysis of example data and ontology scoping (finish small group work) |
10:30-10:45 AM | Break (coffee, etc. provided) |
10:45-11:00 AM | Introduction to Noctua software (J. Balhoff) |
11:15-12:00 PM | Analysis of example data and ontology scoping (group reports and high-level synthesis) |
12:00-1:00 PM | Lunch (provided) |
1:00-1:15 PM | Instructions for identifying ontology terms, concepts, and design patterns; group assignments |
1:00-3:00 PM | Identifying ontology terms, concepts, and design patterns (work in groups) |
3:00-3:15 PM | Break (snacks provided) |
3:15-4:30 PM | Identifying ontology terms, concepts, and design patterns (work in groups) |
Evening | Trip to Sweetwater Wetlands (if interest and weather permitting) |
Friday, June 1
Time | Activities |
---|---|
8:30-9:00 AM | Breakfast provided at meeting location (room 310 of the 105 Building) |
9:00-10:30 AM | Identifying ontology terms, concepts, and design patterns (work in groups) |
10:30-10:45 AM | Break (coffee, etc. provided) |
10:45-12:00 PM | Identifying ontology terms, concepts, and design patterns (group reports and synthesis) |
12:00 PM | Group photo |
12:00-1:00 PM | Lunch (provided) |
1:00-1:30 PM | Introduction to relevant ontologies and related technologies (PATO, BCO, ENVO, SEPIO, etc.) (R. Walls, M. Yoder, B. Stucky) |
1:30-1:40 PM | Brief introduction to ontology competency questions (B. Stucky) |
1:40-3:00 PM | Writing competency questions, identifying users and use cases (work in groups) |
3:00-3:15 PM | Break (snacks provided?) |
3:15-4:00 PM | Writing competency questions, identifying users and use cases (group reports and synthesis) |
4:00-5:00 PM | Wrap-up, outline workshop report paper, and post-workshop development plans |
Workshop Report
https://www.idigbio.org/content/workshop-report-developing-ontology-insect-natural-history-data
Publication
Stucky, B.J.; Balhoff, J.P.; Barve, N.; Barve, V.; Brenskelle, L.; Brush, M.H.; Dahlem, G.A.; Gilbert, J.D.J.; Kawahara, A.Y.; Keller, O.; Lucky, A.; Mayhew, P.J.; Plotkin, D.; Seltmann, K.C.; Talamas, E.; Vaidya, G.; Walls, R.; Yoder, M.; Zhang, G.; & Guralnick, R. Developing a vocabulary and ontology for modeling insect natural history data: example data, use cases, and competency questions. Biodiversity Data Journal, 7, e33303 (2019). DOI:10.3897/BDJ.7.e33303
Insects are possibly the most taxonomically and ecologically diverse class of multicellular organisms on Earth. Consequently, they provide nearly unlimited opportunities to develop and test ecological and evolutionary hypotheses. Currently, however, large-scale studies of insect ecology, behavior, and trait evolution are impeded by the difficulty in obtaining and analyzing data derived from natural history observations of insects. These data are typically highly heterogeneous and widely scattered among many sources, which makes developing robust information systems to aggregate and disseminate them a significant challenge. As a step towards this goal, we report initial results of a new effort to develop a standardized vocabulary and ontology for insect natural history data. In particular, we describe a new database of representative insect natural history data derived from multiple sources (but focused on data from specimens in biological collections), an analysis of the abstract conceptual areas required for a comprehensive ontology of insect natural history data, and a database of use cases and competency questions to guide the development of data systems for insect natural history data. We also discuss data modeling and technology-related challenges that must be overcome to implement robust integration of insect natural history data.