by Deb Paul, with Libby Ellwood and Greg Riccardi as Eds.
Four iDigBio staff members, Greg Riccardi, Pam Soltis, Libby Ellwood, and Deborah Paul (that’s me) headed to Jönköping, Sweden this year for the biodiversity informatics standards meeting of the year, BIS (TDWG) 2014, October 27 - 31. The timely conference theme: Applications and Data Standards for Sustaining Biodiversity suited iDigBio well.
Data Use. Pam attended both the iDigBio Summit and headed to TDWG 2014 later for her presentation: Linking Diverse Data in Studies of Plant Evolution: Case Studies in Progress, in the Phylogenetics Symposium: Biodiversity data synthesis and discovery at a Tree of Life scale convened by Nico Cellinese and Hilmar Lapp. See more talks in their symposium to get a glimpse of how big data and software are making novel biodiversity research possible, and insights into what's needed to support this work into the future.
Collaboration and Identifiers. Greg and I headed over to Stockholm for two ancillary, pre-TDWG 2014 meetings: 1) the
Distributed Open-Source Development of Collection Management Systems (DINA Consortium) meeting, and 2) a
Persistent Identifier Summit convened by Nico Cellinese, John Deck, and Rob Guralnick to foster the development of stakeholder-derived "community guidelines for the adoption of persistent identifiers relevant to biodiversity informatics." Stay tuned for more about this initiative. Jump over to
the Dina Project to learn more about their initiative to build “an open-source Web-based information management system for natural history data.”
TDWG 2014 plenary speaker, Arturo H. Ariño, presented
Shades of Grey: (Yet) Another Role for BIS-TDWG, a compelling story about finding out that you don't necessarily have what you think you have (in your collection), and what you learn about dark and grey data along the journey to digitize specimen and related data. Will we capture the data before it's not digitizable anymore? Arturo compels us to work together and sees a key role for TDWG in this effort.
Digitization. Elspeth Haston (
RBGE), Deb Paul (iDigBio), and Vince Smith (
NHM) collaborated to convene a
TDWG symposium: Access to Digtisation Tools and Methods. Our talks focused on software and strategies open to everyone (open source, open access). We are always looking for ways to encourage international collaboration efforts for digitization. While originally planned with the Nairobi, Kenya TDWG meeting location in mind, talks were timely in Jönköping too. If you could not attend, or would like to hear a talk again or point one out to your colleagues, you can find out more about all the talks, and
access the recordings and
presentations on the
Symposium Wiki. We were thrilled to see the momentum and collaborative development of so many strategies for capturing data and images faster, and more accurately. Projects highlighted not only use of the data, and data visualization, but also important key issues about public participation in our efforts, and the challenges of storage of the digitized materials, the national resources, we are creating.
Sentiment expressed by a digitisation symposium participant: At the national level, countries need to recognize that collections specimens, and collections data are national resources with intrinsic value deserving of sustainable support for the benefit of the nation and its citizens.
And speaking of
Digitarium, they've been busy working on how to speed up transcription and improve the resulting data quality. Discover more about their services and
DigiWeb, their new dynamic web-based environment for involving all transcription participants in the management and cleaning of transcribed data for best data quality. At the same time, they've been experimenting with a conveyor belt ("imaging line") strategy to capture insect (beetles, in case you were wondering) specimen images and data. Want to know their results? (Spoiler alert: 5 to 10 times faster than other current practices!) Check out these brand new papers for details:
High-performance digitization of natural history collections: Automated imaging lines for herbarium and insect specimens. Riitta Tegelberg, Tero Mononen, and Hannu Saarenmaa. Taxon 5 Dec 2014. http://www.ingentaconnect.com/content/iapt/tax/pre-prints/content-1400102 NOTE: this is a preliminary hot-off-the press version that Taxon states: "will no longer be available online once replaced by the final version." So, depending on when you read this, you might have to look up the paper. DOI http://dx.doi.org/10.12705/636.13
DigiWeb - a workflow environment for quality assurance of transcription in digitization of natural history collections. Tero Mononen, Riitta Tegelberg, Mira Sääskilahti, Markku A. Huttunen, Marko Tähtinen, Hannu Saarenmaa. Biodiversity Infomatics Vol 9, 2014, pp. 18-29
https://journals.ku.edu/index.php/jbi/article/view/4748
Our experience in streamlining transcription is that much can be gained by using shared lookup services and big pools of data that are already available...Transcription in isolation is a waste of time, and thus more web services are needed.
If you'd like to continue the digitization conversation, you can find Elspeth, Vince, and myself on twitter @emhaston @vsmithuk @idbdeb
Capturing Inventory level information about collections as a step in object to image to data workflows.
ZooSphere - Development of a software for automated spheric image capturing and interactive 3D visualization of biological collection objects.
Data Discovery and Doer Happiness: (Many) Uses for Optical Character Recognition (OCR) Output.
Managing Digitization Projects with BIOSPEX. Find out how you, a collection or data manager, might easily manage crowd-sourcing projects in the near future.
ENVIRONMENTS-EOL: identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life.
Some other sessions and workshops at TDWG included Biodiversity Data Quality: issues, methods, and tools, and two meetings that both discussed Biodiversity Informatics Skills needed in the research community. One was a workshop: Effective Biodiversity Data Management Training that gave Deb a chance to talk about Data Carpentry, and the second was a session focused on the formation of a TDWG Biodiversity Informatics (BI) Training interest group. With big data, the skills needed to manage so much data, so many different data types, across many platforms, also change. How do we effectively figure out what to teach, at what level to teach it, and how to impart the importance of remaining agile and open to learning new skills? Lots of groups across the planet are asking these questions now, and developing methods to address data and computational literacy gaps.
Where's next year’s TDWG? – Nairobi, Kenya. Organizers plan a week-long biodiversity informatics training before TDWG 2015. Possible dates being considered are for late September -- but we'll have to wait and see -- they are not set yet.
Of course, lots more went on at BIS (TDWG) 2014! RDF, Bioinformatics, Cyberinfrastructure, Scientific Workflows, Annotations, Darwin Core tutorials and discussions (new terms coming soon, and one or two going away),... You can
check out the complete program, and see the
talks for yourself.
See more photos from the event at the TDWG 2014 Facebook photo album.