Managing Natural History Collections Data for Global Discoverability

Managing Natural History Collections Data for Global Discoverability
[[Image:\|center\|500px\|]]
Quick Links for Managing Natural History Collections Data for Global Discoverability
Managing Natural History Collections Data for Global Discoverability Agenda
Managing Natural History Collections Data for Global Discoverability Biblio Entries
Managing Natural History Collections Data for Global Discoverability Report

This wiki supports the Managing Natural History Collections (NHC) Data for Global Discoverability Workshop and is in development. This workshop is sponsored by iDigBio and hosted by the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new Alameda space on September 15-17, 2015. It is the fourth in a series of biodiversity informatics workshops held in fiscal year 2014-2015. The first three were 1)Data Carpentry, 2)Data Sharing Data Standards and Demystifying the IPT, and 3)Field to Database (March 9 - 12, 2015).

General Information

Description and Overview of Workshop. Are you:

actively digitizing NHC data and looking to do it more efficiently?
getting ready to start digitizing NHC data and looking to learn some new skills to enhance your workflow?
digitizing someone else’s specimens (e.g., as part of a research project)?
finding yourself in the role of the museum database manager (even though it may not be your title or original job)?
someone who has a private research collection who wishes to donate specimens and data to a public collection?

The theme of the "Collections Data for Global Discoverability" workshop is ideally suited for natural history collections specialists aiming to increase the "research readiness" of their biodiversity data at a global scale. Have you found yourself in situations where you need to manage larger quantities of collection records, or encounter challenges in carrying out updates or quality checks? Do you mainly use spreadsheets (such as Excel) to clean and manage specimen-level datasets before uploading them into your collections database? The workshop is most appropriate for those who are relatively new to collections data management and are motivated to provide the global research community with accessible, standards- and best practices-compliant biodiversity data.

During the workshop essential information science and biodiversity data concepts will be introduced (i.e., data tables, data sharing, quality/cleaning, Darwin Core, APIs). Hands on data cleaning exercises using spreadsheet programs and readily usable and free software will be performed. The workshop is platform independent, and thus will not focus on the specifics of one or the other locally preferred biodiversity database platforms, instead addressing fundamental themes and solutions that will apply to a variety of database applications.

To Do For You: Pre-reading materials [Darwin Core Data Standard, Best Practices for Data Management,...]

Updates will be posted to this website as they become available.

Planning Team

Collaboratively brought to you by: Katja Seltmann (AMNH - TTD-TCN), Amber Budden (DataONE), Edward Gilbert (ASU - Symbiota), Nico Franz (ASU), Mark Schildhauer (NCEAS), Greg Riccardi (FSU - iDigBio), Reed Beaman (NSF), Cathy Bester (iDigBio), Shari Ellis (iDigBio), Kevin Love (iDigBio), Deborah Paul (FSU - iDigBio)

About

Instructors (iDigBio): Katja Seltmann, Amber Budden, Edward Gilbert, Nico Franz, Mark Schildhauer, Greg Riccardi, Deborah Paul

Skill Level: We are focusing our efforts in this workshop on beginners.

Where and When: Tempe, AZ at the Arizona State University (ASU) School of Life Sciences Natural History Collections, Informatics & Outreach Group in their new Alameda space, September 15 - 16, 2015

Requirements: Participants must bring a laptop.

Contact (iDigBio Participants): Please email Deb Paul dpaul@fsu.edu for questions and information not covered here.

Twitter:

Tuition for the course is free, but there is an application process and spots are limited. [Apply here]

Software Installation Details

A laptop and a web browser are required for participants.
We use Adobe Connect extensively in this workshop. Please perform the systems test using the link below. Also, you will also need to install the Adobe Connect Add-In to participate in the workshop.

Adobe Connect Systems Test
- Note when you follow the link to install and perform the test, some software will install (but it doesn't look like anything happens). To check, simply re-run the test.

Agenda

Managing NHC Data Adobe Connect Room (to be linked - stay tuned)
Monday evening, September 14th: pre-workshop informal get-together at [to be decided], from [time to be decided].

Schedule - subject to change.

Course Overview - Day 1 - Tuesday September 15th
8:15-8:45	Check-in, name tags, log in, connect to wireless and Adobe Connect	All
8:45-9:00	Welcome, Introductions, Logistics, Intro to the Workshop	Deb Paul, iDigBio
9:00-9:15	Why this workshop?	Amber Budden & Deb Paul
09:15-9:35	General Concepts and Best Practices brief introduction to data modeling, the data lifecycle, and relational databases	(to be decided), Ed Gilbert and Amber Budden
9:35-9:55	Overview of Data standards Darwin Core, EML, Audubon Core, GGBN, DwC-A, Identifiers (GUIDs vs local)	Ed Gilbert, Deb Paul
10:00-10:30	Hands-on Exercise with Specimen Data Set with known mapping / standardization issues.	All
10:30-10:50	Break	all
10:50-11:30	Data Management Planning choosing a database, data flow, data backup, field-to-database, metadata	Amber Budden
11:30-12:00	Exercise DataONE Lesson 4: best practices for data entry and data manipulation	Amber Budden
12:00-1:00	Lunch
1:00-1:30	Images and media issues: a brief intro choosing a camera, issues across different database platforms, image submissions, linking images to occurrence records, batch processing	Ed Gilbert
1:30-1:50	Digitization workflows and process getting started, prioritization, specimen collecting, new database, integrating old data	Deb Paul, Ed Gilbert & Katja Seltmann
1:50-2:10	Common Workflows image to data, specimen to data, skeletal records, crowd-sourcing, OCR/NLP, georeferencing, metadata	Deb Paul, Ed Gilbert & Katja Seltmann
2:10-2:25	Optimization: Reviewing your own workflow common bottlenecks, documentation	Katja Seltmann, Deb Paul & Katja Seltmann
2:25-3:00	Hands-on exercise (to be decided)	tbd
3:00-3:20	Break
3:20-3:50	Georeferencing Data (Georeferencing Workflow) visualization tools, when to georeference, best practices	Ed Gilbert
3:50-4:10	GEOLocate Exercise (May be DEMO) CoGe, GPS Visualizer, re-integration, qc	Ed Gilbert
4:40-5:30	Conversation, overview of day, preview for tomorrow...	All
(Optional Evening Activity?)
Course Overview - Day 2 - Wednesday September 16th
8:30-12:00	Desert Botanical Garden (DBG) Field Trip and Lunch
12:00-1:00	Lunch (at garden)
25min	Welcome Back and Intro to Data Quality inside the data-life-cycle, cost of data quality, quality vs completeness	Amber Budden, Ed Gilbert
15min	Data Cleaning where, when and how does it happen?, what kind of feedback to expect	Deb Paul & Katja Seltmann
20min	Data Cleaning - Quick exercise: Spot the snafus	Deb Paul & Katja Seltmann
25min	Data Cleaning - the details types of common errors and omissions, best practices strategies, feedback and annotation, error tracking, automation, policies and protocols	Deb Paul & Katja Seltmann
(25min)	25 extra minutes here on purpose - for discussion / break outs / unconference topics or demos	Deb Paul & Katja Seltmann
20min	Break
35min	Data Cleaning Exercise I better spreadsheet skills	Deb Paul & Katja Seltmann
25min	Data Cleaning Exercise II Open Refine, part I (facets, clustering)	Deb Paul & Katja Seltmann
4:40-5:00	Conversation, overview of day, preview for tomorrow...	Deb Paul & Katja Seltmann
Course Overview - Day 3 - Thursday September 17th
35min	Data Cleaning Exercise II Open Refine, part II (Using APIs, Taxonomic Name Resolution Services)	Deb Paul & Katja Seltmann
15min	(move this time to earlier slots above to make more time in data cleaning sections)	Deb Paul & Katja Seltmann
25min	Data Cleaning, Data Manipulation, and Visualization Tools (and Lessons) Review Kurator, GPS Visualizer, GEOLOcate, Google Fusion Tables, Notepad ++, Open Refine	Deb Paul & Katja Seltmann
30min	Identifiers	Greg Riccardi
20min	Break
1hr 20min	Break out groups TNRS,ECAT,QGIS,GEOLocate,CoGe,Data Cleaning: what is scripting? what is regex? examples in Open Refine, possibly in Symbiota, your own data issues / requests	All
12:00-1:00	Lunch
1:00-1:25	Data Publishing: in the context of the data life cycle benefits, concerns, aggregators, citation, attribution	tbd
1:30-2:15	iDigBio Portal Exercise: Using iDigBio portal to do something with data that can’t be done within a local system, Ex. PhyloJive	tbd
2:15-2:45	Copyright / Intellectual Property	tbd
3:00-3:20	Break
3:20-4:20	Second round of break-out groups DWC-A publishing Exercise (or DEMO): using IPT instance OR Symbiota DwC-A mapping and publishing exercise
4:20-4:40	Closing topics a greater network, the global landscape, next steps	Katja Seltmann & Nico Franz
4:40-5:10	Participant 3 minute Presentations (1 slide)
5:10 - 5:30	Review Data Life Cycle we’ve walked through. discussion, survey, next steps, and conclusions	all

Logistics

link to local area activities / restaurants
logistics for hotel / food / per diem / map
Workshop Calendar Announcement
Paticipant List

Adobe Connect Access

Adobe Connect will be used to provide access for everyone and for remote folks to listen to the lectures.

Workshop Documents, Presentations, and Links

Google Collaborative Notes
links to any presentations (like power points) here
Darwin Core Terms
Participant Presentations

Pre-Workshop Reading List

Links beneficial for review

Workshop Recordings

Day 1

8:30am-10:15m
10:45am-11:00am
11:15am-12pm
1:00pm-2:30pm
3:00-5:00pm

Day 2

8:30am-10:15m
10:45am-11:00am
11:15am-12pm
1:00pm-2:30pm
3:00-5:00pm

Day 3

8:30am-10:15m
10:45am-11:00am
11:15am-12pm
1:00pm-3:30pm
3:30-5:00pm

Resources and Links

Got a favorite resource - a book?, a website? to share with your classmates?
Canadensys Introduction to Darwin Core
Experts Workshop on the GBIF Integrated Publishing Toolkit (IPT) v. 2
- Summary resources available from IPT workshop held the 20-22 June in Copenhagen, Denmark.
Example of a Data Paper: Yves Bousquet, Patrice Bouchard, Anthony E. Davies, and Derek S. Sikes. 2013. Data associated with CHECKLIST OF BEETLES (COLEOPTERA) OF CANADA AND ALASKA. SECOND EDITION. DATA PAPER. ZooKeys. http://dx.doi.org/10.5886/998dbs2a
For more Data Papers: http://biodiversitydatajournal.com/
Darwin Core extension for germplasm Dag Endresen (on slideshare)
Data exchange standards, protocols and formats relevant for the collection data domain within the GFBio network
- Check out this link if you'd like to see one example page about the multitude of current standards in use in the Natural History Collections and Culture Collections world; this example is from the german federation for the curation of biological data (gfbio).
GBIF Darwin Core Archive, How-to Guide (download the pdf).
GBIF Metadata Profile Reference Guide (download the pdf).
Darwin Core Quick Reference Guide (download the pdf).
Guide on how to use the BioVeL portal (includes a section on OpenRefine).
lynda.com is a useful collection of tutorials on various IT and other resources - e.g. on relational databases
- For example see relational database fundamentals
You want to share genetic sequence data for your specimens? Are the sequences in a database like GenBank? You can use dwc:associatedSequences field to share links to the sequences and metadata about them. Note you can soon use the Material Sample Core, and share more complex genomic data using the GGBN extensions, and also use an extension to share the specimen information from which the samples were taken.

Digitization Training Workshops Wiki Home

Managing Natural History Collections Data for Global Discoverability

Contents

General Information

Planning Team

About

Software Installation Details

Agenda

Logistics

Adobe Connect Access

Workshop Documents, Presentations, and Links

Pre-Workshop Reading List

Workshop Recordings

Day 1

Day 2

Day 3

Resources and Links

Digitization Training Workshops Wiki Home

Navigation menu

Managing Natural History Collections Data for Global Discoverability

General Information

Planning Team

About

Software Installation Details

Agenda

Logistics

Adobe Connect Access

Workshop Documents, Presentations, and Links

Pre-Workshop Reading List

Workshop Recordings

Day 1

Day 2

Day 3

Resources and Links

Digitization Training Workshops Wiki Home

Navigation menu

Search