1,650
edits
No edit summary |
|||
(46 intermediate revisions by 10 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:Transcription Hackathon]] | [[Category:Transcription Hackathon]][[Category:Workshop]] | ||
'''Notes from Nature/iDigBio Hackathon to Further Enable Public Participation in the Online Transcription of Biodiversity Specimen Labels''' | '''Notes from Nature/iDigBio Hackathon to Further Enable Public Participation in the Online Transcription of Biodiversity Specimen Labels''' | ||
December 16–20 at the University of Florida, Gainesville | December 16–20 at the University of Florida, Gainesville | ||
{| class="wikitable" style="float:right;" | |||
! colspan="2" style="background:#D58B28;width:200px;font-size:10pt" | Digitizing the Past and Present for the Future | |||
|- | |||
| colspan="2" style="text-align:center;font-size:7pt" | <!--YOU CAN INSERT A NEW IMAGE FOR THE LOGO BETWEEN THE COLON AND THE PIPE-->[[Image:IDigBio Logo RGB.png|center|300px|iDigBio Logo RGB.png]]<br /> | |||
|- | |||
!colspan="2" style="background:#D58B28;text-align:center;font-size:9pt" | Quick Links for Transcription Hackathon Workshop | |||
|- | |||
|[https://docs.google.com/document/d/1TyluwM1rMcq7O_nidy8CLJFMW4FrOPjsHkrLVho5cVU/edit?usp=sharing Transcription Hackathon Workshop Agenda] | |||
|- | |||
|[https://www.idigbio.org/biblio?f%5bkeyword%5d=274 Transcription Hackathon Workshop Biblio Entries] | |||
|- | |||
|[https://www.idigbio.org/content/citscribe-hackathon Transcription Hackathon Workshop Report] | |||
|} | |||
== Agenda and Logistics == | == Agenda and Logistics == | ||
*[https://www.idigbio.org/content/hackathon-enable-public-participation-online-transcription-biodiversity-specimen-labels Hackathon Advertisement] | *[https://www.idigbio.org/content/hackathon-enable-public-participation-online-transcription-biodiversity-specimen-labels Hackathon Advertisement] | ||
*[ | *[[Transcription Hackathon Draft Agenda| Agenda]] | ||
*[ | *[[Media:IDigBio_Public_Participation_in_Digitization_Workshop_Logistics_4Dec13.pdf|Logistics Document]] | ||
*[ | *[[Media:Transcription_Hackathon_Participant_List_23Dec13.pdf|Participants List]] | ||
*[http://idigbio.adobeconnect.com/citscribe AdobeConnect room for | *[http://idigbio.adobeconnect.com/citscribe AdobeConnect room for collaboration after the hackathon, then for connection to the workshop remotely] (Send an email to Austin Mast, if you'd like to use the room for additional collaboration after the hackathon.) | ||
== Media == | |||
*[https://www.facebook.com/media/set/?set=a.645283388848944.1073741833.215120891865198&type=1 Citscribe Hackathon Facebook Album] | |||
*Twitter stuff: @iDigBio @NfromN hashtag #CITScribe | |||
==Report== | |||
*[https://www.idigbio.org/content/citscribe-hackathon Citscribe Hackathon Report] | |||
== Coordination == | == Coordination == | ||
*[ | *[[Transcription Hackathon Interoperability Planning| Interoperability Track]] | ||
*[ | *[[Transcription Hackathon OCR Integration Planning| OCR Integration Track]] | ||
*[ | *[[Transcription Hackathon Reconciliation of Replicates Planning| QA/QC and Reconciliation of Replicates Track]] | ||
*[ | *[[Transcription Hackathon User Engagement Planning| User Engagement Track]] | ||
*[https://docs.google.com/document/d/1ns_10ZMBRMOZX1DzfRBdALjhKtr_x8yYaAZLJH6YHyI/edit?usp=sharing Participants Interest in Tracks] | *[https://docs.google.com/document/d/1ns_10ZMBRMOZX1DzfRBdALjhKtr_x8yYaAZLJH6YHyI/edit?usp=sharing Participants Interest in Tracks] | ||
Line 28: | Line 49: | ||
*Cody Meche, UF: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/Agile.pdf Agile Scrum] | *Cody Meche, UF: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/Agile.pdf Agile Scrum] | ||
*Julie Allen, INHS: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/Allen.pdf Gamification] | *Julie Allen, INHS: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/Allen.pdf Gamification] | ||
*Edward Gilbert, Symbiota Developer: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/Symbiota_2013-12-16.pdf Symbiota: a specimen-based biodiversity portal platform] | |||
*Deborah Paul, iDigBio Augmenting OCR WG: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/aOCRLightning.pptx What's new in using OCR output in a Citizen Science Workflow] | *Deborah Paul, iDigBio Augmenting OCR WG: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/aOCRLightning.pptx What's new in using OCR output in a Citizen Science Workflow] | ||
*Andrea Matsunaga, iDigBio: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/MatsunagaiDigBioCrowdsourcingHackathon2013.pdf Herbarium Labels Transcription Crowdsourcing & OCR] | *Andrea Matsunaga, iDigBio: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/MatsunagaiDigBioCrowdsourcingHackathon2013.pdf Herbarium Labels Transcription Crowdsourcing & OCR] | ||
*Joshua Campbell, iDigBio: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/CampbelliDigBioCrowdsourcingHackathon2013.pdf Herbarium Labels Transcription Crowdsourcing Consensus] | *Joshua Campbell, iDigBio: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/CampbelliDigBioCrowdsourcingHackathon2013.pdf Herbarium Labels Transcription Crowdsourcing Consensus] | ||
*Yonggang Liu, ACIS iDigBio: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/Yonggang_image_ingestion_appliance.pdf iDigBio Image Ingestion Appliance] | |||
*Paul Kimberly, Smithsonian: [https://www.idigbio.org/sites/default/files/workshop-presentations/citscribe/SI_Center.pdf Smithsonian Transcription Center] | |||
*William Ulate, Missouri Botanical Garden: [[Media:Purposeful_Gaming_BHL_Dec_2013.pdf|Purposeful Gaming and BHL]] | |||
== Development Resources == | == Development Resources == | ||
* [https://github.com/idigbio-citsci-hackathon GitHub organization for this Transcription Hackathon] | * [https://github.com/idigbio-citsci-hackathon GitHub organization for this Transcription Hackathon] | ||
* 4 existing crowdsourcing datasets from Notes From Nature. Datasets contain transcriptions of different types of collections labels. Read more [https://docs.google.com/document/d/1UCz5WblnNIvqBErX-XeWgS9mf69qFhycHqntQOGnPp4/edit?usp=sharing here]. The datasets were shared only with the | * 4 existing crowdsourcing datasets from Notes From Nature. Datasets contain transcriptions of different types of collections labels. Read more [https://docs.google.com/document/d/1UCz5WblnNIvqBErX-XeWgS9mf69qFhycHqntQOGnPp4/edit?usp=sharing here]. The datasets were shared only with the hackathon participants through dropbox once anonymized. It will be made public when we get a definitive approval from NfN. | ||
** Calbug dataset | ** Calbug dataset | ||
** Herbarium labels—The filenames with "USAM_" represent a nearly complete set of recent transcriptions from a collection (the University of South Alabama Herbarium), four replicates for most specimens (I think). | ** Herbarium labels—The filenames with "USAM_" represent a nearly complete set of recent transcriptions from a collection (the University of South Alabama Herbarium), four replicates for most specimens (I think). | ||
** Macrofungi labels | ** Macrofungi labels | ||
** Ornithological dataset | ** Ornithological dataset | ||
* For those interested in experimenting with the images that have been used for public participation in transcription: | * For those interested in experimenting with the images that have been used for public participation in transcription: | ||
Line 54: | Line 74: | ||
** Vagrant script to build a VM with Notes From Nature web interface: https://github.com/idigbio-citsci-hackathon/nfn-vagrant | ** Vagrant script to build a VM with Notes From Nature web interface: https://github.com/idigbio-citsci-hackathon/nfn-vagrant | ||
** Go to the location of the vagrant script and type "vagrant up" in your command prompt to build a VM with Note from Nature running on localhost:9294. | ** Go to the location of the vagrant script and type "vagrant up" in your command prompt to build a VM with Note from Nature running on localhost:9294. | ||
** API Calls | |||
*** https://api.zooniverse.org/projects/notes_from_nature/groups/ | |||
*** https://api.zooniverse.org/projects/notes_from_nature/groups/5170103b3ae74027cf000002 | |||
* [[CYWG iDigBio Image Ingestion Appliance]]: | * [[CYWG iDigBio Image Ingestion Appliance]]: | ||
Line 60: | Line 83: | ||
* Gold Images from aOCR Hackthon: | * Gold Images from aOCR Hackthon: | ||
** CSV file with URLs for the Images on iDigBio beta server (Uploaded by Image Ingestion Appliance): [http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/ent.csv ent], [http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/herb.csv herb],[http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/lichens.csv lichens]. | ** CSV file with URLs for the Images on iDigBio beta server (Uploaded by Image Ingestion Appliance): [http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/ent.csv ent], [http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/herb.csv herb],[http://www.acis.ufl.edu/~yonggang/idigbio/recordset/gold/lichens.csv lichens]. | ||
* Code from the aOCR Hackthon: | * Code from the aOCR Hackthon: | ||
** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription. | ** HandwritingDetection (https://github.com/idigbio-aocr): an algorithm that separates images into sets with no handwriting, little handwriting (mostly text typed or printed), lots of handwriting, based on the noise generated by the OCR software. [http://manuscripttranscription.blogspot.com/2013/02/detecting-handwriting-in-ocr-text.html Read more at Ben's blog]. This could be used to rank which images are in more need for human transcription. | ||
** Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names): link to be provided by aOCR? | ** Dictionaries to improve crowdsourcing consensus (e.g., names of collectors, scientific names): link to be provided by aOCR? | ||
*** (Some [http://webprojects.huh.harvard.edu/authority_files/ botantists]: RDF and tab-delimited.) | *** (Some [http://webprojects.huh.harvard.edu/authority_files/ botantists]: RDF and tab-delimited.) | ||
* Hi all - (Paul Flemons). | * Hi all - (Paul Flemons). | ||
**I have uploaded a number of files: | **I have uploaded a number of files: | ||
*** | ***[[Media:OpenRefine_procedures_for_EVENTS_1212a.pdf|a description of Open Refine procedures used for matching BVP fields to EMu EVENTS]] | ||
*** | ***[[Media:Preparing_BVP_data_for_import_into_EMu_-_process_1212a.pdf|Detailed process of preparing BVP data for EMu]] | ||
*** | ***[[Media:Preparing_BVP_data_for_import_into_EMu_-_overview.pdf|Overview of preparing BVP data for EMu]] | ||
*** | ***[[Media:VisioDiagramofProcess.JPG|Diagram of the process of preparing data from BVP for EMu]] | ||
*From Steve Raden: some background on Zooniverse's design | *From Steve Raden: some background on Zooniverse's design | ||
Line 78: | Line 99: | ||
**http://arfon.org/how-the-zooniverse-works-keeping-it-personal | **http://arfon.org/how-the-zooniverse-works-keeping-it-personal | ||
**http://arfon.org/how-the-zooniverse-works-the-domain-model | **http://arfon.org/how-the-zooniverse-works-the-domain-model | ||
== Hackathon Products == | |||
*Brainstorming Documents from the Thursday Mix Ups | |||
**Group 1 [https://docs.google.com/document/d/1aMVXG3GzTznYBs9R6lQ13Tny_CyBIcMJ_LPLj1zlz7U/edit Mix Up Discussion Summary] (google doc) | |||
**Group 2 | |||
**Group 3 [https://docs.google.com/document/d/1B6kvLFw_Mzhrsx4xPgJm29w5j75TpSihdDXFauyt2YM/edit MixUp google doc] | |||
**Group 4 [https://docs.google.com/document/d/1-Z-oiwjZZiCh-nVHGZHhUBphY5Z-rcJBnZzn6vnJtBs/edit Mix Up Discussion Summary] (google doc) | |||
*Some groups used the Coordination pages above to summarize products | |||
*Group 1 [[Target File Format]] |
edits