Transcription Hackathon Reconciliation of Replicates Planning: Difference between revisions

Jump to navigation Jump to search
 
Line 1: Line 1:
[[Category:Transcription Hackathon]]
[[Category:Transcription Hackathon]]
== Coordination Tools  ==
We worked on tools to help with reconciling and interpreting crowd-sourced data. One possible workflow might go like this:


*[https://docs.google.com/document/d/1AOsU-lcQpzzzibXculxbpGUlLct3VS2coe0v62t2FvY/edit?usp=sharing GoogleDoc for Coordination]
    Start with crowd-sourced transcriptions.
    → '''reconcile''' ( → filter out irreconcilables?)
    if locality:
        → '''place name matching'''
        → geocoding
    if names:
        → '''name splitting'''
        → name list lookup


* [https://docs.google.com/document/d/1VxGU5sq2n0s9Ox84l7WSDUv4SKILgk7VKQewn3Zb5v0/edit GoogleDoc for presentation planning]
Reconciliation: Range of approaches:
* Get a super-user to finalize / approve transcriptions, instead of trying to resolve multiple submissions
* Or, given multiple transcriptions, pick one which minimizes some edit distance.
* Or, use sequence alignment tools to find the best transcription of subregions in a larger string. (GitHub code does this.)


'''Add your name and interests to the GoogleDoc, if this is a track that interests you!'''
Locality: Again, a range, but probably want to try to [http://norvig.com/spell-correct.html clean up] the transcribed string before going to geocoding service.
 
Names: Processing will depend on target database structure: Maybe you just want one string, or maybe you want to try to separate names. If the names are separated, they could be compared/linked to an outside list of collectors. (... and that could be part of a larger QA process: Does the collection date make sense, given the life span of the collector?) (GitHub code tries to do this.)
 
* [https://docs.google.com/presentation/d/1KqIprcRvAEqbKMmVmEqqEkbyg7DtxLk4RYgMGD15c4M Final presentation]
* [https://github.com/idigbio-citsci-hackathon/StringTools GitHub]
 
== Older documents  ==
 
*[https://docs.google.com/document/d/1AOsU-lcQpzzzibXculxbpGUlLct3VS2coe0v62t2FvY GoogleDoc for Coordination]
* [https://docs.google.com/document/d/1VxGU5sq2n0s9Ox84l7WSDUv4SKILgk7VKQewn3Zb5v0 GoogleDoc for presentation planning]


Back to [[Transcription_Hackathon]]
Back to [[Transcription_Hackathon]]
3

edits

Navigation menu