Text Transcription Issues: Difference between revisions

From iDigBio
Jump to navigation Jump to search
No edit summary
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== About Standards for Transcribing Text  ==
== About Standards for Transcribing Text  ==
<br>
<br>
*In our last meeting (18 Dec 2012) we discussed some of the challenges of transcribing text with corrections, alterations, strikeouts, ambiguous letters, etc and I briefly mentioned some transcription projects that have dealt with similar issues. A hackathon participant, Ben Brumfeld, has much more experience in this topic so first I'll point you to some information he has compiled. His blog home page (http://manuscripttranscription.blogspot.com) currently has a transcription of his talk about the variety of formats that various projects are using. A worthwhile read.
*Content here begins with resources put together by Jason Best (thank you Jason) in an email sent to the AOCR wg on 19 December 2012.
 
*In our last meeting (18 Dec 2012) we discussed some of the challenges of transcribing text with corrections, alterations, strikeouts, ambiguous letters, etc and I [Jason Best] briefly mentioned some transcription projects that have dealt with similar issues. A hackathon participant, Ben Brumfeld, has much more experience in this topic so first I'll point you to some information he has compiled. His blog home page (http://manuscripttranscription.blogspot.com) currently has a transcription of his talk about the variety of formats that various projects are using. A worthwhile read.


*If we decide to try to transcribe or preserve ambiguous or corrected/struckout characters, then the Text Encoding Initiative format might be a good start, though it would require the use of XML elements in brackets. A more lightweight approach might be to utilize some of the wiki markup formats like:
*If we decide to try to transcribe or preserve ambiguous or corrected/struckout characters, then the Text Encoding Initiative format might be a good start, though it would require the use of XML elements in brackets. A more lightweight approach might be to utilize some of the wiki markup formats like:
Line 13: Line 15:
**New York Public Library Menu transcription guidelines - http://menus.nypl.org/help
**New York Public Library Menu transcription guidelines - http://menus.nypl.org/help
**National Archives Transcription tips - http://transcribe.archives.gov/tips
**National Archives Transcription tips - http://transcribe.archives.gov/tips
**Leiden+ notation used by classicists for marking damage and unclear readings in Greek papyrus standards - http://papyri.info/editor/documentation?docotype=text (In use since the mid-1930s, updated and translated to TEI by the Integrating Digital Papyrology group.)


*Projects that might have additional approaches to transcription
*Projects that might have additional approaches to transcription
**http://scripto.org http://www.uscript.org
**http://scripto.org http://www.uscript.org
**http://transcriptorium.eu http://t-pen.org
**http://transcriptorium.eu http://t-pen.org
Back to the [[2013 AOCR Hackathon Wiki]]

Latest revision as of 16:31, 17 January 2013

About Standards for Transcribing Text


  • Content here begins with resources put together by Jason Best (thank you Jason) in an email sent to the AOCR wg on 19 December 2012.
  • In our last meeting (18 Dec 2012) we discussed some of the challenges of transcribing text with corrections, alterations, strikeouts, ambiguous letters, etc and I [Jason Best] briefly mentioned some transcription projects that have dealt with similar issues. A hackathon participant, Ben Brumfeld, has much more experience in this topic so first I'll point you to some information he has compiled. His blog home page (http://manuscripttranscription.blogspot.com) currently has a transcription of his talk about the variety of formats that various projects are using. A worthwhile read.
  • If we decide to try to transcribe or preserve ambiguous or corrected/struckout characters, then the Text Encoding Initiative format might be a good start, though it would require the use of XML elements in brackets. A more lightweight approach might be to utilize some of the wiki markup formats like:

Back to the 2013 AOCR Hackathon Wiki