Thursday, August 27, 2015

Beyond crowd sourcing

How do you coordinate contributions from a hundred editors and ensure the quality of the resulting archive?  That's a challenge we face thanks to the success of the past several years of summer seminars at CHS.

The solution we've designed enables scattered teams using virtual machines to work in a collaborative work flow and document their progress in publicly visible github repositories.   The nuts and bolts of the process are increasingly thoroughly documented  (special thanks to project manager Stephanie Lindeborg and the summer 2015 team at Holy Cross for their invaluable contributions).  While this challenge applies to any collaborative digital project,  the HMT approach seems to stand apart from other digital projects, so I've posted a long overview of the technical design of our validation and verification system on the HMT github site.

The important conclusions: while a single book of the Iliad can easily surpass 10,000 words of text in a manuscript like the Venetus A, the HMT project's validation system  ensures that every word can be tracked to a region of interest on an image, and that both text and image are connected to a specific page of the manuscript by a syntactically valid URN that cites an object that really exists in the HMT archive. Every word of every text is tested against rigorous criteria that are specific to the type of the word.  Automated validation and computer-assisted human verification put the HMT archive on a solid foundation.