Saturday, July 6, 2013

Taking count of the Homer Multitext

In the seminar on the Homer Multitext project that wraps up today at the Center for Hellenic Studies, five teams editing Iliad 10 in the Venetus A manuscript have been working with an experimental automated validation system.  Teams are verifying that their editions of texts and related records about manuscripts and images pass a variety of consistency tests.  One important implication of this work is that we can now build new versions of the Homer Muiltitext Project's online services automatically from material passing a defined suite of tests.  

An immediate, practical consequence is that we can completely reinstall the project's online services in 5-10 minutes, a significant improvement over the earlier, more tedious process.  As an example of how easily this allows us to survey information across the HMT project, here are a few numbers about the current state of the project's editions of Iliad 1-7 in the Venetus A manuscript.

Features Number
Tokens (“words”) indexed to occurrence in a specific passage > 100,000
Distinct forms of indexed tokens > 25,000
Lexical entities (“dictionary entries”) represented by the 25,000 distinct forms > 8,000

Those numbers will change, not only as we add new material, but as we now begin to review our earlier work and assess our editions of Iliad 1-7 against the same tests being used in our current work on Iliad 8-10.

