Monday, August 30, 2010

Editing manuscripts with text and image services

An earlier post briefly illustrated one way that the Homer Multitext project is using a dynamic image service to help editors inventorying the scholia in the Venetus A manuscript of the Iliad. A post last week illustrated one way that the project's Canonical Text Service is helping automate comparison of different texts of the Iliad. Taken together, image services and automated collation of editions are a potent combination for editors.

The initial note on comparing two manuscripts used the chart linked from this thumbnail: I updated the note to use the chart linked from this thumb: Why the dramatic difference in the report on Iliad 17?

The project's initial edition of the Venetus A was created by a dedicated team of undergraduate Fellows, who worked from the apparatus of T.W. Allen's critical edition to "reverse engineer" a text of the Venetus A, before the project was able to digitize the manuscript in 2007. In 2010, as the scholia are being inventoried, this edition is gradually being checked against the direct evidence of the digital images, but book 17 is still based on Allen's printed information. In his apparatus to 17.729 (vol. 3, p. 166), Allen notes tersely "729-761 om. A". The HMT Fellows correctly interpreted this to mean that Venetus A does not include the last 32 lines of book 17, and consequently struck them from our edition. This is the basis for the first chart above.

Now, with the digital images in hand, we see somthing more interesting. Folio 237 verso is written in the familiar tenth-century hand of most of the manuscript, and includes scholia. (See the zoomable image, including scholia, linked from this thumbnail: . Folios 238 recto and verso, however, are a replacement for a lost original, perhaps in the hand of Cardinal Bessarion himself. Compare the zoomable image linked here

Evidently, in this instance at least, Allen decided that "A" was to mean "the tenth-century A only". The HMT edition prefers instead to take "A" as the entire, continuous Iliadic text, since our electronic edition and indices can distinguish the folios added later from the tenth-century originals, and leave open for any particular application the question of whether to work with tenth-century text only, later text only, or the entire text.

Allen's apparatus has no way to communicate this. The ambiguously compressed note "omisit" would most naturally suggest that the last 32 lines of book 17 were never part of A (as the HMT Fellows took it to mean). There is no hint that the manuscript, as we have it, completes A through 17.761.

The most important point is not whether or not we take issue with Allen's phrasing, however. What is significant is rather that tools like automatic collation can call our attention to passages that stand out or appear unusual; automated associations with our image service then allow us to unravel a trail of evidence that has vanished from the app.crit. of Allen's "definitive" edition. Editors are now manually collating the text of the last 32 lines of Venetus A's book 17. Among the interesting questions we will be able to consider: what source did Bessarion use for filling out the missing folio? An automated comparison with the Venetus B might be revealing — perhaps a subject for a future blog post.

A final methodological observation — all the images in this post are created dynamically. References either to Google's Chart service, or the Homer Multitext project's image service return image data that can be used as you like, including embedding in a web page.

Thursday, August 26, 2010

Comparing two manuscripts with CTS

A digital multitext can make it easier for readers to compare different versions of the Iliad; it can also enable new kinds of systematic, machine-assisted comparisons. For example: since we now have complete texts of the Venetus A and Venetus B manuscripts of the Iliad in the Homer Multitext project's Canonical Text Service, we can use the service's knowledge about the citation of each version to find the vertical variation between Venetus A and Venetus B (that is, what lines are present or absent in the two manuscripts).

The chart linked to this thumbnail image summarizes the variation book by book. The number of "plus" and "minus" verses are counted for each of the 24 books of the Iliad (the x axis); the dark blue section of a bar represents the number of lines that appear in A but not B; the light blue section represents the number of lines that appear in B but not A. (Phrased differently, if we are taking A as a reference text, and comparing B to it, we could say that the dark blue section represents "plus verses," and the light blue section represents "minus verses.")

Even a simple example like this creates a view of two manuscripts that would be prohibitively tedious to construct from print editions — and since there is no complete print edition of either Venetus A or Venetus B, would be impossible in any case. As we think about how to read and compare material in a digital multitext, we will have to go beyond our experience with print editions to rethink what it means to read and compare texts.

I'll reserve the subject of horizontal variation for another blog post.

Update: I've posted a slightly geekier but related discussion here.

Saturday, August 21, 2010

Updates to text services

This summer, the implementation of Canonical Text Services (or CTS) running on Google AppEngine has benefited from a number of updates, but one of the most important changes was added with a single line of code. An initiative of the World Wide Web Consortium now defines a new mechanism for explicitly granting permission for other sites to use data drawn from services like CTS. (For the technical documentation, see the July 27, 2010, draft of the W3C's "Cross-Origin Resource Sharing" document here. The current version of CTS for AppEngine uses the Access-Control-Allow-Origin header to permit programs from any location on the internet, without restriction, to interoperate with a CTS.
In plain English, this one-line change means that other programs can automatically talk to the Homer Multitext project's services. In isolation, this may seem a small step, but it nevertheless brings us closer to a point where lower-order scholarly activities — look this passage up, search for this term, compare these two passages — can be formally specified, and automatically carried out and evaluated, freeing scholars to focus instead on what these exercises mean.
How might our scholarship change if we could use software to assess its machine-actionable foundations? One place classicists (and humanists more generally) could profitably look for examples is software development. CTS for AppEngine takes a "test-driven" approach. In test-driven software development, tests for evaluating a program are specified first; programs are written subsequently; whether they pass the tests or not can then be automatically evaluated.
The CTS test suite defines a set of queries (ca. 100) about a sample corpus of texts, and information about the responses it expects to receive from those queries. Today, we released on sourceforge an updated version of the ctsvalidator, a program that runs the test suite of queries against any CTS with the test data installed, and reports in some detail on how well the particular installation compiles with the requirements of the CTS protocol. Development of CTS for AppEngine also reached a landmark when for the first time an installation passed 100% of the tests. (The demo of CTS at now bats 1.000, with 100 correct replies to 100 queries.)
We're rapidly approaching a stage where scholarly claims about versions of the Iliad can be expressed as assertions to test against the contents of a digital multitext, and so liberate us to consider what these claims might mean.


Wednesday, August 11, 2010

Homeric Canonical Text Services

Updated versions of two services on Google's AppEngine platform are now delivering Homeric texts:
Both sites are running on a new, pre-release version of the Canonical Text Services developed at the Center for Hellenic Studies; we expect to release the new version this month on the project's sourceforge site,