Saturday, August 21, 2010

Updates to text services

This summer, the implementation of Canonical Text Services (or CTS) running on Google AppEngine has benefited from a number of updates, but one of the most important changes was added with a single line of code. An initiative of the World Wide Web Consortium now defines a new mechanism for explicitly granting permission for other sites to use data drawn from services like CTS. (For the technical documentation, see the July 27, 2010, draft of the W3C's "Cross-Origin Resource Sharing" document here. The current version of CTS for AppEngine uses the Access-Control-Allow-Origin header to permit programs from any location on the internet, without restriction, to interoperate with a CTS.
In plain English, this one-line change means that other programs can automatically talk to the Homer Multitext project's services. In isolation, this may seem a small step, but it nevertheless brings us closer to a point where lower-order scholarly activities — look this passage up, search for this term, compare these two passages — can be formally specified, and automatically carried out and evaluated, freeing scholars to focus instead on what these exercises mean.
How might our scholarship change if we could use software to assess its machine-actionable foundations? One place classicists (and humanists more generally) could profitably look for examples is software development. CTS for AppEngine takes a "test-driven" approach. In test-driven software development, tests for evaluating a program are specified first; programs are written subsequently; whether they pass the tests or not can then be automatically evaluated.
The CTS test suite defines a set of queries (ca. 100) about a sample corpus of texts, and information about the responses it expects to receive from those queries. Today, we released on sourceforge an updated version of the ctsvalidator, a program that runs the test suite of queries against any CTS with the test data installed, and reports in some detail on how well the particular installation compiles with the requirements of the CTS protocol. Development of CTS for AppEngine also reached a landmark when for the first time an installation passed 100% of the tests. (The demo of CTS at now bats 1.000, with 100 correct replies to 100 queries.)
We're rapidly approaching a stage where scholarly claims about versions of the Iliad can be expressed as assertions to test against the contents of a digital multitext, and so liberate us to consider what these claims might mean.


No comments:

Post a Comment