Sunday, April 13, 2014

Testing the HMT project’s technical underpinnings

In February, we noted the release of new draft specifications for the CTS URN notation that we use to cite texts, and the CTS protocol that we use to retrieve texts in the Homer Multitext project. Since the publication of the draft specifications, we have released updates of a suite of test data and of software using the test data to assess the compliance of a given CTS service with the current version of the protocol.

Together with version 1.6 of this software, the ctsvalidator servlet, we are today releasing version 0.9.0 of our implementation of the CTS protocol, sparqlcts. The new version of sparqlcts passes 100% of the ctsvalidator tests.

To recapitulate what we have released in 2014 in our work on CTS:
  • Formal specifications for the Canonical Text Services protocol, and CTS URNs. The specifications include Relax NG schemas for a CTS Text Inventory (the catalogue of a CTS library), and Relax NG schemas for validating the responses to CTS requests.
  • A test data set, documented in a valid CTS Text Inventory, and available in three formats:
    • valid and well-formed XML
    • tabular data in simple delimited text files
    • RDF triples in .ttl format
  • A set of 68 tests applying CTS requests to the test data set. The tests are defined in an XML file listing the request and parameters to be submitted to a running CTS installation. For each test, a corresponding XML file gives the expected responses to the request.
  • The CTS Validator, a web-app that runs the tests against any online CTS service hosting the corpus of test-data.
  • An implementation of the Canonical Text Services, sparqlcts, a Java web-app drawing its data from a SPARQL endpoint.  When the SPARQL end point is hosting the corpus of test data, sparqlcts passes 68 out of 68 of our defined tests.
This of course does not mean that sparqlcts is necessarily flawless (there may be problems that ctsvalidator does not test for), but it is an important milestone. One of the most profound implications of digital scholarship is that when we can automate the testing of digital work, we should invert the humanist’s traditional order of composition and assessment: specify the automated test first, then work until you pass the test. This applies to the software we use, too. When we next update our online services, we can be confident that our text service has successfully passed 100% of a challenging series of tests.


Christopher Blackwell and Neel Smith, project architects

No comments:

Post a Comment