Thursday, February 27, 2014

Publishing the HMT archive

The editorial work of the Homer Multitext project is ongoing, and, as good photography of more manuscripts and papyri becomes available, is open-ended. While we have provided openly licensed access to our source images and editorial work in progress since our first digital photography in 2007, we have not previously offered packaged publications of our archive.

That is changing in 2014. The project’s editors have decided on a publishing cycle of roughly three issues a year (since our work tends to be concentrated around an academic calendar of fall term, spring term, and summer work). Published issues of the project archive must satisfy four requirements.
  1. The issue must be clearly identified. Our releases are labelled with a year and issue number: our first issue is 2014.1.
  2. All content published in a given issue must pass a clearly identified review process. Teams of contributing editors work in individual workspaces. (We use github repositories to track the work history of these teams.) When a block of work passes a series of manual review and automated tests, it migrates from “draft” to “provisionally accepted” status and is added to the project’s central archival repository. This is the repository that we are publishing for the first time this week.
  3. All published material must be in appropriate open digital formats. Apart from our binary image data, all the data we create are structured in simple tabular text files or XML files with published schemas.
  4. All published material must be appropriately licensed for scholarly use. All of our work is published under a Creative Commons Attribution-ShareAlike license. (Licenses for some of our image collections additionally include a “non-commercial” clause: in those cases, a license for commercial reuse must be separately negotiated with the copyright holder.)

Access to the Published Digital Archive

The published packages are available for download from as zip files. An accompanying README explains the contents of each zip file.
We are also distributing our published issues as nexus artifacts (previously mentioned briefly here), a system that allows software to identify and retrieve published versions automatically. Whether manually or automatically downloaded, it now becomes possible for scholars (and their software) to work with citable data sets from the constantly changing archive of the HMT project.

Tracking Work in Progress

We will continue to make our work in progress available. For easy access to the current state of “provisionally accepted” material in our archive, we also generate a nightly set of packages. These are available for manual download here, but are not distributed through our nexus server.
They should be considered unpublished: other publications should cite only published issues of the archive.

Like our individual editorial teams, we manage our publication repository through github: Our data archive includes a publicly available issue tracker where you can submit questions or bug reports, and follow our progress.

More technical information

If you’re interested in technical information about how we develop the published archive and use it to build applications, Christopher Blackwell and I have recently published a discussion here.


Saturday, February 8, 2014

Technically speaking ...

For over a decade, the Homer Multitext project has been exploring how to represent a multitext in digital form.  For some of our essential work, we have been able to adopt well understood practices (such as how to use XML markup to structure a diplomatic edition of a text).  In other aspects of our work, we are faced with issues that have not been explored in prior work on digital scholarship, and have had to define new standards.

We have devoted special attention to the fundamental question of how to cite texts in a form that is independent of any specific technology and sufficiently rigorously defined for computers to use.  We have defined the syntax and semantics for a notation for citing texts that is based on the Internet Engineering Taskforce's Uniform Resource Name (URN) notation.  We call this notation the Canonical Text Service URN, or CTS URN.

We have also defined a protocol for a networked service that understands the CTS URN notation, and can retrieve passages of texts.  Unsurprisingly, we call this the Canonical Text Service protocol, or CTS protocol.

We have worked hard to ensure that the technical design of our notation and service fully satisfies the needs of the Homer Multitext project, but is not limited to or in any way specific to the HMT project's corpus of texts.  Both of us have applied the CTS notation and CTS service protocol to a range of other projects, not limited to Greek or Latin texts.  As our work on these two technical projects has matured, we have found more and more interest in it from scholars working with canonically citable texts.

This week, we were able to complete revisions for a new version of the specification for both the CTS URN notation and CTS protocol.  It was especially gratifying that we were able to complete this work during a visit to Leiden University, where we were graciously hosted by Ineke Sluiter and her colleagues, a new group of collaborators on HMT who first participated in the summer 2013 seminar at the Center for Hellenic Studies.

The specifications:

Christopher Blackwell and Neel Smith, HMT project architects