Saturday, October 20, 2018

Access to HMT Facsimiles

The Homer Multitext is producing complex data. The complexity is irreducible, since it is our mission to publish digital editions mapped to their manuscript folios, with Iliadic texts associated with commentaries.

This complex data is published as a single CEX file, a plain-text serialization of the current state of the HMT. That data is also exposed via a web service, and an integrated web-application. For more straightforward access, we have published a facsimile view of the data.

This post is to announce the Homer Multitext Facsimile Index application, allowing users to access HMT data based on Iliadic citations, e.g. 2.100 (individual passages), or 2.1-2.10 (ranges of passages).

Because traditional citations assumed an audience of (clever, intuitive) human readers, some traditional practices do not translate to a computational environment. For example "1.1-10" is not a valid, that is, unambiguous citation. Does it mean "from 1.1 to 1.10" or "from Book 1, Line 1, through all of Book 10"? The unimaginative machine will assume the latter. So with this app, and with CITE data generally, users must be verbose and specific: 1.1-1.10, with [book].[line] on both sides of the hyphen.

Homer Multitext Facsimile Index

As with all expressions of HMT data, this application was build with the CITE Architecture code libraries in Scala and Scala-JS.

Thursday, July 19, 2018

The Homer Multitext Microservice

The Homer Multitext produces integrated data on Greek Epic poetry, its language, its evolution over time, the traditions of scholarship surrounding it, and the physical artifacts, manuscripts and papyri, that are our only evidence. For a concise explanation of what the HMT publishes, please see https://github.com/homermultitext/hmt-archive/blob/master/overview.md.

At the same time, we Project Architects of the HMT, Neel Smith and Christopher Blackwell, are interested in making this data as widely accessible as possible. The data is released in CEX Format, a plain-text serialization of data organized according to defined abstract data models. We have developed code libraries in Scala implementing these abstract data models. These libraries provide the greatest flexibility in manipulating, locating, aggregating, and transforming the data of the Homer Mulititext.

For users who may not want to write code directly, we have provided an online application offering a graphical user interface for interacting with HMT data using the Cite Architecture’s Scala libraries.

For those who might want to write their own applications that interact with the HMT data, we provide a collection of microservices.

The examples below demonstrate the Scala Cite Services (Akka) application, SCS-Akka, running at beta.hpcc.uh.edu/scs/, and (as of July 19, 2018) serving data from the 2018g Release of the Homer Multitext Data.

The service accepts requests via HTTP, and returns JSON expressions of CITE objects. We have published a library in Scala for de-marshalling those JSON expressions into CITE data objects.

The CiteApp web-based application for the Homer Multitext gets its data from this service, and indeed the web-application and the microservice were developed jointly.

This collection of microservices is serving current data from the Homer Multitext, edited by Casey Dué and Mary Ebbott, a project of the Center for Hellenic Studies of Harvard University.

For more information on this service, please see https://github.com/cite-architecture/scs-akka.

For information on the CITE Architecture, please see https://cite-architecture.github.io.

Report bugs by filing issues on GitHub.

Texts

About the Service’s Catalog

http://beta.hpcc.uh.edu/scs/libraryinfo

See the Text Catalog

Get the First Valid Reference in a text

http://beta.hpcc.uh.edu/scs/texts/firsturn/urn:cts:greekLit:tlg0012.tlg001.msA:

Get Valid References

All references for a version of a text:

http://beta.hpcc.uh.edu/scs/texts/reff/urn:cts:greekLit:tlg0012.tlg001.msA:

Valid references for parts of a text:

Get Passages

Passages for a specific version of a text:

Passages for all versions of a text:

NGrams

NGrams in works present in the library:

http://beta.hpcc.uh.edu/scs/texts/ngram/urn:cts:greekLit:tlg5026.msA.va_dipl:?n=3 (3-grams in the Venetus A Main Scholia)
http://beta.hpcc.uh.edu/scs/texts/ngram/urn:cts:greekLit:tlg5026.msA.va_dipl:?n=3&t=20 ( occuring more than t times)
http://beta.hpcc.uh.edu/scs/texts/ngram/urn:cts:greekLit:tlg5026.msA.va_dipl:?n=3&s=Ζηνόδοτος (Filter for string, s=Ζηνόδοτος.)

Find citations to NGrams:

http://beta.hpcc.uh.edu/scs/texts/ngram/urns?ng=ὅτι+Ζηνόδοτος+γράφει (find URNs for a given N-gram in the entire library)
http://beta.hpcc.uh.edu/scs/texts/ngram/urns/urn:cts:greekLit:tlg0012.tlg001.msA:?ng=προσέφη+πόδας+ὠκὺς (find URNs for a given N-gram in one text)

Returning a Corpus of Passages containing an NGram:

http://beta.hpcc.uh.edu/scs/texts/ngram/urns/tocorpus?ng=ὅτι+Ζηνόδοτος+γράφει (find URNs for a given N-gram in the entire library)
http://beta.hpcc.uh.edu/scs/texts/ngram/urns/tocorpus/urn:cts:greekLit:tlg0012.tlg001.msA:?ng=προσέφη+πόδας+ὠκὺς (find URNs for a given N-gram in one text)

String Searches

Token Searches

http://beta.hpcc.uh.edu/scs/texts/token?t=Ἀγαμέμνων
http://beta.hpcc.uh.edu/scs/texts/token/urn:cts:greekLit:tlg0012.tlg001.msA:?t=Ἀγαμέμνων
http://beta.hpcc.uh.edu/scs/texts/tokens?t=πόδας&t=ὠκὺς
http://beta.hpcc.uh.edu/scs/texts/tokens/urn:cts:greekLit:tlg0012.tlg001.msA:?t=πόδας&t=ὠκὺς
http://beta.hpcc.uh.edu/scs/texts/tokens?dist=3&t=ὅτι&t=γράφει (Two tokens within dist=3 of each other)
http://beta.hpcc.uh.edu/scs/texts/tokens?dist=3&t=ὅτι&t=γράφει (Two tokens within dist=2 of each other, should return no passages)

Collections of Objects

Catalog

http://beta.hpcc.uh.edu/scs/collections/ (all collections)
http://beta.hpcc.uh.edu/scs/collections/urn:cite2:hmt:msA.v1: (filter by URN)
http://beta.hpcc.uh.edu/scs/collections/reff/urn:cite2:hmt:msA.v1: (filter by URN, just URNs)
http://beta.hpcc.uh.edu/scs/collections/hasobject/urn:cite2:hmt:msA.v1:1r (check for an object; should return true)
http://beta.hpcc.uh.edu/scs/collections/hasobject/urn:cite2:hmt:msA.v1:NOTOBJECT (check for an object; should return false)
http://beta.hpcc.uh.edu/collections/labelmap (returns a map of Cite2Urn -> String, the label of each citable object)

Objects

http://beta.hpcc.uh.edu/scs/objects/urn:cite2:hmt:msA.v1: (all objects for version v1 of collection urn:cite2:hmt:msA:)
http://beta.hpcc.uh.edu/scs/objects/prevurn/urn:cite2:hmt:msA.v1:2v (get the URN of the previous object in an ordered collection)
http://beta.hpcc.uh.edu/scs/objects/nexturn/urn:cite2:hmt:msA.v1:1r (get the URN of the next object in an ordered collection)
http://beta.hpcc.uh.edu/scs/objects/urn:cite2:hmt:msA.v1:12r-13v (a range of objects in an ordered versioned collection)
http://beta.hpcc.uh.edu/scs/objects/urn:cite2:hmt:msA.v1:12r-13v?dse=true (a range of objects in an ordered versioned collection, with all DSE records associated with those objects and properties of those objects [see below])
http://beta.hpcc.uh.edu/scs/objects/paged/urn:cite2:hmt:msA.v1:?offset=1&limit=10 (paged viewing of objects in an ordered collection)
http://beta.hpcc.uh.edu/scs/objects/paged/urn:cite2:hmt:msA.v1:?offset=11&limit=10 (paged viewing of objects in an ordered collection)
http://beta.hpcc.uh.edu/scs/objects/paged/urn:cite2:hmt:msA.v1: (paged viewing, with default values, offset=1, limit=10, of objects in an ordered collection)

Get objects from multiple collections:

http://beta.hpcc.uh.edu/scs/collections/objects?urn:cite2:cite:datamodels.v1:&urn=urn:cite2:hmt:msA.v1:&urn=urn:cite2:hmt:compimg.v1:

Finding Objects

urn-match

http://beta.hpcc.uh.edu/scs/objects/find/urnmatch?find=urn:cite2:hmt:msA.v1:12r (search all property-values for a URN)
http://beta.hpcc.uh.edu/scs/objects/find/urnmatch?find=urn:cite2:hmt:msA.v1:12r&dse=true (search all property-values for a URN, with DSE records for the returned objecs.)
http://beta.hpcc.uh.edu/scs/objects/find/urnmatch?find=urn:cite2:hmt:msA.v1:12r&offset=0&limit=3 (search all property-values for a URN; offset=0 start at the first result; limit=3 return only three results; these optional parameters apply to all searching requests and allow paged access to results)
http://beta.hpcc.uh.edu/scs/objects/find/urnmatch/urn:cite2:hmt:va_dse.v1:?find=urn:cite2:hmt:msA.v1:12r (limit to a specified collection)
http://beta.hpcc.uh.edu/scs/objects/find/urnmatch/urn:cite2:hmt:va_dse.v1:?find=urn:cite2:hmt:msA.v1:12r&parameterurn=urn:cite2:hmt:va_dse.v1.surface: (limit to a specified collection and a specified property)

regexmatch

http://beta.hpcc.uh.edu/scs/objects/find/regexmatch?find=[0-9]{2} (use regular expressions to search property values)
http://beta.hpcc.uh.edu/scs/objects/find/regexmatch/urn:cite2:hmt:compimg.v1:?find=[0-9]{2} (use regular expressions to search property values)

stringcontains

valueequals

numeric less-than

numeric less-than-or-equal

numeric equals

numeric greater-than

numeric greater-than-or-equal

numeric within

Data Models

Images

Basic Image Retrieval

http://beta.hpcc.uh.edu/scs/image/urn:cite2:hmt:vaimg.2017a:VA012RN_0013 (resolve to image)
http://beta.hpcc.uh.edu/scs/image/urn:cite2:hmt:vaimg.2017a:VA012RN_0013@0.04506,0.2196,0.1344,0.10093 (resolve to image)
http://beta.hpcc.uh.edu/scs/image/urn:cite2:hmt:vaimg.2017a:VA012RN_0013?resolveImage=false (return URL to image)
http://beta.hpcc.uh.edu/scs/image/urn:cite2:hmt:vaimg.2017a:VA012RN_0013@0.04506,0.2196,0.1344,0.10093?resolveImage=false (return URL to image)

Defining a width

Defining MaxWidth and MaxHeight

Embedding

Relations

CITE Relations are associations of URN to URN, with the relationship specified by a Cite2 URN.

http://beta.hpcc.uh.edu/scs/relations/urn:cts:greekLit:tlg0012.tlg001:1.1 Get all relations for a URN.
http://beta.hpcc.uh.edu/scs/relations/urn:cts:greekLit:tlg0012.tlg001:1.1?filter=urn:cite2:cite:verbs.v1:commentsOn Get all relations, filtered by a relation-URN.

Commentary Data Model

If a library includes CiteRelations and implements the Commentary datamodel, comments associated with passages of text can (optionally) be attached to replies for a corpus of texts.

http://beta.hpcc.uh.edu/scs/texts/urn:cts:greekLit:tlg0012.tlg001:1.1?commentary=true

Documented Scholarly Editions (DSE) Data Model

The DSE Data model consists of a CITE Collection of objects, each documenting a three-way relationship between (a) a text-bearing artifact, (b) a documentary image (ideally with a region-of-interest defined), and © a citable passage of text.

http://beta.hpcc.uh.edu/scs/dse/recordsforsurface/urn:cite2:hmt:msA.v1:12r Get all DSE Records associated with a Text Bearing Artifact.
http://beta.hpcc.uh.edu/scs/dse/recordsforimage/urn:cite2:hmt:vaimg.2017a:VA012RN_0013Get all DSE Records associated with a Citable Image.
http://beta.hpcc.uh.edu/scs/dse/recordsfortext/urn:cts:greekLit:tlg0012.tlg001.msA:1.1 Get all DSE Records associated with a passage of text.
http://beta.hpcc.uh.edu/scs/dse/recordsfortext/urn:cts:greekLit:tlg0012.tlg001.msA:1.1-1.3 Get all DSE Records associated with a passage of text expressed by a range-URN.
http://beta.hpcc.uh.edu/scs/objects/urn:cite2:hmt:msA.v1:12r-13v?dse=true (a range of objects in an ordered versioned collection, with all DSE records associated with those objects and properties of those objects [see below])
http://beta.hpcc.uh.edu/scs/objects/find/urnmatch?find=urn:cite2:hmt:msA.v1:12r&dse=true&offset=0&limit=3 Search all property-values for a URN, with DSE records for the returned objecs. If offset and limit are not used to constrain the returned results, the list of DSEs might be huge, and cause the request to timeout.
http://beta.hpcc.uh.edu/scs/texts/urn:cts:greekLit:tlg0012.tlg001.msA:1.1-1.5?dse=true Load a corpus of text, with DSE records, if any, for each citable node.

(The dse=true parameter is valid for all object-searching, as well as for retrieval of individual objects or ranges of objects.)

Wednesday, May 30, 2018

Homer Multitext 2018d Data Release

We are pleased to announce the 2018d release of Homer Multitext data. This is the fourth release of 2018. With each release, we try to improve our automated validation and machine-assisted verification, and to improve integration of this data through refinements to the data models.

This is the work of over 170 editors.

A guide to understanding HomerMultitext data is online.

All current data is on the project’s GitHub site. The current release, 2018d, is in the releases-cex subdirectory.

The work of the Homer Multitext is focused on scholarly data. At the same time, we are interested in providing useful access to this data in as many ways as possible. With the 2018d release, we are also pleased to provide these new tools:

An online, integrated web applicatation for exploring HMT data.
A service that can deliver data in JSON format, as responses to http requests.
A downloadable, double-clickable web application preconfigured for reading and exploring the textual content of the HMT 2018d release.

Thursday, January 4, 2018

Publishing the Homer Multitext project archive

The Homer Multitext project (HMT) is changing its publication practice in 2018. All of our work in progress remains available from publicly visible repositories hosted on github, but we are adopting a new format for integrating material from our working archive into publishable units.

Our goals have always been first to specify a model for all HMT data structures independent of any publication format, and then to select a format that fully captures the semantics of the model. In choosing a format for publication, we prefer one that, while completely expressing the model, is as simple as possible. It should be intellegible both to human readers and to software, and readily usable by the widest possible range of digital tools.

Beginning in 2014, we adopted the TTL serialization format of the Resource Description Framework (RDF) to integrate textual editions, data about physical artifacts like manuscripts, and documentary images into a single publishable file. RDF was designed to facilitate dynamic exchange and automated linking of resources on the world wide web, and is widely used for that purpose in the digital humanities community today. As a format for disseminating stable releases of HMT content, it is not ideal, however. RDF can be quite verbose: to represent a single citable node of text in one of our editions, for example, requirs more than a half dozen separate RDF statements. It is often not immediately intellegible to human readers, and although the RDF model can be implemented in multiple formats (JSON and XML, in addittion to TTL), RDF data can only be practically used with software specifically aimed at RDF processing.

This month, we are releaseing our first published data sets in the CITE Exchange format (CEX). To quote the CEX specification, CEX is "a plain-text, line-oriented data format for serializing citable content following the models of the CITE Architecture." CEX makes it possible to represent any of the fundamental models of the HMT archive — texts, citable collections of objects, and the complex relations among these objects that our archival data sets encode — as simple tabular structures in labelled blocks of a plain text file you can inspect with any text editor. All blocks in a CEX file are optional, so we can equally easily publish a single updated body of material — a new set of photographs of a manuscript, or a newly edited section of a text — or an entire compilation of our current archive in a single plain-text file. Because each CEX block is a table represented as lines of delimited text, generic tools from spreadsheets, databases, or ancient command-line utilities like `sed` and `grep` can be directly applied to CEX data, in addition to specialized code libraries we have developed that understand the semantics of citation with URNs. (See https://cite-architecture.github.io/ for more information about the cross-platform code libraries.)

As a result, over the coming weeks you will see a series of short announcements of releases as we test and release one portion of our archive at a time.

Happy New Year, with complex data in simple formats!

Welcome to the HMT

This blog discusses new developments and on-going research related to the Homer Multitext project (www.homermultitext.org). The HMT seeks to present the textual transmission of the Iliad and Odyssey in a historical framework. Such a framework is needed to account for the full reality of a complex medium of oral performance that underwent many changes over a long period of time. Using technology that takes advantage of the best available practices and open source standards that have been developed for digital publications, the HMT offers free access to a library of texts and images and tools to allow readers to discover and engage with the Homeric tradition.