Wednesday, July 18, 2012

HTML CTS Kit

Abstract

Announcing for download a package of html, javascript, and css that allows embedding into an html page passages of text served by a Canonical Text Services implementation, by inserting a CTS-URN into a <blockquote></blockquote> element, with a @class attribute “cts-text”. E.g.:
<blockquote class="cts-text" cite="urn:cts:greekLit:tlg0012.tlg001.msA:1.1>Iliad, 1.1</blockquote> 

Background

CTS stands for Canonical Text Services; it is the networked service developed for the Homer Multitext that allows discovery and retrieval of passages of texts using citations in URN format. In short, if an electronic edition or translation of a text is in a CTS service, a user or machine can request that passage using a documented protocol. All of the electronic texts edited for the Homer Multitext are exposed via a CTS service.

The Homer Multitext (HMT) has also developed an image service, which allows citation by URN to images and parts of images.

HTML CTS Kit

Anyone with experience in making web-pages in HTML knows how easy it is to include an image in a page:
<img src="http://url-to-image"/>
A web-browser will interpret that tag as a request to embed the identified image in the page, to show the image to the reader. In other words, the http://url-to-image will be resolved to the image itself.

This is how citation has always worked… an author includes a citation in a piece of writing, and the reader can resolve the citation to the quotation to which it points. In the digital age, we expect that resolution to happen automatically.

Web-browsers have always allowed urls to images to be resolved for readers, even when the images are on different servers from the server hosting the HTML page. It would be nice if text were as easy.
Canonical citation has been the foundation of Classical philology for centuries, and it is the heart and sole linking mechanism of the HMT. In the digital realm we have found this to be a rich, scaleable, and flexible method for building a complex and diverse digital library. The HTML CTS Kit is a package of files that allows authors working in HTML to cite texts concisely using canonical CTS URNs, and have those URNs resolve to the passages to which they point.

Here is a demonstration of a page that uses URNs to cite both CTS texts and a region-of-interest on an image. The page that the reader sees has rich content; the underlying source is very concise:
<h1>High Resolution Scholarship</h1>

<p>The first five lines of the <i>Iliad</i> on the Venetus A:</p>

<blockquote class="cts-text" cite="urn:cts:greekLit:tlg0012.tlg001.msA:7.1-7.5">
Iliad 7.1-7.5</blockquote>

<p>The Summary of Book 7 from the Venetus A, in Dactylic Hexameter:</p>

<img class="cite-img" 
   src="urn:cite:hmt:chsimg.VA091RN-0263:0.2412,0.0845,0.4013,0.0295"/>

<blockquote class="cts-text" cite="urn:cts:greekLit:tlg5026.chs01.msA:7">
Book 7 Summary</blockquote>

How it Works

An author can discover the CTS URN for a text by browing the Homer Multitext’s CTS service, or any other implementation of CTS, such as this one, from Furman, containing Biblical texts. The URN for “Homer, Iliad, Edition based on the Venetus A, Book 7, lines 1–15” is:

urn:cts:greekLit:tlg0012.tlg001.msA:7.1-7.15.

To cite this passage in an HTML page, an author can use the standard HTML5 blockquote element. This element is defined as allowing an attribute named cite; that attribute will hold the CTS URN. In order for the scripts in the HTML CTS Kit to recognize this blockquote as containing a CTS URN that should be resolved, the blockquote element should also have a class attribute, with a value of “cts-text”. blockquote elements should not be empty, so it is a good idea to put a human-readable citation inside the element; if the citation cannot resolve for any reason, that will be what the reader sees. The final citation will look like this:
<blockquote class="cts-text" cite="urn:cts:greekLit:tlg0012.tlg001.msA:1.1>Iliad, 1.1</blockquote> 
Assuming the correct scripts and stylesheets have been included in the HTML page (instructions are here), this is what will happen.
  • When the page loads, the script will find all of these <blockquote>…</blockquote> elements and perform an AJAX request, sending a “GetPassagePlus” request for each URN.
  • As the results of those requests come in, the scripts will process the XML returned by the CTS Services, using XSLT stylesheets to turn the XML into fragments of HTML.
  • Those HTML fragments will be inserted into the page.
  • CSS stylesheets will give some attractive presentation to the newly inserted quotations.
The XSLT and CSS is of course entirely customizeable by anyone who wants to change the structure or appearance of the resulting texts; what we provide is simply a default.

HTML CTS Kit uses the jQuery Javascript Library for most of its work, and the Sarissa library to process XSLT via Javascript.

License and Download

Like all code and data in the Homer Multitext, the HTML CTS Kit is available under an open-content license, and we hope people will find it useful. The official guide is here. Download links are here.

Idea for improvement

None of this works inside Blogger. If anyone can make it work with Blogger, we would love to hear about it!

4 comments:

  1. Hi Chris! This is very cool. It seemed like it would be pretty feasible to get it working inside Blogger, so I gave it a shot. Following Tom Elliott's instructions for getting awld-js into Blogger, I copied/pasted the header from the demo page into my template. After converting relative script/CSS URL references to absolute ones, the hangup seemed to be with the inline JS for variable declaration (even wrapping it in an interior HTML comment, Blogger would mangle it on post renders). So I put that up in a separately-hosted JS file and referred to it with an absolute URL. The result seems to work fine!

    What's copied/pasted into the Blogger template <head>: https://gist.github.com/153d3705dafc64fb2285
    My variables.js: https://gist.github.com/0590229227496147b336

    ReplyDelete
    Replies
    1. Hmmm…perhaps I should have tested more. Seems to work fine for images, but text starts to load then disappears. I think this is because the URL the XSL is at needs CORS enabled - I think it would work for text as well after that (error console shows e.g. "XMLHttpRequest cannot load http://www.homermultitext.org/hmt-doc/guides/ctskit/xsl/chs-gp.xsl. Origin http://rfbaumann.blogspot.com is not allowed by Access-Control-Allow-Origin." for each marked up text element).

      Delete
    2. You are amazing, Ryan! I'd gotten about 50% this far in my experimentation. It did seem that the XSLT was the problem... I bet if I knew how to set the http headers to allow cross-domain requests, it would work. Any thoughts?

      Delete
    3. Check out http://enable-cors.org/ - they have instructions for setting up CORS on a number of servers. Note that depending on what else is hosted there, you might want to limit it to a specific directory/location/virtualhost for security considerations (i.e. for any authenticated portion of the site served out of the server, you probably don't want CORS enabled).

      Delete