Monday, August 30, 2010

Editing manuscripts with text and image services

An earlier post briefly illustrated one way that the Homer Multitext project is using a dynamic image service to help editors inventorying the scholia in the Venetus A manuscript of the Iliad. A post last week illustrated one way that the project's Canonical Text Service is helping automate comparison of different texts of the Iliad. Taken together, image services and automated collation of editions are a potent combination for editors.

The initial note on comparing two manuscripts used the chart linked from this thumbnail: I updated the note to use the chart linked from this thumb: Why the dramatic difference in the report on Iliad 17?

The project's initial edition of the Venetus A was created by a dedicated team of undergraduate Fellows, who worked from the apparatus of T.W. Allen's critical edition to "reverse engineer" a text of the Venetus A, before the project was able to digitize the manuscript in 2007. In 2010, as the scholia are being inventoried, this edition is gradually being checked against the direct evidence of the digital images, but book 17 is still based on Allen's printed information. In his apparatus to 17.729 (vol. 3, p. 166), Allen notes tersely "729-761 om. A". The HMT Fellows correctly interpreted this to mean that Venetus A does not include the last 32 lines of book 17, and consequently struck them from our edition. This is the basis for the first chart above.

Now, with the digital images in hand, we see somthing more interesting. Folio 237 verso is written in the familiar tenth-century hand of most of the manuscript, and includes scholia. (See the zoomable image, including scholia, linked from this thumbnail: . Folios 238 recto and verso, however, are a replacement for a lost original, perhaps in the hand of Cardinal Bessarion himself. Compare the zoomable image linked here

Evidently, in this instance at least, Allen decided that "A" was to mean "the tenth-century A only". The HMT edition prefers instead to take "A" as the entire, continuous Iliadic text, since our electronic edition and indices can distinguish the folios added later from the tenth-century originals, and leave open for any particular application the question of whether to work with tenth-century text only, later text only, or the entire text.

Allen's apparatus has no way to communicate this. The ambiguously compressed note "omisit" would most naturally suggest that the last 32 lines of book 17 were never part of A (as the HMT Fellows took it to mean). There is no hint that the manuscript, as we have it, completes A through 17.761.

The most important point is not whether or not we take issue with Allen's phrasing, however. What is significant is rather that tools like automatic collation can call our attention to passages that stand out or appear unusual; automated associations with our image service then allow us to unravel a trail of evidence that has vanished from the app.crit. of Allen's "definitive" edition. Editors are now manually collating the text of the last 32 lines of Venetus A's book 17. Among the interesting questions we will be able to consider: what source did Bessarion use for filling out the missing folio? An automated comparison with the Venetus B might be revealing — perhaps a subject for a future blog post.

A final methodological observation — all the images in this post are created dynamically. References either to Google's Chart service, or the Homer Multitext project's image service return image data that can be used as you like, including embedding in a web page.

Thursday, August 26, 2010

Comparing two manuscripts with CTS

A digital multitext can make it easier for readers to compare different versions of the Iliad; it can also enable new kinds of systematic, machine-assisted comparisons. For example: since we now have complete texts of the Venetus A and Venetus B manuscripts of the Iliad in the Homer Multitext project's Canonical Text Service, we can use the service's knowledge about the citation of each version to find the vertical variation between Venetus A and Venetus B (that is, what lines are present or absent in the two manuscripts).

The chart linked to this thumbnail image summarizes the variation book by book. The number of "plus" and "minus" verses are counted for each of the 24 books of the Iliad (the x axis); the dark blue section of a bar represents the number of lines that appear in A but not B; the light blue section represents the number of lines that appear in B but not A. (Phrased differently, if we are taking A as a reference text, and comparing B to it, we could say that the dark blue section represents "plus verses," and the light blue section represents "minus verses.")

Even a simple example like this creates a view of two manuscripts that would be prohibitively tedious to construct from print editions — and since there is no complete print edition of either Venetus A or Venetus B, would be impossible in any case. As we think about how to read and compare material in a digital multitext, we will have to go beyond our experience with print editions to rethink what it means to read and compare texts.

I'll reserve the subject of horizontal variation for another blog post.

Update: I've posted a slightly geekier but related discussion here.

Saturday, August 21, 2010

Updates to text services

This summer, the implementation of Canonical Text Services (or CTS) running on Google AppEngine has benefited from a number of updates, but one of the most important changes was added with a single line of code. An initiative of the World Wide Web Consortium now defines a new mechanism for explicitly granting permission for other sites to use data drawn from services like CTS. (For the technical documentation, see the July 27, 2010, draft of the W3C's "Cross-Origin Resource Sharing" document here. The current version of CTS for AppEngine uses the Access-Control-Allow-Origin header to permit programs from any location on the internet, without restriction, to interoperate with a CTS.
In plain English, this one-line change means that other programs can automatically talk to the Homer Multitext project's services. In isolation, this may seem a small step, but it nevertheless brings us closer to a point where lower-order scholarly activities — look this passage up, search for this term, compare these two passages — can be formally specified, and automatically carried out and evaluated, freeing scholars to focus instead on what these exercises mean.
How might our scholarship change if we could use software to assess its machine-actionable foundations? One place classicists (and humanists more generally) could profitably look for examples is software development. CTS for AppEngine takes a "test-driven" approach. In test-driven software development, tests for evaluating a program are specified first; programs are written subsequently; whether they pass the tests or not can then be automatically evaluated.
The CTS test suite defines a set of queries (ca. 100) about a sample corpus of texts, and information about the responses it expects to receive from those queries. Today, we released on sourceforge an updated version of the ctsvalidator, a program that runs the test suite of queries against any CTS with the test data installed, and reports in some detail on how well the particular installation compiles with the requirements of the CTS protocol. Development of CTS for AppEngine also reached a landmark when for the first time an installation passed 100% of the tests. (The demo of CTS at now bats 1.000, with 100 correct replies to 100 queries.)
We're rapidly approaching a stage where scholarly claims about versions of the Iliad can be expressed as assertions to test against the contents of a digital multitext, and so liberate us to consider what these claims might mean.


Wednesday, August 11, 2010

Homeric Canonical Text Services

Updated versions of two services on Google's AppEngine platform are now delivering Homeric texts:
Both sites are running on a new, pre-release version of the Canonical Text Services developed at the Center for Hellenic Studies; we expect to release the new version this month on the project's sourceforge site,

Thursday, July 22, 2010

Inventorying the scholia to the Iliad

Previous publications in print both individually and collectively offer only a selection of the scholia to be found in manuscripts. As one part of a summer research internship at the College of the Holy Cross, Melissa Browne and Frankie Hartel collaborated with Profs. Mary Ebbott and Neel Smith to create the first complete inventory of scholia in the Venetus A manuscript, for books 3 and 4 of the Iliad. Each scholion has a unique identifier, and is assigned to one of the distinct groups of scholia distinguished by their placement on the folio, by orthographic features, and perhaps to a greater extent than it has been possible to appreciate from print publication by their contents.

The scholia are also indexed to the images Browne and Hartel are using to create an edition of the texts. The digital "working notebooks" Browne and Hartel developed are now being published at, where references to regions of images are used to dynamically embed sections of images in the web page, and to create a color-coded overview of the contents of each folio side.

Wednesday, July 14, 2010

Google cites Venetus A as historic example of organizing scholarly content

In recent weeks the official Google blog and Inside Google Books  have linked to images of the Venetus A published via the Homer Multitext. Both posts stress Google's commitment to digital humanities research, the potential of text mining and other quantitative research techniques for the Humanities, and their interest in research on the ancient world in particular.

(Image courtesy of the Biblioteca Nazionale Marciana, Creative Commons Non-Commercial Share Alike 3.0 license)

Monday, July 12, 2010

Homeric Papyri and the Homer Multitext

The publication of ancient papyrus texts has always been central to the goals of the Homer Multitext project. The Homeric papyri are, with the exception of some ancient quotations, the oldest surviving witnesses to the text of Homer. The medieval manuscript tradition of Homer begins with the tenth century CE manuscripts of the Iliad known as D (Laurentianus 32.15) and Venetus A (Marcianus Graecus 454). Some papyrus fragments predate the medieval tradition by as many as 1200 years. In a 2001 article [Dué 2001a; on-line version], I argued that the multiformity of the Homeric texts, as evidenced by the earliest quotations of Homer and the Ptolemaic papyri, calls for a new approach to editing the texts of Homer. Building on the work of Gregory Nagy (especially Nagy 1996a), who was himself building on the insights of Parry and Lord into the oral traditional nature of Homeric poetry, I suggested that a web-based, “multitext” edition would be truer to the complexity of the transmission of the Homeric poems, which are oral-derived texts composed in performance. The texts as we now have them are the product of many singers over the course of many generations. What Parry and Lord’s work shows us most essentially is that there is not one original text that we should try to reconstruct. Instead of reconstructing an “original text,” the aim of the Homer Multitext, now at last becoming a reality after a decade of research and planning, is to present a series of complete, historically contextualized texts, together with images, and a variety of tools with which users can compare and analyze these historical documents.

The Homeric papyri are all fragmentary, and range in date from as early as the third century BCE to the seventh century CE. The vast majority of the fragments were discovered in Egypt, and now reside in collections located all over the world. They give us an otherwise irrecoverable picture of the Iliad and Odyssey as they were performed and recorded in ancient times. When taken altogether, Homeric papyri reveal a state of the Homeric texts in antiquity that can be quite surprising. There are numerous verses in the papyri that are seemingly intrusive from the standpoint of the medieval vulgate. These additional verses, the so-called plus verses, are not present in the majority of the medieval manuscripts of the Iliad and Odyssey. Other verses that are canonical in the medieval manuscripts are absent from the papyri—these may be termed minus verses. Also prevalent is variation in the formulaic phrasing within lines. In other words, it seems from this most ancient evidence that the poems were performed and recorded with a considerable amount of fluidity in antiquity. It is not until about 150 BCE that the papyrus texts begin to stabilize and present a relatively more uniform text.

The early Homeric papyri are the vestiges of a once vibrant performance tradition of the Iliad and Odyssey (see especially Nagy 1996a and Dué 2001a). In such a tradition no poem is ever composed, performed, or recorded in exactly the same way twice. In the earliest stages of the Iliad and Odyssey, each performance would have resulted in an entirely new composition. By the time of the first papyrus fragments, the oral composition and performance tradition of Homeric epic poetry had died out. But variation in the ancient textual tradition, the reflexes of this once oral and performative tradition, persisted for several more centuries. These variations, preserved for us in the Homeric papyri, are a unique window into the oral tradition that we have lost.

And yet in Homeric textual criticism, the papyri are not always attributed the weight that their antiquity should bestow on them. The variations are dismissed by a variety of strategies, including the often cited assertion that the variations are banal and uninteresting, and the labeling of the Ptolemaic papyri as “wild” or “eccentric” (for counter arguments, see especially Dué 2001a and 2001b and Dué and Ebbott 2009). In several publications I have suggested that the Medieval transmission is given more authority than the papyri precisely because modern editors find the multiformity of the papyri and early quotations disturbing (see especially Due 2006 as well as Dué 2001a and 2001b, Dué and Ebbott 2009). The seeming fluidity of these earliest witnesses conflicts with a basic desire (among Classicists at least) to find a single text and a single author behind our Iliad and Odyssey. The Medieval transmission, while by no means reducible to a single “vulgate” text, is more uniform, and offers the mirage of a reconstructable original that is just beyond the reach of our sources. This mirage has enticed many an editor to attempt to reconstruct what “Homer” actually composed (see especially Dué 2006).

Neither I nor my co- editors of the Homer Multitext are seeking to privilege the papyri in any special way over the Medieval transmission; rather we seek simply to make them available to scholars and anyone interested in the transmission of the Homeric poems over the course of three millennia or more, and to suggest that they have great historical value in the picture they present of the state of the Homeric texts in the earliest state in which we have it. Modern editions of the Iliad and Odyssey report papyrus readings only very selectively. The nature of a critical apparatus, moreover, necessarily obscures the context from which these readings arise. Not only can it be hard to locate the date or geographical origin of a particular papyrus when it is cited (in a highly abbreviated form) in an apparatus, it is also nearly impossible to reconstruct the character of the papyrus text that is being cited as a whole. In other words, is a particular reading one isolated variant, or is the papyrus as a whole quite multiform from the point of view of the Medieval transmission? Is the text preserved on the papyrus short or long? Is what survives a few letters per verse, whole verses, or something in between? Is the papyrus a deluxe edition of the text, a school text, a commentary? These are just a few of the questions that are almost impossible to answer by studying a critical apparatus alone. The limitations of the printed page of course prohibit including such information in a typical printed edition of the text.

But a web-based edition need not be limited in the same way, and can present complete historical documents side by side, as transcribed texts and as images. While the physical experience of touching the paper or parchment may be difficult to convey in digital form, metadata conveying such information can be easily included in the digital image files and precise scholarly descriptions can be linked. The editors of the Homer Multitext plan to do exactly this with the Homeric papyri. It is our goal to build a library of TEI XML-encoded diplomatic editions of the papyri, and to cooperate with scholars, libraries, and collections to put images, descriptions, and metadata for these papyri on-line. An initial set of editions, now available here, has been created by a group of graduate students. These students are now scholars in their right. It is our hope that they and other interested scholars will contribute more such editions as the project develops, and help us to develop the standards for such editions. The initial set referenced here is really, we hope, just the beginning of a collaborative effort that will include contributions from many people.

The idea of publishing the variations present in papyrus texts in digital form long predates the Homer Mutltitext. Homer and the Papyri was a project first created and edited by Professor Dana S. Sutton of the University of California, Irvine, who published it on CD-rom and later on the web. Homer and the Papyri, as it was established by Professor Sutton, was a website consisting of a) lists of published papyri and related items for the Iliad and the Odyssey, and b) a repertoire of the textual variants presented by this body of material, hypertextually linked to the lists of papyri. In 2001 Professor Sutton handed Homer and the Papyri over to the Center for Hellenic Studies, with a view to its continuation and incorporation into the publications of the Center, including a multitext edition of Homer. (Dana Sutton’s introduction to his original web-based edition may be found on the CHS website.) At that time Casey Dué, Mary Ebbott, and Dimitrios Yatromanolakis were appointed as editors, and a team of advisors selected. In 2005 we asked John Lundon to join our team of editors, and Alexander Loney became a contributing editor. Since then, Bart Huelsenbeck has also been a frequent contributor to the project.

When Professor Sutton first handed over Homer and the Papyri to the CHS team, the Homer Multitext project was in its infancy, and many questions immediately presented themselves. How could the data that Sutton had amassed be sustained over the long term? How could this data become interoperable within the architecture of the Homer Multitext? These somewhat technical questions raised more theoretical questions. Homer and the Papyri was an html list of variants, not complete texts of the papyri. How, then, to define a variant? What is the “original” from which the variants deviate? As I have noted, even the term “variant” fundamentally clashes with the findings of Parry and Lord that are the foundation on which the Homer Multitext project has been conceived. Sutton himself used a number of modern printed editions as points of comparison, and acknowledged some of the problems involved in doing so, including a lack of an equivalent for the Odyssey of T. W. Allen’s editio maior of the Iliad (see Sutton’s introduction to Homer and the Papyri).

The new editors quickly realized that a new approach to the project would be necessary, one that required a number of interconnected and labor intensive action items. First, Sutton’s data needed to be converted to TEI-XML, for its long term stability and so that it could be interoperable with other projects. Second, new papyri needed to be incorporated and assigned numbers in a systematic way. Not only are new papyri published every year (with new “variants,” however those are defined), often old papyri are joined, and so no longer require separate numbers. New descriptions must be written for the newly published or joined papyri and a bibliography maintained. Thirdly, we decided that we could expand the project’s utility by incorporating the data into a fully searchable relational database. Such a database was created by Michael Jones, with the cooperation and supervision of the Stoa Consortium, at that time edited by Anne Mahoney and Ross Scaife. This database allows the user to search in one of six fields, such as title (Iliad or Odyssey), book number, and line number. There are also fields for variants, witnesses, and a more general description field, in which the user may search for special features (such as material, location, or editor). The database, however, is flawed, for reasons that I will discuss further below. Our more theoretical concerns, moreover, were not solved.

These first three action items were our initial goal, and occupied several years of work on the project. But by this time, Martin West’s (1998-2000) Teubner edition of the Iliad had appeared. Not only did this edition track more papyrus readings than had been done by previous editors, it included a list of all Iliad papyri (including the papyri Sutton called “witnesses” and “Homerica”), and this list contained nearly 800 additional unpublished papyri in the Bodleian library, thereby doubling the previously known number. (This list was also published as chapter 4 of West’s Studies in the Text and Transmission of the Iliad [West 2001a].) The editors of the new Homer and the Papyri faced a new dilemma. If we continued to assign numbers and  incorporate new papyri as they were published, our list would conflict with West’s. Our initial decision was to track the differences in a “comparatio numerorum” table. We have since had cause to reevaluate this decision, and are still debating the best solution.

But we faced a far greater dilemma in our continuation of the practice of reporting variants. Should the publication of West’s new edition affect what variants we report for the Iliad? Even as most papyrologists were beginning to make use of West’s edition for their own supplements when publishing new fragments, we wrestled with the idea of making it our default, notional text. Might not the Venetus A, the oldest complete text of the Iliad, make a better, more historical point of comparison? Yet the Venetus A is itself in its own way just an arbitrary edition. In fact, any one version of the text, whether historical or constructed in modern times, is simply one version. Providing only the “variants,” in isolation from their context (as Sutton’s method had been), is misleading, because it suggests that there is an historical “original” from which the variants are varying. For the Homeric poems, that’s simply not the case.

We realized that we wanted to undertake something quite different than what the founding editor, Dana Sutton, had originally envisioned when the internet was still quite new and few standards existed. Moreover, as we continued to test the new database, its problems became increasingly glaring. As is inevitable with a large amount of data entered manually in an unstructured way (I mean by using HTML, which is a descriptive mark up system, rather than XML, which is far more structured), we found numerous errors and contradictions in the data. These errors and a general lack of uniformity, despite the XML structure we attempted to impose on it, to this day prevent the database from working properly. Though it does have some functionality, few users have been able to use it regularly and successfully.

It soon became clear that in order for Homer and the Papyri to become current, useful, and fully integrated within the Multitext, we needed to conceive of the material in a new way. Therefore, just as we had begun to do for the Medieval manuscripts and their scholia, we began to commission new TEI-XML encoded diplomatic editions of the Homeric papyri. These papyri will be published as part of the Homer Multitext by means of the same services and tools that have been developed in conjunction with the manuscripts.

The editors of the Homer Multitext feel that this new vision is true to Dana Sutton’s project, whose aim was to make accessible to interested people and scholars the multiform texts that survive on papyrus. Not only will users be able to access these papyri as complete, diplomatic texts, they will also be able to view them side by side with other historical documents, including other papyri and Medieval manuscripts.

Accomplishing what we envision - a complete library of TEI-XML encoded diplomatic editions of all published Homeric papyri - will require a great deal of work. We very much welcome contributions from other editors, and such contributions will be properly attributed and given recognition. (All contributions must be openly licensed under a Creative Commons license.) We also very much hope to include images from collections who will allow publication under a Creative Commons License, and plan to link to those existing images on-line that have stable URLs. If you are interested in contributing diplomatic editions and/or images to the Homer Multitext please contact Casey Dué (casey at and Mary Ebbott (ebbott at

Works Cited and Further Reading

[Allen 1924] Allen, T. W. Homer: The Origins and Transmission. Oxford, 1924.
[Allen 1931] Homeri Ilias. Oxford, 1931.
[Dué 2001a] Dué, C. “Achilles’ Golden Amphora in Aeschines’ Against Timarchus and the Afterlife of Oral Tradition.” Classical Philology 96 (2001): 33-47.
[Dué 2001b] “Sunt Aliquid Manes: Homer, Plato, and Alexandrian Allusion in Propertius 4.7.” Classical Journal 96 (2001): 401-413.
[Dué 2002] Homeric Variations on a Lament by Briseis. Lanham, Md.: Rowman and Littlefield Press, 2002.
[Dué 2006] “The Invention of Ossian.” Classics@ 3 (2006).
[Dué and Ebbott 2009] Dué, C., and M. Ebbott. “Digital Criticism: Editorial Standards for the Homer Multitext.” Digital Humanities Quarterly 3.1 (2009).
[Haslam 1997] Haslam, Michael. "Homeric Papyri and Transmission of the Text." in I. Morris and B. Powell, eds., A New Companion to Homer. Leiden, 1997.
[Lord 1960] Lord, A. B. The Singer of Tales. Cambridge, Mass., 1960. 2nd rev. edition, 2000.
[Lord 1991] Epic Singers and Oral Tradition. Ithaca, N.Y., 1991.
[Lord 1995] The Singer Resumes the Tale. Ithaca, N.Y., 1995.
[Nagy 1996a] Nagy, G.  Poetry as Performance. Cambridge, 1996.
[Nagy 1996b] Nagy, G. Homeric Questions. Austin, TX, 1996.
[Nagy 2000] Nagy, G. Review of Martin L. West (ed.) Homeri Ilias. Bryn Mawr Classical Review 2000.09.12.
[Nagy 2002] Plato’s Rhapsody and Homer’s Music: The Poetics of the Panathenaic Festival in Classical Athens. Cambridge, Mass., 2002.
[Nagy 2004] Homer’s Text and Language. Champaign, IL, 2004.
[M. West 1998–2000] West, M., ed. Homeri Ilias. Recensuit / testimonia congessit. Stuttgart and Leipzig, 1998–2000.
[M. West 2001a] Studies in the Text and Transmission of the Iliad. Munich, 2001.
[M. West 2001b] “West on Nagy and Nardelli on West.” Bryn Mawr Classical Review 2001.09.06.
[M. West 2004] “West on Rengakos (BMCR 2002.11.15) and Nagy (Gnomon 75, 2003, 481–501) on West: Response to 2002.11.15.” Bryn Mawr Classical Review 2004.04.17.
[S. West 1967] West, Stephanie. The Ptolemaic Papyri of Homer. Köln, 1967.

* Papyrus image courtesy of

Sunday, July 11, 2010

Digitizing Homeric Manuscripts at El Escorial

This post will describe, briefly, the technology for digitization of two Iliadic manuscripts in the collection of the Real Monasterio de El Escorial, outside of Madrid, Spain.

Casey Dué has provided some initial notes on these in the previous post. The two manuscripts were created in the 11th century CE. Their catalogue numbers are: Escorialensis ω.I.12 (513 = Allen E4) and Escorialensis y.I.1 (294 = Allen E3).

For this digitization work, we are collaborating closely with Dr. Brent Seales of the University of Kentucky’s Center for Visualization and Virtual Enviornments. Aspects of this work have been funded by the National Science Foundation.

We see this as an exciting opportunity both to advance our humanist scholarship on oral poetry and the history of Homeric texts, and the integration of technologies for multi-modal imaging of cultural heritage objects in the field. For these manuscripts, we hope to capture multi-spectral images and 3-dimensional surface maps, and ultimately to integrate these by means of the networked infrastructure developed by the Homer Multitext.

The manuscript rests on the Conservation Copystand built for the CHS by Manfred Meyer. The camera is a medium-format bellows-camera with a digital back. The digital sensor is monochromatic, and 38 megapixels. The resolution is a good thing, and the lack of color is also a good thing. In a normal, color, digital camera of, say, 24 megapixels, there is a color filter laid over the sensor. Of the 24 million pixels, 8 will be filtered through red, 8 will be filtered through green, and 8 will be filtered through blue. So each full color "pixel" will consume three pixels of resolution. The software in the camera will merge the three pixels into one, full-color pixel, at the cost of some softness to the image.

Our black-and-white camera has no color filter in front of the sensor. This does not mean that we won’t have lovely color images of these manuscripts, however.

The lights for this photography consist of banks of LED lights, with each bank bank of LEDs emitting a specific frequency of light. There are thirteen banks, ranging from ultraviolet, through the visible spectrum (blues, greens, oranges, reds) down to several levels of infrared. The camera and lights are controlled by a computer, which will automatically cycle through the spectra of light, taking a picture for each one.

The result is thirteen monochromatic images, each showing particular features of the page, as different kinds of ink and different kinds of stains or damage reflect differently.

At the end, the thirteen images can be merged to create full-color images that take advantage of the full resolution of the sensor. Other “false color” images can be generated to suit particular kinds of analysis.

In addition to this digital photography, the team is capturing structured light data using a custom-programmed projector tied to the camera. The projector uses a laser, rather than a bulb, which allows it to maintain perfect focus across an uneven surface. By projecting a series of images onto the surface of a page, and by processing the resulting pictures of that page, the team can create a 3-dimensional model of the surface. This model, in turn, can be used to remove distortions from the text, or to make a vividly realistic digital reconstruction of the page and its text.

This project relies heavily on the talents of many people. Brent Seales provided the vision of integrating this technology with humanist inquiry, and raised the funds that made the project possible. Matt Fields, Ryan Bauman, and Dan Staley are our indefatigable experts on the computer-and-imaging systems. David Jacobs protects the books with his expertise as a conservator. Juan Garces provides liaison and his professional skills as a Greek scholar and curator. Chris Collins provides high-tech environmental monitoring equipment. Amy Blackwell oversees the video team and works with David on handling the manuscripts. Casey Dué, Mary Ebbott, Neel Smith, and Christopher Blackwell stand by to see what discoveries these manuscripts may reveal.

The staff of El Escorial, particularly Director José Luis del Valle Merino, and his assistant, Padre Fabian, have been warmly welcoming, enthusiastic, and generous. It is a great privilege to collaborate with these professionals and to work in such an exalted space.

The raw data from this work will be archived, and available for use, at the Homer Multitext’s data archive at the University of Houston. Human interfaces to the data will emerge as we conduct post-processing, indexing, and linking during the late summer and autumn of this year.

Tuesday, July 6, 2010

Some preliminary notes and bibliography for Escorial Iliad manuscripts E3 and E4

(Updated 1/6/2011) As work begins in Madrid, I thought it would be helpful to gather here some preliminary notes and bibliography for the two manuscripts of the Iliad that are being digitized over the next few weeks. Once we have had a chance to study the images, I or others on the team should be able to improve upon these initial notes, which were taken before arriving in Spain.

E3 (= West E, Escorialensis Υ.I.1) is an 11th century parchment codex consisting of 336 folios, containing Iliad 1.1–24.717 with accompanying scholia.The first seven folios have been restored by later hands (folio 1 in the fifteenth century, folios 2–7 in the thirteenth century). Individual books are preceded by a one verse metrical summary, (the same one verse summaries that you find in Venetus B, but occasionally the one from A is also added in a later hand - see, e.g., folio 40r, the beginning of book 3). There are no hypotheses, subscriptions, or critical signs. The text and scholia in this manuscript are closely related to the ones in the Venetus B, which is also from the eleventh century; Maniaci (2006) has argued that Venetus B and E3 are “twins,” in that every folio matches the layout and content of the corresponding folio in the other manuscript. (As Bethe first noted, it is only the oldest, numbered set of scholia from B that is found in E3.)  According to the catalogue, the manuscript was purchased in Venice 1572 by  Guzmán de Silva for Philip II, which supports the connection between the two manuscripts—though of course all three were almost certainly produced in Constantinople not Venice, ca. 400 years before coming to Venice. Venetus A, Venetus B, and E3 all have the same style of binding.

E4 (= West F, Escorialensis Ω.I.12) is another eleventh-century parchment codex, thought by Allen to be later than E3,5 consisting of 216 folios, containing a complete text of the Iliad, a commentary with lemmata on Iliad 1–2.300, hypotheses, lives of Homer, a summary of the Cypria, an excerpt from the Batrachomyomachia (“Battle of Frogs and Mice”), excerpts from Porphyry, and other scholia with lemmata. The main text of the Iliad begins on folio 7, where a new set of scholia likewise begins. Individual books are preceded by hypotheses and a one verse metrical summary (the same one verse summaries that you find in Venetus A), and the right columns consist of a paraphrase. According to Allen (1931:148), E4 is not related to any of the other early minuscule manuscripts. The scholia seem to have been collected from several different sources. There is a set of numbered scholia which corresponds to the numbered scholia in B, E3, and Laurentianus 32.3 (= West C). There is another set of scholia in the same hand that is connected to the text with signs, which contain material from the so-called “D scholia” (also known as the scholia minora). This set of scholia is also found in B, but it is in the second, later hand of B. The scholia in this group are linked to the text through signs. The manuscript seems to have been acquired in Venice for the price of 25 ducats, according to a subscription on the last folio (liber mei Benedicti Cornelii quem emi meis pecuniis pretio ducatorum viginti q).

[The image is of folio 124 recto of manuscript E3, showing the beginning of book 10 of the Iliad.]

Bibliography (in order of publication)

Tyschen, T. C. “Beschreibung der Handschriften des Homer in der Escurial.” Bibliothek der alten Litteratur und Kunst VI (1789): 134–144.

Bekker, I., ed. Scholia in Homeri Iliadem. Berlin, 1825-1827.

Miller, E.  Catalogue des Manuscrits Grecs de la Bibliotèque de l’Escurial. Paris, 1848.

Dindorf, W., ed. Scholia Graeca in Homeri Iliadem. Oxford, 1875-1888.

Bethe, E. “Zwei Iliashandschriften des Escorial.” Rheinisches Museum für Philologie Neue Folge 48 (1893): 355–379 and 484.

Allen, T. W. Homeri Ilias. Vol. I–III. Oxford, 1931.

Revilla, A., ed. Catálogo de los códices griegos de la biblioteca de el Escorial. Vol. I. Madrid, 1936.

de Andrés, G., ed. Catálogo de los códices griegos de la Real biblioteca de el Escorial. Vol. II–III. Madrid, 1965–1967.

Erbse, H., ed. Scholia Graeca in Homeri Iliadem. Berlin, 1969-1988.

West, M. L., ed. Homeri Ilias. Stuttgart and Leipzig, 1998–2000.

Thursday, July 1, 2010

Digitization of 2 Iliad manuscripts in the Escorial to begin next week

Next week Mary Ebbott, Christopher Blackwell, Neel Smith, and I will travel to Madrid, where we will meet up with David Jacobs and Juan Garcés of the British Library and Brent Seales and a team of researchers from the University of Kentucky's Center for Visualization and Virtual Environments. Our goal is to capture the best possible images of two important Iliad manuscripts in the collection of the Escorial monastery in San Lorenzo. Subsequent posts will give more information about the imaging process, and the significance of the text and scholia in these manuscripts.

Geneva Iliad to provide exciting challenges for HMT

I had the opportunity to visit the Genavensis 44 manuscript of the Iliad last week during my trip to Switzerland for the E-codices workshop. The manuscript is undergoing an extensive restoration and has been completely unbound. I was able to see that the manuscript is indeed in need of extensive restoration, and I learned a great deal about the manuscript by seeing it in person. For example, although I knew that there is an interlinear paraphrase that runs through approximately the first half of the poem, I was not aware that this paraphrase is of the same size and same hand as the main text. It raises the question, for me at least, as to how to characterize this text from the point of view of transcription and identification of text groups. Line numbers are of course modern editorial additions, but when we transcribe this document we will have to make some choices about what to put where. I assume we will separate out the paraphrase from the text of the poem, but having access to the images of the manuscript itself will allow users of this transcription to appreciate that the paraphrase was originally written to be an organic part of the reading of this manuscript. When we separate it out, we lose something of that experience.

Tuesday, June 15, 2010

C.I.T.E - The Infrastructure of the Homer Multitext (Part 1 - Introduction)

The Infrastructure of the Homer Multitext

     C · I · T · E

The Homer Multitext (HMT) is a project of the Center for Hellenic Studies of Harvard University (CHS). It is best described in the words of its editors, Casey Dué and Mary Ebbott:
“The Homer Multitext project, the first of its kind in Homeric studies, seeks to present the textual transmission of the Iliad and Odyssey in a historical framework. Such a framework is needed to account for the full reality of a complex medium of oral performance that underwent many changes over a long period of time. These changes, as reflected in the many texts of Homer, need to be understood in their many different historical contexts. The Homer Multitext provides ways to view these contexts both synchronically and diachronically.” (From the CHS website)
Dué and Ebbott, in collaboration with the Director of the CHS, Gregory Nagy, and the CHS’s Head of Publications, Leonard Muellner, initiated research toward this project with an eye to advancing particular arguments about the nature of Homeric poetry. But anyone interested in epic poetry, Greek poetry in general, and the intellectual history of the Greco-Roman world, the cultures that came into contact with it, and those that succeeded it, stand to profit from the project.


The HMT aims to collect, as comprehensively as possible, all of the sources for our knowledge of the Homeric epics, and to publish these online, freely accessible to any interested reader.

These sources include versions of the Iliad and Odyssey, and the surviving pieces of lesser-known epic poems born in the Greek Bronze Age. These versions may be fragments of papyrus found in the sands of Egypt or manuscripts produced under the Byzantine Emperors of Constantinople. These sources also include texts of later Greek and Roman writers who quote from Homer, writers such as Plato, Aristotle, Herodotus, and Thucydides. A particularly rich body of evidence comes from the writings of the literary scholars who worked in the Libraries of Alexandria and Pergamum; the works of these writers do not survive intact, but thousands of excerpts from them and references to them do survive, as comments written in the margins of manuscripts.

Dué and Ebbott are committed to providing the most useful access possible to these sources. This means offering texts of those sources in the original Greek and translated into modern languages where possible. It also means providing high-quality digital facsimiles of the actual manuscripts wherever possible.

It is impossible to overstate the value of digital facsimiles. The Greek and Latin texts that we can check out of libraries, or find online, are highly processed documents. Editors will compare different manuscripts of a work – which always differ – and produce a uniform text that is identical to no single medieval or ancient “witness” to the work. Responsible editors will provide notes explaining in what ways their edited text differs from particular manuscripts, but these notes – even the most meticulous – fall far short of providing the depth of information that can be gleaned from direct access to good images of the manuscripts themselves.

Scholarship based entirely on edited texts is fundamentally handicapped. However brilliant the scholars working from these texts may be, their insights will be limited by the absent editors of their source-texts, by their assumptions, and by the innumerable details that disappear on the journey from the hand-written manuscript, through generations of editions, to the shelves of the library. 

For the past century, scholars of Greece and Rome have been content for the most part to work from edited texts. There were justifiable reasons for this – practical, technological, and economic reasons. None of those justifications survived the turn of the 21st Century.

In addition to texts and images, other kinds of data might shed light on Homeric poetry: morphological and lexical data, lists of persons, geographic information (where is "Sandy Pylos” or “Horse-rolling Thessaly”, is a reference to Thebes pointing to Seven-Gated Thebes, or Hundred-Gated Thebes in Egypt?), and so forth.

The Challenge

To bring these disparate materials online in a useful way posed a challenge. The collaborators on the HMT wanted an all-purpose infrastructure that would both contribute to end-user applications for browsing, searching, and reading, but would also make the raw data available for discovery and retrieval. 

Some kind of digital library infrastructure was necessary, but the complexity of the anticipated contents of that library posed another problem. A digital library containing highly diverse data, which is expected to expand indefinitely must be exposed through protocols that define requests and responses. Those requests and responses should allow discovery of contents, access to objects, retrieval of parts of objects – passages of texts, data elements, parts of images – and querying, manipulation, and other kinds of processing.

Since the data is highly varied and the possible uses of the data potentially infinite,  should the protocol become correspondingly complex, then the infrastructure would become, essentially, an end-user application, useable only to its creators, fragile and difficult to maintain, and increasingly vulnerable to obsolescence as time goes by.

Almost a decade of thinking and experimentation went into defining a generic, scaleable protocol that enables scholarly access to and use of these materials in a networked environment, as simply as possible.
This was mainly the task of the HMT’s Project Architects, Neel Smith and me, Christopher Blackwell.

Our answer is C.I.T.E., that is, Collections, Indices, Texts, and Extensions.

This looks like four things, but it is really only three: texts, collections, and indices. In our conception of the requirements of the Homer Multitext, we have reduced scholarship to these three kinds of digital object, have defined protocols for working with each, and have working code that implements each.
In the next installments of this series of postings, I will describe each element in the C.I.T.E. architecture in some detail. Finally, I will describe how they can be brought together to build rich applications for sholarship.

A Final Note

Any discussion of a “generic infrastructure for scholarship” will inevitably sound like the beginning of an evangelical spiel about how everyone needs to adopt the speaker’s pet approach to data. That is not our intention here. 

Our dear friend, the late Professor Ross Scaife, was once playing advocatus diabli as I was describing our protocol for texts. “How many other projects need to adopt this protocol for it to be useful?” My colleague Neel had the answer: “One, ours.”

We have developed C.I.T.E. because we needed something like it in order to do what we want to do with the history of Homeric texts. I am describing it here because it is the foundation for much of the ongoing research of the HMT team, which we will also document here, and it might be of interest to other scholars working on similar projects.

All computer code developed for the HMT is free and open-source; all data published by the project is open-content under a Creative Commons or similar license.

Next… Part 2 - Texts

Ongoing Research, Summer 2010

Christopher Blackwell here:

I have begun a series of blog posts aimed at describing and narrating one corner of the constellation of research that surrounds the Homer Multitext. These posts will appear on my blog: .

They will focus on the work of this summer, 2010, both the projects in Europe, and what my undergraduate collaborators are doing in Greenville, SC.

I am hoping to use these as tools for recruiting good students to study Classics at Furman University, so they wil tend to have a local focus.

However, I also want to give an overarching view of how the Homer Multitext is progressing, what we have done, and what we hope to do in the near term. I will post those pieces here, and link to them from my “Eumaeus, the Noble Swineherd” blog.

- Posted using BlogPress from my iPad

UH High Performance Computing hosts Homer Multitext data

The Homer Multitext is a publication of Harvard's Center for Hellenic Studies. The project has been from the beginning, however, a collaborative one between colleagues with various strengths and abilities and from a variety of different kinds of institutions all over the United States and Europe. My own particular research focus has always been the Homeric epics and the oral tradition in which they were composed, but our team includes computer scientists, conservators, and photographers, philologists, art historians, codicologists, papyrologists, and historians. I would like to record my appreciation here for the constant assistance and support of the University of Houston's Research Computing Center, its director Keith Crabb, and especially staff member Alan Pfeiffer-Traum. All image data for the Homer Multitext Project is also hosted by the UH RCC, and can be found at

Friday, June 4, 2010

The Homer Multitext and undergraduate research

The Homer Multitext is a large, collaborative research project, and will require the contributions of many researchers to achieve its goals. We have therefore developed ways for undergraduate researchers to be involved in producing original research, published and credited as their own but contributing to the larger endeavor. This summer five undergraduates will be contributing to the project. At Furman University, three undergraduates are working on digital diplomatic editions of Homeric papyri, some of our oldest witnesses to the Homeric epics. At the College of the Holy Cross we have two students working with the high-resolution digital photographs of the Venetus A manuscript that we acquired in 2007 (see the images via the Manuscript browser here) to create digital texts of its text of the Iliad, the scholia (marginal commentary) and all other features of each page of the manuscript. The texts will all be linked through structured mark-up to the images themselves. The goal for the summer project is to complete this task for two books of the Iliad. In future posts I will give updates on their progress.

Thursday, May 13, 2010

Homer Multitext to collaborate with E-codices of Switzerland

As part of a Mellon funded project, the Homer Multitext will collaborate with E-codices of Switzerland to publish the thirteenth century manuscript of the Iliad known as the Genavensis. For several years Neel Smith has been working on creating electronic editions of ten sets of scholia in which individual scholarly notes are coordinated so that we can, for the first time, systematically analyze their distribution across the manuscripts where they are attested.  Cluster analysis identifies clearly distinct groups of comments that appear together in particular sets of scholia, and suggests that, far from slavishly reproducing a single archetype, scribes could combine material from multiple, distinct sources. An initial methodologically independent measurement of vocabulary similarity isolates precisely the same clusters of scholia: the obvious historical conclusion is that scribes drew on different sources for their differing scholarly content.[1] This completely overturns traditional attempts to construct a single stemma, or “family tree,” of manuscript scholia, and means that we urgently need to reassess the relation of the important scholia in the  Codex Genavensis 44 to the other scholia we have already studied. 

The goal of the e-codices project is to provide access to all medieval and selected early modern manuscripts held in Switzerland via a virtual library At the moment, the virtual library contains 605 manuscripts from 25 different libraries. We are grateful for the opportunity to collaborate with E-codices to bring this manuscript on-line with high resolution images. The manuscript is being carefully restored in preparation for imaging. We anticipate that the imaging will take place in Spring 2011.

[1] Preliminary results have been presented at the conference "Text Mining Services" (Leipzig, 2009) and are forthcoming  forthcoming in the conference proceedings (Springer Verlag, 2009) as Gabriel Weaver and Neel Smith, "Applying Domain Knowledge from Structured Citation Formats to Text and Data Mining: Examples Using the CITE Architecture."

Monday, April 19, 2010

Rediscovery of Homer thwarted by Iceland's Vesuvias

This week I had planned to travel to Germany to present the Homer Multitext at a conference organized by Teuchos – Zentrum für Handschriften- und Textforschung. Unfortunately, Iceland's volcano has prevented me and no doubt several others from attending. The text of the presentation, entitled "Rediscovering Homer: Manuscript Digitization and the Homer Multitext Project," and a PDF of the accompanying slides (66MB) can be found here and here. Thank you to Daniel Deckers and his colleagues for inviting me to the conference. I am sure it will be very productive and illuminating for those who are able to attend, and I very much regret that I will not be among them.

Friday, January 22, 2010

Homeric Papyri Service Online

Homer and the Papyri, was first created by Professor Dana Sutton of the University of California, Irvine, to be a database of fragments of the Homeric Iliad and Odyssey that survive on papyrus from Graeco-Roman Egypt.

The Center for Hellenic Studies inherited this valuable data, and the project is now under the editorship of Casey Dué and Mary Ebbott, as part of the ongoing Homer Multitext project.

We are pleased to announce the first publication of a new service for scholars and readers interested in Homeric papyri: The Homeric Papyri Canonical Text Service. This is an application hosted on Google AppEngine. While the service is intended primarily to allow other online applications to discover and retrieve texts and passages of text, it does provide a human-readable interface to these papyri.

The texts as delivered by this service include full editorial markup, in TEI-P5-compliant XML. The human-readable form (visible by default) intentionally hides any text that is not physically present on the papyrus. Future versions of the user-interface to this service may give the option to show or hide editorially supplied text at the user’s discretion.

For more information on the Canonical Text Services Protocol (“CTS”), see the project’s Sourceforge site.

— Christopher W. Blackwell

Thursday, January 21, 2010

Multitext related articles from Google Books

Allen, T. W. 1899. “On the Composition of Some Greek Manuscripts: The Venetian Homer.” Journal of Philology 26: 161-181. 

Monro, D. B. 1883. “On the Fragment of Proclus’ Abstract of the Epic Cycle Contained in the Codex Venetus of the Iliad.” Journal of Hellenic Studies 4: 305-34.

I'll post more as I come across them. Dindorf's editions of the scholia are available as well. 

New, forthcoming, and older Multitext related publications

In May of last year, the Center for Hellenic Studies and Harvard University Press published Recapturing a Homeric Legacy: Images and Insights from the Venetus A Manuscript of the Iliad. This book consists of seven essays and a variety of high resolution images of the Venetus A, the oldest complete text of the Iliad in existence, meticulously crafted during the tenth century ce.

In Spring 2010,  the Center for Hellenic Studies and Harvard University Press will publish Iliad 10 and the Poetics of Ambush: A Multitext Edition with Essays and Commentary. In this book Casey Dué and Mary Ebbott use approaches based on oral traditional poetics to illuminate many of the interpretive questions that strictly literary approaches find unsolvable. The introductory essays explain their textual and interpretive approaches and explicate the ambush theme within the whole Greek epic tradition. The critical texts (presented as a sequence of witnesses, including the tenth-century Venetus A manuscript and select papyri) highlight the individual witnesses and the variations they offer. The commentary demonstrates how the unconventional Iliad 10 shares in the oral traditional nature of the whole epic, even though its poetics are specific to its nocturnal ambush plot.

In early 2009 Digital Humanities Quarterly published a special issue in honor of Ross Scaife, founder of the Stoa Consortium.  See Casey Dué and Mary Ebbott, "Digital Criticism: Editorial Standards for the Homer Multitext," Neel Smith, "Citation in Classical Studies," and Christopher W. Blackwell and Thomas R. Martin, "Technology, Collaboration, and Undergraduate Research."

See also Homeric Questions by Gregory Nagy,  Homeric Variations on a Lament by Briseis by Casey Dué, and Imagining Illegitimacy in Classical Greek Literature by Mary Ebbott, together with a variety of other publications on the CHS website. Coming soon to the CHS website and the Milman Parry Collection of Oral Literature: Albert Lord's The Singer of Tales.

Sunday, January 17, 2010

Philology in the Age of Corpus and Computational Linguistics

I'd like to begin this blog by thanking Greg Crane and Anke Lüdeling for the workshop they organized at Tufts this past week ( This workshop brought together classicists, linguists, and computer scientists. It gave us the opportunity to learn from one another and discuss best practices and new possibilities for Digital Humanities projects. We are grateful for the opportunity to present the Homer Multitext to such a distinguished and innovative group of scholars. Stay tuned to this blog for updates about new and forthcoming developments for the HMT, as we acquire more images, add various texts and transcriptions to the site, and develop new tools for interacting with them.