Tuesday, 3 September 2013

How Dickinson College Commentaries linked up with Pelagios

Thanks to the Pelagios Project, Dickinson College Commentaries has recently stepped up into the world of linked geographical data, and I am very grateful to Elton Barker, Rainer Simon, and Leif Isaksen at Pelagios, and to Tom Elliot, Sean Gillies, and Sebastian Heath at Pleiades for making it possible. In this post I want to talk about how Pelagios and Pleiades have helped us and our users, and to say a little bit about the work flow on our end.

screen shot of Dickinson College Commentaries

DCC explores a model of textual commentary that tries to take full advantage of the digital medium, harnessing the best of traditional philological, historical, and archaeological scholarship, and focusing on the user experience in a way to enhance reading, rather than just searching. We’re not really a database, but a reading environment, so we try not to bury the user in information, but to offer scholarly guidance informed by teaching experience. We also have some limitations financially and institutionally. We are lucky to have an endowment at the Department of Classical Studies at Dickinson, on which we can draw to hire undergraduate students. And we have a strong support system in the Academic Technology unit at Dickinson, where Ryan Burke built the structure of our site in Drupal, and helps to maintain and improve it. But we have no graduate students, no dedicated programmers or web developers, and no full time staff. I teach a full load at Dickinson and do this in my spare time, as it were, with help of a number of colleagues at other institutions who are on our editorial board. This is all to say that I have to be careful about not getting in over my head when it comes to site maintenance. I value user functionality and solid content above all, but simplicity runs a very close third.

Pelagios, with its machine linking of places mentioned in our commentaries to the unique place identifiers in Pleiades, delivers simplicity itself. On our end what needed to be done was to create a single file that listed all of our geographical annotations, with their locations (urls). We already had Google Earth maps made in summer 2012 by Dickinson student Merri Wilson, that contained placemarks with all places mentioned in two of the existing commentaries, each placemark annotated with Pleiades URIs (unique identifiers). A third Google Earth map, for Caesar’s Gallic War, did not have the Pleiades URIs, and all the linkages in the other two commentaries (Sulpicius Severus’ Life of St. Martin and Book 1 of Ovid’s Amores) had to be checked for errors. Archaeology and Classics major Dan Plekhov was perfect for this job, which required a good knowledge of ancient geography, Latin, Greek, and solid research skills. He worked in Carlisle for 8 weeks in the summer of 2013, with approximately two weeks devoted to this aspect of the project.

Meanwhile, computer science major Qingyu Wang investigated the .RDF format we were to use for the comprehensive file, and the very specific formatting required by Pelagios. This is not exactly the kind of thing computer science majors do all day, but she taught herself the skills she needed to complete the work, spending about a week on it all told. She was aided by good advice from Sebastian Heath at New York University, and Rainer Simon of Pelagios. We had to invent a human-readable code for our specific type of annotations—so we could keep track of things and every annotation would have a unique designation—then put all that into a format that Pelagios could deal with. My role was deciding on concise but informative conventions that fit our material. Once we figured all that out, Qingyu created the .RDF file that specifies the linkages between a unique ancient place as referred to in Pleiades, with a specific annotation on a page of our site. Now, when you go to that place in Pleiades (Gallia, for instance), under "Related Content from Pelagios" you will see "Pleiades urls Dickinson College Commentaries." So someone exploring Gaul could now go straight to DCC, read Caesar’s account, or watch our little video of the famous opening paragraph of the BG.

Here are some examples of the lists of references we adapted from the Pelagios template. The first is a reference to the Alps in Sulpicius Severus' Life of St. Martin, section 5.

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="sulpicsev-martin-5.4-alpes">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/783"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/sulpicius-severus/section-5"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>"Sulpicius Severus, Life of St. Martin 5.4"</dcterms:title>
</rdf:Description>

The Gallic tribe the Boii in Caesar, Gallic War 1.5:

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="caesar-bg-1.5-boii">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/197173"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/caesar/book-1/chapter-1-5"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>Julius Caesar, Gallic War 1.5</dcterms:title>
</rdf:Description>

Mt. Olympus in Ovid, Amores 1.2.39:

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="ovid-amores-1.2.39-olympusmons">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/491677"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/ovid-amores/amores-1-2"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>"Ovid, Amores 1.2.39"</dcterms:title>
</rdf:Description>

Our full .rdf file is available here.

Another aspect of that process, in a sense the reverse of it, was the automatic channeling of data from Pleiades into DCC, via the addition of thumbnail pop-ups on the names of places mentioned in the notes fields. As of this summer, when you mouse over such a linked place name in DCC, a thumbnail with a small map pops up, with the link to Pleiades.

screen shot of text and notes to Sulpicius Severus with thumbnail popup to Pleiades


The beauty of this is that one does not have to navigate away from the text to get an idea of where roughly the place is; but at the same time, Pleiades is only a click way. Qingyu and Ryan Burke made this happen, using a bit of css code created by Sebastian Heath for use in his ISAW papers. One nagging issue is that when viewed on an iPad, the pop-ups do not go away, and one must reload the page to get rid of them. But I view this is a superb use of the digital medium to enhance the reading experience. Geographical knowledge is delivered on time, as needed, unobtrusively, right there beside the text, in way simply impossible in print. And all that is required, once the css code is in place, is to create the normal html link in the Drupal editor.

I’m here at a liberal arts college doing digital humanities at a fairly small scale, compared to what’s going on at large research universities, or at a well-funded outfit like the Perseus Project. Small size has certain advantages, I suppose, but the biggest danger is probably isolation. On an organizational level I try to avoid that by reaching out to colleagues at other institutions and getting them involved, as the Bryn Mawr Classical Review has done so successfully. But Pelagios offers DCC and projects like it an equally potent way to combat isolation, by allowing our small project to make a contribution to the much larger world of linked geographical data. Maybe someday there will be a similar infrastructure of sharing linked data about ancient persons, texts, and material objects as well, and I’d like to be there adding to it.

Chris Francese (francese@dickinson.edu)

Monday, 1 July 2013

A SQL version of the Pleiades dataset



For those of you who use the Pleiades data in your applications I’ve created a .sql file for their newest data dump which you can find here.  I used the Pleiades data dump from June 27, 2013.  What’s nice about the Pleiades people is that when they say a file is comma separated it really is.  I downloaded this file and extracted it into the promised .csv and then imported it directly into Excel where I formatted it into a set of .sql inserts.  The columns seem to have changed somewhat from the previous versions; there are fewer name columns.  As a result I reformatted my Pleiades data table in SquinchPix’s database so that it now looks like this:

CREATE TABLE `PlacRefer` (
  `bbox` varchar(100) default NULL COMMENT 'bounding box string',
  `description` varchar(512) default NULL COMMENT 'Free-form text description',
  `id` varchar(100) default NULL COMMENT 'Pleiades ID number',
  `max_date` varchar(255) default NULL COMMENT 'last date',
  `min_date` varchar(100) default NULL COMMENT 'earliest date',
  `reportLat` float default NULL COMMENT 'Latitude',
  `reportLon` float default NULL COMMENT 'Longitude',
  `Era` varchar(24) default NULL COMMENT 'Character indicate the era',
  `place_name` varchar(100) default NULL COMMENT 'Name string for the place'
);

I formatted the bounding box as a comma-separated varchar.  I do this because the bounding box requires special treatment; it might be missing altogether or it might be less than four points so, if you’re working with it, just get it into a string and split the string on commas.  Then you’ll have an array of items that you can treat as floats.  I finally got it through my thick skull that the description line can be parsed into keywords so I’ll be using that more in the future.  The ‘id’ field is the regular Pleiades ID.  Is it my imagination or did the Pleiades people suddenly get a large dump of data from the Near East?  The number of items in the file is now 34,000+ and this looks like a big increase.  The max_date and min_date fields give the terminus ante quem and terminus post quem, respectively, for any human settlement of the place in question.  The reportLat and reportLon fields haven’t changed.  The ‘era’ field gives zero or more characters that indicate the period of existence of any site: ‘R’ for ‘Roman’, ‘H’ for ‘Hellenistic’, etc.  I included them because it might be handy for your chronological interpretation.  The ‘place_name’ field is the only name field in the current setup.

If this table layout is satisfactory for you then you can get all the Sql to create and populate the table with all the newest Pleiades data from Google Drive here. Be careful; this new .sql deletes the PlacRefer table first.

I modified my Regnum Francorum Online parser to use this renewed table. The relevant code looks like this:

 $place_no = $l5[0];  // $l5[0] is a fragment of the input record which contains the     // Pleiades ID.
 unset($lat);  // we test for unset later
 unset($lon);

 $querygeo = "select a.reportLat, a.reportLon from PlacRefer a where a.id = $place_no;";
 $resultgeo = mysql_query($querygeo);
 $rowgeo    = mysql_fetch_array($resultgeo);

 $lat  = $rowgeo[0];
 $lon  = $rowgeo[1];

This is how you’ll probably use it most of the time – using the Pleiades ID to retrieve the lat/lon pair.  I was pleasantly surprised at how much the data has improved.   I redid all the Regnum Francorum Online records with the new data and it looks a lot better.  So congratulations to the Pleiades guys!  Although they should double check the exact location of Nördlingen.  Here's how the first 500 Regnum Francorum Online records look on a map.

First 500 Regnum Francorum Online records displayed on SquinchPix using new Pleiades data.
A big improvement over the previous version which you can see here.

If you want to do this yourself from the original Pleiades data dump then be sure to convert (no parentheses in the following sequences) all double quote characters to (&quot;) , left single quote to (&lsquo;) and right single quote to (&rsquo;).  The data has elaborate description fields which have been formatted with lots of portions quoted in various ways by various workers.  Also many place names in the Near East and many French names contain embedded single quotes that must be changed to (&lsquo;) or (&rsquo;) or the equivalent.  If you need a guide go here.

Get this right first because if you’re not absolutely sure that you’ve got all the pesky quotes taken care of then the sql import won’t run.

But you can avoid all that hassle by just downloading my .sql file from Google Drive and importing it to your DB.  Have fun!

Robert Consoli
Cross-posted from Squinches.

Thursday, 27 June 2013

A New Dimension for Pelagios

For a number of reasons this is merely a News Flash, but we are extremely pleased to report that thanks to the generosity of the Andrew W. Mellon Foundation, Pelagios will shortly be exploring some new directions. Needless to say, we will still maintain our role as a connecting medium for Ancient World resources, but we think the coming year will also unlock a wide range of additional opportunities in historical Linked Open Data as well. To find out more, tune in on September 1st. Until then, we'll continue to keep you updated on partner activities, for which you could do no better than to check out the exciting stuff Bob Consoli has been up to over at Squinchpix...