Showing posts with label Pleiades. Show all posts
Showing posts with label Pleiades. Show all posts

Tuesday, 3 September 2013

How Dickinson College Commentaries linked up with Pelagios

Thanks to the Pelagios Project, Dickinson College Commentaries has recently stepped up into the world of linked geographical data, and I am very grateful to Elton Barker, Rainer Simon, and Leif Isaksen at Pelagios, and to Tom Elliot, Sean Gillies, and Sebastian Heath at Pleiades for making it possible. In this post I want to talk about how Pelagios and Pleiades have helped us and our users, and to say a little bit about the work flow on our end.

screen shot of Dickinson College Commentaries

DCC explores a model of textual commentary that tries to take full advantage of the digital medium, harnessing the best of traditional philological, historical, and archaeological scholarship, and focusing on the user experience in a way to enhance reading, rather than just searching. We’re not really a database, but a reading environment, so we try not to bury the user in information, but to offer scholarly guidance informed by teaching experience. We also have some limitations financially and institutionally. We are lucky to have an endowment at the Department of Classical Studies at Dickinson, on which we can draw to hire undergraduate students. And we have a strong support system in the Academic Technology unit at Dickinson, where Ryan Burke built the structure of our site in Drupal, and helps to maintain and improve it. But we have no graduate students, no dedicated programmers or web developers, and no full time staff. I teach a full load at Dickinson and do this in my spare time, as it were, with help of a number of colleagues at other institutions who are on our editorial board. This is all to say that I have to be careful about not getting in over my head when it comes to site maintenance. I value user functionality and solid content above all, but simplicity runs a very close third.

Pelagios, with its machine linking of places mentioned in our commentaries to the unique place identifiers in Pleiades, delivers simplicity itself. On our end what needed to be done was to create a single file that listed all of our geographical annotations, with their locations (urls). We already had Google Earth maps made in summer 2012 by Dickinson student Merri Wilson, that contained placemarks with all places mentioned in two of the existing commentaries, each placemark annotated with Pleiades URIs (unique identifiers). A third Google Earth map, for Caesar’s Gallic War, did not have the Pleiades URIs, and all the linkages in the other two commentaries (Sulpicius Severus’ Life of St. Martin and Book 1 of Ovid’s Amores) had to be checked for errors. Archaeology and Classics major Dan Plekhov was perfect for this job, which required a good knowledge of ancient geography, Latin, Greek, and solid research skills. He worked in Carlisle for 8 weeks in the summer of 2013, with approximately two weeks devoted to this aspect of the project.

Meanwhile, computer science major Qingyu Wang investigated the .RDF format we were to use for the comprehensive file, and the very specific formatting required by Pelagios. This is not exactly the kind of thing computer science majors do all day, but she taught herself the skills she needed to complete the work, spending about a week on it all told. She was aided by good advice from Sebastian Heath at New York University, and Rainer Simon of Pelagios. We had to invent a human-readable code for our specific type of annotations—so we could keep track of things and every annotation would have a unique designation—then put all that into a format that Pelagios could deal with. My role was deciding on concise but informative conventions that fit our material. Once we figured all that out, Qingyu created the .RDF file that specifies the linkages between a unique ancient place as referred to in Pleiades, with a specific annotation on a page of our site. Now, when you go to that place in Pleiades (Gallia, for instance), under "Related Content from Pelagios" you will see "Pleiades urls Dickinson College Commentaries." So someone exploring Gaul could now go straight to DCC, read Caesar’s account, or watch our little video of the famous opening paragraph of the BG.

Here are some examples of the lists of references we adapted from the Pelagios template. The first is a reference to the Alps in Sulpicius Severus' Life of St. Martin, section 5.

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="sulpicsev-martin-5.4-alpes">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/783"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/sulpicius-severus/section-5"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>"Sulpicius Severus, Life of St. Martin 5.4"</dcterms:title>
</rdf:Description>

The Gallic tribe the Boii in Caesar, Gallic War 1.5:

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="caesar-bg-1.5-boii">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/197173"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/caesar/book-1/chapter-1-5"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>Julius Caesar, Gallic War 1.5</dcterms:title>
</rdf:Description>

Mt. Olympus in Ovid, Amores 1.2.39:

<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:h="http://www.w3.org/1999/xhtml" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:oac="http://www.openannotation.org/ns/" xmlns:dcterms="http://purl.org/dc/terms/" rdf:ID="ovid-amores-1.2.39-olympusmons">
  <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
  <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/491677"/>
  <oac:hasTarget rdf:resource="http://dcc.dickinson.edu/ovid-amores/amores-1-2"/>
  <dcterms:creator rdf:resource="http://dcc.dickinson.edu/"/>
  <dcterms:title>"Ovid, Amores 1.2.39"</dcterms:title>
</rdf:Description>

Our full .rdf file is available here.

Another aspect of that process, in a sense the reverse of it, was the automatic channeling of data from Pleiades into DCC, via the addition of thumbnail pop-ups on the names of places mentioned in the notes fields. As of this summer, when you mouse over such a linked place name in DCC, a thumbnail with a small map pops up, with the link to Pleiades.

screen shot of text and notes to Sulpicius Severus with thumbnail popup to Pleiades


The beauty of this is that one does not have to navigate away from the text to get an idea of where roughly the place is; but at the same time, Pleiades is only a click way. Qingyu and Ryan Burke made this happen, using a bit of css code created by Sebastian Heath for use in his ISAW papers. One nagging issue is that when viewed on an iPad, the pop-ups do not go away, and one must reload the page to get rid of them. But I view this is a superb use of the digital medium to enhance the reading experience. Geographical knowledge is delivered on time, as needed, unobtrusively, right there beside the text, in way simply impossible in print. And all that is required, once the css code is in place, is to create the normal html link in the Drupal editor.

I’m here at a liberal arts college doing digital humanities at a fairly small scale, compared to what’s going on at large research universities, or at a well-funded outfit like the Perseus Project. Small size has certain advantages, I suppose, but the biggest danger is probably isolation. On an organizational level I try to avoid that by reaching out to colleagues at other institutions and getting them involved, as the Bryn Mawr Classical Review has done so successfully. But Pelagios offers DCC and projects like it an equally potent way to combat isolation, by allowing our small project to make a contribution to the much larger world of linked geographical data. Maybe someday there will be a similar infrastructure of sharing linked data about ancient persons, texts, and material objects as well, and I’d like to be there adding to it.

Chris Francese (francese@dickinson.edu)

Monday, 1 July 2013

A SQL version of the Pleiades dataset



For those of you who use the Pleiades data in your applications I’ve created a .sql file for their newest data dump which you can find here.  I used the Pleiades data dump from June 27, 2013.  What’s nice about the Pleiades people is that when they say a file is comma separated it really is.  I downloaded this file and extracted it into the promised .csv and then imported it directly into Excel where I formatted it into a set of .sql inserts.  The columns seem to have changed somewhat from the previous versions; there are fewer name columns.  As a result I reformatted my Pleiades data table in SquinchPix’s database so that it now looks like this:

CREATE TABLE `PlacRefer` (
  `bbox` varchar(100) default NULL COMMENT 'bounding box string',
  `description` varchar(512) default NULL COMMENT 'Free-form text description',
  `id` varchar(100) default NULL COMMENT 'Pleiades ID number',
  `max_date` varchar(255) default NULL COMMENT 'last date',
  `min_date` varchar(100) default NULL COMMENT 'earliest date',
  `reportLat` float default NULL COMMENT 'Latitude',
  `reportLon` float default NULL COMMENT 'Longitude',
  `Era` varchar(24) default NULL COMMENT 'Character indicate the era',
  `place_name` varchar(100) default NULL COMMENT 'Name string for the place'
);

I formatted the bounding box as a comma-separated varchar.  I do this because the bounding box requires special treatment; it might be missing altogether or it might be less than four points so, if you’re working with it, just get it into a string and split the string on commas.  Then you’ll have an array of items that you can treat as floats.  I finally got it through my thick skull that the description line can be parsed into keywords so I’ll be using that more in the future.  The ‘id’ field is the regular Pleiades ID.  Is it my imagination or did the Pleiades people suddenly get a large dump of data from the Near East?  The number of items in the file is now 34,000+ and this looks like a big increase.  The max_date and min_date fields give the terminus ante quem and terminus post quem, respectively, for any human settlement of the place in question.  The reportLat and reportLon fields haven’t changed.  The ‘era’ field gives zero or more characters that indicate the period of existence of any site: ‘R’ for ‘Roman’, ‘H’ for ‘Hellenistic’, etc.  I included them because it might be handy for your chronological interpretation.  The ‘place_name’ field is the only name field in the current setup.

If this table layout is satisfactory for you then you can get all the Sql to create and populate the table with all the newest Pleiades data from Google Drive here. Be careful; this new .sql deletes the PlacRefer table first.

I modified my Regnum Francorum Online parser to use this renewed table. The relevant code looks like this:

 $place_no = $l5[0];  // $l5[0] is a fragment of the input record which contains the     // Pleiades ID.
 unset($lat);  // we test for unset later
 unset($lon);

 $querygeo = "select a.reportLat, a.reportLon from PlacRefer a where a.id = $place_no;";
 $resultgeo = mysql_query($querygeo);
 $rowgeo    = mysql_fetch_array($resultgeo);

 $lat  = $rowgeo[0];
 $lon  = $rowgeo[1];

This is how you’ll probably use it most of the time – using the Pleiades ID to retrieve the lat/lon pair.  I was pleasantly surprised at how much the data has improved.   I redid all the Regnum Francorum Online records with the new data and it looks a lot better.  So congratulations to the Pleiades guys!  Although they should double check the exact location of Nördlingen.  Here's how the first 500 Regnum Francorum Online records look on a map.

First 500 Regnum Francorum Online records displayed on SquinchPix using new Pleiades data.
A big improvement over the previous version which you can see here.

If you want to do this yourself from the original Pleiades data dump then be sure to convert (no parentheses in the following sequences) all double quote characters to (&quot;) , left single quote to (&lsquo;) and right single quote to (&rsquo;).  The data has elaborate description fields which have been formatted with lots of portions quoted in various ways by various workers.  Also many place names in the Near East and many French names contain embedded single quotes that must be changed to (&lsquo;) or (&rsquo;) or the equivalent.  If you need a guide go here.

Get this right first because if you’re not absolutely sure that you’ve got all the pesky quotes taken care of then the sql import won’t run.

But you can avoid all that hassle by just downloading my .sql file from Google Drive and importing it to your DB.  Have fun!

Robert Consoli
Cross-posted from Squinches.

Thursday, 25 April 2013

How Ancient History Encyclopedia linked up with Pelagios

We're back for some information on how we linked Ancient History Encyclopedia to Pelagios. I hope that this can be of help for future websites that join this excellent project.

First of all, we need to explain how AHE works. The website is entirely based on tags / keywords. Each tag has one (and only one) definition associated to it, and many possible articles, illustrations, or timeline events. It is possible --and indeed necessary for the website to work properly-- that articles, illustrations, and timeline events are linked to many tags. An article on "Trade in Ancient Greece" would be tagged with "Greece", "Economy", "Trade", "Colonization", and it would subsequently be listed under all those tags' pages.

Now the initial idea was easy: Let's link up every geographical tag of ours (cities, countries, regions) to its equivalent location in Pleiades. We've got 2,400 tags, and we expect to have many more in the future, so we didn't want to do this all by hand. Instead, we wanted something future-proof, that would notify us automatically of possible matches between tags and Pleiades locations.

Every day, we automatically import the Pleiades database of names, their respective location IDs and their locations and mirror it in our database using a cron job. We wrote a nifty little PHP function that converts the Pleiades data to a PHP array -- feel free to use it.

In our editorial team's interface we have a page that automatically tries to find possible matches between Pleiades place names and tags on AHE. For links, we only look at those tags which have a definition -- after all we only want to link up content that is of use to potential readers, not empty tags. Editors can then review the link suggestions and either approve or reject them. That way, we already found most of the links between our datasets.

Suggestion from the automatic linking script

Then there is the problem of links that aren't found by our automatic matching script. For example, on AHE the tag is called "Greece" whereas on Pleiades it's known as "Hellas". Another example would be "Mediterranean" on AHE is known as "Internum Mare" at Pleiades. No script can figure that out!

For those cases, we added another functionality to our tag editor form: Our editorial team can simply search the Pleiades DB mirrored on our server for links, for each tag. An editor could for example see the tag "Greece", notice that it's not linked to Pleiades, open the linking form for the tag Greece and manually search for "Hellas".
Tag listing for editors (2nd last column is the Pleiades link)
The search will give exactly the same type of results as the automatic linking does above, with a map to help the decision-making.

When a tag is linked, we write the Pleiades ID into a newly-created field in that tag's entry in our database (hoping that Pleiades will never change tag IDs).

Now it's time to deliver all this data in a format that Pelagios can understand. We have another script that goes through all the linked tags and fetches their respective definitions, as well as all articles and illustration that are linked to them in our database. Then we output each tag definition as Turtle/RDF in the Pelagios format, linked to a specific Pleiades ID. All articles and images associated with that tag are also output for that Pleiades ID. The final result looks like this. Notice that while each definition only occurs once (one definition per tag), articles and images can appear multiple times, linked to multiple tags (as one article or image is linked to many tags).

Personally, I find that Turtle/RDF is somewhat mindboggling and not exactly easy to understand (I'm not a professional programmer), but with the excellent help of Simon Rainer, Elton Barker, and Leif Isaksen we managed to make it work and validate. Thanks a lot guys... we couldn't have been able to do it without you!

We then submit the generated file to Pelagios (in the next version of Pelagios it'll be imported automatically on a regular basis).

I hope that this was helpful or at the very least interesting to anyone who is looking to link up with Pelagios. If your site is similar to ours, do feel free to drop us a line on {editor AT ancient.eu.com}! We're always happy to help!

Tuesday, 2 October 2012

The Portable Antiquities Scheme joins Pelagios

Hacking Pelagios rdf in the ISAW library, June 2012
Earlier in 2012, the excellent Linked Ancient World Data Institute was held in New York at the Institute for the Study of the Ancient World (ISAW). During this symposium, Leif and Elton convinced many participants that they should contribute their data to the Pelagios project, and I was one of them.

I work for a project based at the British Museum called the Portable Antiquities Scheme which encourages members of the public within England and Wales to voluntarily record objects that they discover whilst pursuing their hobbies (such as metal-detecting or gardening). The centrepiece of this projects is a publicly accessible database which has been on-line in various guises for over 13 years and the latest version is now in the position to produce interoperable data much more easily than previously.

Image of the finds.org.uk database
The Portable Antiquities Scheme database

Within the database that I have designed and built (using Zend Framework, jQuery, Solr and Twitter Bootstrap), we now hold records for over 812,000 objects, with a high proportion of these being Roman coin records (175,000+ at the time of writing, some with more than 1 coin per record). Many of these coins have mints attached (over 51,000 are available to all access levels on our database, with a further 30,000 or so held back due to our workflow model.) To align these mints with a Pleiades place identifier was straightforward due to the limited number of places that are involved, with the simple addition of columns to our database. Where possible, these mints have also been assigned identifiers from Nomisma, Geonames and Yahoo!'s WOEID system (although that might be on the way out with the recent BOSS news), however some mints I haven't been able to assign - for instance 'mint moving with Republican issuer' or 'C' mint which has an unknown location.

Once these identifiers were assigned to the database, it allowed easy creation of  RDF for use by the Pelagios project and it also facilitated use of their widgets to enhance our site further. To create the RDF for ingestion by Pelagios, our solr search index dumps XML via a cron job cUrl request, which is transformed by XSLT every Sunday night to our server and uses s3sync to send the dump to Amazon S3 (where we have incremental snapshots). These data grow at the rate of around 100 - 200 coins a week, depending on staff time, knowledge and whether the state of the coin allows one to attribute a mint (around 45% of the time.) The PAS database also has the facility for error reporting and commenting on records, so if you use the attributions provided through Pelagios and find a mistake, do tell us!

At some point in the future, I plan to try and match data extracted from natural language processing (using Yahoo geo tools and OpenCalais) against Pleiades identifiers and attempt to make more annotations available to researchers and Pelagios.

For example, this object WMID-3FE965, the Staffordshire Moorlands patera or trulla (shown below):

Has the following inscription with place names:

This is a list of four forts located at the western end of Hadrian's Wall; Bowness (MAIS), Drumburgh (COGGABATA), Stanwix (UXELODUNUM) and Castlesteads (CAMMOGLANNA). it incorporates the name of an individual, AELIUS DRACO and a further place-name, RIGOREVALI. Which can further be given Pleiades identifiers as such:
  1. Bowness: 89239
  2. Drumburgh: 89151
  3. Stanwix: 967060430
  4. Castlesteads: 89133

Integrating the Pelagios widget and awld.js

Using Pleiades and Nomisma identifers allows the PAS database to enrich records further via the use of rdfa in view scripts and by the incorporation of the Pelagios widget and the ISAW javascript library on a variety of pages. For example, the screenshot below gives a view of a gold aureus of Nero recorded in the North East of England with the Pelagios widget activated:
The pelagios widget embedded on a coin record:  DUR-B4E094 
The javascript library by Nick Rabinowitz and Sebastian Heath also allows for enriched web pages, this page for Nero shows the libary in action:

These emperor pages also pull in various resources from third party websites (such as Adrian Murdoch's excellent talking head video biographies of Roman emperors), data from dbpedia, nomisma, viaf and the site's internal search engine. The same approach is also used, but in a more pared down way for all other issuer periods on our website, for example: Cnut the Great.


Integrating Johan's map tiles

Following on from Johan's posting on the magnificent set of map tiles that he's produced for the Pelagios project (and as seen in use over at the Pleiades site and OCRE), I've now integrated these into our mapping system. I've done it slightly differently to the examples that Johan gave; due to the volume of traffic that we serve up, it wasn't fair to saddle the Pelagios team with extra bandwidth. Therefore, Johan provided zipped downloads of the map tiles and I store these on our server (if you're a low traffic site, feel free to use our tile store):
Imperium map layer, with parish boundary. Zoom level 10.
The map zoom has been set to the level (10 for Great Britain) at which we decided site security was ensured for the discovery points (although Johan has made tiles available to level 11). This complements the other layers we use:

  • Open Street Map
  • terrain 
  • satellite
  • soil map
  • Stamen map watercolor
  • Stamen map toner 
  • NLS historic OS maps
Each find spot is also reverse geocoded for a WOEID and Geonames identifier to be produced, elevation to obtained and subsequently we link to Aaron Straup Cope's excellent woedb for further enhancement of place data.  We also serve up boundaries derived from the Ordnance Survey Opendata BoundaryLine dataset, split from shapefiles and converted to KML by ogr2ogr scripts. The incorporation of this layer allows researchers (over 300 projects currently use our data) to interpret the results that they get from searches on our database against the road network and settlement data much more easily and has already gathered many positive comments from our staff and research colleagues.

By contributing to the Pelagios project, we hope that people will find our resources more easily and that we in turn can promote the efforts of all the fantastic projects that have been involved in this programme. What we've managed to implement from joining the Pelagios project already outweighs the time spent coding the changes to our system. If you run a database or website with ancient world references, you should join too!


Wednesday, 2 May 2012

Fasti Online - Pelagios Compliant and viewable in the New API


We have some good news from Fasti-Online HQ, we are now Pelagios-compliant and are the first dataset (along with the Ure Museum) to be viewable in the new Pelagios API! Very exciting!

Trebula Mutuesca in the new Pelagios API v2 - showing the Fasti sites
The Fasti representations can be seen at:

Fasti Online in Pelagios

Fasti Online Annotations in Pelagios

Please note that this is using the cutting edge code (version 2 of the API) which is currently in a pre-release state and so may be down or a bit buggy - but it's great to be able to give a preview! A blog post will be coming soon introducing the v2 API, so watch this space!

How did we get there?

Firstly, with a lot of help from the Pelagios team! This was our first foray into the world of Linked Data, so we had quite a few queries initially, but once they were ironed out the process of creating the relevant files was relatively painless.

Linking the Pleiades URIs

Fasti is a database of archaeological excavations all of which by their very definition happen at a place. So our first step was to match the place information in Fasti with the placenames in Pleiades, which acts as the glue between the Pelagios partners. I have mentioned the process we undertook to do this in a previous entry. We ended up with 339 Fasti sites that link to a Pleiades URI, we hope to improve on this number in the future.

Creating the RDF representation

We then had to create the RDF representation of our data, which can be seen here. Our RDF is relatively simple, in that we just have the Fasti excavation name, followed by the Pleiades URI and the URI to the excavation record in Fasti. A typical entry looks like this:

<http://www.fastionline.org#set1/annotation1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.openannotation.org/ns/Annotation> .
<http://www.fastionline.org#set1/annotation1> <http://purl.org/dc/terms/title> "Pompeii, house VI 8, 20-21.2" .
<http://www.fastionline.org#set1/annotation1> <http://www.openannotation.org/ns/hasBody> <http://pleiades.stoa.org/places/433032#this> .
<http://www.fastionline.org#set1/annotation1> <http://www.openannotation.org/ns/hasTarget> <http://www.fastionline.org/micro_view.php?item_key=fst_cd&fst_cd=AIAC_1704> .



Of course this file is created programmatically within Fasti's Archaeological Recording Kit (ARK) back-end. ARK is an open-source project, meaning that the codebase can be easily updated and those updates rolled out to all users of the system. This is a plus for us, because it has meant that we only needed to write a few new lines of code into the ARK codebase to output the Fasti dataset in a Pelagios-compliant format. The real beauty is that now the code has been written and committed back to the ARK codebase any other project using ARK (providing they can meaningfully match to Pleiades URIs) can immediately become Pelagios-compliant with the click of a button - very cool, and it bodes well for new partners in the future!

The VoID file

We also had to create a VoID file, essentially a machine-readable description of the RDF dataset. This was a little bit technical to look at, but thanks to a number of templates provided by the team we just replaced the relevant bits with Fasti-specific information and were good to go. Our VoID file can be found here, if you are using a similar set-up to us then it should just be a matter of replacing the Fasti-specific stuff with your own dataset details.

What's next?

Now that we have the code all cleaned, we will automate the creation of the .n3 file (the RDF representation), so that when a new site is added into Fasti it will automatically update the .n3 file - providing near live data to Pelagios partners. We are also looking forward to the other partners getting their data into the new API so we can start playing around with using the linked data on Fasti (and within other ARKs). We are just entering the fun part of a project like this - after all no one likes crunching their data to fit, but now it is in the right format we can really start using it for advancing the research in the field and learning new things about what we are interested in, archaeology, and not the ins and outs of arcane RDF notation!

This has also come at exactly the right time for the Fasti project as we are gearing up to release the new version of the front-end in the next couple of months, and the Pelagios linkages will become a big part of this overhaul. Fasti's future is bright... Fasti's future is Linked!

Tuesday, 13 December 2011

Converting the Ure Museum data

The Ure Museum of Classical Archaeology in Reading, is one of the most recent Pelagios partners. I have just started work on converting the collection data into a Pelagios-compliant format with the help of the curator, Amy Smith.

The main task involved in this is finding a way to figure out for each item in the collection whether there are any places in Pleiades associated with the item. Once we have done this, it should hopefully be straightforward to turn this data into OAC annotations for Pelagios.

You can browse the Ure Museum database online here. There are about 3000 objects in the collection. Any information about places associated with an object is generally under either the 'Fabric' or 'Provenance' listed for the object. The fabric is usually an adjective describing where the item was thought to have been made e.g. Boeotian, Etruscan, Daunian. The provenance is generally less structured. Here are some examples of the contents of this field for a selection of different objects:
  • Probably made in Cyprus (Stubbings)
  • Found on Mount Helicon with an arrowhead, 26.7.13
  • Northern Boeotia (?), provenience unknown
  • From a burial somewhere in the Argolid.
  • Thought to be from Cyprus: T.146.II. From Poli? Cf. JHS 1890.
  • Unknown, similar to Larnaca, Kamelarga finds
  • From Carthage (or other North African site)
  • Central Italian, possibly from the vicinity of Rome.
  • Cast from an original in the Acropolis Museum, Athens
  • Said by vendor to have come from between Thebes and Chalcis
I have been given all the data as an XML dump and want to write a script to match any places in Pleiades with this information from the 'Fabric' and 'Provenance' fields in the data. I also have a copy of Pleiades+ which provides toponyms from GeoNames for the places in Pleiades. You can read more about Pleiades+ here and here.

The rough approach I have taken is to go through each item in the collection and then for each item, go through all the places in Pleiades+ and see if any match with anything in the Fabric or Provenance field.
I am hopefully not far off getting all the special cases sorted out and should have this completed in the early new year.

Here are a few of the challenges and issues that I have encountered so far.

1) Uncertainty in the data

This was one of the first things that concerned me. As you can see from the examples above there is often a large degree of uncertainty about where items are from. In addition, items may not have been found in some spot that even has a name and may have more than one place associated with them if they have moved locations. However, as I was reminded by Leif, these are annotations we are providing. So you can provide multiple annotations for an object, it's perfectly fine to annotate an object with any location that it is remotely associated with and an annotation does not indicate an object's definite origin.

2) Location-based adjectives

Most of the fabric information is give as adjectives rather than as the place name e.g. as Corinthian rather than Corinth. Even in the Provenance data there are still lots of adjectives. Adjectives associated with a place are outside the scope of Pleiades+, so I have compiled a list of how adjectives map to places with the help of Amy. It is relatively limited because it's restricted to adjectives used in the Ure Museum database as it stands, but would it be useful for us to share this list and allow other people to add to it in some way?

I should point out that there are still some question marks even with this approach e.g. would you want a reference to Roman Britain to map to Rome? However, I suspect the number of controversial mappings is going to be small.

3) Disambiguating places

Sometimes there are multiple places with the same name. For example, there is more than one place called Salamis. How do we make sure that if we know we want the Salamis in Cyprus then it matches to http://pleiades.stoa.org/places/707617/ rather than say this Salamis? Again this is where the 'connections with' information in Pleiades would help in theory. However in practice, it looks likely that we are going to have to deal with these ambiguities as special cases in the script.

4) Granularity of annotations

If we have an object from Salamis in Cyprus, do we annotate it with both Salamis and with Cyprus or just with the more precise location, Salamis? You wouldn't necessarily expect every item from Rome to also be annotated with Italy so using the more precise location feels sensible. On the other hand it may not do any harm to annotate with both and if we do have two places associated with an object, how do we tell that one is contained within another? Pleiades has information about which places 'connect with' other places and according to Sean Gillies of Pleiades, 'you'd almost never go wrong in Pleiades by inferring containment between a precisely located place of small extent and a much more extensive place' if you used this data. However, there is a great deal of connection information missing from Pleiades, so in practice this approach is unlikely to work well.

5) Pleiades locations enclosed in text not related to the location

If you just go through the Pleiades+ data and search for each place in term in the text associated with the object, you get lots of false hits, partly because there is some slightly odd data in Pleiades such as http://pleiades.stoa.org/places/324652/. Most of these you can rule out by assuming that place names will be capitalised in the Ure Museum data and by insisting on whole word matches. However there are still occasional problems. For example the Pleiades place Artemis matches 'Sanctuary of Artemis Orthia, Sparta' and you may also want to rule out locations of museums mentioned. I have been writing special cases in my script for these. I can do this because the collection isn't too large, but I can see that with a larger collection I can see that you could easily miss some instances like these. I have wondered if the GeoParser used for GAP might help with dealing with this type of unstructured data.

6) Alternative toponyms not in Pleiades+

Pleiades+ doesn't claim to be comprehensive and I have come across a fair number of alternative toponyms, again with Amy's help, not in Pleiades+, also writing these into my script as special cases. Some of these are from the Barrington Atlas Notes in Pleiades but there are others as well. As with the adjectives, I'm wondering if there is some way of sensibly sharing alternative toponyms that we have found so as to prevent other people having to duplicate our work.

7) Vague geographical data

There are quite a few provenance entries which include locations like 'South Italy' or 'Greek Islands'. There is no way of specifying these that I have found in terms of Pleiades locations, so I have had to resort to annotating them just with 'Italy' or 'Greece', losing some of the information. Objects are also often described as being found in modern countries or places that don't always have a clear equivalent in Pleiades.

8) The historical scope of Pelagios

The Ure Museum contains objects from a wide range of periods. Pleiades focuses on the Greek and Roman world and Pleiades in a sense defines the scope of Pelagios. However, should I still annotate a Neolithic object for example with the larger region from which it comes even if the precise location is not in Pleiades?

9) Spelling mistakes in the data

There aren't too many of these, but I have also had to include some special cases for spelling mistakes (as well as for alternate transliterations of place names). Obviously the ideal solution is to get the spelling mistakes fixed in the database itself and then get a new download of the data, but I thought I should highlight this as a potential issue. If the data has only been read by humans previously who unlike a computer can easily understand what is intended, it is easy for these typos to slip through.

10) Dealing with updates to the data

It is obviously likely that more data is going to be added to the Ure Museum database as time goes on. It would obviously be possible to rerun my script but there are enough special cases that it would hard to guarantee that any new results would be comprehensive and accurate.

Next stages

Overall this is proving a really interesting exercise and good introduction to the world of Pelagios.
Once I have finished on the special cases, the next stage will then be to turn the data into OAC annotations and arranging where the data is going to be hosted. In the meantime, I'm off for the next few weeks seeing what my one-year-old makes of Christmas!



Friday, 9 December 2011

Pelagios Phase 2: Project Plan

Phase two of Pelagios looks to build on our lightweight framework, based on the concept of place (a Pleiades URI) and a standard ontology (Open Annotation Collaboration), by publishing the Pelagios Toolkit-a set of services and documentation that will assist people in annotating, discovering and vizualizing references to places in open online ancient world resources.

In all, there are four Work-Packages:
§ WP1 casts the net beyond the existing partners in order to allow anyone to publish their data in a way that maximizes its discoverability. This webcrawling and indexing service will find material and - based on the Pelagios framwork and semantic sitemaps - aggregate place metadata in order to create value for the holders of that data.
§ WP2 aims to explore further ways of exploiting the concept of place. The place/space-based APIs and contextualisation service will help other users and data-providers discover relevant data and do interesting things with them.
§ WP3 tackles end-user engagement: i.e. subject specialists who lack the technical coding expertise to use the data underlying what it seen on the screen. The visualization service will explore ways of allowing these users to get to grips with the data both in a single Pelagios interface but also as embedded widgets hosted on each partner’s site.
§ WP4 distils the guidelines into a cookbook providing explicit recipes for producing, finding and making use of geoannotations for the community as a whole. In short, you won’t need to be a Pelagios partner to be able to join-in in making your data discoverable and usable.

The evolving nature of the Pelagios collective reflects the shift towards community engagement. While partners from the original Pelagios proof-of-concept project will continue to be involved, the main work for phase two of Pelagios will be carried out by: Arachne, CLAROS, DME, Fasti-online, GAP, IET (the Open University), Nomisma, Southampton, SPQR, the Ure Museum.

Deliverables
The outcomes, in more detail, are as follows:

D 1.1: Web Crawling and Indexing Prototype. This infrastructure component traverses resource sets on the Web (registered manually or discovered using semantic search engines like Sindice) and catalogues their place metadata. Place metadata encompasses geographical coordinates as well as Pleiades and Geonames URIs.
D 1.2: Pelagios 2 Graph API. This deliverable is an HTTP API that allows querying of the aggregate data graph generated by the Indexing Prototype. The API will provide responses in JSON and RDF format; and possibly in additional formats (e.g. KML or GeoRSS) if the need is identified in WP3. The initial range of possible queries is based on the outcome of the Pelagios project. The exact scope and structure of the final API will be driven by the requirements identified in WP3.
D 1.3: API Statistics and Reporting Interface. This deliverable will extend the Pelagios 2 Indexing Prototype with means to extract statistics and reports on the use of the API. Data partners can use this interface to gain insight into how their data is being discovered, queried and re-used within the larger online community.

D 2.1: Place-based API. This deliverable will extend the Pelagios 2 API with queries that return resources relevant to specific places or those with mereological (part-whole) relationships.
D 2.2: Space-based API. This deliverable will extend the Pelagios 2 API with queries that permit searches based on geographic scope, e.g. within a certain geographic buffer around a given location set.
D 2.3: Contextualisation Prototype. This deliverable is a service that provides ranked, relevant materials for a certain place or particular Named Entities. Results will be enriched with additional data from sources such as GeoNames, DBpedia and Freebase.

D 3.1: Evaluation of User Needs. This deliverable will report on the results of a formal evaluation of user needs regarding data visualization. The evaluation will be conducted in conjunction with project partners, and will inform the design of a set of online visualization widgets. This deliverable will have the form of a series of blog posts.
D 3.2: Widget Suite, Alpha version. This deliverable encompasses the first (alpha) version of the visualization widgets.
D 3.3: Evaluation of Widget Design. This deliverable will report on the results of observational and participatory design studies. The studies will be conducted on the Widgets as they are continuously and iteratively being developed from alpha state to final (beta) prototype. This deliverable will have the form of a series of blog posts.
D 3.4: Widget Suite, Beta version. This deliverable encompasses the final (beta) version of the visualization widgets.

D 4: Pelagios 2 Cookbook. Content Partners will produce regular documentation on data preparation, practices, tool use, etc. in the form of blog posts. The PI, assisted by the Co-Is will distil this information into a “cookbook” which will make it easier for anyone with Ancient World content to publish their data online in conformance with the Pelagios 2 common open standards.

Saturday, 3 December 2011

Welcome to Pelagios - Phase 2

Pelagios is a growing collective of ancient world projects who are linking together their data so that scholars and members of the public are able to discover all different kinds of stuff about ancient places.

Phase 1 has been the proof of concept. In this stage we have linked some core ancient world projects to each other through the concept of place (a Pleiades URI) and a baseline ontology (Open Annotation). The value of those linkages is demonstrated in the Pelagios Explorer, which allows users to discover and investigate the data from those different projects in a handy search interface.

The second phase of Pelagios is to formalize that process by which anyone can join or enjoy the fruits of the Pelagios superhighway. We will provide a ‘digital toolkit’ for anyone producing material about the ancient world—not just universities but also museums, libraries, etc­—, so that their data will be more discoverable and usable. We will also be experimenting further with methods of visualizing that data so that subject specialist users and the general public can discover information about places that interest them, without having the technical expertise to do the digging themselves.

The Pelagios kick-off meeting in Greenwich: (back row) Andy Meadows (Nomisma), Sebastian Rahtz (CLAROS), Liz Fitzgerald (IET), Amy Smith (Ure Museum), Elton Barker (OU), Rainer Simon (DME), Alex Dutton (CLAROS); (front row) Leif Isaksen (Southampton), Simon Hohl & Rasmus Krempel (Arachne), Juliette Culver (IET)

It was taken by a plaque reading "Greenwich: still the centre of space and time"

Friday, 16 September 2011

The *Child of 10* standard

While Pelagios has been largely about building an alliance of leading ancient world research groups with the aim of linking their data in an open and transparent way, the 'front end' of our product has never been far from our minds. After all, many of the partners are also users of the data that they gather, or, if not the actual users, they have their own user groups to think about and appeal to. As a classicist myself - that is, as someone who spends most of the time reading and analysing ancient Greek texts - I want to be able to access sources easily and trust the data that I get: in other words, I want to be able to turn on the tap and find that the water runs (either hot or cold, depending on what I'm doing); I'm not interested in the plumbing that brings the water to me.

So it is with timely fashion that JISC brought to our attention a fellow jiscGEO project, called G3. In an earlier post, they had talked about a useful benchmark in user interface design being the Child of 10 standard, meaning that a child of 10 should be able to learn to do something useful with the system within 10 minutes. This indicates whether a system is “easy to use” or not.

Will our tool, the Pelagios Graph Explorer, fit the bill, I wonder? While our natural target audience are university researchers (lecturers and undergrads), given the seemingly never-ending appeal of Classics in popular culture, we would be mad not to take seriously the point that a 10 year old should be able to use our tool to find out interesting stuff about the ancient world. Indeed, the technical skills of the average Classicist researcher - not least this one - makes it imperative that we address this question. At the time of writing, then, we are currently engaging in user testing of the Graph Explorer with a sample representative audience, the results from which we will help inform our delivery of the product at the end of October (though it's already clear that this will be a work-in-progress...). All next week Mia Ridge, who has been conducting the user testing, will blog about it, setting out the methods (why we chose them, what prep is done), what actually happens in a session, and then some initial results.

But I can give a sneak preview here of the answer to that question, does the
Pelagios Graph Explorer pass the *child of 10* test. On current performance, that would be a 'no'. Which is not to say that things haven't gone well! On the contrary, the very fact that issues are being raised with what you can do now that stuff is linked shows how successful we've been in linking our data: when we started out, it simply wasn't possible to imagine an ancient world of linked data, let alone think seriously about traversing it. But now that we have linked stuff together, the bar has been raised and people - rightly - want to do more with it. This presents a challenge to all the Pelagios partners to provide as much detail as possible in their metadata, in order to allow the kind of free play that a 10 year old - or a classicist - might want.

Perhaps we could start with the name: the Pelagios Graph Explorer isn't very sexy. Suggestions on the back of a postcard, or, ideally, on this blog, welcome.

Wednesday, 13 April 2011

To ASCII or not to ASCII

So, you've got your locations all beautifully geoparsed, but how is anyone going to search for Liévin, let alone Łódź or Uherské Hradiště?

In the old days of the web, this wasn't a problem. They couldn't. At least, not without a huge amount of learning and expense on everyone's part. In the English-speaking world, text on the web meant ASCII. This restricted you, basically, to the characters on a standard US keyboard: 128 characters, a third of which were things like 'tab' and 'line feed'. If you wanted your readers to read 'é', you sent them a something called an entity (&eacute; in this case). If you wanted your readers to read 'Ł', you might even have to send them a little image of the letter.

This was stupid, obviously. And, thank god, we now have browsers, databases, and programming languages that speak a different text 'language': UTF-8 (Unicode). This gives us millions and millions of characters, and means that if you want people to read 'š', or even 'אַ', you can send them plain text and not have to worry too much about operating systems, browser support, or reproducing 2000 .gif files whenever you change your site design (been there, done that, please don't judge if you haven't supported Internet Explorer 4).

But this only solves half of the problem: reading, but not writing. If you have to type 'Ł', and you're using a standard US/UK keyboard, we haven't come very far. The existing methods are either clumsy, slow, or require a lot of memorisation (or all three). It's all very well expecting admins to take a bit of extra time to enter the correct name - although a little bit of AJAX and GeoNames means that they don't have to. But, as this rather forthright blog points out, is it realistic to expect this from our users?

To give you an example of how this applies to the Pleiades data, take the case of Mérida, in Spain. Mérida (Roman Emerita Augusta, Pleiades 256155, GeoNames 2513917) is quite an important location for the APGRD (Archive of Performances of Greek and Roman Drama), as its wonderfully preserved Roman theatre is the venue for many new versions of classical plays - last year's Lysistrata, for example (warning, very slightly NSFW!).

Because the APGRD has no time frame - we're interested in every single performance, from antiquity onwards - many of our users won't know that Mérida used to be Emerita Augusta, so I can't rely on them searching for that. However, a search on the Pleiades dataset for Merida (no accent) returns no results because for the computer, 'Merida' and 'Mérida' are two different values.

I imagine those of a non-English-speaking background will be tempted to say "so what, we've been typing non-ASCII characters all our lives - you do know our keyboards look different, don't you?". Likewise, those who've spent the effort learning the betacode or combining unicode commands to get the best out of TLG and Perseus may be less forgiving.

However, I must admit that my own sympathies lean somewhat toward the author of that post. My own experience, and what I've learned from geoparsing our database's locations (entered over 10 years by a variety of people) is that:
  1. people don't necessarily know that a location has non-ASCII characters, or what the right ones are (pop quiz, no googling: how do you (properly) spell Liege?). They'll get no results and not realise that they've entered the wrong search string.
  2. a substantial proportion of people, even academics, have no idea how to enter non-ASCII characters - and even those that can are generally only good at doing it in languages/ character sets with which they are familiar.
  3. of these, how many are going to cut-and-paste text from elsewhere (which does not always give you the characters you want)?
So, I'm sure of the problem, but not necessarily of the solution. It seems to me that the best way to do this might be to convert all search strings to ASCII before matching them against ASCII-ised locations (at least, for user search). But does this lead to a horrible loss of precision? Are there better ways?

And I have it easy - all my locations are at least in the Latin alphabet!

Sunday, 6 February 2011

Welcome to PELAGIOS

PELAGIOS stands for 'Pelagios: Enable Linked Ancient Geodata In Open Systems'. The idea behind the project is simple, even if the actions to fulfill it - and the acronym - are not. On-line resources that reference ancient places are multiplying rapidly, bringing huge potential for the researcher provided that they can be found; but, even then, the user currently has no way of bringing the data together. PELAGIOS has teamed up with an international consortium of leading research groups to trial a method of linking open data (LOD) that will enable scholars and enthusiast alike to discover all kinds of stuff related to ancient places and then to visualize it in accessible and meaningful ways.

The consortium of projects and research groups that make up PELAGIOS are as follows:

We'll be aiming to post the project's many turns pretty regularly (say, every other week or so), since we believe that the process as much as any outcome itself may be of interest to the community. Above all, since a project of this size and nature will only succeed if it has the input from those who are going to use it as a resource (i.e. you guys), we welcome your feedback. So join us in going places, ancient style.

Pelagios is funded by JISC as part of their #jiscGEO programme.