Thursday, 31 May 2012

Pelagios API Documentation

Just a quick announcement to let everyone know that we now (finally) have some API documentation available as part of the ever-growing Pelagios Cookbook. The documentation is aimed at developers who want to build their own applications or mashups using data from the Pelagios network of partners.

You can find it here: Using Pelagios > Using the Pelagios API

Tuesday, 29 May 2012

Perseus Pelagios Compliancy

Perseus is happy to announce that our Pleiades annotations are now fully Pelagios-compliant and are viewable in the new Pelagios API.

The Perseus annotations are browseable at
http://pelagios.dme.ait.ac.at/api/datasets/21e48d8ca46f666467b81a551fbb1cb

Our first step, publishing annotations identifying occurrences of ancient places represented by Pleiades URIs in the Perseus texts, was described in a Pelagios blog post last year.

Subsequently, we have made some refinements to the process which included:

Following the Pelagios standard for referencing Pleiades places using the #this locator on the Pleiades URIs.
Simplifying labeling on the annotations.
As part of a wider effort at Perseus to better support Linked Open Data and CTS standards, referencing the Perseus text annotation targets using stable URIs in the data.perseus.org namespace, leveraging CTS URNs. E.g.:
http://data.perseus.org/citations/urn:cts:greekLang:tlg0016.tlg001.perseus-eng1:1.149
Publishing the annotations themselves at resolvable URIs which more accurately reflect the hierarchy of the occurrences both within the texts, and within the annotation datasets. E.g.:
http://data.perseus.org/annotations/occur/places:pleiades/urn:cts:greekLang:tlg0016.tlg001.perseus-eng1#5
Publishing a VoID file to describe the annotation datasets at http://data.perseus.org/annotations/occur/places:pleiades/void.ttl

All of which contribute to enabling some very cool linking possibilities, as demonstrated by the Pelagios API:

Wednesday, 23 May 2012

Pelagios API Alpha Launch!

Those of you following us closely (especially through our twitter account) will have noticed that we have - almost silently - started to populate our all-new Pelagios API with data from our partners. The API is the centerpiece of our Workpackage 1, and exposes human- and machine-readable views (HTML, JSON and RDF) on the place references we aggregate from our partners.

From a user's point of view, the API will currently tell you things such as:

Which Pelagios partners have data available for a particular place? (For example: Athens, Rome or Pompeii.)
What are the datasets currently listed by Pelagios, and what can you expect to find in there? (For example: Arachne, FASTI Online, nomisma or the Ure Museum.)
What is the data available for a particular place, in a particular dataset? (For example: Open Context Data for Petra, or texts from the Perseus Digital Library referencing Sparta.)

We are still busy testing and completing the current implementation, as well as getting all our partners' data included. Plus, we're adding more advanced features, so watch this space - and be prepared for the occasional downtime as we are deploying updates... But apart from that, you are more than welcome to browse around, try it out, and let us know your feedback!

http://pelagios.dme.ait.ac.at/api

Wednesday, 2 May 2012

Fasti Online - Pelagios Compliant and viewable in the New API

We have some good news from Fasti-Online HQ, we are now Pelagios-compliant and are the first dataset (along with the Ure Museum) to be viewable in the new Pelagios API! Very exciting!

Trebula Mutuesca in the new Pelagios API v2 - showing the Fasti sites

The Fasti representations can be seen at:

Fasti Online in Pelagios

Fasti Online Annotations in Pelagios

Please note that this is using the cutting edge code (version 2 of the API) which is currently in a pre-release state and so may be down or a bit buggy - but it's great to be able to give a preview! A blog post will be coming soon introducing the v2 API, so watch this space!

How did we get there?

Firstly, with a lot of help from the Pelagios team! This was our first foray into the world of Linked Data, so we had quite a few queries initially, but once they were ironed out the process of creating the relevant files was relatively painless.

Linking the Pleiades URIs

Fasti is a database of archaeological excavations all of which by their very definition happen at a place. So our first step was to match the place information in Fasti with the placenames in Pleiades, which acts as the glue between the Pelagios partners. I have mentioned the process we undertook to do this in a previous entry. We ended up with 339 Fasti sites that link to a Pleiades URI, we hope to improve on this number in the future.

Creating the RDF representation

We then had to create the RDF representation of our data, which can be seen here. Our RDF is relatively simple, in that we just have the Fasti excavation name, followed by the Pleiades URI and the URI to the excavation record in Fasti. A typical entry looks like this:

<http://www.fastionline.org#set1/annotation1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.openannotation.org/ns/Annotation> .
<http://www.fastionline.org#set1/annotation1> <http://purl.org/dc/terms/title> "Pompeii, house VI 8, 20-21.2" .
<http://www.fastionline.org#set1/annotation1> <http://www.openannotation.org/ns/hasBody> <http://pleiades.stoa.org/places/433032#this> .
<http://www.fastionline.org#set1/annotation1> <http://www.openannotation.org/ns/hasTarget> <http://www.fastionline.org/micro_view.php?item_key=fst_cd&fst_cd=AIAC_1704> .

Of course this file is created programmatically within Fasti's Archaeological Recording Kit (ARK) back-end. ARK is an open-source project, meaning that the codebase can be easily updated and those updates rolled out to all users of the system. This is a plus for us, because it has meant that we only needed to write a few new lines of code into the ARK codebase to output the Fasti dataset in a Pelagios-compliant format. The real beauty is that now the code has been written and committed back to the ARK codebase any other project using ARK (providing they can meaningfully match to Pleiades URIs) can immediately become Pelagios-compliant with the click of a button - very cool, and it bodes well for new partners in the future!

The VoID file

We also had to create a VoID file, essentially a machine-readable description of the RDF dataset. This was a little bit technical to look at, but thanks to a number of templates provided by the team we just replaced the relevant bits with Fasti-specific information and were good to go. Our VoID file can be found here, if you are using a similar set-up to us then it should just be a matter of replacing the Fasti-specific stuff with your own dataset details.

What's next?

Now that we have the code all cleaned, we will automate the creation of the .n3 file (the RDF representation), so that when a new site is added into Fasti it will automatically update the .n3 file - providing near live data to Pelagios partners. We are also looking forward to the other partners getting their data into the new API so we can start playing around with using the linked data on Fasti (and within other ARKs). We are just entering the fun part of a project like this - after all no one likes crunching their data to fit, but now it is in the right format we can really start using it for advancing the research in the field and learning new things about what we are interested in, archaeology, and not the ins and outs of arcane RDF notation!

This has also come at exactly the right time for the Fasti project as we are gearing up to release the new version of the front-end in the next couple of months, and the Pelagios linkages will become a big part of this overhaul. Fasti's future is bright... Fasti's future is Linked!

Tuesday, 1 May 2012

Ure Museum data available

Back in December, I blogged about converting the Ure Museum data into Pelagios-compliant format. I'm pleased to say that this data is now available here as an N3 file released under a CC-BY licence.Thank you to the curator, Amy Smith for all her help with this!

The conversion process

I thought it might be useful for other people undertaking a conversion of data into a Pelagios-compliant format to outline the rough steps involved.

The overall goal was to find any matches between places mentioned in the museum database and places mentioned in the Pleiades gazetteer and then put these into an RDF file in the format required by Pelagios. That format is not yet documented in a formal way but there are lots of examples and various posts on this blog about the topic.

Here are the steps which I went through:

1) Obtain an XML file of the museum data and identify fields that could contain place data.

This stage was essentially getting a feel for the data available and possible issues. My original blog post discussed these in more detail.

2) Write a script to extract useful information from the XML file, identify candidate places and put the information into a spreadsheet.

I assumed that any capitalised word in either the 'Fabric' or 'Provenance' field could be a place. It is possible that we might have missed a few places that had not been capitalised in the database or that were mentioned in other fields, but based on looking through the data in the previous step, I was fairly confident that we were unlikely to miss more than a few places this way. The role of the spreadsheet was purely to make it easier to check the data manually.
3) Check all the possible places against the Pleiades+ file for matches and add these to the spreadsheet. Create a list of missed possible matches and false matches by manually checking the spreadsheet.

Looking through the results of this matching process, it was clear that there various places mentioned that were not caught and that there were also some incorrect matches.

4) Add in special cases to the script, with expert help.

This is where Amy and I painstakingly went through the list from the previous step to improve the data. These fell into the following main categories

Adjectival references to places e.g. Spartan rather Sparta
Toponyms not in Pleiades+ - many of these were in the Barrington Atlas Notes but not Pleiades+ or were alternative transliterations, but there were also others (e.g. Zoan is a toponym for Tanis).
Places that did not exist at all in Pleiades such as for some neolithic sites - for these we either included just the region for or omitted them completely
There were a few typos - I was still working with the original XML file so even with corrections to the main database, I still needed to deal with these, although I could have corrected the XML file instead.
False matches based on names of museums or temples e.g. we didn't want 'Temple of Artemis Orthia' to match Artemis or 'Acropolis Museum, Athens' to automatically match Athens. I also learned that East Greece is not in fact the east part of Greece!

5) Add in special cases to disambiguate Pleiades places with identical names, with expert help.

As well as missed and false matches, tthere were over 40 different place names with multiple matches i.e.where there was more than one place in Pleiades with the same name. For example, there are a lot of cities called Alexandria besides the famous one. Some of these were easier than others but again, this was another rather painstaking process, involving Amy's help, going through the Pleiades gazetteer, identifying which was the correct place to match with the item in question. I also had to decide for example when a city had the same name as the island where it was located, whether to match the city or the island, and in other cases whether to match a geographical or administrative region.

6) Write the final data as RDF.

This last stage was in fact one of the easiest despite my initial fears about lack of documentation, especially as there are lots of examples of Pelagios-compliant data files now and other project members, especially Rainer Simon, were very helpful here with checking the format of my data. One interesting question was what dcterms:title to give each annotation. Different approaches seem to have been taken by different partners here. I decided to include a title for the object based on the Accession Number, Shape and Period, rather than treating this as a title of the annotation itself, partly because I knew this would feed into the information I was using in the widgets, but it certainly possible to argue that this is technically incorrect.

I also created a VoID file for the data. We discussed splitting the data into various subdatasets but for the time being, since the dataset is relatively small by Pelagios standards (just over 2000 annotations), I kept it as a single dataset for the sake of simplicity.

The future

There is an interesting question of what to do about updates to the collection database. Because of the number of special cases I would be a bit nervous about just running the script again without checking things through too. We may have to play things by ear here!

For me the interesting questions that have come up are whether we can expand Pleiades+ in some useful way based on the types of special cases that we came across and whether there is any way that the process of disambiguating places in Pleiades could have been made simpler. I think these are both tasks that are likely to be encountered by other institutions trying to convert their data into Pelagios-compliant format, especially those with their place data in free text format, so it would be great if we could do something to make these processes easier for them.

Pages