Showing posts with label APIv2. Show all posts
Showing posts with label APIv2. Show all posts

Tuesday, 3 July 2012

Geographical information retrieval of historical regions

In the last few weeks I have been developing some API and an interface over the PELAGIOS API in order to be able to retrieve historical places, and their relative annotations, by using some geographical context. The issue of superimposing a geographical representation over some data collection is not novel. In modern data collections, for example the data produced by public administrations nowadays, organisations use geographical nomenclatures such as the administrative subdivisions, or the NUTS if the data is statistical observations. 

For historical data collections this is not always feasible, since the administrations of past kingdoms not always provided a sharp definition of their boundaries (sharp meant in a modern sense, with precise coordinates for regions' shapes) nor a deep subdivision which can help our information retrieval task.

Fortunately for PELAGIOS' users, some of the boundaries for the provinces of the past roman empire have been made available as shape files, and this can help us in browsing the wealth of data annotations provided as geolocalized linked data. The shape files in question were digitised from Barrington Atlas rasters (georegistered and supplied by AWMC) by Pedar Foss at Depauw University in 2007 within the context of the MAGIS project, and have been provided by Tom Elliott from the Institute for the Study of the Ancient World, New York University. The regions represented can be seen in the figure below.


Roman provinces up to AD 117 visualised in CartoDB
The present post is not about a single API or a single interface that can be implemented on top of PELAGIOS data and services, but instead it aims to provide some insights on how to implement geographical browsing by using open source tools. 

In order to visualize places and annotations from PELAGIOS API we exploited the geographical search by bounding box. By retrieving places in the PELAGIOS network contained by a bounding box we are half way to filter them via any polygon. In fact, by adopting a GIS we could directly querying data by polygons. Unfortunately that would require to have all the annotation data and the regions' polygons stored in the same database which is against the principle of distributeness of the linked data paradigm and it is not feasible in general scnarios. In fact that solution would require to provide a version of the PELAGIOS data to any interested user that would be forced to install GIS software and host their particular polygons. 

Instead, in here, what we did is to decouple the management of the annotation data with the geographical retrieval features, trying to minimize the amount of software to install and reusing as much as possible the data and services already provided. For this reason we uploaded the polygons we were interested on in a web enabled version of postgis, called CartoDB. CartoDB allows a limited and free use of the web platform, but users can download the open source version and install it on a server if and when needed. CartoDB allows to run SQL queries over HTTP requests that allow developers to integrate the system easily.

As said earlier, once we have the capability to query by bounding box we are half way to being able to query by polygons. In fact, by querying the CartoDB we can retrieve the shape of a region by using its name (e.g. Aquitania in the figures below). If we want to retrieve all the PELAGIOS places contained in the Aquitania region we can query the PELAGIOS API for the places contained in the bounding box of the polygon first, and then filter those places based on the topological containment applied to the retrieved shape. 


Selection of PELAGIOS places by using
region's bounding box
Filtering of those resources by using the
polygon topological containment 






















The activities involved to extract places by using polygons can be represented by the diagram below and involve three actors: the service implemented by the ECS dept. in Southampton (named ECS), the PELAGIOS API, and the CartoDB instance used for this scenario.


Wednesday, 2 May 2012

Fasti Online - Pelagios Compliant and viewable in the New API


We have some good news from Fasti-Online HQ, we are now Pelagios-compliant and are the first dataset (along with the Ure Museum) to be viewable in the new Pelagios API! Very exciting!

Trebula Mutuesca in the new Pelagios API v2 - showing the Fasti sites
The Fasti representations can be seen at:

Fasti Online in Pelagios

Fasti Online Annotations in Pelagios

Please note that this is using the cutting edge code (version 2 of the API) which is currently in a pre-release state and so may be down or a bit buggy - but it's great to be able to give a preview! A blog post will be coming soon introducing the v2 API, so watch this space!

How did we get there?

Firstly, with a lot of help from the Pelagios team! This was our first foray into the world of Linked Data, so we had quite a few queries initially, but once they were ironed out the process of creating the relevant files was relatively painless.

Linking the Pleiades URIs

Fasti is a database of archaeological excavations all of which by their very definition happen at a place. So our first step was to match the place information in Fasti with the placenames in Pleiades, which acts as the glue between the Pelagios partners. I have mentioned the process we undertook to do this in a previous entry. We ended up with 339 Fasti sites that link to a Pleiades URI, we hope to improve on this number in the future.

Creating the RDF representation

We then had to create the RDF representation of our data, which can be seen here. Our RDF is relatively simple, in that we just have the Fasti excavation name, followed by the Pleiades URI and the URI to the excavation record in Fasti. A typical entry looks like this:

<http://www.fastionline.org#set1/annotation1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.openannotation.org/ns/Annotation> .
<http://www.fastionline.org#set1/annotation1> <http://purl.org/dc/terms/title> "Pompeii, house VI 8, 20-21.2" .
<http://www.fastionline.org#set1/annotation1> <http://www.openannotation.org/ns/hasBody> <http://pleiades.stoa.org/places/433032#this> .
<http://www.fastionline.org#set1/annotation1> <http://www.openannotation.org/ns/hasTarget> <http://www.fastionline.org/micro_view.php?item_key=fst_cd&fst_cd=AIAC_1704> .



Of course this file is created programmatically within Fasti's Archaeological Recording Kit (ARK) back-end. ARK is an open-source project, meaning that the codebase can be easily updated and those updates rolled out to all users of the system. This is a plus for us, because it has meant that we only needed to write a few new lines of code into the ARK codebase to output the Fasti dataset in a Pelagios-compliant format. The real beauty is that now the code has been written and committed back to the ARK codebase any other project using ARK (providing they can meaningfully match to Pleiades URIs) can immediately become Pelagios-compliant with the click of a button - very cool, and it bodes well for new partners in the future!

The VoID file

We also had to create a VoID file, essentially a machine-readable description of the RDF dataset. This was a little bit technical to look at, but thanks to a number of templates provided by the team we just replaced the relevant bits with Fasti-specific information and were good to go. Our VoID file can be found here, if you are using a similar set-up to us then it should just be a matter of replacing the Fasti-specific stuff with your own dataset details.

What's next?

Now that we have the code all cleaned, we will automate the creation of the .n3 file (the RDF representation), so that when a new site is added into Fasti it will automatically update the .n3 file - providing near live data to Pelagios partners. We are also looking forward to the other partners getting their data into the new API so we can start playing around with using the linked data on Fasti (and within other ARKs). We are just entering the fun part of a project like this - after all no one likes crunching their data to fit, but now it is in the right format we can really start using it for advancing the research in the field and learning new things about what we are interested in, archaeology, and not the ins and outs of arcane RDF notation!

This has also come at exactly the right time for the Fasti project as we are gearing up to release the new version of the front-end in the next couple of months, and the Pelagios linkages will become a big part of this overhaul. Fasti's future is bright... Fasti's future is Linked!