Thursday 17 October 2013

IWP2: Pelagios and the Beakers of Vicarello

The last few weeks have been a busy time for the Pelagios team. In parallel to kicking off our work on linking gazetteers as part of first Infrastructure Workpackage (IWP 1), we also started to assemble some foundational bits and pieces of our second IWP - which is concerned with building up the data and annotation infrastructure.

Prelude: the Itinerarium Gaditanum

Jump cut to Vicarello, Italy, mid-19th century: excavations at the Aquae Apollinares Baths in 1852 reveal three cylindrical vessels made of silver, with heights varying between 95-153 mm. Excavations in 1863 later reveal a fourth vessel of similar kind. Although differing in the details, on the surface of each vessel is engraved the Itinerarium Gaditanum, the land route between Gades (Cadiz) and Rome, listing between 104 and 110 road stations along the way, and the distances between them in units of Millia Passum (thousand Roman steps or 1481 meters approx).

Photo by Ryan Baumann CC-BY 2.0

The Vicarello Beakers, as they are now frequently referred to, have traditionally been identified as miniature replicas of a milestone probably erected in Gades, perhaps similar in design to the Miliarium Aureum (the Golden Milestone) in Rome. Originally, through the study of the different stations of the route, experts had dated them at different times between the governments of Augustus and Tiberius. But recent palaeographic studies and comparisons with late documents such as the Antonine Itinerary or Burdigalensis Itinerarium, as well as their resemblance to the missorium of Theodosius suggest a dating to the late third or early fourth century AD.

Their handy number of toponyms, as well as the fact that there are images and transcriptions available online already, makes the Vicarello Beakers an excellent test case to teach our data infrastructure a few new tricks. Technical details about the upgrades it's about to receive (complete with RDF samples and pointy brackets) will appear on our Wiki and through our mailing list in due course. But, for the purpose of this blog post, let me just give you a sneak preview of some of the things our upgraded data model can do.

Linked Data, Open Annotation, RDF, What?

You may recall that Pelagios is based on the principles of Linked Open Data, and that we have chosen the Open Annotation Data Model as the conceptual basis for our common vocabulary. These foundations will not change. But with a growing network of partners, more diverse content, and increasing amounts of data, it has become painfully clear that our initial data model from the days of Pelagios 1 and 2 has reached its limits. We have grown to so many partners and content now that data for major places has become practically unmanageable - just try to find something useful in our data about Rome!

Mapped Pelagios annotations for one of the Vicarello Beakers

So what are the things our new data model will improve?

  • First and foremost, our new model allows for richer item metadata. There is now a much cleaner separation between information about the item, and information about the places that relate to it (and how). There is room to encode dates and temporal characterics, categories, authorship, languages used in the source document - ordering dimensions which help us to get more structure into the pile of "anonymous place references" we agglomerated through our first two project phases.
  • In line with a richer metadata model, we have also adopted the FRBR distintion of Work and Expression. In FRBR terminology, the Vicarello Beakers are a Work - "a distinct intellectual or artistic creation". Each of the four beakers is termed an Expression of this Work. This is another straightforward ordering principle, which helps us to get more structure and hierarchy into our data.
  • One of the changes that happend in the transition between the (now deprecated) Open Annotation Collaboration model and the new Open Annotation model is support for multiple "annotation bodies". I'll refer to the OA spec for details. But as far as Pelagios is concerned, this change allows us to represent the different "faces" of a place reference in a source document - logical mappings to (a) gazetteer URI(s), its precise transcription, different images of it, etc. in a much simpler way.
  • Toponyms in a document may follow a certain sequence or layout. The Vicarello Beakers are a prime example of this: laying out their toponyms in a list with four columns, according to the sequence of the places along the route between Gades and Rome. We're experimenting with ways to record the logical ordering of toponyms in a document, and bringing it to use for visualization.

This simple mashup shows the toponyms from the four Vicarello Beakers on a map. There's an information box with the Work metadata at the bottom, and if you look to the top-right, you will find a small layer menu which lets you switch places - and the path indicating the toponym sequence - on and off individually for each beaker. Click on a place, and a popup will show you the transcription from the Beakers, along with the gazetteer reference from Pleiades, which corresponds to the place.

What's noteworthy about this demo, however, is not so much the map itself - but rather that the map is generated completely automatically from a Pelagios RDF file, containing item metadata and OA annotations. (You can grab the RDF source file here.) In essence, these are also our first baby steps towards the Visualization Workbench - which is the objective of our third infrastructure workpackage.

In the meantime, stay tuned for the exciting sequel to "Pelagios and the Beakers of Vicarello" - in which the Pelagios team will tackle their next Early Geospatial Document, and where we will shed some light on the workflow we use to compile our data, and how we transform it to Open Annotations.

Tuesday 8 October 2013

New Researcher Joins the Team

The launch of Pelagios 3 also saw a new researcher join the team. And the identity of the fourth musketeer? Over to you, Pau:

Hi all, I'm Pau de Soto from Barcelona (Catalonia, Spain). I have a PhD (2010) from the Autonomous University of Barcelona (UAB) on the use of GIS and Network Analysis in understanding how the Roman Transportation System works in the Iberian Peninsula. Using this methodology I discovered that it is possible to calculate the costs and the times needed to travel along the Roman networks by sea, river and land, from one point to another or from one point to the entire network.

After completing my PhD, I completed a MsC in Geographic Information Systems (2012), before taking a job at the Archaeological Institute of Merida (Spanish National Research Council-CSIC). At the AIM, I’ve been responsible for conducting geophysical surveys in Spanish archaeological sites.
Ok, that’s enough information about me! What am I doing as part of Pelagios?
As a Postdoctoral fellow I will be responsible for the production of Pelagios annotations, including the survey, collation and documentation of primary and secondary literature. I will also work with Rainer to develop new annotation methods and tools. Of course, in true Pelagios style I will be documenting all activity in order to help disseminate that work and continue to build up the community knowledge.
Last but not least I want to thank my new colleagues for the opportunity of contributing to this exciting project. I can’t wait to get started!

Tuesday 1 October 2013

A Web of Gazetteers

Pelagios is all about creating connections between places and data about them. Since we are now in the process of extending our scope beyond the ancient Greco-Roman world, we have been joined by two new infrastructural partners - PastPlace and the China Historical GIS. They will provide us with records for those places that are beyond the spatial or temporal coverage of our long-term partner in crime, the Pleiades Gazetteer of the Ancient World.

Moving from a single gazetteer to a system of three has significant consequences. Gazetteers vary widely in how they represent places conceptually and syntactically: with different abstractions, relations and hierarchy models; with different approaches to express changes over time, or to record the source or bibliographic references that lead to the inclusion of the place in the gazetteer. In fact, even the definition of what a place is can radically differ from one gazetteer to the next. This is especially true for the specialist gazetteers that we are dealing with in the humanities.

The goal of our first infrastructure work package is to bridge these gaps and create a framework to link up our gazetteers to form a coherent whole. Obviously, we can (and will) never find the one generic datamodel that fits the needs of everyone, and that every gazetteer should adhere to from now on. Apart from practical issues of implementation and migration effort, such a model would inevitably end up being either hugely complex (because it would need to subsume all the complexities and subtleties of each gazetteer known at the time of design); or it would be overly simplistic (because it would force everyone into a rigid, trimmed-down schema, sacrificing the richness and specialization of the original custom models).

Photo by will ockenden CC-BY 2.0

For this reason, we are not aiming to create a common data model in the first place. Instead, we're following our general strategy of "connectivity through common references", which standardizes how to create links between stuff, rather than standardizing how stuff should be represented. That being said: things don't work entirely without any data (or, rather, metadata) specification at all, unfortunately. What we do standardize in our case is a syntax for "descriptive records". Each gazetteer exposes such records about each of its primary entities, and they contain the absolute minimum information we need in order to:

  1. identify and disambiguate places, and
  2. build a searchable index external to the gazetteer, so that we can relate search queries in a third-party application (such as the Pelagios API) to the original entry in the source gazetteer.

The other essential aspect that we need in order to move from a single gazetteer to a system of many is (surprise surprise!) links. Each descriptive record may (and should) include links to entries in other gazetteers in order to indicate similarity. (We are going to use the semantics of skos:closeMatch, which is defined as a relation "[...] used to link two concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications".) Specifically, we encourage gazetteers to include links to one or more reference gazetteers in their descriptive records - open data gazetteers with global coverage, high community adoption, and a Linked Data representation - such as Wikidata or GeoNames.

And what's the result of this? Answer: a dense network of links that makes our specialist gazetteers globally navigable, as well as re-usable and combinable in other contexts and applications. We are still in the process of polishing the spec for our descriptive records. You can find the current status on the Pelagios Cookbook Wiki. Our partners are about to start working on the implementation; and I'm about to extend our core data handling software libary to support it as well.

Are you working with a gazetteer dataset you want to see linked up with Pelagios? Let us know - we'd be excited to see a global Web of gazetteers grow and flourish!