Tuesday 1 October 2013

A Web of Gazetteers

Pelagios is all about creating connections between places and data about them. Since we are now in the process of extending our scope beyond the ancient Greco-Roman world, we have been joined by two new infrastructural partners - PastPlace and the China Historical GIS. They will provide us with records for those places that are beyond the spatial or temporal coverage of our long-term partner in crime, the Pleiades Gazetteer of the Ancient World.

Moving from a single gazetteer to a system of three has significant consequences. Gazetteers vary widely in how they represent places conceptually and syntactically: with different abstractions, relations and hierarchy models; with different approaches to express changes over time, or to record the source or bibliographic references that lead to the inclusion of the place in the gazetteer. In fact, even the definition of what a place is can radically differ from one gazetteer to the next. This is especially true for the specialist gazetteers that we are dealing with in the humanities.

The goal of our first infrastructure work package is to bridge these gaps and create a framework to link up our gazetteers to form a coherent whole. Obviously, we can (and will) never find the one generic datamodel that fits the needs of everyone, and that every gazetteer should adhere to from now on. Apart from practical issues of implementation and migration effort, such a model would inevitably end up being either hugely complex (because it would need to subsume all the complexities and subtleties of each gazetteer known at the time of design); or it would be overly simplistic (because it would force everyone into a rigid, trimmed-down schema, sacrificing the richness and specialization of the original custom models).

Photo by will ockenden CC-BY 2.0

For this reason, we are not aiming to create a common data model in the first place. Instead, we're following our general strategy of "connectivity through common references", which standardizes how to create links between stuff, rather than standardizing how stuff should be represented. That being said: things don't work entirely without any data (or, rather, metadata) specification at all, unfortunately. What we do standardize in our case is a syntax for "descriptive records". Each gazetteer exposes such records about each of its primary entities, and they contain the absolute minimum information we need in order to:

  1. identify and disambiguate places, and
  2. build a searchable index external to the gazetteer, so that we can relate search queries in a third-party application (such as the Pelagios API) to the original entry in the source gazetteer.

The other essential aspect that we need in order to move from a single gazetteer to a system of many is (surprise surprise!) links. Each descriptive record may (and should) include links to entries in other gazetteers in order to indicate similarity. (We are going to use the semantics of skos:closeMatch, which is defined as a relation "[...] used to link two concepts that are sufficiently similar that they can be used interchangeably in some information retrieval applications".) Specifically, we encourage gazetteers to include links to one or more reference gazetteers in their descriptive records - open data gazetteers with global coverage, high community adoption, and a Linked Data representation - such as Wikidata or GeoNames.

And what's the result of this? Answer: a dense network of links that makes our specialist gazetteers globally navigable, as well as re-usable and combinable in other contexts and applications. We are still in the process of polishing the spec for our descriptive records. You can find the current status on the Pelagios Cookbook Wiki. Our partners are about to start working on the implementation; and I'm about to extend our core data handling software libary to support it as well.

Are you working with a gazetteer dataset you want to see linked up with Pelagios? Let us know - we'd be excited to see a global Web of gazetteers grow and flourish!

No comments:

Post a Comment