Wednesday, 1 August 2012

Pelagios: Future Directions and Lessons Learned

Having come such a long way in a short time (it's hard to believe that the first phase of Pelagios began only last year), crystal ball-gazing is surprisingly challenging. On top of this, the UK and international funding landscape is rapidly changing, which may affect the kinds of research and development we can do in future. Nevertheless it is possible to identify some likely future directions of travel, as well clarify those services that we expect to sustain.

Sustainability

Pelagios was deliberately developed as a decentralised community of practice to minimise sustainability issues between development cycles. All annotations are hosted by the data partners themselves, so while it is possible for them to disappear individually there is no single point of failure. There is also a natural symmetry to this - the most likely reason for the annotations to disappear is if the resource they annotate goes offline, in which case the annotations would no longer point to anything anyway. The two major pieces of infrastructure we use - Pleiades and Open Annotation - have long term funding, but it is also worth noting that even were these services to disappear there would still be value in Pelagios annotations. They will create a network of connectivity between data partners, even if the the place URIs cannot be directly resolved.

The only components which require direct maintenance from the Pelagios community are the APIs and visualisation interfaces. These are used by some of our partners and so it would be unhelpful for them to be shut down. For that reason we have directed some of our funding towards a year's hosting, with the intention that it will tide us over until the next funding cycle. In case that should fail to materialise not all is lost, however. Our entire API and visualisation codebase is hosted on github and and can be installed anywhere else instead (several project partners have informally offered assistance in such an eventuality). This is entirely in the spirit of Pelagios - it is not our intention that the current API be the central access point, but that anyone should be able to set up APIs harvesting and serving data relevant to the needs of their own community. As the data is hosted independently there is no 'lock-in' or dependency on a single host.

Future Directions

There are many directions in which Pelagios can be taken and we are actively exploring several of them. Two forms of data we would like to include more of are maps and geographic writings. Although extant spatial documents from Antiquity are relatively scarce, they are extremely rich in content (sometimes with thousands of toponyms) and the associations between them are still far from clear. By digitally annotating geographical texts and images such as Ptolemy's Geography, the roman itineraries, the Peutinger Table, Strabo, Pliny, Pomponius Mela, and the Periplus of the Erythraean Sea, we would be able to explore the relationships between them in a far more powerful way. We could see at a glance the levels of coverage, as well as important omissions, or add contextual overlays to the documents themselves.

A second direction is to apply the lessons learned in Pelagios to other regions and periods of history. We are already in discussions about identifying gazetteers for late Antiquity and ancient and medieval China. The power of Pelagios is that it is equally applicable to any tie and place - it only requires that stable URI gazetteer be available. At a yet greater level of abstraction, the Pelagios framework can also be adapted to other conceptual entities, such as people, periods or canonical citations. Matteo Romanello is currently doing some very exciting work in the latter case which we have been following with interest. There is also a long running community discussion about creating a 'temporal' gazetteer' of historical periods, although it'srelationship to both place and individual assertions by scholars makes this a challenging topic.

However the space which seems to offer most promise currently is references to people. URI authority files such as VIAF already list a large number of well-known people from Antiquity. Likewise there are forthcoming digital prosopographies that could potentially offer stable URIs for less renowned citizens of Antiquity. By establishing a common service for discovering these URIs the stage would be set to annotate resources with references to people. This is not merely of interest to those researching ancient social networks. Because life spans are relatively short (historically speaking), references to people (and especially multiple people) are a powerful way of identifying the temporal salience of a resource, in addition to its spatial relevance. That can be extremely helpful when filtering through the thousands of annotations associated with a city like Rome or Athens!

These are just some of the ideas we hope to follow up on imminently or over time. We hope you find them as exciting as we do, and if you have an idea of how Pelagios could help facilitate your own work then do get in touch - we'd be delighted to hear about it.

Lessons Learned


And what have we learned along the way? Three key lessons stand out:

  1. Semantically formalizing references is a quicker win than semantically formalizing relationships. Much 'Semantic Web' research in the past has focussed on property and ontology-driven work that permits complex inferencing but is difficult to scale and has little value if the entities referred to are not already normalized. At this stage in the development of the Linked Data Web it may be best to focus on identifying common concepts (places, people, citations, taxonomies - anything you can 'point to'), which enhances discovery and lays the groundwork for the harder task of deriving and aligning ontologies from legacy data.
  2. The Web is designed to facilitate Openness and Decentralization. It doesn't necessarily follow that one ought to act in the spirit of these principles, but if you don't then you will be going against the grain of the technology. Because they are fundamental to Pelagios's goal (making independent ancient world resources easily and mutually discoverable), Web technologies have served us extremely well with few of the technical headaches that come from trying to keep things locked down or centralize everything in an 'ultimate solution'.
  3. Find your place in the ecosystem. Trying to do everything not only limits your horizons but is antithetical to the infinitely expansive nature of the humanities. Pelagios has proved successful by playing a small and tightly defined role in a community of partners who make equally vital contributions of various natures. This has allowed us to avoid mission creep and benefit from the excellence of our colleagues while giving back something in return. It has also allowed us to fully appreciate just how gracious, vibrant and giving the 'digital ancient world' community currently is. Continuing to foster a similar culture across the digital humanities will be fundamental to its success.
We'll look forward to learning further lessons in later phases of Pelagios, but for now it remains to thank the JISC Discovery Programme, all of our partners, and of course people like you, whose interest and support remains the lifeblood of the project. We'll continue to post updates in the coming months and if you have data you'd like to link to Pelagios then do get in touch!

Pelagios phase 2: the last post - for now

(To get a “live action” summary of what Pelagios is all about, watch the Elton and Leif double-act at the recent Digital Humanities 2012 conference in Hamburg.)

Pelagios phase 1 (Feb – Oct 2011) had established the concept that you can link online stuff about the ancient world by using a lightweight framework, based on the concept of place (a Pleiades URI) and a standard ontology (Open Annotation Collaboration). Its guiding principles have been Openness and Decentralization—we store no data ourselves centrally but rather enable connections between different datasets to be made (based on common references to places). Building on this “bottom-up” infrastructure, Pelagios phase 2 (Nov 2011 – Jul 2012) has produced four outcomes:
1.     an indexing service that allows any ancient world scholar working in the digital medium to make their data discoverable;
2.     an API (an interface allowing computers to communicate with each other) that enables other users and data-providers to discover relevant data and do interesting things with them;
3.     a suite of visualization services including widgets that empowers any interested party to find out more about the ancient world—through literature, archaeological finds, visual imagery, maps, etc.
4.     the Pelagios “cookbook”, into which the community’s wisdom and experience has been poured and distilled.

Successes
The Pelagios API has provided at least three quick wins. It helps provide Context for those hosting data online, by allowing you to obtain links to online material that may be relevant to your own. It facilitates Discovery of your data, so that any web-user can find your resources by following links on other partners’ sites. Finally it allows Reuse by providing machine-readable representations (JSON, RDF) by means of which you can mash-up the data you find in ways you want.
The Pelagios API in action

The suite of visualization tools that we’ve been developing illustrates just some ways the API can be used: so, we have created widgets that can be embedded on partners’ websites that enable place searches, a “heat map” that shows annotations within the Pelagios cloud on a map by virtue of their density, and the Graph Explorer, which allows users to search for connections between places in documents or find out about the documents that reference a particular place. Perhaps even more exciting is to see what partners are making of the API themselves. So, for example, Nick Rabinowitz and Sebastian Heath have developed a JavaScript library for Ancient World Linked Data, “awld.js”, which adds functionality to a website by providing a pop-up preview of Web links to Pelagios references for a place, simply by virtue of you passing your browser over the place-name.

The number of partners has grown appreciably. In addition to the “originals” from phase 1 (Pleiades, Arachne, GAP, nomima.org, Perseus Digital Library, SPQR), Pelagios2 introduced CLAROS, Open Context and Ure Museum at the outset, and have since been joined by the following: the British Museum, Fasti Online, Inscriptions of Israel / Palestine, Meketre, OCRE, ORACC, Papyri.info, Ports Antiques and Regnum Francorum Online. It is exciting to note that some of these new partners, such as ORACC—or, to give it its full title, the Open Richly Annotated Cuneiform Corpus—extend the Pelagios family into new geographical areas (i.e. the Near East and Egpyt). And this is important not only for challenging the still dominant “eurocentric” vision of antiquity but also because it more accurately reflects the interconnected nature of the ancient world. By doing so, it opens up a whole new range of potentially exciting linkages.

Challenges
This wouldn’t be nearly such an exciting, or fun, project if it didn’t throw up the odd occasional difficulty. These have tended to focus on the process of data alignment, which is not surprising since mapping your place references to Pleiades is the hardest part. On the one hand, Aggregating Data is inevitably challenging since no two datasets are the same, and the process has thrown up questions of how to label appropriately (references, data containing the reference), what kind of dataset partitions to have (no subsets vs. multiple levels of hierarchy), and how to keep Pelagios up-to-date of changes you may make to the annotations. On the other hand, we have found that the process of alignment has obliged partners to think about how they are Conceptualizing Data in the first place: i.e. how they are expressing the relation between data and place, such as find-spot vs. origin, uncertain references (probably made in, from the vicinity of), different levels of granularity or specificity (South Italy, Greek Islands, etc.). Because computers are unable to make the “semantic leap”, as humans we have to be a lot clearer about what it is we think we’re doing. To find out about how the partners tackled some of these issues, you can browse through the blog (summarised here in our cookbook) and join the pelagios-group mailing list, where you can also share your experiences.

Pelagios has also been very concerned that all our visible outcomes—the suite of visualizations especially—make sense to everyone. Accordingly, we have been conducting robust and iterative user-testing throughout development, keeping in mind the “Child of 10” standard: for the results of this phase 2 testing, see here and here (and for phase 1, here). But we can still do much better. Part of this perhaps might be better managing the expectations of our home constituency (ancient world scholars), whose excitement at the prospect of being able to gather all different kinds of information about antiquity suggests to them that we’re hosting it—i.e. that we’re a kind of Ancient Wikipedia. Remember: Pelagios is expressly not “one ring to rule them all”, but a means of facilitating connections. Getting out the message that this is in fact a community to which they can also contribute will continue to be central to our mission. Still, this enthusiasm shows that there’s a huge appetite for drawing on, and contributing to, content that is free, open and linkable to across the web.
Pelagios: not “one ring to rule them all”


Futures
Pelagios continues to go from strength to strength. We’re currently in negotiation with another potential partner, which would increase our geographic scope considerably—all the way to ancient China! There has also been discussion about extending the Pelagios “keep it simple”/ “bottom-up” approach to other kinds of common references, such as time periods or people’s names. But to fulfil any of these possibilities will require as much input from our partners and others as Pelagios has been blessed to receive—and we are extremely grateful for everybody’s support!

Leif has much more to say about these aspects in a forthcoming post. Personally speaking, now that we have a working bottom-up infrastructure in place, I would like to see web-users, ordinary non-technical browsers like me, working with the data between which Pelagios enables you to draw connections. For the study of the ancient world—what we in the trade call “Classics” or “Classical Civilization”— is an interdisciplinary subject that encompasses literary texts, material culture, visual artefacts and conceptual ways of thinking. The digital environment affords possibilities for mashing-up and exploring all these different kinds of data in ways that before were simply not imaginable but which are the essence of our subject. With its partners, Pelagios is helping to lay the foundations for the study of the ancient world in the twenty first century.

Pelagios WP2 at Glance: Discovery Services


Whereas WP1 shows how Pelagios RDF annotations can be discovered, aggregated and served via a basic API, WP2 focused on the specifically spatial elements of what we were doing. In particular, our goals were to:
  1. develop services to provide ranked, relevant materials based on input of place URIs and Named Entities or spatial coordinates.
  2. provide super users with specific APIs that permit them to perform federated place- and space-based queries over the resources catalogued by WP1.
  3. enrich results with additional data from sources such as GeoNames, DBpedia and Freebase, returned in a variety of optional Web formats (RDF, JSON, KML, Atom)

In order to achieve these Rainer extended the standard API so that, in addition to returning annotations associated with a single resources, those from multiple places within a co-ordinate bounding box could be returnedThis is very useful for instances in which the relevant coordinates are known, but users are often interested in mereological (part-whole) relationships: returning annotations for all the places in Latium, for example. To accomplish this, Gianluca made use of the online spatial database CartoDB and a shape file of Roman provinces kindly provided the DARMC project. This allows us to create performant spatial queries by first requesting annotations from the Pelagios API filtered by a bounding box, and then filtering it a second time again against a regional polygon.


The principle difficulty encountered with this approach is one of data granularity. We only have approximate boundaries for Roman provinces most of the time, and these are fluid over time. Indeed, in many cases boundaries in Antiquity were only ever approximately defined in the first place. While better polygon datasets will certainly help us with coarse-grained queries, we will need to accept that any such results must be considered provisional at best and should be subjected to further scrutiny. One long term aim may be to create RDF associations between places and their regional affiliations which can remain spatially independent.


We had originally intended to automatically provide additional content associated with GeoNames, Wikipedia and Freebase, but it later occurred to us that this goes against the grain of Pelagios. These are resources just like all our other partners and it makes sense to treat them as such. As a result, we are converting Pleiades+ into an RDF annotation of GeoNames resources, (and where available, wikipedia and Freebase) that can be incorporated directly through the Pelagios API. These will come online in early August.