Pelagios

Thursday, 14 June 2012

Arachne Void descriptions

In this blogpost we describe how the VoID RDF description of the Arachne Pleiades linkage works. As a result of the Pelagios compliancy work, we are introducing some mechanisms to the datastructure of Arachne itself that will mean changes in future iterations. Thus we have chosen the VOID description of our Pleiades linkage to reflect that.

The VoID descriptions

The void dataset describes the data that have been matched to Pleiades. We have chosen the VoID:linkset for the general definition of the interlinkage set between Arachne and Pleiades.

The general interlinkage set (ArachnePleiadesLinkage) divides into two groups. A place matching (Arachne2Pleiades_Places) and an object matching (Arachne2Pleiades). The matching is split for two reasons. Sometime in the near future, Arachne will start using the DAI-gazetteer where place information will be shared among the different web-resources of the DAI - a Gazetteer, a Web-GIS, Arachne, Zenon pp. At that point the place component will be "outsourced" from Arachne. The other data set contains all objects that are “inferred” from the place matchings. So it uses the internal linkage between Places and Arachne objects, etc.

These two sets have subsets that combine the results of a matching process at a specified time. We have tried to include this information in the first matching, but without the void data set description this has been a time consuming task, since every previous annotation had a creation time. This problem has been solved by attaching the information about the creation time to the data set. The time related information relating to the creation of a matching is now also reflected in the set hierarchy.

The split between places and other entites in Arachne has been a more complicated task because they were held in one triple space. We have tried to overcome this issue by putting the entities into different .n3 files in the downloadable zip Archive. This can now be archived by using the VoID descriptions.

In short, our approach tries to address four problems:

The data will grow. Neither Arachne nor Pleiades are yet complete at the time of the matching process. Any data that is put into Arachne or Pleiades after the matching process will not show up in it. So, from a future perspective, the matching is going to be incomplete very soon and will have to be undertaken again.
The data themselves will change. For example: if a place gets a more precise coordinate, the matching results will also differ in some way. Here, a versioning of datasets represented in the URI on both sides would be a solution for an “everlasting” matching.
The matching process will be enhanced, so, for example, the results can be more accurate.
Keeping old stuff available will be important. If you are using data for your project that is not up-to-date, you can still reference the information by a unique data set and a unique URI of a match. This is essential because places can match one time and will fail to match the other time (depending on Problem 1 or 2).

Prof Dr. Reinhard Förtsch and Rasmus Krempel, Arachne Database, CoDArchLab

Sunday, 10 June 2012

New Partners

We're happy to announce that in the past week Pelagios has gained five new partners :-) As always, there will be future blog posts describing what they do in their own words, as well as how greater connectivity with other ancient world resources benefits those activities. For now however, we're glad to welcome the following institutions and initiatives into the Pelagios Community:

The British Museum
Inscriptions of Israel/Palestine (Brown University)
Oracc: The Open Richly Annotated Cuneiform Corpus (University of Pennsylvania)
Ports Antiques
Papyri.info (ISAW/NYU)

The Graph of Ancient World Data has been greatly enriched by them!

Friday, 8 June 2012

Pelagios at the Linked Ancient World Data Institute (#lawdi)

Last week (31st May - 2 June) the Institute for the Study of the Ancient World, NYU, hosted a workshop on linked data in the ancient world. Pelagios were well represented, as Leif kicked off the invited presentations with a discussion of the difference between the semantic web and linked data, while I brought up the rear with a personal reflection on what the evolving digital world might mean for a Classical Studies researcher or student. (All presentations can be found here.)

Here I’d like to present 5 take-home points:

1. Of the different approaches to tying together resources on the web, linked open data seems the best bet, and not just because that's what Pelagios is doing! Linked open data uses a decentralised model in which participants agree on certain stable identifiers for things (such as places or names) and a way of mapping their data to them. So, for example, Pelagios uses Pleiades identifiers for ancient places and something known as RDF triples for expressing the relationship. We find that, by doing this, authority is diffused through the Pelagios ecosystem, meaning that there is no single point of failure (unless Pleiades fails, and, if that happens, we're all screwed anyway!) and that the extent to which projects annotate their data depends on the extent to which they want to hook into the network. Above all, as Sean Gillies, Pleiades’ head developer, has already emphasised in a previous post, it means doing what works.

2. Ok, but what difference does it make if your data is linked? Well, one great example provided at LAWDI by Andrew Meadows of Nomisma concerned coins. Within the world of linked data, it's now possible to discover, map and analyse not only find-spots (where the coins are found), but also where the same coins were minted and even the mines from which their metals derive. These data provide hitherto unparalleled access into the political and cultural deep structure that underpinned all kinds of interactions in the ancient world.

3. At one level, this kind of work represents a paradigm shift of sorts. The lone humanities scholar could hardly be expected to provide and analyse all these data by him or herself; linked data presupposes cooperation. But there is also a bigger point. If I think about my own experiences in Hestia, GAP and now Pelagios, it’s not only the case that each project has led to further, and more involved, collaboration; at each point new skills or tools have been needed, we have found the person to carry out that work and brought them in on the team. Linking data means, when all is said and done, linking with people. Which is fun!

4. While formal collaborations are not the usual humanistic way of doing things, linking data is what scholars have been doing all the time, as evident in footnoting. But scholarship is not only about referring to some other data of some kind; the best scholars chase up the connections. So, for example, the late, great, Oxford don, Don Fowler, writes (in the chapter “On the Shoulders of Giants” of his book Roman Constructions, Oxford, 2000, p.116):

“Classicists have always been concerned with ‘parallels’ – with what goes after the magic word ‘cf.’... What has not been clear with the traditional citation of parallel passages is what the point of the activity is, how the parallels affect the interpretation of the text.”

With this abbreviation “cf.”, which derives from the Latin conferre, Fowler plays upon its meaning to compare or “to bring together”. Imagine reading a footnote and being able to check the ancient source or modern scholar cited, or find out what other materials (images, documents) relate to the place or person under investigation, simply by clicking on a link. This might be blue- or pie-in-the-sky thinking for present publications, but it will be soon possible in ISAW papers, where individual contributions will be identified down to the paragraph level, meaning that any paragraph can be cited, or tweeted, at will. Reading is going get a lot more interactive.

5. Finally, this idea of linked open data is a powerful metaphor not only for thinking about our own world (and especially the internet) but also for approaching the ancient world. At the beginning of his enquiry (‘historia’) into why the Greeks and Persians came into conflict, Herodotus describes how he ‘came upon towns of men both small and great alike, for of the places that were once great, most have now become small, while those that were great in my time were small before’ (1.5.3). Like an Odysseus wandering the seas and coming to know the minds of many men (Homer, Odyssey 1.3), Herodotus writes about a world in which a people forcibly relocated to Persia (claim to) ultimately derive from refugees from Troy (the Paeonians, 5.13), where places as far flung as Marseilles and Cyprus are brought together for comparing the meaning of a word (5.9), and where the river Ister (Danube) and Nile frame the Histories’ geography (2.26, 33-34; 4.50, 53). In a world that is linked together in a myriad of different ways, investigations require making myriad uses of connections. Herodotus would have approved.

Pages