Wednesday 23 March 2011

Pelagios Workshop SPARQL Demo & RDFa

Note: The following post is heavily based on notes by Mathieu D’Aquin and Sebastian Heath.

Prior to the workshop we have made a test SPARQL endpoint available for anyone who wants to test our approach and see what might be possible. The Demonstrator was built by Mathieu D’Aquin based on our weekly Skype meetings and discussion list (which you can follow at our Google Group). The endpoint is available at:
using the OWLIM triple store with a Sesame interface.

Test with DME data
We have loaded the store with test data from the OAC descriptions available at:

For example, is described with the following triples (see

<>     rdf:type       dfs:Resource           
<>     rdf:type       oac:Annotation           
<>     dc:title       "Algier"           
<>     j.1:created    "2011-03-17 15:14:17.899"
<>     j.1:modified   "2011-03-17 15:14:17.899"
<>     dc:creator     "guest"           
<>     oac:hasBody            <>
<>     oac:hasTarget  _:node341

_:node341            rdf:type            rdfs:Resource           
_:node341            rdf:type            oac:ConstrainedTarget
_:node341            oac:constrainedBy   _:node342           
_:node341            oac:constrains      ""           

<>   rdf:type   rdfs:Resource           
<>   rdf:type   oac:Body
<>   rdfs:label "Control Point for 'Algier' (36.752887, 3.042048)"

Example queries:
select distinct ?x where {
     ?x a oac:Annotation

gives the list of available annotations. (see{%3Fx%20a%20oac:Annotation}&limit=0&infer=true)

select distinct ?lab where {
    ?x a oac:Annotation.
    ?x oac:hasBody ?b.
    ?b rdfs:label ?lab.
    ?x oac:hasTarget ?t.
    ?t oac:constrains ""

gives the list of texts associated to the document

Added Pleiades links:
A simple association based on similarity was applied to detect when the annotation related to a place known to Pleiades. For each of such annotations (e.g., about “Corsica”, which is, triples similar to the following were added:

oac:hasBody <>
rdf:type wgs84_pos:SpatialThing

Small extension of the Ontology:
A new class was created called GeoAnnotation. Intuitively, this class represent the oac:Annnotation(s) that point to (oac:hasBody) geographical objects (wgs84_pos:SpatialThing).
It is therefore defined in abstract OWL syntax as 

class (GeoAnnotation partial oac:Annotation)
class (GeoAnnotation complete restriction(oac:hasBody someValuesFrom(wgs84_pos:SpatialThing))

In other words, GeoAnnotation is the class of annotations that have a body which is a SpatialThing. In triple form:

  rdf:type        owl:Class
<>  rdfs:subClassOf oac:Annotation
<>  owl:equivalentClass _:node417
_:node417            rdf:type                 owl:Restriction
_:node417            owl:onProperty           oac:hasBody
_:node417            owl:someValuesFrom       wgs84_pos:SpatialThing           

Based on this definition, the system is able to infer that the annotations that have been connected to Pleiades objects are GeoAnnotation. The query:

select distinct ?x where {
  ?x a <>
gives the corresponding results (see{%3Fx%20a%20%3C}&limit=100&infer=true )

The query:
select distinct ?d ?b ?l where {
  ?x a <>.
  ?x oac:hasBody ?b.
  ?x oac:hasTarget ?t.
  ?t oac:constrains ?d.
  ?b a wgs84_pos:SpatialThing.         
  ?b rdfs:label ?l
gives the list of relationships between documents and Pleiades places (see{%3Fx%20a%20%3C}%0A&limit=100&infer=true )

Test with Arachne Data

Using the following query:*:*&fq=kategorie:topographie&start=0&rows=10000&fl=+Pfad,+id,+kurzbeschreibung,+antikeRoemProvinzTopographie,+ort,+Genauigkeit+Ort_antik,+antikeGriechLandschaftTopographie,+Geonamesid&qt=standard&wt=json&explainOther=&hl.fl=

we can obtain a list of potential annotations from Arachne, with information about images relating to places, that can have modern and ancient names.

There are (apparently) a bit more than 5000 items in this list, running the same similarity-based process as above, we could relate a bit more than 2000 of them to Pleiades URIs.

They all appear, with the previous queries, as Annotations and GeoAnnotation. With this dataset, there are now many documents refering to the same place. We can for example obtain the list of documents referring to “” with the query:

select distinct ?x where {
     ?y oac:hasBody <> . 
     ?y oac:hasTarget ?x

The list of all the (geo)annotations relating to it can be retrieved through the query:

PREFIX oac:<>
select ?x where {
  ?x a <>.
  ?x oac:hasBody <>

Other possible extensions
There are different kinds of GeoAnnotation (and different types of documents considered). For example, some data would mention both documents about the location, and documents that cite the location, but are about something else. Maps of particular locations might have a specific status, as well. We could add sub-properties of oac:hasTarget for example or subclasses of GeoAnnotation.

Using html link element to indicate presence of Pelagios ingestible RDF

In addition Mathieu’s work, Sebastian Heath has been considering implmentations in RDFa. It is common to use the html element ‘link’ within an html ‘head’ to indicate the location of alternate versions of web resource. The most common application of this convention is to indicate the presence of Atom or RSS feeds.

E.g. <link rel="alternate" type="application/atom+xml" title="Atom feed" href="<URI Here>" />

Resource authors/publishers can use the following convention to indicate the location of RDF-serialized Pelagios-compatible oac:Annotations.

 <link rel=”x-pelagios-oac-serialization” title=”Pelagios compatible version” type=”<mime-type>” href=”<URI>”/>

  • The @rel MUST be equal to "x-pelagios-oac-serialization".
  • The @type SHOULD match the mime type of the resource pointed to by @href.
  • The @type SHOULD be one of:
  • The value of the title attribute is not significant. Authors MUST NOT use it to communicate any information to the Pelagios crawler.

Monday 14 March 2011

Pelagios Project Plan Part 7: Budget

Budget forecast*

  • Total Staffing costs 59%
  • Partner collaboration activities (incl. the workshop) 16%
  • Overheads (Estates and Indirects) 27%

*This is the total budget, including contributions In Kind from partners (see below)

Budget Management

The Arts Faculty Research Grants Manager is Suzanne Duncanson-Hunter. She is working closely with both the PI and CoI to make sure that the project keeps to budget. The PI and CoI are also in close correspondence with the JISC manager over how best to manage costs and utilise resources. In addition, the Open University, through Suzanne's efforts, have already drawn up an external consultancy contract with Rainer Simon of DME for work to be undertaken on Pelagios in WP3.

Budget Justification

The ontology work undertaken by LUCERO in WP1 complements the funding that they already have from JISC. Furthermore, all work on ontology specification, mapping and alignment done by the data and documentation partners (GAP, Perseus, DME, SPQR, Arachne) and Pleiades is payment In Kind. Because all of our partners are already committed to linked open data research and have secured sustainable and significant funding for themselves, Pelagios is able to make substantial research and infrastructural advances on a relatively modest budget, thus greatly maximizing Return On Investment for both JISC and project partners.

This is encapsulated by our 'start-up' event, the one-day workshop at KCL, which presents a unique opportunity for an intensive exchange of knowledge and experience on Linked Open GeoData from the community at large, as well as from all of our partners. For this reason we are intending to record the proceedings to help document the discussion of issues and methods relating to linking open data ontologies, so that it may be of use for other groups working in this area in the future. Our budget ensures that all invited external expert speakers will have UK costs borne, while also paying for the time and expenses of all the Pelagios partners, for both their participation at the workshop and the project meeting on the following day.

Saturday 12 March 2011

Choosing an ontology: OAC

Part of Pelagios' first Workpackage is to decide on a Common Ontology for Place References (COPR). In doing so we are not attempting to reinvent the wheel - far from it. Good Linked Open Data practice is to reduce, reuse and recycle. To that end we have been investigating a variety of options and are now basing our approach on the Open Annotation Collaboration ontology.

The OAC is also a work in progress but, as luck would have it, they are holding their workshop in Chicago on exactly the same dates as ours (we now have a great line-up, btw, so register soon as places are filling up). Their fundamental principles seem to be exactly what we are looking for though: the ability to connect a target web document with some information about it (in our case, an ancient place).

The basics should look something like this:

ex:ann1 oac:Annotation
       oac:hasBody ""
       oac:hasTarget <some resource identifying the text + fragment>

But a number of interesting issues remain - should we use Blank Nodes for the annotations themselves (especially in RDFa)? If not, where do we store them? Should we subclass the OAC ontology to specify that it is a geographic annotation? If so should it be the oac:Annotation or oac:hasBody that we subclass, and where should that ontology be hosted? We are fortunate to have the assistance of Bernhard Haslhofer and Robert Sanderson in this discussion, who are both involved with OAC and look forward to them reporting back from OAC's workshop. In the meantime Mathieu D'Aquin is putting together our own SPARQL demonstrator to see how useful this approach may be in practice.

If you have any thoughts on this do let us know and you can follow the discussion itself over on our Google Group.

Thursday 10 March 2011

Pelagios Project Plan Part 6: Projected Timeline, Workplan & Overall Project Methodology

Project plan

Pelagios is divided into three work packages (WP), centred on the three stated outputs of the project: core ontology development; application of that ontology to the project partners’ sample datasets, and documentation of the process; and development of web resources to trial and show the value of the ontology for users.

WP1 Ontology Specification - LUCERO, Southampton ACRG, Pleiades

WP1 has three key elements:

1. First, to develop a core ontology for (ancient) place references (COPR) with help from the partners. This involves:

· Gathering from the project partners sample dataset ontologies that describe references to places in their documents, identifying common elements, and evaluating their robustness

· Constructing a core ontology that can be applicable for all the different datasets represented by the partners (text, database, map), and then testing its implementation with a SPARQL demonstrator

2. Second, to hold a two-day workshop on developing a core ontology for linking open geodata. This involves:

· Disseminating around the project partners and workshop invitees the proposed ontology as well as abstracts from all the speakers

· Hosting a one-day workshop that is open to all members of the community* with three sessions dedicated to: 1) Issues of referencing ancient and contemporary places online; 2) Lightweight ontology approaches; 3) Methods for generating, publishing and consuming compliant data. Each session will consist of several short (15 min) papers followed by half an hour of open discussion.

· Hosting a project meeting on the day following the workshop, which will be devoted to incorporating feedback from the community and then deciding upon the core pelagios ontology that will be used by each of the project partners

3. Lastly, to publish the COPR in RDF (hosted by Pleiades)

*For more information about the community workshop, including registration details, please visit:

WP2 Documentation of Uniform Resource Identifiers (URI) Mapping and RDF publication - GAP, Perseus, SPQR, Arachne, DME

WP2 is divided into 5 parallel streams (A-E) according to the different document types hosted by each of the partners. Each stream will:

i) Detail the process by which local place references are aligned with Pleiades URIs and COPR-compliant RDF is generated and hosted

ii) Make recommendations for document-type extensions to COPR

· Stream A: GAP will document processes related to narrative free texts

· Stream B: Perseus will document processes related to XML-encoded freetext (In Kind)

· Stream C: SPQR will document processes related to fragmentary free texts

· Stream D: Arachne will document processes related to database records

· Stream E: DME will document processes related to rasterized maps

WP3 Development of Web services and tools to facilitate consumption - DME

This last WP trials and develops the various web-applications to which the ontology may be put. This includes:

· Developing a Representational State Transfer (REST) webservice to output COPR-compliant RDF in alternative output formats, e.g. Keyhole Markup Language (KML), GeoJSON

· Developing a suite of three web visualization tools: a map view, a table view, and an ‘ordered’ view (for chronological or narrative timelines)

· Incorporating further feedback from each partners’ user groups in order to permit agile development of the ontology webresources

The Pelagios GANTT chart (01 February 2011 – 31 October 2011)























Project Management

Most project meetings take place via skype:

· A monthly meeting comprising of the PI, Co-I and one representative from each of the partners is established for checking progress, identifying and resolving common issues, planning for the forthcoming month’s activities, ensuring aims are met, and disseminating the project outcomes.

· In the run-up to the workshop, a smaller group comprising of the PI, Co-I one representative from the development partners meet to discuss the ontology development.

· The PI and Co-I meet once a month with their JISC PM, David Flanders, to ensure that the project is on target.

The project uses a google group email for more regular, bi-weekly communication:

· This email group keeps partners informed of deadlines and to-dos.

· It also hosts discussion of work flows, issues and methodologies. All communication is archived for the benefit of the broader community.

In addition, the PI and Co-I will make one visit to each partner during the ontology application process. Rapid-iteration Agile methods will be used for software development.

Wednesday 9 March 2011

Pelagios workshop on Linking Open (geo)Data

The Pelagios project is hosting a workshop on Linking Open GeoData in the Humanities on Thursday 24 March at KCL. The event is free of charge, but, if you would like to attend, please sign up at

Further details:

The Pelagios workshop is an open forum for discussing the issues associated with and the infrastructure required for developing methods of linking open data (LOD), specifically geodata. There will be a specific emphasis on places in the ancient world, but the practices discussed should be equally applicable to contemporary named locations. The Pelagios project will also make available a proposal for a lightweight methodology prior to the event in order to focus discussion and elicit critique.

The one-day event will have 3 sessions dedicated to:

1) Issues of referencing ancient and contemporary places online

2) Lightweight ontology approaches

3) Methods for generating, publishing and consuming compliant data

Each session will consist of several short (15 min) papers followed by half an hour of open discussion. The event is FREE to all but places are LIMITED so participants are advised to register early. This is likely to be of interest to anyone working with digital humanities resources with a geospatial component.

Preliminary Timetable

10:30-1:00 Session 1: Issues

2:00-3:30 Session 2: Ontology

4:00-5:30 Session 3: Methods

Confirmed Speakers (including affiliation and relevant project)

Johan Alhlfeldt (University of Lund) Regnum Francorum online

Ceri Binding (University of Glamorgan) Semantic Technologies Enhancing Links and Linked data for Archaeological Resources

Gianluca Correndo (University of Southampton) EnAKTing

Claire Grover (University of Edinburgh) Edinburgh Geoparser

Adam Rabinowitz (University of Texas at Austin) GeoDia

Sebastian Rahtz (University of Oxford) CLAROS

Sven Schade (European Commission)

Humphrey Southall (University of Portsmouth) Great Britain Historical Geographical Information System

Tuesday 8 March 2011

Pelagios on Big Data

Recently JISC commissioned a reporter to go to the O'Reily Strata (Big Data) Conference in California. The report, which can be accessed here, throws up a number of interesting questions that are pertinent to our own project. We sketch out some initial responses here—but we’d welcome further comments from within the team or from any of our readers.

Given the opening questions asked about Big Data—What information is out there? How can we find it? And how can it be accessed?—it is perhaps no surprise to find that the Pelagios team is sympathetic to much of the discussion. From a brief reading of this document, three things strike us as being of particular relevance:

1. Open access. The concluding point, ‘if data is less open, it is less useful—limiting access limits value’, is one with which we are in total accord. The importance of data being open is particularly acute in an academic environment, where cutting edge research depends fundamentally on the ability to access datasets of all different kinds. If somewhat more geared towards a science model of research, nevertheless open access could transform (in a positive way) the Humanities, making it incumbent on the researcher to show his/her working out: just what is the evidence for this or that interpretation, and so on. But access is not the only issue; data must also be reusable—which presents its own range of technological and intellectual challenges. Furthermore, care must be given if we are to avoid a tyranny of openness, by virtue of which research that (for whatever reason) is less open (whatever that might mean) is passed over or fails to gain funding or prestige. Or, to put that differently, does the mere fact that something is open make it worthwhile research?

2. Infrastructure. This issue seemed to be the keynote of the blog. It is raised by (among others) Mike Olson (Cloudera), Rod Smith (IBM) and Abhishek Metha (Tresata); and by Steve Midgley (U.S. Department for Education). How do we find and identify valuable resources? And how can it be connected up? Just such an issue has been the concern too at the European Science Foundation (—the practice to be guarded against being termed ‘data silos’. But, given the importance of ‘infrastructure to store and share data within sectors’, how can it be achieved? Various ‘top-down’ approaches (whether commercial or educational) have thus far fallen flat either because of insufficient user uptake or the lack of reusability of the data, tools, etc. Accordingly, Pelagios supports a ‘bottom-up’ Linked Data approach, by which a variety of user groups work together to connect their resources in an open and transparent way, which others can assess and to which they may add their own. We certainly support the idea that resources should be shared wherever possible and that research (particularly in the Humanities) needs to move beyond a narrow competitive mentality.

3. Users. This is something touched upon in our previous response and that is implied in much of the blog—but whose fundamental importance is not really addressed. Focus seems to be rather on the producers and how they can provide context for the data or an expert ability to disseminate knowledge and understanding. But what about the user-end of the communication (something which Humanities teaches us to pay attention to)? Who are the users? Why do they use some Big Data and not others? And how can they be empowered to bring together for themselves different kinds of information? This is the critical challenge facing anyone working in the digital medium. In a commercial context it is clear that there is increasing demand for ‘Data Scientists’ capable of making sense of it all. Arguably that has always been the role of academics, yet few of us have the requisite technical skills and domain knowledge demanded by these new resources. Pelagios is a consortium project for this very reason and we suspect that multidisciplinary research groups will, almost of necessity, be the organizational structure best suited to exploiting academic Big Data.

Pelagios Project Plan Part 5: Project Team Relationships and End User Engagement

Pelagios is made up of an international consortium of projects leading digital research on the ancient world, the more established of which already have large user communities: in short, these groups are the end users of the Pelagios project. Each partner has at least one prominent member on the Pelagios working group to represent them. So, welcome to the team:

Elton Barker (GAP, The Open University) Principal Investigator
Elton is the non techie on the project: he’s a lecturer in Classical Studies at the OU with a specialism in all things Greek (Homer, tragedy, historiography...). But he’s slowly slipping into the murky world of Digital Humanities having been Principal Investigator of HESTIA - a project investigating spatial concepts in Herodotus - and now GAP. (He’s also assisting in the promotion and understanding of DH at the OU.) Elton is in overall charge of all things Pelagios related, like overseeing progress of the work packages and making sure that the team delivers on what it promised: in other words, it's his neck on the JISC chopping-block if things go pear-shaped - not that he's worried. He’s also responsible for GAP’s contribution to project documentation (meaning that he’ll try to put the techie talk into plain English).

Leif Isaksen (GAP, Southampton) Co-Investigator
Leif, a strange hybrid - part philosopher, part classicist, part archaeologist, and total computer geek - is a Research Fellow at the Archaeological Computing Research Group (Southampton). Having been the technical consultant on HESTIA, Leif has moved on to be Co-I for both GAP and Pelagios. Leif is the go-to man for all the technical components of the project, working in conjunction with Development Partners to ensure infrastructural outputs are delivered. He’s also responsible for GAP’s data contribution to the project (i.e. the techie stuff).

Tom Elliott, Sean Gillies and Sebastian Heath (Institute for the study of the Ancient World) Development Partners
Tom is managing editor of Pleiades, an innovative ancient world online atlas, which was established with a grant from the U.S. National Endowment for the Humanities and which he has since developed as Associate Director for Digital Programs in the Institute for the Study of the Ancient World at New York University.
Sean Gillies is Pleiades’ chief engineer and representative in open source web and GIS initiatives. Together, he and Tom will provide support and expertise in mapping ancient place references to Pleiades’ uniform resource identifiers for each location (so called URIs). As part of this task they are helping to develop the core ontology by which different projects can point their place reference data to the Pleiades URIs.

Sebastian Heath is a Ceramics specialist with expertise in Linked Data and also responsible for

Mathieu d’Aquin (LUCERO, The Open University) Dev Partner
Mathieu is a researcher at the OU’s Knowledge Media Institute and project director of the JISC-funded LUCERO project, which is exploring the means of integrating Linked Data practices for research and education in higher-educational organizations. As our ontology guru, Mathieu is looking to inject some Linked Data goodness into our project: he’s responsible for implementing the core semantic infrastructure, and will assist all the partners in its delivery and documentation.

Rainer Simon (DME, Austrian Institute of Technology) Dev Partner
Rainer is presently involved in the EuropeanaConnect project, where he leads research and development activities concerned with semantic media annotation, including the development of demonstrators for annotating and linking digitised maps and geospatial content. Rainer, then, has the coolest job of us all - to come up with funky ways of experimenting with and visualizing what Pelagios can do for you. He’s also responsible for DME’s data and documentation contributions to the project.

Greg Crane (Perseus, Tufts) Data + Doc Partner

Greg is Editor-in-Chief of the Perseus project, a trail-blazer in the world of Digital Humanities for its programme of digitizing classical texts - it currently hosts the world’s largest classical online library - with a particular focus on organising the data to meet user needs. Among supporters of Perseus are: the Annenberg/CPB Projects; the Digital Library Initiative Phase 2; the National Endowment for the Humanities and the National Science Foundation; the Institute for Museum and Library Services; the Mellon Foundation; and Google. Overseeing this vast digitised Classical empire, Greg is responsible for Perseus’ data and documentation contributions to the project.

Reinhard Foertsch (Arachne, Cologne) Data + Doc Partner
Reinhard is director of the CoDArchLab (computing archaeology) at Cologne University and leads the Arachne project, the central object-database of the German Archaeological Institute (DAI), which has over 6,000 registered users and approximately 981,000 scanned images and documentation for roughly 270,000 sites accessible free of charge and, helpfully, in English. Arachne is a partner of CLAROSnet and has recently won, like GAP and Perseus, a Google Digital Humanities Research Award. Reinhard is responsible for Arachne’s data and documentation contributions to the project.

Mark Hedges (SPQR, KCL) Data + Doc Partner
Mark is Deputy Director of the Centre for e-Research at KCL and Director of SPQR, which is not a project about the famous phrase describing the constituent part of the Roman Republic (Senatus Populusque Romanus) but rather one that adopts Semantic Web and Linked Data approaches to allow researchers to formalise resources and the links between them more flexibly. Mark is responsible for SPQR’s data and documentation contributions to the project.

Neel Smith (Ptolemy, Holy Cross) Data + Doc Partner
Neel is Associate Professor of Classics at the College of the Holy Cross (Worcester, Mass.), an architect of the Homer Multitext project, a principal designer of the Center for Hellenic Studies' CITE architecture, and is editing a geographically-aware digital edition of Ptolemy's Geography. He has been involved in digital scholarship since the early 1980s and still uses ed for quick edits.

Sebastian Rahtz (CLAROS, Oxford) Data + Doc Partner
Sebastian is the head of the Information and Support Group of Oxford University Computing Services. In this capacity he provides the technical support and development for CLAROS. He also directs the Lexicon of Greek Personal Names and works on and for the Text Encoding Initiative.

Other key project members are:
Andreas Geissler & Karin Hoehne (Arachne), Tobias Blanke + Gabriel Bodard (SPQR), Kate Byrne + Eric Kansa (GAP), Alex Dutton (CLAROS)

End-User Engagement
Pelagios has received feedback from a range of national and international organisations and initiatives including Google, EDINA, ADS, STELLAR, EnAKTing, UKLP, GeoNames, GeoDia, the European Commission, Open Context, Orbis, Regnum Francorum Online, Digital Atlas of Roman and Medieval Civilizations (Harvard) and Open Encyclopaedia of Classical Sites (Princeton). Many of these groups will be participating in the Pelagios workshop at KCL on 24 March. This event is open to ALL. Anyone wishing to attend should contact Elton at

In addition to this blog, we are also archiving all group email discussions under At the end of the project (October 2011) we will wrap up with a Web-launch of the ontology, tools and services to maximise impact.

Pelagios has consulted closely with JISC in finalizing the plan, budget and website and will also interact with, and solicit feedback from, the JISC Geospatial Working Group.