Pelagios: January 2012

Thursday, 26 January 2012

Discovering the Discovery Programme

As we are still funded by JISC, you may not have realised that the second phase of Pelagios is actually part of a different programme (also in its second phase, as it happens): the Resource Discovery Programme. Its aims are to help academics and students discover content relevant to their research interests. In particular they are interested in resources that are openly available to anyone. As part of that programme we will be attending several events organised by JISC in order to interact with our sister projects, and learn from each other.

The first meeting was held in Birmingham on January 10th and broke down into two parts. In the first half representatives from each project gave a short (5 minute) presentation about what they intend to achieve. It was interesting to me to see how many projects have a textual angle, especially given that texts are fundamental to Pelagios partners like GAP and Perseus. One project that particularly caught my eye is DiscoverEDINA who (among other things) will be looking to improve the place-based metadata associated with media resources. It's early days for them yet but there may well be opportunities for our projects to learn from each other.

The second part of the day was a series of rotating group discussions on topics associated with data access and usability. These were relatively short and snappy affairs and principally an opportunity for JISC to get a feel for the state of play amongst the groups concerned. They were also intended to trigger our own thinking about such issues and the day ended with a request that each group respond to a series of issues throughout the lifetime of its blog. They are:

Adopting open licensing
Requiring clear reasonable terms and conditions
Using easily understood data models
Deploying persistent identifiers
Establishing data relationships by re-using authoritative identifiers
Providing clear mechanisms for accessing APIs
Documenting APIs
Adopting widely understood data formats

What I found notable was a broad interest, if not wholesale adoption, of semantic approaches across the groups. Anyone who's read my PhD thesis (anyone?) will know that I think semantic technologies really only come into their own in an open data environment, so this seems positive to me. I'm also glad to see that JISC have broken down many of the challenges concerned (especially minting and re-using identifiers) so that they can be addressed independently. None of these are trivial issues and although there's no space to start delving into them here, expect to hear more from us about these topics as Pelagios continues to unfold.

Thursday, 19 January 2012

Scenarios of use and potential end users

Some of us within IET here at the OU have been thinking about the design and evaluation of the Pelagios tools and interfaces/widgets that are going to be developed and tested over the next few months, and we'd like some feedback from all of you please, to help us answer a few questions.

In particular, we're thinking about:

our user base - in the final JISC proposal document, it mentions super users, end users and policy makers. We think the most important of these are the first two groups: super users and end users. But who exactly are these people - our target groups? could you give us some examples - and also, if anyone is happy to be included in one or both of these groups, please could you email me your contact details so we can include you in some of the design and evaluation tasks? or could you propose other people/institutions who might be happy to be involved?
scenarios of use - why will our users want to use these tools and interfaces? what drives them to look at these resources - what are their interests or goals? what do they need to get out of these resources?

If you think you could help us with this information, please either post your comment on here or email me directly - many thanks!

Monday, 16 January 2012

FASTI Online - New Project Partner

As Fasti Online is one of the newest Pelagios partners, I thought it was about time we introduced ourselves to the project and let everyone know why we have joined and what we are hoping to bring to the table, and also what we hope to gain from Pelagios ourselves.

Between 1946 and 1987 the International Association for Classical Archaeology (AIAC) published the Fasti Archaeologici. It contained very useful summary notices of excavations throughout the area of the Roman Empire. However, spiraling costs and publication delays combined to render it less and less useful. AIAC's board of directors thus decided in 1998 to discontinue the publication and to seek a new way of recording and diffusing new results. The Fasti Online is the result of this effort.

Working with L - P : Archaeology [creators of the Archaeological Recording Kit (ARK)] AIAC and our project partners[1] have created an online database of over 2,700 archaeological sites in 13 different countries[2]. Each of these sites has had at least one excavation season since the year 2000 (in fact we have over 4000 excavation seasons in the database). Fasti Online, therefore, is a database of ongoing and recent archaeological projects, and not really a database of ancient places. This is what has made Pelagios so interesting to us, as by linking to the data provided by the other partners we can enrich our own and hopefully enrich theirs as well.

As to how we are planning on making the linkages, one of the fields recorded by the Fasti partners is the Ancient Site Name (where available). For the first round of linking we plan on matching our Ancient Site Names with those held in Pleiades, doing a check on the coordinates to make sure that they are the same place and then adding the Pleiades URI to the Fasti database. An initial run of the linking code has left us with 355 sites that match with Pleiades sites (only 955 of the Fasti sites have an Ancient Site Name attached) so that is not too bad at all for a first run.

We hope that at some point in the future we may be able to supply some of our Ancient site names back to Pleiades and of course the Pelagios partners should be able to link to the Fasti database to see if there are any ongoing excavations in their area of interest!

We'll write a further post once the linking script has been run, and we have managed to get an RDF representation of it all. Watch this space!

[1] the project is generously supported by the Packard Humanities Institute, while the Italian and Ukrainian sites receive additional support from the Ministero dei Beni e le Attivita' Culturali and the Ukrainian Studies Fund, respectively.

[2] the countries that are currently part of Fasti are Italy, Serbia, Bulgaria, Romania, Macedonia, Malta, Morocco, Croatia, Albania, Slovenia, Kosovo, Montenegro and Ukraine

Tuesday, 10 January 2012

Progress in CLAROS towards Pelagios

I have been working on getting data into CLAROS (http://www.clarosnet.org/) to make it a proper contributing partner to Pelagios. Not new data exactly (we have millions on RDF triples already), but new connections between data. Finally, we're almost there, as Alex Dutton will explain in a subsequent post, able to list all the objects and people in CLAROS which can be linked to Pleiades places. But it may be instructive to informally describe the process we go through, and the tools we use.

The starting point for data providers in CLAROS is a supply of RDF against the CIDOC CRM (obviously, that takes some doing at their end; the wiki at http://www.clarosnet.org/wiki/index.php?title=CIDOC_CRM_RDF/XML helps explain how and what). This RDF (I give examples in XML) typically describes a set of objects, eg an <crm:E22_Man-Made_Object rdf:about="http://www.beazley.ox.ac.uk/record/AA1CD952-927D-41D7-B7AF-39520936CF95"> which has a section saying where they think it comes from, in the slightly tortuous way familiar to users of the CRM:

<P16i;was_used_for>
  <E7_Activity>
   <P2_has_type rdf:resource="http://id.clarosnet.org/vocab/Event_FindObject"/>
   <P7_took_place_at>
     <E53_Place>
       <P87_is_identified_by>
         <E48_Place_Name>
           <rdf:value>VULCI</rdf:value>
         </E48_Place_Name>
       </P87_is_identified_by>
       <P89_falls_within>
         <E53_Place>
           <P87_is_identified_by>
             <E48_Place_Name>
               <rdf:value>ETRURIA</rdf:value>
             </E48_Place_Name>
           </P87_is_identified_by>
         </E53_Place>
       </P89_falls_within>
     </E53_Place>
   </P7_took_place_at>
 </E7_Activity>
</P16i_was_used_for>

This is not wrong, but not ideal, since

the E53_Place objects are not identified by a URL and so are not addressable in the RDF
there is no indication of the geographical location of Vulci
there is no link to any other record for Vulci

The CLAROS ingest procedure reads this data, and enhances it by taking the place name "Vulci" and comparing it to a list of known places in an internal gazetter called Metamorphoses. This has been built up by pulling together ad hoc catalogues from the various projects at Oxford, and gradually enhancing the entries with latitude and longitude acquired by finding places on Google Maps or Earth, and cross-referencing sites from Geonames (http://www.geonames.org/). By then consulting PleiadesPlus (http://googleancientplaces.wordpress.com/2011/01/24/pleiades-adapting-the-ancient-world-gazetteer-for-gap-%E2%80%93-by-leif-isaksen/), we can enhance the gazetteer still further with links to Pleiades. The end result looks like this, utilizing the skos:closeMatch relationship to link up our internal place Vulci: with Pleiades and Geonames

<E53_Place rdf:about="http://id.clarosnet.org/places/metamorphoses/place/vulci">
 <rdfs:label>[IT] Vulci</rdfs:label>
 <P87_is_identified_by>
   <E48_Place_Name rdf:about="http://id.clarosnet.org/places/metamorphoses/placename/vulci">
     <rdf:value>Vulci</rdf:value>
   </E48_Place_Name>
 </P87_is_identified_by>
 <P87_is_identified_by>
   <E47_Place_Spatial_Coordinates rdf:about="http://id.clarosnet.org/places/metamorphoses/place/vulci/coordinates">
     <claros:has_geoObject>
       <geo:Point xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
         <geo:lat>42.4167</geo:lat>
         <geo:long>11.5833</geo:long>
       </geo:Point>
     </claros:has_geoObject>
   </E47_Place_Spatial_Coordinates>
 </P87_is_identified_by>
 <skos:closeMatch rdf:resource="http://pleiades.stoa.org/places/413393#this"/>
 <skos:closeMatch rdf:resource="http://sws.geonames.org/3163940/"/>
 <P89_falls_within rdf:resource="http://id.clarosnet.org/places/metamorphoses/country/IT"/></E53_Place>

Now we can match the "VULCI" from earlier on with this "vulci", and rewrite the <P7_took_place_at> as <P7_took_place_at rdf:resource="http://id.clarosnet.org/places/metamorphoses/place/vulci"/>; this now lets us assert that http://www.beazley.ox.ac.uk/record/AA1CD952-927D-41D7-B7AF-39520936CF95 is associated with http://pleiades.stoa.org/places/413393#this in some way, which is where we meet Pelagios.

Most of the normalizing process is done in a single XSLT 2.0 transform (which also does quality checks of the RDF) of incoming RDF XML, working with the Metamorphoses RDF and a lookup XML file listing common spelling mistakes. When the resulting rewritten RDF is loaded into the triple store, additional inferences are performed to make subsequent retrievals easier. This process is, of course, very open to change and refinement, and as CLAROS develops we will no doubt rewrite it all.

Does it work? CLAROS' gazetter currently defines about 7300 places, of which only 1442 are linked to Pleiades. But bearing in mind that CLAROS has a lot of modern place names, and a lot of ones in the middle and far east, we are not dissatisfied with progress. Our next step will be to gradually go over places in the obvious countries (Greece, Italy, France, Germany etc), and check them against Pleiades, with the target of complete synchronization across the Mediterranean. It will be slow work...

Pages