- Adopting open licensing
- Requiring clear reasonable terms and conditions
- Using easily understood data models
- Deploying persistent identifiers
- Establishing data relationships by re-using authoritative identifiers
- Providing clear mechanisms for accessing APIs
- Documenting APIs
- Adopting widely understood data formats
Thursday, 26 January 2012
Discovering the Discovery Programme
Thursday, 19 January 2012
Scenarios of use and potential end users
In particular, we're thinking about:
- our user base - in the final JISC proposal document, it mentions super users, end users and policy makers. We think the most important of these are the first two groups: super users and end users. But who exactly are these people - our target groups? could you give us some examples - and also, if anyone is happy to be included in one or both of these groups, please could you email me your contact details so we can include you in some of the design and evaluation tasks? or could you propose other people/institutions who might be happy to be involved?
- scenarios of use - why will our users want to use these tools and interfaces? what drives them to look at these resources - what are their interests or goals? what do they need to get out of these resources?
Monday, 16 January 2012
FASTI Online - New Project Partner
As Fasti Online is one of the newest Pelagios partners, I thought it was about time we introduced ourselves to the project and let everyone know why we have joined and what we are hoping to bring to the table, and also what we hope to gain from Pelagios ourselves.
Between 1946 and 1987 the International Association for Classical Archaeology (AIAC) published the Fasti Archaeologici. It contained very useful summary notices of excavations throughout the area of the Roman Empire. However, spiraling costs and publication delays combined to render it less and less useful. AIAC's board of directors thus decided in 1998 to discontinue the publication and to seek a new way of recording and diffusing new results. The Fasti Online is the result of this effort.
Working with L - P : Archaeology [creators of the Archaeological Recording Kit (ARK)] AIAC and our project partners[1] have created an online database of over 2,700 archaeological sites in 13 different countries[2]. Each of these sites has had at least one excavation season since the year 2000 (in fact we have over 4000 excavation seasons in the database). Fasti Online, therefore, is a database of ongoing and recent archaeological projects, and not really a database of ancient places. This is what has made Pelagios so interesting to us, as by linking to the data provided by the other partners we can enrich our own and hopefully enrich theirs as well.
As to how we are planning on making the linkages, one of the fields recorded by the Fasti partners is the Ancient Site Name (where available). For the first round of linking we plan on matching our Ancient Site Names with those held in Pleiades, doing a check on the coordinates to make sure that they are the same place and then adding the Pleiades URI to the Fasti database. An initial run of the linking code has left us with 355 sites that match with Pleiades sites (only 955 of the Fasti sites have an Ancient Site Name attached) so that is not too bad at all for a first run.
We hope that at some point in the future we may be able to supply some of our Ancient site names back to Pleiades and of course the Pelagios partners should be able to link to the Fasti database to see if there are any ongoing excavations in their area of interest!
We'll write a further post once the linking script has been run, and we have managed to get an RDF representation of it all. Watch this space!
[1] the project is generously supported by the Packard Humanities Institute, while the Italian and Ukrainian sites receive additional support from the Ministero dei Beni e le Attivita' Culturali and the Ukrainian Studies Fund, respectively.
[2] the countries that are currently part of Fasti are Italy, Serbia, Bulgaria, Romania, Macedonia, Malta, Morocco, Croatia, Albania, Slovenia, Kosovo, Montenegro and Ukraine
Tuesday, 10 January 2012
Progress in CLAROS towards Pelagios
I have been working on getting data into CLAROS (http://www.clarosnet.org/) to make it a proper contributing partner to Pelagios. Not new data exactly (we have millions on RDF triples already), but new connections between data. Finally, we're almost there, as Alex Dutton will explain in a subsequent post, able to list all the objects and people in CLAROS which can be linked to Pleiades places. But it may be instructive to informally describe the process we go through, and the tools we use.
The starting point for data providers in CLAROS is a supply of RDF against the CIDOC CRM (obviously, that takes some doing at their end; the wiki at http://www.clarosnet.org/wiki/index.php?title=CIDOC_CRM_RDF/XML helps explain how and what). This RDF (I give examples in XML) typically describes a set of objects, eg an <crm:E22_Man-Made_Object rdf:about="http://www.beazley.ox.ac.uk/record/AA1CD952-927D-41D7-B7AF-39520936CF95"> which has a section saying where they think it comes from, in the slightly tortuous way familiar to users of the CRM:
<P16i;was_used_for>
<E7_Activity>
<P2_has_type rdf:resource="http://id.clarosnet.org/vocab/Event_FindObject"/>
<P7_took_place_at>
<E53_Place>
<P87_is_identified_by>
<E48_Place_Name>
<rdf:value>VULCI</rdf:value>
</E48_Place_Name>
</P87_is_identified_by>
<P89_falls_within>
<E53_Place>
<P87_is_identified_by>
<E48_Place_Name>
<rdf:value>ETRURIA</rdf:value>
</E48_Place_Name>
</P87_is_identified_by>
</E53_Place>
</P89_falls_within>
</E53_Place>
</P7_took_place_at>
</E7_Activity>
</P16i_was_used_for>
This is not wrong, but not ideal, since
- the E53_Place objects are not identified by a URL and so are not addressable in the RDF
- there is no indication of the geographical location of Vulci
- there is no link to any other record for Vulci
The CLAROS ingest procedure reads this data, and enhances it by taking the place name "Vulci" and comparing it to a list of known places in an internal gazetter called Metamorphoses. This has been built up by pulling together ad hoc catalogues from the various projects at Oxford, and gradually enhancing the entries with latitude and longitude acquired by finding places on Google Maps or Earth, and cross-referencing sites from Geonames (http://www.geonames.org/). By then consulting PleiadesPlus (http://googleancientplaces.wordpress.com/2011/01/24/pleiades-adapting-the-ancient-world-gazetteer-for-gap-%E2%80%93-by-leif-isaksen/), we can enhance the gazetteer still further with links to Pleiades. The end result looks like this, utilizing the skos:closeMatch relationship to link up our internal place Vulci: with Pleiades and Geonames
<E53_Place rdf:about="http://id.clarosnet.org/places/metamorphoses/place/vulci">
<rdfs:label>[IT] Vulci</rdfs:label>
<P87_is_identified_by>
<E48_Place_Name rdf:about="http://id.clarosnet.org/places/metamorphoses/placename/vulci">
<rdf:value>Vulci</rdf:value>
</E48_Place_Name>
</P87_is_identified_by>
<P87_is_identified_by>
<E47_Place_Spatial_Coordinates rdf:about="http://id.clarosnet.org/places/metamorphoses/place/vulci/coordinates">
<claros:has_geoObject>
<geo:Point xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<geo:lat>42.4167</geo:lat>
<geo:long>11.5833</geo:long>
</geo:Point>
</claros:has_geoObject>
</E47_Place_Spatial_Coordinates>
</P87_is_identified_by>
<skos:closeMatch rdf:resource="http://pleiades.stoa.org/places/413393#this"/>
<skos:closeMatch rdf:resource="http://sws.geonames.org/3163940/"/>
<P89_falls_within rdf:resource="http://id.clarosnet.org/places/metamorphoses/country/IT"/></E53_Place>
Now we can match the "VULCI" from earlier on with this "vulci", and rewrite the <P7_took_place_at> as <P7_took_place_at rdf:resource="http://id.clarosnet.org/places/metamorphoses/place/vulci"/>; this now lets us assert that http://www.beazley.ox.ac.uk/record/AA1CD952-927D-41D7-B7AF-39520936CF95 is associated with http://pleiades.stoa.org/places/413393#this in some way, which is where we meet Pelagios.
Most of the normalizing process is done in a single XSLT 2.0 transform (which also does quality checks of the RDF) of incoming RDF XML, working with the Metamorphoses RDF and a lookup XML file listing common spelling mistakes. When the resulting rewritten RDF is loaded into the triple store, additional inferences are performed to make subsequent retrievals easier. This process is, of course, very open to change and refinement, and as CLAROS develops we will no doubt rewrite it all.
Does it work? CLAROS' gazetter currently defines about 7300 places, of which only 1442 are linked to Pleiades. But bearing in mind that CLAROS has a lot of modern place names, and a lot of ones in the middle and far east, we are not dissatisfied with progress. Our next step will be to gradually go over places in the obvious countries (Greece, Italy, France, Germany etc), and check them against Pleiades, with the target of complete synchronization across the Mediterranean. It will be slow work...