CLAROS currently aggregates data from 12 partners, most of whose material relates to the ancient world. The input is RDF XML against the CIDOC CRM, largely describing objects:
arachne | Arachne | 185119 objects |
ashmol | Jameel Collection, Ashmolean | 2316 objects |
beazley | Beazley Archive | 130960 objects |
bsa | British School at Athens | (pending) |
bsr | British School at Rome, photographs and plans | 16043 objects |
creswell | Creswell Photographic Archive, Ashmolean | 6521 objects |
cycladic | Cycladic Museum, Athens | 348 objects |
lgpn | Lexicon of Greek Personal Names | 251821 people |
limc | LIMC Paris | 4724 objects |
limcbasel | LIMC Basel | 55852 objects |
metamorphoses | Gazetteer | 9396 places (6325 geolocated) |
waa | World of Ancient Art | 406 places |
- c.9300 places known
- c.6200 places geolocated
- c.1500 places linked to Pleiades
- c.4330 places linked to geonames.org
The majority of the data hitting CLAROS uses a simple place name, so the main work of our ingest procedure is to attempt to map that to known place (and thence to Pleiades). The procedure may be of interest:
- Does the
<E53_Place>
in the RDF already have a geolocation? OK - Normalize place name. Translate space to -, lower-case, etc
- Does the name match an entry in our mapping table?
from="academy" to="athens-academy" from="aegypten" to="egypt" from="agios-ioannis" to="athens-agios-ioannis" from="agli" to="aglie" from="agrigento" to="sicily-agrigento" from="aidinjik" to="edincik"
if so, use the canonical form - Does name of place match a known place? link to that place
- Does name of place partially match a place?
create an
<E53_Place>
which has a<P89_falls_within>
linking to the half-match. Example "athens-kerameikos" - Does
<E53_Place>
have a geonames link? get lat/long from www.geonames.org
<E53_Place rdf:about="http://id.clarosnet.org/places/metamorphoses/place/astypalaia"> <rdfs:label>[GR] Astypalaia</rdfs:label> <P87_is_identified_by> <E48_Place_Name rdf:about="http://id.clarosnet.org/places/metamorphoses/placename/astypalaia"> <rdf:value>Astypalaia</rdf:value> </E48_Place_Name> </P87_is_identified_by> <P87_is_identified_by> <E47_Place_Spatial_Coordinates rdf:about="http://id.clarosnet.org/places/metamorphoses/place/astypalaia/coordinates"> <claros:has_geoObject> <geo:Point xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"> <geo:lat>36.58116008943272</geo:lat> <geo:long>26.39066203259252</geo:long> </geo:Point> </claros:has_geoObject> </E47_Place_Spatial_Coordinates> </P87_is_identified_by> <skos:closeMatch rdf:resource="http://sws.geonames.org/264408/"/> <skos:closeMatch rdf:resource="http://pleiades.stoa.org/places/599536#this"/> <P89_falls_within rdf:resource="http://id.clarosnet.org/places/metamorphoses/country/GR"/> </E53_Place>
Once we have the hundreds of thousands of objects and people duly linked to a place, it is easy to associate them with Pleiades, via the <skos:closeMatch> shown in the example. The data is loaded into a RDF triple store (Jena), and then we can run the following SPARQL query to generate a new set of triples containing the needed OAC annotations:
CONSTRUCT { ?anno a oac:Annotation ; dcterms:conformsTo <http://id.clarosnet.org/annotation-class/find-location> ; oac:hasTarget ?object ; oac:hasBody ?pleiades . ?object a oac:Target, crm:E22_Man-Made_Object ; rdfs:label ?label . } WHERE { ?object crm:P16i_was_used_for [ crm:P2_has_type <http://id.clarosnet.org/vocab/Event_FindObject> ; crm:P7_took_place_at ?place ] ; rdfs:label ?label . ?place skos:closeMatch* ?pleiades . FILTER (regex(str(?pleiades), "pleiades")) . BIND (uri(concat("http://id.clarosnet.org/annotation/find-location/", sha1(str(?object)))) as ?anno) . }
The resulting triples are loaded into a new graph called "pelagios" in the triple store, and finally we are able to point the Pelagios folk at http://data.clarosnet.org/graph/pelagios, and the corresponding VoID at http://data.clarosnet.org/graph/void, and results start to appear in Pelagios clients.
So far, so good. But there remain two problems, one practical and one theoretical.
Firstly, the CLAROS collection includes 180000 objects from Arachne; but Arachne is a Pelagios
contributor in its own right. This means that the existence of a gold ring from Athens will be reported twice in Pelagios. To solve this, we need to adjust the SPARQL inference above to run separately against each of the partner data collections, and generate discrete sets of OAC triples. This will allow Pelagios to avoid harvesting Arachne from CLAROS, assuming it is better to come from the source.
Secondly, some of the relationships in CLAROS start to strain the notion of an annotation. When a person called Alexandros comes from a place called Athens, is it really sensible to say that the person "annotates" the place? It could equally be argued that the place annotates the person. In some ways, this does not matter so long as all the data contributors follow the same conventions, but eventually consumers will find our data sets in isolation, and find them quite confusing. Other similar projects using the same technology may make quite different choices.
The Pelagios idea of using OAC as its structure was a good one, and has let the project proceed fast and efficiently. Whether it can, or should be, maintained as the ancient world semantic web builds up, is debatable.
No comments:
Post a Comment