Tuesday, 31 July 2012
Pelagios WP3 at a glance - Pt.1 widgets and user testing
Thursday, 26 July 2012
OCRE Joins Pelagios
Each of these records is described with the Numismatic Description Standard (NUDS), an XML schema influenced by other common metadata standards, such as TEI and EAD, and by the tenants of linked open data. Like other standards common to the library and archival communities, NUDS uses the W3C XLink attributes for semantic linking. The record for Augustus 1a as defined in RIC contains the obverse and reverse type descriptions, legends, and other typological attributes--material, method of manufacture, place of minting, etc. These attributes are intellectual concepts represented by web resources on nomisma.org. The denomination, quinarius, is represented by the URI http://nomisma.org/id/quinarius. The coin was minted in Emerita, represented by http://nomisma.org/id/emerita. The nomisma.org RDF model for Emerita indicates that the Pleiades place http://pleiades.stoa.org/places/256155 is a related resource, and thus the Pleiades URI is stored in OCRE's Solr index for querying. As a result, the contents of OCRE are available in the form RDF, which is, in turn, ingested into Pelagios. More than 7,600 of the records in OCRE are associated with Pleiades places and accessible through Pelagios. Moreover, OCRE supports an Atom feed driven by Lucene queries. This Atom feed links to the HTML representations of each record, as well as RDF, KML, and source NUDS/XML. It is therefore possible to programmatically page through the Atom feed to harvest all of the data in OCRE.
Work on OCRE is ongoing. We expect the coins of Antoninus Pius to be published in the near future. It may take several years to fully publish the coin types through Anastasius, but the current phase of the project is an important first step in bringing the study and publication of Roman imperial coins into the twenty-first century.
Wednesday, 25 July 2012
Pelagios WP1 at a Glance - Pt.2: Backstage
This post continues my report on the results of Work Package 1. (Read the first part of the series here.) In this second part, we'll take a look behind the scenes of the Pelagios API.
API Administration Area
The API admininstration area allows us to easily add new datasets or update existing ones manually. Furthermore, it has a "statistics dashboard". In project management terms, the statistics dashboard corresponds to Deliverable 1.3; and it tells us how our API is being used.
Using a few lists and charts, we can see what terms users are searching for, which places and datasets are most popular, and what response formats (HTML, JSON, RDF) are most frequently used.
API Statistics Dashboard |
Pelagios Monitor
The second essential component backstage is the Pelagios Monitor. The Monitor is separate from the API; and project-organization-wise, it represents the main part of Deliverable 1.1. The Monitor's (declaredly boring) task is to periodically crawl our partners' data to check for changes.
Needless to say, all software components we produced in WP 1 are open source (licensed under the GPL v3) and available on Github!
The API project is located at http://github.com/pelagios/pelagios-api-v2
The Monitor project is located at http://github.com/pelagios/pelagios-monitor
Rainer Simon
Austrian Institute of Technology
Friday, 20 July 2012
How does CLAROS make its annotations for Pelagios?
CLAROS currently aggregates data from 12 partners, most of whose material relates to the ancient world. The input is RDF XML against the CIDOC CRM, largely describing objects:
arachne | Arachne | 185119 objects |
ashmol | Jameel Collection, Ashmolean | 2316 objects |
beazley | Beazley Archive | 130960 objects |
bsa | British School at Athens | (pending) |
bsr | British School at Rome, photographs and plans | 16043 objects |
creswell | Creswell Photographic Archive, Ashmolean | 6521 objects |
cycladic | Cycladic Museum, Athens | 348 objects |
lgpn | Lexicon of Greek Personal Names | 251821 people |
limc | LIMC Paris | 4724 objects |
limcbasel | LIMC Basel | 55852 objects |
metamorphoses | Gazetteer | 9396 places (6325 geolocated) |
waa | World of Ancient Art | 406 places |
- c.9300 places known
- c.6200 places geolocated
- c.1500 places linked to Pleiades
- c.4330 places linked to geonames.org
The majority of the data hitting CLAROS uses a simple place name, so the main work of our ingest procedure is to attempt to map that to known place (and thence to Pleiades). The procedure may be of interest:
- Does the
<E53_Place>
in the RDF already have a geolocation? OK - Normalize place name. Translate space to -, lower-case, etc
- Does the name match an entry in our mapping table?
from="academy" to="athens-academy" from="aegypten" to="egypt" from="agios-ioannis" to="athens-agios-ioannis" from="agli" to="aglie" from="agrigento" to="sicily-agrigento" from="aidinjik" to="edincik"
if so, use the canonical form - Does name of place match a known place? link to that place
- Does name of place partially match a place?
create an
<E53_Place>
which has a<P89_falls_within>
linking to the half-match. Example "athens-kerameikos" - Does
<E53_Place>
have a geonames link? get lat/long from www.geonames.org
<E53_Place rdf:about="http://id.clarosnet.org/places/metamorphoses/place/astypalaia"> <rdfs:label>[GR] Astypalaia</rdfs:label> <P87_is_identified_by> <E48_Place_Name rdf:about="http://id.clarosnet.org/places/metamorphoses/placename/astypalaia"> <rdf:value>Astypalaia</rdf:value> </E48_Place_Name> </P87_is_identified_by> <P87_is_identified_by> <E47_Place_Spatial_Coordinates rdf:about="http://id.clarosnet.org/places/metamorphoses/place/astypalaia/coordinates"> <claros:has_geoObject> <geo:Point xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"> <geo:lat>36.58116008943272</geo:lat> <geo:long>26.39066203259252</geo:long> </geo:Point> </claros:has_geoObject> </E47_Place_Spatial_Coordinates> </P87_is_identified_by> <skos:closeMatch rdf:resource="http://sws.geonames.org/264408/"/> <skos:closeMatch rdf:resource="http://pleiades.stoa.org/places/599536#this"/> <P89_falls_within rdf:resource="http://id.clarosnet.org/places/metamorphoses/country/GR"/> </E53_Place>
Once we have the hundreds of thousands of objects and people duly linked to a place, it is easy to associate them with Pleiades, via the <skos:closeMatch> shown in the example. The data is loaded into a RDF triple store (Jena), and then we can run the following SPARQL query to generate a new set of triples containing the needed OAC annotations:
CONSTRUCT { ?anno a oac:Annotation ; dcterms:conformsTo <http://id.clarosnet.org/annotation-class/find-location> ; oac:hasTarget ?object ; oac:hasBody ?pleiades . ?object a oac:Target, crm:E22_Man-Made_Object ; rdfs:label ?label . } WHERE { ?object crm:P16i_was_used_for [ crm:P2_has_type <http://id.clarosnet.org/vocab/Event_FindObject> ; crm:P7_took_place_at ?place ] ; rdfs:label ?label . ?place skos:closeMatch* ?pleiades . FILTER (regex(str(?pleiades), "pleiades")) . BIND (uri(concat("http://id.clarosnet.org/annotation/find-location/", sha1(str(?object)))) as ?anno) . }
The resulting triples are loaded into a new graph called "pelagios" in the triple store, and finally we are able to point the Pelagios folk at http://data.clarosnet.org/graph/pelagios, and the corresponding VoID at http://data.clarosnet.org/graph/void, and results start to appear in Pelagios clients.
So far, so good. But there remain two problems, one practical and one theoretical.
Firstly, the CLAROS collection includes 180000 objects from Arachne; but Arachne is a Pelagios
contributor in its own right. This means that the existence of a gold ring from Athens will be reported twice in Pelagios. To solve this, we need to adjust the SPARQL inference above to run separately against each of the partner data collections, and generate discrete sets of OAC triples. This will allow Pelagios to avoid harvesting Arachne from CLAROS, assuming it is better to come from the source.
Secondly, some of the relationships in CLAROS start to strain the notion of an annotation. When a person called Alexandros comes from a place called Athens, is it really sensible to say that the person "annotates" the place? It could equally be argued that the place annotates the person. In some ways, this does not matter so long as all the data contributors follow the same conventions, but eventually consumers will find our data sets in isolation, and find them quite confusing. Other similar projects using the same technology may make quite different choices.
The Pelagios idea of using OAC as its structure was a good one, and has let the project proceed fast and efficiently. Whether it can, or should be, maintained as the ancient world semantic web builds up, is debatable.
Pelagios WP1 at a Glance - Pt.1: The API
The Pelagios API enables you to search and browse the data we are aggregating from our partners. It has a basic HTML interface, so you can click your way through the Pelagios network of places, datasets and place references. Your starting point can be either a particular place or a particular dataset. You can search for those by name, or - in case of the datasets - browse the list.
Places in the API
Olympia in the API |
The API user interface provides views on the different objects in Pelagios: Places are shown with some basic metadata from their original Pleiades source entry, including labels, a description, their feature type, and so on. A table underneath the place description lists the references the Pelagios network has for that place, sorted by partner dataset; and clicking an entry will take you to the list of references for that place, in that particular dataset.
Below the list, you will find the neighbourhood cloud. This little gadget looks like one of the tag clouds often seen on blogs or social bookmarking sites, where they usually visualize the frequency of often-used words or tags in the system. This way, users can get a quick overall grasp of what the content on a particular site might be about.
The Pelagios neighbourhood cloud has a slightly different purpose: it shows neighbour places - not in a geographical sense, but in terms of how they are connected in the graph. Larger tags mean that the neighbour is "stronger" connected to the place than others.
We are still fiddling with the ranking metrics, but in a nutshell, a "strong" connection (as regards our current visualization) is one that runs through datasets that are primarily concerned with those two places. For example: see how the cloud for Thermopylae will render a neighbourhood to Lacedaemonia, Sparta and (to a slightly lesser extent) Salamis; or how the Island of Sardinia sits nicely between places in Italy and North Africa.
Datasets in the API
Datasets have similar overview pages. As for places, these show some basic metadata for the dataset (title, description, license terms, etc.) and list the subsets contained in this dataset. The view also shows a small bar graph listing the five most frequently referenced places.
Machine-Access
But the main purpose of the API is, as the name implies, not the user interface - it is to serve out machine-actionable representations of our data. To this end, the API provides responses in RDF (currently in XML and Turtle serialization), as well as JSON (with support for Content Negotiation). Cross domain requests to the API (essential for supporting client-side mashups) are supported through JSONP and CORS.
In terms of functionality, the API offers everything that's in the user interface, plus a few additional features which are not (yet) found there, including
- geographical search
- "geo-footprints" for datasets, i.e. the geographical area covered by all places referenced in a dataset
- configurable pagination for everything that comes in long lists (e.g. place references in a particular dataset)
- shortest path search between two places in the graph
Rainer Simon,
Austrian Institute of Technology
Continue with part 2 of this post.
Thursday, 19 July 2012
MEKETRE - New Project Partner Introduction
The MEKETRE project seeks to systematically collect, research, and study the reliefs and paintings of Middle Kingdom tombs of Ancient Egypt. One of its main aims is to map and elaborate the development of the scenes and their content in comparison to the Old Kingdom. The project is funded by the Austrian Science Fund (FWF) and has a duration of three years (late 2009 until late 2012). The project's technical part features an online repository (the MEKETREpository) for easy exploration of the collected data.
Collected Data
The data in the MEKETREpository is, at the highest level, structured into tombs and fragments that contain themes, i.e. specific types of scenes that are part of the tomb decoration programme. Additional information can be attached to these themes in the form of annotations. To each tomb, theme and fragment multiple annotations can be attached that, e.g., highlight specific regions of interest. Furthermore, they connect these regions to descriptions which can be provided as free-text but also as classification terms or keywords from a controlled vocabulary. Annotations are an intuitive means to structure and organize information, for both data consumers and producers.
So far, the egyptological staff of the project has gathered an extensive amount of data, e.g.:
- >240 Objects: ~114 Tombs, ~120 Themes, ~8 Fragments
- >570 Images (3.5 GB)
- ~1900 Annotations
- ~500 Basic Terms, ~500 Classification Terms
- >1700 References to >200 Publications
Linked Data Utilization
Every item in the repository can be viewed by using a webbrowser (cf. this item). Additionally, there is also the option to download an RDF representation of the item by clicking at triple icon on the top left of the page.
The controlled vocabularies used for annotating the repository items are created by using the third-party web application PoolParty. The tool supports scholors from the Egyptology domain in collaboratively building an online thesaurus following the SKOS de-facto standard for controlled vocabularies on the web. Our thesaurus is linked directly from the project's homepage or can be accessed directly from the PoolParty server.
In our implementation we use a MySQL database together with Triplify to generate the RDF representation of our content. It aims to adopt and reuse as many existing vocabularies as possible (e.g., Dublin Core, FOAF) but also makes use of our own core vocabulary.
Future Work
As a next step we intend to extend our repository by a separate web application that supports easy contribution (e.g., image uploading, creation of annotations, suggestion of new vocabulary terms) for interested users without scientific background. The goal is to collect even more material on Middle Kingdom artwork that can then be reviewed and amended by scholars. If the quality has reaches the necessary level, the material will be integrated into the MEKETREpository.
Tuesday, 17 July 2012
Announcing the Pelagios widgets
Pelagios Place Widget
The first widget is the Pelagios Place Widget. This is an icon with the Pelagios logo that you can add to your website. When you add it you specify a particular place in the Pleiades gazetteer and the widget will then provide information about that place.Here is a screenshot of the icon which you can see after the link to Delphi.
When the user clicks on the icon, they then get information about Delphi from the Pelagios partners as well as relevant photos from Flickr.:
You can see a live demonstration of the widget for Corinth here by clicking on this icon: (opens in a new tab or window). There is also a live demonstration of the widget as an overlay for Corinth here along with demonstrations for a selection of other places.
There are various options when you embed the widget. For example, you can choose the whole widget to display immediately rather than via an icon which is clicked open, or you can choose for the map to not be displayed.
Pelagios Search Widget
This consists of a search box. If you search for a place then it will show you all the matches for that place as a list and on a map. You can then click on each one to obtain data from Pelagios partners about each place.Live demonstration of the search widget
How to embed the widgets
You can add the Pelagios widgets to your site by adding a small snippet of HTML to your page. There are full instructions for embedding the widgets here.Some sites, in particular most blogging and content management systems, have restrictions on the HTML you can add, and in particular will not allow you to add Javascript. For these, you can add the Pelagios Place Widget via an image and link, although the widget will open in a new tab or window. Hopefully at some point in the future we will be able to turn the widgets into Wordpress plug-ins, Google gadgets and other formats that may be able to help with this. .
Feedback
We would warmly welcome and feedback on the widgets and suggestions for future work on them - please send any comments to me at j.culver@open.ac.uk. Please do also feel free to try embedding them on your sites and let us know if you have any problems or if the documentation could be improved.More about the widgets
There is much more information about the widgets on the Pelagios Widget pages and the source code is available on Github released under Gnu Public Licence v3. They were developed here at the Institute of Educational Technology at The Open University.Tuesday, 3 July 2012
Geographical information retrieval of historical regions
Roman provinces up to AD 117 visualised in CartoDB |
In order to visualize places and annotations from PELAGIOS API we exploited the geographical search by bounding box. By retrieving places in the PELAGIOS network contained by a bounding box we are half way to filter them via any polygon. In fact, by adopting a GIS we could directly querying data by polygons. Unfortunately that would require to have all the annotation data and the regions' polygons stored in the same database which is against the principle of distributeness of the linked data paradigm and it is not feasible in general scnarios. In fact that solution would require to provide a version of the PELAGIOS data to any interested user that would be forced to install GIS software and host their particular polygons.
Instead, in here, what we did is to decouple the management of the annotation data with the geographical retrieval features, trying to minimize the amount of software to install and reusing as much as possible the data and services already provided. For this reason we uploaded the polygons we were interested on in a web enabled version of postgis, called CartoDB. CartoDB allows a limited and free use of the web platform, but users can download the open source version and install it on a server if and when needed. CartoDB allows to run SQL queries over HTTP requests that allow developers to integrate the system easily.
As said earlier, once we have the capability to query by bounding box we are half way to being able to query by polygons. In fact, by querying the CartoDB we can retrieve the shape of a region by using its name (e.g. Aquitania in the figures below). If we want to retrieve all the PELAGIOS places contained in the Aquitania region we can query the PELAGIOS API for the places contained in the bounding box of the polygon first, and then filter those places based on the topological containment applied to the retrieved shape.
Selection of PELAGIOS places by using region's bounding box |
Filtering of those resources by using the polygon topological containment |
The activities involved to extract places by using polygons can be represented by the diagram below and involve three actors: the service implemented by the ECS dept. in Southampton (named ECS), the PELAGIOS API, and the CartoDB instance used for this scenario.