Tuesday, 7 January 2014

The day of Pelagios: Berlin 11.12.13

Before the seasonal break of mince pies and Glühwein, the Pelagios team held a meeting in Berlin to address a range of issues relating to geospatial data aggregation and analysis. The fact that we were holding this in Berlin reflected the fortunate co-presence there of a number of different digital humanities initiatives. Our hosts were the German Archaeological Institute (or DAI), the ICT Director, Reinhard Förtsch, along with his researchers Philipp Gerth and Wolfgang Schmidle. Others joining us were:
The meeting presented us with the opportunity to talk first about Pelagios and its evolution. The Pelagios model of phases 1 and 2 uses annotations to facilitate linking (in our case through common references to places) rather than trying to unify different models. By enabling linking, each partner’s site also serves as a gateway to another, thereby maximizing the potential discoverability of these resources and avoiding fruitless attempts at creating individual portals that are supposed to do everything. Yet, even if we are decentralized, for linking to be facilitated we need a lightweight structure.

In Pelagios phase 3 work is concentrating on three areas. Since we are extending our model into new regions and time periods, gazetteers - essentially databases of place names - are crucial. Again our approach is to enable the linking between resources rather than trying to build a super gazetteer that contains all place names over time. With the aim of aligning gazetteers, we are currently investigating interoperability: What might a gazetteer 'ecosystem' look like? Options include using popular gazetteers as a backbone, though each come with drawbacks (the Getty Thesaurus of Geographic Names is heavily curated, minimizing community involvement, while Geonames includes extraneous information like every hotel in Berlin), and the SKOS vocabulary 'close match' label to enable links between gazetteers. For the meeting we've brought along a first preview of our 'cross gazetteer search', which runs on top of the linkages between the datasets from Pleiades and DARE. A screenshot of the user interface to the system is shown below.

Figure 1. Cross-Gazetteer Search Preview UI

Our second task is to enable annotations to be made on primary data (both textual and visual), so that place names can be identified. Initial attempts at building a toolkit for annotating texts will be discussed in forthcoming posts on this blog. As for the challenge of annotating maps, two questions are particularly relevant: where can we get computers to do the heavy lifting? And where do humans have to come into the loop? Finally, we are also investigating ways of visualizing the resources in our network. Our heat map provides an early indication not only of the spatial spread but also the intensity of the resources.

These three areas—relating to gazetteer interoperability, annotation methods and visualization—were the subjects of discussion.

Gazetteers
The DAI started work in May to build a gazetteer of the Institute’s archaeological and bibliographical records. They have also been working with Wikidata and Wikimedia to explore how knowledge about the Roman frontier (the ‘Limes’) can be aggregated and used. One such example is an interactive timeline (seen below), showing how the border changed over time. Markus Schnöpf is currently working on a gazetteer for the Islamic world, which could help provide the basis for future Pelagios activity with Islamic texts. Meanwhile, at Stanford, Josh Ober’s team are developing a digital version of Mogen Hansen’s Polis inventory, which will not only provide a comprehensive dataset of settlements in ancient Greece, but also allow them to be searched in various ways using a simple browser plug in map. (Watch this space for developments.) These projects join a list that includes Pleiades, the Digital Atlas of the Roman Empire, Chinese Historical GIS, and Past Place, as the key protagonists taking the first steps towards creating a gazetteer ecosystem.

Figure 2. An interactive timeline of the Roman ‘Limes’ (frontier)

Annotation methods
With Greg Crane’s Humboldt Professorship at the University of Leipzig, various new initiatives are being launched with the aim of utilizing digital resources for the study of the ancient world. One of these, the Historical Languages eLearning Project, is experimenting with e-learning strategies for teaching ancient Greek and Latin based around annotation. Pelagios could work with this team to help in cases of disambiguating names that prove too challenging for our automated workbench, or to experiment with using games to scale up annotation over larger number of documents. The ARIADNE project, here represented by Martin Doerr and Gerald Hiebel, is laying the foundations for inferencing over data rather than just data retrieval (which is what Pelagios focuses on). In particular, the CIDOC-CRM model adopted by ARIADNE uses a formal structure for describing concepts and relationships that, while more complex semantically, is compatible with the Pelagios annotation model; moreover, the results of Pelagios can be used as the basis for CRM-compliant data.

Visualization
Throughout the discussion, we were also concerned about visualization developments that can help in the understanding and analysis of potentially massive datasets. Dirk Wintergrün presented on GeoTemCo, a platform for visualising spatio-temporal data. This potentially looks very powerful, and will be especially interesting once temporal content (derived from e.g. publication dates, person references and other sources) are combined with place annotations. We give one example below, since it provides a new way of looking at data that members of the Pelagios team have produced in a previous project, GAP. Figure 3 shows GAP data from Herodotus and Pausanias in GeoTemCo, enabling the analysis and comparison of geographical referencing of these different books. In particular, Marian Dörk demonstrated a wide range of exciting visualization possibilities that could answer specific research questions and more generally appeal to the general public.

Figure 3. A comparison of places in Herodotus and Pausanias, using GAP data in GeoTemCo

Friday, 15 November 2013

The nesting of EAGLE within Pelagios

In our previous post we introduced what EAGLE is and what it hopes to achieve. In this post we outline briefly some particularities with our data structure that demonstrate what we are bringing to Pelagios.

Most fundamentally we use the term ‘place’ as it is defined by Trismegistos Geo: this means taking ‘place’ in its broadest sense, to refer not only to towns and villages, but also to regions, districts and all kinds of micro-toponyms. All toponyms referring to a single place are listed on their individual cards, each of which has a unique TM Geo_ID number. The number itself contains no information, but creates a numerical order. If two places are identified and their cards joined, the Geo_ID number of the old card is preserved but henceforward contains only a reference to the new card.



For example, Trismegistos Geo lists two kinds of places: ancient places attested in both literary and documentary sources, and modern places insofar an ancient document has been found there. Sometimes in fact no information about the ancient toponym is available and the findspot of an ancient text has to be recorded with its modern findspot. With regard to ancient places, it is not always clear what is a real toponym and what is a common noun that refers to a geographical item (also called appellatives in linguistic studies). In this matter, Trismegistos follows the practical rule that any toponym listed in the geographical index of a publication is also listed in the geographical database. Trismegistos Geo is also adding to PLEIADES id for some location, in order to facilitate the recognition of geographical entries in other databases. In addition, the cards store all names and variants; among them a standard name is chosen both for the ancient and the modern name. Moreover, every place is ascribed to a modern country, an ancient region and a Roman provincia, each item in a separate field. The standard name for the modern country is the one used in English, and the correspondences between each modern country or region and the ancient provinces are those in use at the Epigraphic Database Heidelberg.

Aligning the inscriptions in Trismegistos will mean that the “annotated thing” not only will represent the most up-to-date unique entry for that text but also will in turn link to multiple independent editions of the same text where they exist and indeed to all quality curated editions from the EAGLE BPN. In this way we will help minimize the possibility of duplicating records for the same place.


In the long term, we look forward to aligning both Trismegistos and Pleiades to Wikidata, in order to bring together the richness of both of these gazetteers. As we see it, establishing a network of gazetteers—one of the aims of Pelagios 3—is a highly valuable step towards harmonizing practice and making content reusable and extendable. We look forward to working with the Pelagios team to take linked ancient world data one step further in terms of data networking and interoperability, and together help facilitate research in all disciplines of the field, digital or otherwise.  

Thursday, 14 November 2013

The EAGLE flies with Pelagios

EAGLE—the Europeananetwork of Ancient Greek and Latin Epigraphy—is joining Pelagios. EAGLE is itself a Best-Practice Network (BPN), co-funded through the ICT-Policy Support Programme of the European Commission, and aims to create a new online archive for epigraphy in Europe. As part of Europeana’s multi-lingual online collection of millions of digitised items from European museums, libraries, archives and multi-media collections, EAGLE will link and connect, using Linked Open Data (LOD) best practice, thousands of inscriptions, photos of inscriptions and related contextual items in a single readily-searchable platform.



The project will make available the vast majority of surviving inscriptions from the Greco-Roman world, complete with the essential information about them and, for all the most important, one or more translations. By joining Pelagios, EAGLE will be able to connect with other major online projects about the Ancient World and make its data accessible to other aggregator and LOD projects to increase the quality, usability and accessibility of data provided by the BPN. For example, our partner Trismegistos (KULeuven) has gathered geographical information concerning the provenance of the inscriptions listed by the major content providers—a total of some 35,235 place records and 124,569 place attestation records.

The EAGLE BPN looks forward to the possibilities of connecting materials that have for a long time been viewed only in isolation as a result of separation and localism. There are four tasks towards achieving this vision data wise:
  • To make all content available in Europeana, the largest culture and heritage aggregator in Europe (#AllezCulture)
  • To use Wikidata for our translations of inscriptions. By gathering all existing translations of inscriptions and providing an easy-to-edit online database of translations, EAGLE aims to enrich both those data that are present in Wikimedia Commons with curated content from the databases, and the database contents themselves with contributions from the wider public
  • To produce an open, interoperable format. In the Eagle portal, data will be available in XML files compliant with EPIDOC/TEI guidelines.
  • To produce open vocabularies that align existing models used by single content providers. These will provide many other URIs which, we hope, will become a way to further connect other data on the basis of Object TypeMaterialType of inscription, to mention just some.


We at EAGLE are excited about joining Pelagios and look forward to enabling online research about the ancient world take off.