Friday, 4 November 2011

Perseus and Pelagios

The Perseus geospatial data now includes annotations of ancient places with Pleiades URIs. Beginning next week, the Places widget in the Perseus interface will include links to download the Pleiades annotations in OAC compliant RDF format. These links will appear for any text with place entity markup which also has places from this dataset. We are also providing a link to search on the top five most frequently mentioned of these places in the Pelagios graph explorer.

In addition, RDF files containing annotations for the occurrences across all texts in each collection will be available from the Perseus Open Source Downloads page.

To produce these annotations, we used the normalized and regular place names from the Pleiades+ dataset to identify likely matches with the Perseus places, and then the longitude and latitude coordinates from each source to validate and disambiguate these matches. Places which matched via this method are annotated with an "is" relationship to the Pleiades URI. For Perseus places which were not automatically mapped to a Pleiades URI via this method, we do a second pass at matching using the location coordinates, looking for Pleiades places within a certain range of the Perseus coordinates. Places which matched via this method are annotated with a "nearby" relationship to the Pleiades URI. These mappings are all stored with the Perseus place data in our database, and are available along with the other geospatial data for occurrences of these entities in the Perseus texts.

Going forward, we hope to be able to continue to work on improving the automatic alignment of the Perseus and Pleiades+ place data, as well as providing the means for manual refinement and correction of the annotations. In this initial pass, we were able to automatically annotate a little over 15% of the distinct ancient place names already identified in the Perseus texts. We would like not only to increase the percentage of matches with the Pleiades data, but also to begin to take advantage methods for automatically identifying place entities in the many texts in the Perseus repository which do not yet have this level of curation.

