Tuesday 13 December 2011

Converting the Ure Museum data

The Ure Museum of Classical Archaeology in Reading, is one of the most recent Pelagios partners. I have just started work on converting the collection data into a Pelagios-compliant format with the help of the curator, Amy Smith.

The main task involved in this is finding a way to figure out for each item in the collection whether there are any places in Pleiades associated with the item. Once we have done this, it should hopefully be straightforward to turn this data into OAC annotations for Pelagios.

You can browse the Ure Museum database online here. There are about 3000 objects in the collection. Any information about places associated with an object is generally under either the 'Fabric' or 'Provenance' listed for the object. The fabric is usually an adjective describing where the item was thought to have been made e.g. Boeotian, Etruscan, Daunian. The provenance is generally less structured. Here are some examples of the contents of this field for a selection of different objects:
  • Probably made in Cyprus (Stubbings)
  • Found on Mount Helicon with an arrowhead, 26.7.13
  • Northern Boeotia (?), provenience unknown
  • From a burial somewhere in the Argolid.
  • Thought to be from Cyprus: T.146.II. From Poli? Cf. JHS 1890.
  • Unknown, similar to Larnaca, Kamelarga finds
  • From Carthage (or other North African site)
  • Central Italian, possibly from the vicinity of Rome.
  • Cast from an original in the Acropolis Museum, Athens
  • Said by vendor to have come from between Thebes and Chalcis
I have been given all the data as an XML dump and want to write a script to match any places in Pleiades with this information from the 'Fabric' and 'Provenance' fields in the data. I also have a copy of Pleiades+ which provides toponyms from GeoNames for the places in Pleiades. You can read more about Pleiades+ here and here.

The rough approach I have taken is to go through each item in the collection and then for each item, go through all the places in Pleiades+ and see if any match with anything in the Fabric or Provenance field.
I am hopefully not far off getting all the special cases sorted out and should have this completed in the early new year.

Here are a few of the challenges and issues that I have encountered so far.

1) Uncertainty in the data

This was one of the first things that concerned me. As you can see from the examples above there is often a large degree of uncertainty about where items are from. In addition, items may not have been found in some spot that even has a name and may have more than one place associated with them if they have moved locations. However, as I was reminded by Leif, these are annotations we are providing. So you can provide multiple annotations for an object, it's perfectly fine to annotate an object with any location that it is remotely associated with and an annotation does not indicate an object's definite origin.

2) Location-based adjectives

Most of the fabric information is give as adjectives rather than as the place name e.g. as Corinthian rather than Corinth. Even in the Provenance data there are still lots of adjectives. Adjectives associated with a place are outside the scope of Pleiades+, so I have compiled a list of how adjectives map to places with the help of Amy. It is relatively limited because it's restricted to adjectives used in the Ure Museum database as it stands, but would it be useful for us to share this list and allow other people to add to it in some way?

I should point out that there are still some question marks even with this approach e.g. would you want a reference to Roman Britain to map to Rome? However, I suspect the number of controversial mappings is going to be small.

3) Disambiguating places

Sometimes there are multiple places with the same name. For example, there is more than one place called Salamis. How do we make sure that if we know we want the Salamis in Cyprus then it matches to http://pleiades.stoa.org/places/707617/ rather than say this Salamis? Again this is where the 'connections with' information in Pleiades would help in theory. However in practice, it looks likely that we are going to have to deal with these ambiguities as special cases in the script.

4) Granularity of annotations

If we have an object from Salamis in Cyprus, do we annotate it with both Salamis and with Cyprus or just with the more precise location, Salamis? You wouldn't necessarily expect every item from Rome to also be annotated with Italy so using the more precise location feels sensible. On the other hand it may not do any harm to annotate with both and if we do have two places associated with an object, how do we tell that one is contained within another? Pleiades has information about which places 'connect with' other places and according to Sean Gillies of Pleiades, 'you'd almost never go wrong in Pleiades by inferring containment between a precisely located place of small extent and a much more extensive place' if you used this data. However, there is a great deal of connection information missing from Pleiades, so in practice this approach is unlikely to work well.

5) Pleiades locations enclosed in text not related to the location

If you just go through the Pleiades+ data and search for each place in term in the text associated with the object, you get lots of false hits, partly because there is some slightly odd data in Pleiades such as http://pleiades.stoa.org/places/324652/. Most of these you can rule out by assuming that place names will be capitalised in the Ure Museum data and by insisting on whole word matches. However there are still occasional problems. For example the Pleiades place Artemis matches 'Sanctuary of Artemis Orthia, Sparta' and you may also want to rule out locations of museums mentioned. I have been writing special cases in my script for these. I can do this because the collection isn't too large, but I can see that with a larger collection I can see that you could easily miss some instances like these. I have wondered if the GeoParser used for GAP might help with dealing with this type of unstructured data.

6) Alternative toponyms not in Pleiades+

Pleiades+ doesn't claim to be comprehensive and I have come across a fair number of alternative toponyms, again with Amy's help, not in Pleiades+, also writing these into my script as special cases. Some of these are from the Barrington Atlas Notes in Pleiades but there are others as well. As with the adjectives, I'm wondering if there is some way of sensibly sharing alternative toponyms that we have found so as to prevent other people having to duplicate our work.

7) Vague geographical data

There are quite a few provenance entries which include locations like 'South Italy' or 'Greek Islands'. There is no way of specifying these that I have found in terms of Pleiades locations, so I have had to resort to annotating them just with 'Italy' or 'Greece', losing some of the information. Objects are also often described as being found in modern countries or places that don't always have a clear equivalent in Pleiades.

8) The historical scope of Pelagios

The Ure Museum contains objects from a wide range of periods. Pleiades focuses on the Greek and Roman world and Pleiades in a sense defines the scope of Pelagios. However, should I still annotate a Neolithic object for example with the larger region from which it comes even if the precise location is not in Pleiades?

9) Spelling mistakes in the data

There aren't too many of these, but I have also had to include some special cases for spelling mistakes (as well as for alternate transliterations of place names). Obviously the ideal solution is to get the spelling mistakes fixed in the database itself and then get a new download of the data, but I thought I should highlight this as a potential issue. If the data has only been read by humans previously who unlike a computer can easily understand what is intended, it is easy for these typos to slip through.

10) Dealing with updates to the data

It is obviously likely that more data is going to be added to the Ure Museum database as time goes on. It would obviously be possible to rerun my script but there are enough special cases that it would hard to guarantee that any new results would be comprehensive and accurate.

Next stages

Overall this is proving a really interesting exercise and good introduction to the world of Pelagios.
Once I have finished on the special cases, the next stage will then be to turn the data into OAC annotations and arranging where the data is going to be hosted. In the meantime, I'm off for the next few weeks seeing what my one-year-old makes of Christmas!

Friday 9 December 2011

Pelagios Phase 2: Project Plan

Phase two of Pelagios looks to build on our lightweight framework, based on the concept of place (a Pleiades URI) and a standard ontology (Open Annotation Collaboration), by publishing the Pelagios Toolkit-a set of services and documentation that will assist people in annotating, discovering and vizualizing references to places in open online ancient world resources.

In all, there are four Work-Packages:
§ WP1 casts the net beyond the existing partners in order to allow anyone to publish their data in a way that maximizes its discoverability. This webcrawling and indexing service will find material and - based on the Pelagios framwork and semantic sitemaps - aggregate place metadata in order to create value for the holders of that data.
§ WP2 aims to explore further ways of exploiting the concept of place. The place/space-based APIs and contextualisation service will help other users and data-providers discover relevant data and do interesting things with them.
§ WP3 tackles end-user engagement: i.e. subject specialists who lack the technical coding expertise to use the data underlying what it seen on the screen. The visualization service will explore ways of allowing these users to get to grips with the data both in a single Pelagios interface but also as embedded widgets hosted on each partner’s site.
§ WP4 distils the guidelines into a cookbook providing explicit recipes for producing, finding and making use of geoannotations for the community as a whole. In short, you won’t need to be a Pelagios partner to be able to join-in in making your data discoverable and usable.

The evolving nature of the Pelagios collective reflects the shift towards community engagement. While partners from the original Pelagios proof-of-concept project will continue to be involved, the main work for phase two of Pelagios will be carried out by: Arachne, CLAROS, DME, Fasti-online, GAP, IET (the Open University), Nomisma, Southampton, SPQR, the Ure Museum.

The outcomes, in more detail, are as follows:

D 1.1: Web Crawling and Indexing Prototype. This infrastructure component traverses resource sets on the Web (registered manually or discovered using semantic search engines like Sindice) and catalogues their place metadata. Place metadata encompasses geographical coordinates as well as Pleiades and Geonames URIs.
D 1.2: Pelagios 2 Graph API. This deliverable is an HTTP API that allows querying of the aggregate data graph generated by the Indexing Prototype. The API will provide responses in JSON and RDF format; and possibly in additional formats (e.g. KML or GeoRSS) if the need is identified in WP3. The initial range of possible queries is based on the outcome of the Pelagios project. The exact scope and structure of the final API will be driven by the requirements identified in WP3.
D 1.3: API Statistics and Reporting Interface. This deliverable will extend the Pelagios 2 Indexing Prototype with means to extract statistics and reports on the use of the API. Data partners can use this interface to gain insight into how their data is being discovered, queried and re-used within the larger online community.

D 2.1: Place-based API. This deliverable will extend the Pelagios 2 API with queries that return resources relevant to specific places or those with mereological (part-whole) relationships.
D 2.2: Space-based API. This deliverable will extend the Pelagios 2 API with queries that permit searches based on geographic scope, e.g. within a certain geographic buffer around a given location set.
D 2.3: Contextualisation Prototype. This deliverable is a service that provides ranked, relevant materials for a certain place or particular Named Entities. Results will be enriched with additional data from sources such as GeoNames, DBpedia and Freebase.

D 3.1: Evaluation of User Needs. This deliverable will report on the results of a formal evaluation of user needs regarding data visualization. The evaluation will be conducted in conjunction with project partners, and will inform the design of a set of online visualization widgets. This deliverable will have the form of a series of blog posts.
D 3.2: Widget Suite, Alpha version. This deliverable encompasses the first (alpha) version of the visualization widgets.
D 3.3: Evaluation of Widget Design. This deliverable will report on the results of observational and participatory design studies. The studies will be conducted on the Widgets as they are continuously and iteratively being developed from alpha state to final (beta) prototype. This deliverable will have the form of a series of blog posts.
D 3.4: Widget Suite, Beta version. This deliverable encompasses the final (beta) version of the visualization widgets.

D 4: Pelagios 2 Cookbook. Content Partners will produce regular documentation on data preparation, practices, tool use, etc. in the form of blog posts. The PI, assisted by the Co-Is will distil this information into a “cookbook” which will make it easier for anyone with Ancient World content to publish their data online in conformance with the Pelagios 2 common open standards.

Saturday 3 December 2011

Welcome to Pelagios - Phase 2

Pelagios is a growing collective of ancient world projects who are linking together their data so that scholars and members of the public are able to discover all different kinds of stuff about ancient places.

Phase 1 has been the proof of concept. In this stage we have linked some core ancient world projects to each other through the concept of place (a Pleiades URI) and a baseline ontology (Open Annotation). The value of those linkages is demonstrated in the Pelagios Explorer, which allows users to discover and investigate the data from those different projects in a handy search interface.

The second phase of Pelagios is to formalize that process by which anyone can join or enjoy the fruits of the Pelagios superhighway. We will provide a ‘digital toolkit’ for anyone producing material about the ancient world—not just universities but also museums, libraries, etc­—, so that their data will be more discoverable and usable. We will also be experimenting further with methods of visualizing that data so that subject specialist users and the general public can discover information about places that interest them, without having the technical expertise to do the digging themselves.

The Pelagios kick-off meeting in Greenwich: (back row) Andy Meadows (Nomisma), Sebastian Rahtz (CLAROS), Liz Fitzgerald (IET), Amy Smith (Ure Museum), Elton Barker (OU), Rainer Simon (DME), Alex Dutton (CLAROS); (front row) Leif Isaksen (Southampton), Simon Hohl & Rasmus Krempel (Arachne), Juliette Culver (IET)

It was taken by a plaque reading "Greenwich: still the centre of space and time"

Wednesday 30 November 2011

How do we balance supporting novice spatial users alongside experts? Or, is geospatial analysis necessarily GIS?

The task for table 5 at the #jiscGEO breakout discussion was to come up with a recommendation about how to balance the needs of newcomers to geospatial analysis with those of ‘experts’ in using Geographical Information Systems (GIS). To have greatest impact and value for the community, there was general agreement that any strategy should address the needs of the subject specialist user community. This means exploring how the technology can be made to work for the user, rather than necessarily ‘up skilling’ the user to become a technical expert (say, in GIS).

By focusing simply on technology training, there is the danger that, as well as being seen as irrelevant, too difficult or simply just boring for users (academics or students), the data gets overlooked or is made to fit a ‘system’ of analysis. For example, one problem of using GIS in humanities is the issue of ‘fuzzy’ data. This isn’t just a case of the system failing to cope with fuzziness: it also betrays an underlying assumption that data can, and should be, disambiguated and clear. For humanists, however, the questions driving research are often precisely those that look to nuance or complicate the material. We like messy results. Humanists need worry less about producing an accurate and/or truthful representation and more about how maps can be used as entry points to explore the data—this is seeing maps as part of the investigative process rather than as an end in and of themselves.

Ideally, then, users should be involved in the development, enrichment and adaptation of geospatial technologies, to make those tools work for them. Therefore, we recommend that JISC should build on their contacts within the HE sector to have teams of subject specialist users (i.e. the successful projects) go into universities, where this is already a JISC presence to help co-ordination, to show the target group the kinds of geospatial technologies that can be used and get them involved in shaping these tools for the future—the ‘show and tell’ unconference rolled out across the sector, as it were.

Elton, Nicola, Rasmus, Claire, Ryan, Addy

Sunday 27 November 2011

Growing use of OAC - an inventory (no doubt incomplete) of initiatives

As the Pelagios project’s Common Ontology for Place References (COPR) is based on the Open Annotation Collaboration ontology, JISC (in the person of David Flanders) suggested a blog post on the growing use of OAC in HE and research. It’s a bit late - blogger’s block - but here it is. Many thanks to Rob Sanderson for his useful input.

A lot of the work being done is in the humanities, where research practice is more human-centric and “annotation” - with various meanings - is a core component of research, or a fundamental “scholarly primitive”. Textual studies is a particularly active area:

Stanford University has been using AOC for work on annotating digitised mediaeval manuscripts. As these are frequently illustrated, this involves annotating structured text (maybe already marked up using TEI XML) and images within the texts. This has been taken up more widely in the SharedCanvas project, whose results are being used by various libraries and universities for annotating mediaeval manuscripts, including the British Library, Bibliotheque National de France, and the Bodleian in Oxford, among others.

Emblem Books are another fruitful area for annotation. These form a genre of book, popular during the 16th and 17th centuries, containing collections of emblematic images with explanatory text, usually aiming to inspire the reader to contemplate some moral issue or other. The University of Illinois at Urbana-Champaign and Herzog August Bibliothek Wolfenb├╝ttel have been collaborating with the Emblematica Online project on using OAC for annotating digitised emblem books. This also involves annotating structured text and images, although in printed books rather than manuscripts.

AustLit project, based at the University of Queensland in Australia, has been applying OAC to the development of scholarly critical editions, specifically for annotating variations between different versions of a literary work.

An analogous approach could be used with variants within a “stemma” or family of manuscripts. In fact a use case of our own may be provided by the HERA-funded SAWS project, which is looking at complex relationships between mediaeval Greek and Arabic manuscripts of “wise sayings”, so-called gnomologia. I will be looking into this further.

A little (but not entirely) beyond textual studies, OAC is also being used for annotating historical maps - the Digital Mappaemundi project at Drew University is looking at methods of dealing with mediaeval maps and related geographical texts - in fact these maps can be thought of as complex images with original annotations, so the model may fit very well. Also at Cornell, the YUMA Universal Media Annotator (YUMA) tool has been used with OAC to annotate historical map collections.

OAC has also found applications in the digital libraries and archives world (the applications are not entirely disjoint from the above):

The US
National Information Standards Organization (NISO) and the Internet Archive have launched an initiative for developing standards for creating and sharing bookmarks and annotations in e-books (announced October 2011), with various publishers interested. This will take on board the work done in OAC, although the standards developed will go beyond this.

Brown University Library is
developing an annotation framework for the Fedora digital repository software based on OAC, linking the annotations created directly with TEI-encoded texts in their repository, and exploring how annotations can be attached to structural and semantic elements within those documents. Brown’s Women’s Writers Project will provide one of the initial test cases.

MITH (Maryland Institute for Technology in the Humanities) have been collaborating with the
Alexander Street Press on using OAC to store annotations on their streaming library of educational videos. As an example of what they intend, they have produced a working prototype that allows shapes to be drawn so as to select regions of video for annotation.

And just to show that the sciences are not being ignored here,
BIONLP at the University of Colorado - who work on natural language processing of biological texts - are investigating the use of OAC with entities and relationships automatically mined from such texts, and the FP7 Wf4Ever (Workflow Forever) project is using OAC for annotating research objects.

Any more contributions to this list happily accepted!

Friday 11 November 2011

The Pelagios Graph Explorer: An information superhighway for the ancient world

Just as the settlements around the Ancient Mediterranean would seem disconnected without the sea to join them, so online ancient world resources have been separated, until now. Meaning “of the sea”, Pelagios has brought this world together using the principles of Linked Open Geodata. The Pelagios Graph Explorer allows students, researchers and the general public to discover the cities of antiquity and explore the rich interconnections between them.

The Pelagios Graph Explorer
Alice is an archaeology student from Memphis, TN. When not collecting Elvis singles, she loves nothing better than to find out about cities of the past. Recently she has come across Arachne, the database of images and finds from the German Archaeological Institute. She's interested in her hometown's namesake, Memphis, Egypt, and so she types it into the search box (fortunately it's the same word in German) and finds quite a few interesting results: 21 objects and 16 sets of photos. But what do they mean? What stories do they tell? And what role did Memphis play in the ancient world? What Alice doesn't know is that there are many other open resources out there with information about Memphis, its history and material culture and has no way to find out.

Enter the Pelagios Graph Explorer. Using the principles of Linked Open Data, the Pelagios Explorer allows people like Alice to discover those resources (including Arachne). When she types 'Memphis' into the Explorer's search box she is presented with a graph of information that shows her a range of different resources that relate to the city. Hovering the mouse over the pink circle, a balloon pops up about the Perseus Digital Library which seems to have 13 separate references to it. And clicking on a reference in the data view takes her straight there.

Now that's all well and good, but it rather begs the question: How would she find out about Pelagios in the first place? The answer is simple. As well as being a human interface, Pelagios is also an API, allowing resource curators to embed links right next to their own content. For instance, Carlo the classicist might be exploring the geographic flow of Herodotus's Histories using GapVis which has lots of handy info - a map, related sites, photos, etc. But the handy 'Pelagios Graph Explorer' link takes him straight to Pelagios and even fills in the details for him. This is the power of Linked Open Data - content providers such as Arachne can open up up a world of contextual information with a single link.

There's a lot more we could tell you about Pelagios - the fact that you can use the Explorer to find relationships between multiple cities for instance, or that it's an ever-growing collective of content providers committed to the principle of openness and public access. We could also tell you about the plans we have for Pelagios2 - to refine the data, improve the search facilities, and expand the community. But we think the best way to explore it is to have a go yourself. So why not check out our user guides and dive in!

Who are Pelagios?

Pelagios is a collective of projects connected by a shared vision of a world - most eloquently described in Tom Elliott’s article ‘Digital Geography and Classics’ - in which the geography of the past is every bit as interconnected, interactive and interesting as the present. Each project represents a different perspective on Antiquity, whether map, text or archaeological record, but as a group we believe passionately that the combination of all of our contributions is enormously more valuable than the sum of its parts. We are committed to open access and a pragmatic lightweight approach that encourages and enables others to join us in putting the Ancient World online. Pelagios is just the first step in a longer journey which will require many such initiatives, but we welcome anyone who shares our vision to join us in realising it.
Members of the Pelagios Team at our February 2011 kick-off workshop. From left to right: Rainer Simon, Greg Crane, Mark Hedges, Reinhard F├Ârtsch, Mathieu D’Aquin, Elton Barker and Sean Gillies. Missing from the photo are: Leif Isaksen, Sebastian Rahtz, Sebastian Heath, Neel Smith, Eric Kansa, Kate Byrne, Tom Elliott, Alex Dutton, Rasmus Krempel, Bridget Almas, Gabriel Bodard and Ethan Gruber

Pelagios was made possible by the following organizations:

Project Plans
  1. Aims
  2. Wider Benefits
  3. Risks
  4. IPR
  5. Team
  6. Workplan
  1. Budget Plan
Project Progress
  1. Welcome
  2. The Workshop
  3. Arachne
  4. Arachne Update
  6. Nomisma
  7. Nomisma Update
  8. Perseus
  9. The Ptolemy Machine
  10. SPQR
Technical Progress
  1. Choosing an Ontology
  2. SPARQL Demo
  3. Tagging Places on Old Maps: The DME Scenario
  4. The PELAGIOS Graph Explorer: A First Look
  5. PELAGIOS Graph Explorer - The Live Demo
  6. (Re-)Using the Graph Explorer Pt. 1: Technology
  7. (Re-)Using the Graph Explorer Pt. 2: API
  8. (Re-)Using the Graph Explorer Pt. 3: Getting Your ...
  1. Pelagios usability testing results
  2. Evaluating usability: what happens in a user testi...
  3. Evaluating Pelagios' usability
  4. The *Child of 10* standard
General Commentary
  1. Big Data
  2. The Other 15/10 Geo Projects
  3. Open Licenses
  4. To ASCII or not to ASCII
  5. What Makes Spatial Special?

Wednesday 9 November 2011

What Makes Spatial Special?

One of the nice aspects of being part of the jiscGEO programme is that occasionally we're thrown slightly more philosophical questions to chew on. The most recent one is simple but broad: 'What makes spatial special?' This is hardly a new topic of course, as one of our co-projects has pointed out. A lot of people have discussed the
significance of the Spatial Turn and Kate Jones has done an excellent job in summarizing many of the key arguments. Rather than repeat them here I thought I'd approach them from a different angle: 'Why has Space become special and not Time?'

On the face of it the two have a great deal in common. For a start they are both values that not only underpin virtually any kind of information you can think of, but as dimensions (or a set of them) they also form a ratio scale which enables us both to order it and calculate relationships such as the closeness and density of data. As the simpler of the two (with just one dimension to deal with, rather than two or three), time seems by far the easier value for people to engage with. And yet there are no Temporal Information Systems, no Volunteered Temporal Information, no Temporal Gazetteers, no 'Temporal Turn' to speak of. So why has space, and not time, become the darling of the digital zeitgeist? Here's my theory: Because we experience space statically but time dynamically, a social asymmetry exists which makes spatial descriptions more useful socially.

Both time and space are affected by the Inverse Square Law of Relevance: as every good hack knows, a person's interest in a topic tends to fall off the further away they are from it, temporally and spatially. Of course that's not an absolute rule, but on the whole people are considerably more interested in today's home game than they are in foreign matches from yesteryear. The difference between space and time is that populations perceive themselves as being randomly dispersed throughout space, whereas time seems to be experienced simultaneously[1]. As a result, maps appear to be universally relevant because the distribution of relevance is spread across them. In contrast, almost our entire global attention is focussed on just one (travelling) moment in time. So while a map of Britain is equally relevant to people in London, Bangor and Inverness, a timeline of Britain is not equally relevant to Saxons, Normans and ourselves because the Saxons and the Normans are dead.

Enough of the beard-stroking, why should we care? It seems to me that there are two important conclusions to be drawn from this. The first is that the importance of maps is created socially and not individually. Because their relevance is determined by having multiple points of view, they can be enormously enhanced through social Web
technologies which is why Webmapping, despite having far less functionality than GIS, has rapidly outstripped it in utility. The less obvious lesson is that despite its ubiquity, spatial relevance is not spread evenly. Sparsely populated parts of the world (i.e. most if it) are not considered highly relevant by many people. By the same token, places in which mankind congregates (cities) tend to be seen as highly relevant. We see this most clearly in the number and diversity of named places they create. Whereas unoccupied spaces tend to have a just a handful of big named places, densely occupied spaces have a name for every nook and cranny. That means to create really powerful, socially relevant maps we need to start thinking about visualizing places, rather than just spaces.

And what of poor old temporal technologies? Will we ever get people to be as interested in the past as they are in the present? That's for another blog post, but if you are interested, come and join us for the NeDiMAH/JISC workshop in Greenwich on November 30th where we'll be devoting plenty of space and time to the subject.

[1] Actually, physics gives us plenty of reasons to doubt that this is the case at all, but it certainly feels that way, which is what's...er...relevant here.

Friday 4 November 2011

SPQR triples - inscriptions and papyri

The SPQR project has produced just over half a million triples, describing approximately 57,500 inscriptions and papyri. The triples were derived from the following epigraphic and papyrological datasets:

The triples can be downloaded as RDF/XML from the following links:

Perseus and Pelagios

The Perseus geospatial data now includes annotations of ancient places with Pleiades URIs. Beginning next week, the Places widget in the Perseus interface will include links to download the Pleiades annotations in OAC compliant RDF format. These links will appear for any text with place entity markup which also has places from this dataset. We are also providing a link to search on the top five most frequently mentioned of these places in the Pelagios graph explorer.

In addition, RDF files containing annotations for the occurrences across all texts in each collection will be available from the Perseus Open Source Downloads page.

To produce these annotations, we used the normalized and regular place names from the Pleiades+ dataset to identify likely matches with the Perseus places, and then the longitude and latitude coordinates from each source to validate and disambiguate these matches. Places which matched via this method are annotated with an "is" relationship to the Pleiades URI. For Perseus places which were not automatically mapped to a Pleiades URI via this method, we do a second pass at matching using the location coordinates, looking for Pleiades places within a certain range of the Perseus coordinates. Places which matched via this method are annotated with a "nearby" relationship to the Pleiades URI. These mappings are all stored with the Perseus place data in our database, and are available along with the other geospatial data for occurrences of these entities in the Perseus texts.

Going forward, we hope to be able to continue to work on improving the automatic alignment of the Perseus and Pleiades+ place data, as well as providing the means for manual refinement and correction of the annotations. In this initial pass, we were able to automatically annotate a little over 15% of the distinct ancient place names already identified in the Perseus texts. We would like not only to increase the percentage of matches with the Pleiades data, but also to begin to take advantage methods for automatically identifying place entities in the many texts in the Perseus repository which do not yet have this level of curation.

Monday 31 October 2011

Adding more Nomisma Annotations to Pelagios: Direct links to Hoards

I've updated the file of annotations that link Nomisma.org and Pleiades URIs. It's available at http://nomisma.org/nomisma.org.pelagios.rdf .

As a reminder, Nomisma.org is a project establishing stable web addresses (URIs) for concepts in numismatics, which is the study of coins. We have simple definitions of mints, as in the page at http://nomisma.org/id/athens. Go there and you'll see there's not yet much information about the ancient mint of Athens itself. You will, however, see a map of hoards in which coins of Athens were recorded. That's important evidence for economic connections in the ancient Mediterranean and beyond.

The first version of the Pelagios-compliant file that Nomisma published made simple "one-to-one" links from Nomisma mint URIs to the relevant Pleiades URIs. I've now updated the file so that all hoards are linked to the Pleiades URIs of the mints of coins within them. Or at least to the Pleiades URIs that we've entered to date. As in, the following XML excerpt indicates that hoard http://nomisma.org/id/igch0039 is linked to the Pleiades URI http://pleiades.stoa.org/places/579885.

  <rdf:Description rdf:ID="igch0039">
    <rdf:type rdf:resource="http://www.openannotation.org/ns/Annotation"/>
    <oac:hasBody rdf:resource="http://pleiades.stoa.org/places/579885"/>
    <oac:hasTarget rdf:resource="http://nomisma.org/id/igch0039"/>
    <dcterms:creator rdf:resource="http://nomisma.org/"/>
    <dcterms:title>Nomisma.org annotation linking http://nomisma.org/id/igch0039 to http://pleiades.stoa.org/places/579885</dcterms:title>

It's important to note that the annotation doesn't make the nature of the relationship clear. As in, there's no suggestion here that the URI http://nomisma.org/id/igch0039 is itself an alternate definition of Athens. The annotation only says that http://nomisma.org/id/igch0039 makes reference to or adds to our understanding of Athens. In this case, the additional information is that coins of Athens are found in that place. But once this new file is ingested, it will be up to the user to click through from the Pelagios browser to find out why the annotation was made. That's the case for all the varied information that is making its way into the rich Pleagios ecosystem.

Tuesday 27 September 2011

Pelagios usability testing results

In my earlier posts, Evaluating Pelagios' usability and Evaluating usability: what happens in a user testing session?, I promised I'd share some preliminary results. Because my last posts were getting rather long, I'll keep this short and sweet.

I had two main questions going into the user testing: could a user discover all the functionality of Pelagios, and could they make sense of what they were shown?  In short, the answer to the first question is 'no', and the answer to the second is 'yes'.

I'm a big fan of testing the usability of a site with real users.  Test participants not only give you incredibly useful insights into your site or application, but they help you clarify your own thoughts about the design. It was exciting to see test participants realise the potential of the site - particularly the map and data views, which is a cue to make them more prominent when the site first loads - but it was clear that the graph interface needs improvements to make the full range of actions available for selecting, linking and exploring datasets more visible to the user.  The test participants also used search heavily when looking for particular resources, so this would be a key area for future work.

If you've been working on a project, user testing is wonderful and painful in equal measures.  It's definitely easier to test someone else's project, not least because it's easier to prioritise tasks from the users' point of view when you don't have to deal with the details of implementing the changes.

The overall goal of the usability testing was to produce a prioritised list of design and development tasks to improve the usability of the Pelagios visualisation for a defined target audience (non/semi-specialist adults with an interest in the ancient world), and this user testing was really successful in giving the team a clear list of future tasks.

Friday 23 September 2011

Evaluating usability: what happens in a user testing session?

In my last post I talked about the test plan for assessing the usability of the Pelagios 'graph explorer' for the project's (deep breath) 'non/semi-specialist adults with an interest in the ancient world' audience. Before I get into the details of what happens in a usability test session, I thought I'd introduce you to our design persona, Johanna.
Image credit: @ANDYwithCAMERA
Johanna is 21, and is a third year History student. She moved from her native Germany to the UK three years ago for university. Her goal is to get a First so she has more options for future academic work, perhaps in the Classics. She's slightly swotty, and is always organised and methodical, but finds that she's easily distracted by Facebook and chat when she's working on the computer. She can often be found having coffee or in the pub with friends, at her part-time job in a clothing store, or in the library (her shared house is often noisy when she's trying to work). She dislikes distractions when she's trying to study, and hates rude customers at work. She likes her bike, RomComs and catching up with friends. Her favourite brands are Facebook, MacBook, Topshop, Spiegel Online and The Body Shop. Her most important personal belongings are her laptop, her mobile phone, and photos of friends and family from Hamburg and college.
Johanna is technically competent, and prefers to learn through trial and error rather than reading manuals or instructions. But she also has limited patience and will give up on interfaces that are too difficult. Johanna is a heavy user of social networks and also uses online research databases and library catalogues.
Johanna has an assignment on inscriptions due in a month. She hates the emphasis on big battles and big men in the subject, and finds inscriptions dry, but has been told they can also convey interesting social history and cultural values. She's not convinced (and she's not sure whether she'll be able to make much of the language of the inscriptions) so she wants to find an ancient place that also has other historical material about it to make the assignment more relevant to her own interests.
To create our persona and design the test tasks, I quizzed Elton on the types of questions people ask when they find out he's a Classicist to get a sense of common (mis)perceptions and interests, and about the types of students he's encountered.

So, onto the usability tests themselves. The time and venue for each test was organised directly with the participant, with the restriction that we had to be able to get online, be in an environment where it was ok to talk aloud, and ideally we'd meet somewhere the participant would feel comfortable.

In my last post I mentioned writing and testings some set tasks for the usability test, a short semi-structured interview, and an introductory script. Once the participant had arrived, and was settled with a cup of tea or whatever, I'd introduce myself and explain how I came to be working with the project. I've included the basic introductory script below so you can get a sense of how a test session starts:
Thank you for agreeing to help us test the usability of the current interface for Pelagios.
We'll be using these tests to produce a prioritised list of design and development tasks to improve the Pelagios visualisation for people like you.
The session will take up to an hour and will start with a short interview, then your initial impressions of the site, and finally we'll go through some typical tasks on the site. I'll ask you to 'think aloud' as you use the site - a running stream of thoughts about what you're seeing and how you think it works. I might also ask you questions to clarify or explore interesting things that come during the session.
I want you to know that you're not being tested! We're testing the interface - anything that goes wrong is almost definitely its fault, not yours! Also, I haven't been involved in the project design, so you don't need to worry about hurting my feelings - be as direct as you like about what you're seeing.
I won't be recording this, but I will be taking notes as we go, and summarising them to pass them on the project team.
You can stop for a break or questions at any time.
Do you have any questions before we begin?
The next phase of the test session was the short interview. Again, I've included the questions below:
  • Demographic data: what is your age, gender, educational level, nationality/cultural background?
  • What websites do you use regularly (on a daily/weekly basis)?
  • What's your favourite website, and why?
  • What websites do you use in your research/daily work?
  • Have you seen sites like [Guardian, Gapminder, etc] that feature interactive visualisations?
  • How would you describe your level of experience with the classics? (e.g. a lot, a little). Do you focus on any particular area?
  • What is your definition of the classics? (Geographical, chronological scope)
Once the questionnaire was over, and any questions that had arisen had been discussed, the test began. The first part of the test covered first impressions of the 'look and feel' of the site, what they thought the site might be about, and what content it would include, and what they thought the 'blobs' that are the first view of the graph visualisation represented. I was also observing the kinds of interactions participants tried with the visualisation, whether single or double mouse-clicks, dragging, right-clicking, etc, because I wanted to know how much of the functionality of the site was intuitively discoverable.

The first formal task was: "find all the resources related to Cyrene" [or a place related to their own interests]. I'd note the actions the participants took along with their comments as they 'thought aloud'. Sometimes I'd ask for more information about why they were doing certain things, or remind them to tell me about the options they were considering. I also noted the points where the participant expressed confusion or frustration, or gave up on a task, though I didn't time the tasks or record a qualitative count of errors.

After the task, I'd ask (if it hadn't already come up):
  • What do you think these resources are?
  • How do you think they relate to your actions?
  • What contextual information might you need to make sense of these resources?
These questions were based on the team's review of the site and were aimed at making sure we understood the participant's 'mental model' of the site. If there's a mismatch between the users' mental model and what your site actually does, you need to help users develop a more appropriate mental model.

The second task, "Are there links between [Place 1, Place 2]? If so, what are they and how many are there?" was more open-ended and designed to see how participants managed small result sets on the site. Again, I had questions prepared as prompts in case they hadn't already been answered during the task:
  • What do you think you're looking at here?
  • What does the screen tell you?
  • What do you think the links mean/are?
  • What do you think the movements on the screen mean?
  • How do you interpret the results?
  • How do you think they're selected?
Finally, I asked some questions aimed at giving the project some metrics to measure improvement in the usability of the site after design updates: 'would you use the site again?', 'how likely are you to recommend it to a friend?'.  The final questions were: 'what would you suggest as first priority?' and 'any final comments?'.

After running each test, I'd tidy up my notes and summarise the key points for the team so they could prioritise the next items of design or development work. Which leads me onto my next post, which will include some preliminary results...