Thursday, 25 April 2013

How Ancient History Encyclopedia linked up with Pelagios

We're back for some information on how we linked Ancient History Encyclopedia to Pelagios. I hope that this can be of help for future websites that join this excellent project.

First of all, we need to explain how AHE works. The website is entirely based on tags / keywords. Each tag has one (and only one) definition associated to it, and many possible articles, illustrations, or timeline events. It is possible --and indeed necessary for the website to work properly-- that articles, illustrations, and timeline events are linked to many tags. An article on "Trade in Ancient Greece" would be tagged with "Greece", "Economy", "Trade", "Colonization", and it would subsequently be listed under all those tags' pages.

Now the initial idea was easy: Let's link up every geographical tag of ours (cities, countries, regions) to its equivalent location in Pleiades. We've got 2,400 tags, and we expect to have many more in the future, so we didn't want to do this all by hand. Instead, we wanted something future-proof, that would notify us automatically of possible matches between tags and Pleiades locations.

Every day, we automatically import the Pleiades database of names, their respective location IDs and their locations and mirror it in our database using a cron job. We wrote a nifty little PHP function that converts the Pleiades data to a PHP array -- feel free to use it.

In our editorial team's interface we have a page that automatically tries to find possible matches between Pleiades place names and tags on AHE. For links, we only look at those tags which have a definition -- after all we only want to link up content that is of use to potential readers, not empty tags. Editors can then review the link suggestions and either approve or reject them. That way, we already found most of the links between our datasets.

Suggestion from the automatic linking script

Then there is the problem of links that aren't found by our automatic matching script. For example, on AHE the tag is called "Greece" whereas on Pleiades it's known as "Hellas". Another example would be "Mediterranean" on AHE is known as "Internum Mare" at Pleiades. No script can figure that out!

For those cases, we added another functionality to our tag editor form: Our editorial team can simply search the Pleiades DB mirrored on our server for links, for each tag. An editor could for example see the tag "Greece", notice that it's not linked to Pleiades, open the linking form for the tag Greece and manually search for "Hellas".
Tag listing for editors (2nd last column is the Pleiades link)
The search will give exactly the same type of results as the automatic linking does above, with a map to help the decision-making.

When a tag is linked, we write the Pleiades ID into a newly-created field in that tag's entry in our database (hoping that Pleiades will never change tag IDs).

Now it's time to deliver all this data in a format that Pelagios can understand. We have another script that goes through all the linked tags and fetches their respective definitions, as well as all articles and illustration that are linked to them in our database. Then we output each tag definition as Turtle/RDF in the Pelagios format, linked to a specific Pleiades ID. All articles and images associated with that tag are also output for that Pleiades ID. The final result looks like this. Notice that while each definition only occurs once (one definition per tag), articles and images can appear multiple times, linked to multiple tags (as one article or image is linked to many tags).

Personally, I find that Turtle/RDF is somewhat mindboggling and not exactly easy to understand (I'm not a professional programmer), but with the excellent help of Simon Rainer, Elton Barker, and Leif Isaksen we managed to make it work and validate. Thanks a lot guys... we couldn't have been able to do it without you!

We then submit the generated file to Pelagios (in the next version of Pelagios it'll be imported automatically on a regular basis).

I hope that this was helpful or at the very least interesting to anyone who is looking to link up with Pelagios. If your site is similar to ours, do feel free to drop us a line on {editor AT}! We're always happy to help!

No comments:

Post a Comment