Tuesday, 25 September 2012

Squinchpix’s experience converting to Pleiades-compliant names


The process of using Pleiades names consists of getting access to the specific name IDs.  Pleiades provides a site that will return these IDs here: http://pleiades.stoa.org/places/

On that page the user can type in the required name and retrieve the ID in the resulting URI.
SquinchPix actually maintained no location information for the pictures in its DB.  The place names are embedded in the captions, of course, and in the tag or keyword tables.  But the tag ‘Rome’ is not treated any differently from the tag for, e.g., ‘concrete’ or any other tag.  As a result there is no easy way to specifically pick out place name tags in an automated fashion.  What SquinchPix has done all along is maintain a pretty accurate lat/long pair for every picture.  It’s the lat/long pairs that drive location services on SquinchPix such as the Google map that gets generated dynamically for every image. 

In order to participate in the Pelagios project SquinchPix decided to make two changes to the DB.  In the table which contains information for each picture (‘PI’) a field was added for an unambiguous modern name for the location of the picture. 

Then came the work of actually rooting out the place names from the keyword table and associating the right place name with the right picture.  We wrote a script that looked for all the pictures that were keyworded ‘Rome’.  Those that were keyworded ‘Rome’ had the word ‘Rome’ entered in the new dedicated place name field by the script.  The script just dumped out the captions for those which were NOT keyworded ‘Rome’.    Then we inspected those captions looking for more place names.  Next came ‘Athens’, then ‘Mycenae’, ‘Naples’, ‘Tiryns’ and the rest.  For each new place name the script labeled that many more pictures and forced out fewer and fewer captions.  From 20,000 pictures without place names we used iteration to reduce that number to about 300 after two days of work.  By the end of that time each locatable picture had a specific place name associated with it.  The remainder were almost all pictures of artifacts with no secure find spot.  That remainder could probably be identified with some larger Pleiades-compliant name such as ‘Syria’, or ‘Mediterranean’ but that work is for another stage.

The second big change to the DB  was the creation of a separate table that used that same modern place name established in step 1 as an index to a set of doubles.  The doubles were simply the corresponding Pleiades-compliant name and the Pleiades ID.  This table was populated by hand, entry by entry.  On SquinchPix there are about 170 distinct and unambiguous place names so that there are that many records in this new table.  In addition to using the Pleiades look-up facility we made use of the .kml which we ran in Google Earth in parallel.  If we couldn’t find the place in Google Earth then we used the look-up facility.  Even though dealing with a much smaller number of records this hand-population took about four days.

Once that table was populated we had a secure way of going from the specific picture to its modern place name and then to the Pleiades-compliant name/ID pair. Now we simply wrote a script that would traverse all the pictures, get the Pleiades-compliant name and number and use it to write out the Turtle-compliant record. In this way (the extra table, that is) we could confine the fluctuating nature of the Pleiades project to a ‘localized’ corner of the DB. We anticipate that this table in our DB will change and will be maintained and updated on an ongoing basis. The reason for this is that Pleiades is dynamic and also our ideas about specific places and names may not mesh cleanly with theirs in all instances thus necessitating the occasional negotiation. To their credit they are very responsive to questions and suggestions about place names. I would urge anyone engaged in a conversion project to communicate with them whenever better ideas about place names or locations should surface.

No comments:

Post a Comment