Thursday 14 June 2012

Arachne Void descriptions

In this blogpost we describe how the VoID RDF description of the Arachne Pleiades linkage works. As a result of the Pelagios compliancy work, we are introducing some mechanisms to the datastructure of Arachne itself that will mean changes in future iterations. Thus we have chosen the VOID description of our Pleiades linkage to reflect that.

The VoID descriptions

The void dataset describes the data that have been matched to Pleiades. We have chosen the VoID:linkset for the general definition of the interlinkage set between Arachne and Pleiades.

The general interlinkage set (ArachnePleiadesLinkage) divides into two groups. A place matching (Arachne2Pleiades_Places) and an object matching (Arachne2Pleiades). The matching is split for two reasons. Sometime in the near future, Arachne will start using the DAI-gazetteer where place information will be shared among the different web-resources of the DAI - a Gazetteer, a Web-GIS, Arachne, Zenon pp. At that point the place component will be "outsourced" from Arachne. The other data set contains all objects that are “inferred” from the place matchings. So it uses the internal linkage between Places and Arachne objects, etc.

These two sets have subsets that combine the results of a matching process at a specified time. We have tried to include this information in the first matching, but without the void data set description this has been a time consuming task, since every previous annotation had a creation time. This problem has been solved by attaching the information about the creation time to the data set. The time related information relating to the creation of a matching is now also reflected in the set hierarchy.

The split between places and other entites in Arachne has been a more complicated task because they were held in one triple space. We have tried to overcome this issue by putting the entities into different .n3 files in the downloadable zip Archive. This can now be archived by using the VoID descriptions.

In short, our approach tries to address four problems:
  1. The data will grow. Neither Arachne nor Pleiades are yet complete at the time of the matching process. Any data that is put into Arachne or Pleiades after the matching process will not show up in it. So, from a future perspective, the matching is going to be incomplete very soon and will have to be undertaken again.
  2. The data themselves will change. For example: if a place gets a more precise coordinate, the matching results will also differ in some way. Here, a versioning of datasets represented in the URI on both sides would be a solution for an “everlasting” matching. 
  3. The matching process will be enhanced, so, for example, the results can be more accurate. 
  4. Keeping old stuff available will be important. If you are using data for your project that is not up-to-date, you can still reference the information by a unique data set and a unique URI of a match. This is essential because places can match one time and will fail to match the other time (depending on Problem 1 or 2).

Prof Dr. Reinhard Förtsch and Rasmus Krempel, Arachne Database, CoDArchLab

No comments:

Post a Comment