Tuesday, 8 March 2011

Pelagios on Big Data

Recently JISC commissioned a reporter to go to the O'Reily Strata (Big Data) Conference in California. The report, which can be accessed here, throws up a number of interesting questions that are pertinent to our own project. We sketch out some initial responses here—but we’d welcome further comments from within the team or from any of our readers.

Given the opening questions asked about Big Data—What information is out there? How can we find it? And how can it be accessed?—it is perhaps no surprise to find that the Pelagios team is sympathetic to much of the discussion. From a brief reading of this document, three things strike us as being of particular relevance:

1. Open access. The concluding point, ‘if data is less open, it is less useful—limiting access limits value’, is one with which we are in total accord. The importance of data being open is particularly acute in an academic environment, where cutting edge research depends fundamentally on the ability to access datasets of all different kinds. If somewhat more geared towards a science model of research, nevertheless open access could transform (in a positive way) the Humanities, making it incumbent on the researcher to show his/her working out: just what is the evidence for this or that interpretation, and so on. But access is not the only issue; data must also be reusable—which presents its own range of technological and intellectual challenges. Furthermore, care must be given if we are to avoid a tyranny of openness, by virtue of which research that (for whatever reason) is less open (whatever that might mean) is passed over or fails to gain funding or prestige. Or, to put that differently, does the mere fact that something is open make it worthwhile research?

2. Infrastructure. This issue seemed to be the keynote of the blog. It is raised by (among others) Mike Olson (Cloudera), Rod Smith (IBM) and Abhishek Metha (Tresata); and by Steve Midgley (U.S. Department for Education). How do we find and identify valuable resources? And how can it be connected up? Just such an issue has been the concern too at the European Science Foundation (http://www.esf.org/research-areas/humanities/strategic-activities/research-infrastructures-in-the-humanities.html)—the practice to be guarded against being termed ‘data silos’. But, given the importance of ‘infrastructure to store and share data within sectors’, how can it be achieved? Various ‘top-down’ approaches (whether commercial or educational) have thus far fallen flat either because of insufficient user uptake or the lack of reusability of the data, tools, etc. Accordingly, Pelagios supports a ‘bottom-up’ Linked Data approach, by which a variety of user groups work together to connect their resources in an open and transparent way, which others can assess and to which they may add their own. We certainly support the idea that resources should be shared wherever possible and that research (particularly in the Humanities) needs to move beyond a narrow competitive mentality.

3. Users. This is something touched upon in our previous response and that is implied in much of the blog—but whose fundamental importance is not really addressed. Focus seems to be rather on the producers and how they can provide context for the data or an expert ability to disseminate knowledge and understanding. But what about the user-end of the communication (something which Humanities teaches us to pay attention to)? Who are the users? Why do they use some Big Data and not others? And how can they be empowered to bring together for themselves different kinds of information? This is the critical challenge facing anyone working in the digital medium. In a commercial context it is clear that there is increasing demand for ‘Data Scientists’ capable of making sense of it all. Arguably that has always been the role of academics, yet few of us have the requisite technical skills and domain knowledge demanded by these new resources. Pelagios is a consortium project for this very reason and we suspect that multidisciplinary research groups will, almost of necessity, be the organizational structure best suited to exploiting academic Big Data.

Pelagios Project Plan Part 5: Project Team Relationships and End User Engagement

Pelagios is made up of an international consortium of projects leading digital research on the ancient world, the more established of which already have large user communities: in short, these groups are the end users of the Pelagios project. Each partner has at least one prominent member on the Pelagios working group to represent them. So, welcome to the team:

Elton Barker (GAP, The Open University) Principal Investigator
Elton is the non techie on the project: he’s a lecturer in Classical Studies at the OU with a specialism in all things Greek (Homer, tragedy, historiography...). But he’s slowly slipping into the murky world of Digital Humanities having been Principal Investigator of HESTIA - a project investigating spatial concepts in Herodotus - and now GAP. (He’s also assisting in the promotion and understanding of DH at the OU.) Elton is in overall charge of all things Pelagios related, like overseeing progress of the work packages and making sure that the team delivers on what it promised: in other words, it's his neck on the JISC chopping-block if things go pear-shaped - not that he's worried. He’s also responsible for GAP’s contribution to project documentation (meaning that he’ll try to put the techie talk into plain English).

Leif Isaksen (GAP, Southampton) Co-Investigator
Leif, a strange hybrid - part philosopher, part classicist, part archaeologist, and total computer geek - is a Research Fellow at the Archaeological Computing Research Group (Southampton). Having been the technical consultant on HESTIA, Leif has moved on to be Co-I for both GAP and Pelagios. Leif is the go-to man for all the technical components of the project, working in conjunction with Development Partners to ensure infrastructural outputs are delivered. He’s also responsible for GAP’s data contribution to the project (i.e. the techie stuff).

Tom Elliott, Sean Gillies and Sebastian Heath (Institute for the study of the Ancient World) Development Partners
Tom is managing editor of Pleiades, an innovative ancient world online atlas, which was established with a grant from the U.S. National Endowment for the Humanities and which he has since developed as Associate Director for Digital Programs in the Institute for the Study of the Ancient World at New York University.
Sean Gillies is Pleiades’ chief engineer and representative in open source web and GIS initiatives. Together, he and Tom will provide support and expertise in mapping ancient place references to Pleiades’ uniform resource identifiers for each location (so called URIs). As part of this task they are helping to develop the core ontology by which different projects can point their place reference data to the Pleiades URIs.

Sebastian Heath is a Ceramics specialist with expertise in Linked Data and also responsible for nomisma.org.



Mathieu d’Aquin (LUCERO, The Open University) Dev Partner
Mathieu is a researcher at the OU’s Knowledge Media Institute and project director of the JISC-funded LUCERO project, which is exploring the means of integrating Linked Data practices for research and education in higher-educational organizations. As our ontology guru, Mathieu is looking to inject some Linked Data goodness into our project: he’s responsible for implementing the core semantic infrastructure, and will assist all the partners in its delivery and documentation.

Rainer Simon (DME, Austrian Institute of Technology) Dev Partner
Rainer is presently involved in the EuropeanaConnect project, where he leads research and development activities concerned with semantic media annotation, including the development of demonstrators for annotating and linking digitised maps and geospatial content. Rainer, then, has the coolest job of us all - to come up with funky ways of experimenting with and visualizing what Pelagios can do for you. He’s also responsible for DME’s data and documentation contributions to the project.

Greg Crane (Perseus, Tufts) Data + Doc Partner

Greg is Editor-in-Chief of the Perseus project, a trail-blazer in the world of Digital Humanities for its programme of digitizing classical texts - it currently hosts the world’s largest classical online library - with a particular focus on organising the data to meet user needs. Among supporters of Perseus are: the Annenberg/CPB Projects; the Digital Library Initiative Phase 2; the National Endowment for the Humanities and the National Science Foundation; the Institute for Museum and Library Services; the Mellon Foundation; and Google. Overseeing this vast digitised Classical empire, Greg is responsible for Perseus’ data and documentation contributions to the project.

Reinhard Foertsch (Arachne, Cologne) Data + Doc Partner
Reinhard is director of the CoDArchLab (computing archaeology) at Cologne University and leads the Arachne project, the central object-database of the German Archaeological Institute (DAI), which has over 6,000 registered users and approximately 981,000 scanned images and documentation for roughly 270,000 sites accessible free of charge and, helpfully, in English. Arachne is a partner of CLAROSnet and has recently won, like GAP and Perseus, a Google Digital Humanities Research Award. Reinhard is responsible for Arachne’s data and documentation contributions to the project.

Mark Hedges (SPQR, KCL) Data + Doc Partner
Mark is Deputy Director of the Centre for e-Research at KCL and Director of SPQR, which is not a project about the famous phrase describing the constituent part of the Roman Republic (Senatus Populusque Romanus) but rather one that adopts Semantic Web and Linked Data approaches to allow researchers to formalise resources and the links between them more flexibly. Mark is responsible for SPQR’s data and documentation contributions to the project.

Neel Smith (Ptolemy, Holy Cross) Data + Doc Partner
Neel is Associate Professor of Classics at the College of the Holy Cross (Worcester, Mass.), an architect of the Homer Multitext project, a principal designer of the Center for Hellenic Studies' CITE architecture, and is editing a geographically-aware digital edition of Ptolemy's Geography. He has been involved in digital scholarship since the early 1980s and still uses ed for quick edits.

Sebastian Rahtz (CLAROS, Oxford) Data + Doc Partner
Sebastian is the head of the Information and Support Group of Oxford University Computing Services. In this capacity he provides the technical support and development for CLAROS. He also directs the Lexicon of Greek Personal Names and works on and for the Text Encoding Initiative.

Other key project members are:
Andreas Geissler & Karin Hoehne (Arachne), Tobias Blanke + Gabriel Bodard (SPQR), Kate Byrne + Eric Kansa (GAP), Alex Dutton (CLAROS)

End-User Engagement
Pelagios has received feedback from a range of national and international organisations and initiatives including Google, EDINA, ADS, STELLAR, EnAKTing, UKLP, GeoNames, GeoDia, the European Commission, Open Context, Orbis, Regnum Francorum Online, Digital Atlas of Roman and Medieval Civilizations (Harvard) and Open Encyclopaedia of Classical Sites (Princeton). Many of these groups will be participating in the Pelagios workshop at KCL on 24 March. This event is open to ALL. Anyone wishing to attend should contact Elton at e.t.e.barker@open.ac.uk.

In addition to this blog, we are also archiving all group email discussions under pelagios-project@googlegroups.com. At the end of the project (October 2011) we will wrap up with a Web-launch of the ontology, tools and services to maximise impact.

Pelagios has consulted closely with JISC in finalizing the plan, budget and website and will also interact with, and solicit feedback from, the JISC Geospatial Working Group.

Tuesday, 1 March 2011

Pelagios Project Plan Part 4: IPR (Licensing for Content, Source Code and Data)

Update 8 July 2011: 

GAP now makes its place reference metadata available under a CC ZERO license:

-----------------------

Pelagios will make its outputs publicly available under the following open licenses:
* The ontology will be put in the public domain under a CC 1.0 BY license (http://creativecommons.org/licenses/by/1.0/) and hosted by the Pleiades Project
* Web services, tools and code will be made available under the European Union Public Licence - EUPL v.1.1 (http://ec.europa.eu/idabc/eupl) by DME
* Perseus’ contribution to Pelagios is entirely self-funded and therefore its IPR remains its own released under CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/us/); it grants the right for its contributions in Work Packages 1, 2 and 3 to be published (non-exclusively) as part of the Pelagios project output
* GAP will publish Google Books’ place reference metadata that it generates (not including Google’s source metadata) under a CC BY licence (http://creativecommons.org/licenses/by/1.0/)
* Arachne publishes its datasets under a CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/de/)
* DME will publish sample COPR RDF under a CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/de/) based on public domain sources of rasterized maps (e.g. the Library of Congress Collection)
* SPQR publishes its datasets under (various) Creative Commons licenses (see under http://tinyurl.com/LDClassics for some examples)
Process documentation will emphasise Open Source software packages where possible.