Publishing the Mapping Manuscript Migrations Data

At the heart of the Mapping Manuscript Migrations (MMM) project are its data: more than 20 million RDF triples describing the relationships between 222,000 manuscripts, 435,000 works, 937,000 events, 56,000 actors, and 5,000 places. One of our main goals was to make the data available for reuse by other researchers, preferably through several different channels. We also wanted to document the data publication process publicly, explain the nature and context of the data, and suggest some possible future uses of the data.

The MMM data can be explored through the MMM Portal, which offers a combination of browsing, searching, filtering, and visualizing across the entire dataset. The results of searches in the Portal can be downloaded as CSV spreadsheets via the Yasgui interface, though the size and content of these files are determined by the underlying SPARQL queries built into the MMM Portal. The data can also be inspected and browsed through the Linked Data Finland service. MMM provides a SPARQL endpoint for querying the data directly.

The main venue for data publication is the Zenodo repository, where the full MMM knowledge graph (packaged into four Turtle files) can be downloaded under a CC-BY-NC licence. The package also includes the MMM schema (together with the CIDOC-CRM and FRBR_OO ontologies) and a VoID description.

To document and explain the MMM data, we turned to the Journal of Open Humanities Data (JOHD). Its short data paper format (1,000 words, peer reviewed) was ideal for our purpose, and complemented the more traditional research articles we were already publishing. Our Zenodo deposit already met an important pre-requisite for JOHD publication: deposit in an appropriate public repository, with an assigned DOI. The Open Access framework of JOHD, under a Creative Commons Attribution (CC-BY) licence, also met one of our key criteria for dissemination.

We noted that an Article Publication Charge would apply. While this could be funded by the project, JOHD also offered a waiver programme when an author could prove that they did not have access to such funds.

The highly structured nature of JOHD’s short paper format was very helpful in guiding our presentation of the contextual information about the MMM dataset. Our article explained the process through which the data were aggregated and transformed, the nature of the files deposited in the Zenodo repository, and the online services through which the data were also available. We then discussed the ways in which the data had already been reused, and the purposes for which the data might be used in the future.

The resulting short article is an important component in our approach to presenting and explaining the work of the MMM Project. The process of submitting this article helped us to think about the significance and value of the MMM data in themselves, and how they might be useful to other researchers. This reflected one of the important discussions we had during the MMM Project: what were the respective values of the different products and outputs from the project?

For humanities researchers, it can be difficult to see the data as having their own value, in addition to the value of the research based on the data. For digital humanities specialists, it can be difficult to see the data separately from the Web portals and interfaces we produce. Our data publication process was an important way of learning that datasets constructed to enable and support research are important in their own right.

Author: Toby Burrows, Oxford e-Research Centre, University of Oxford

Toby Burrows is a member of the Journal of Open Humanities Data editorial board.

Leave a Reply Cancel reply