Blog

Publishing the Mapping Manuscript Migrations Data

At the heart of the Mapping Manuscript Migrations (MMM) project are its data: more than 20 million RDF triples describing the relationships between 222,000 manuscripts, 435,000 works, 937,000 events, 56,000 actors, and 5,000 places. One of our main goals was to make the data available for reuse by other researchers, preferably through several different channels. We also wanted to document the data publication process publicly, explain the nature and context of the data, and suggest some possible future uses of the data.

The MMM data can be explored through the MMM Portal, which offers a combination of browsing, searching, filtering, and visualizing across the entire dataset. The results of searches in the Portal can be downloaded as CSV spreadsheets via the Yasgui interface, though the size and content of these files are determined by the underlying SPARQL queries built into the MMM Portal. The data can also be inspected and browsed through the Linked Data Finland service. MMM provides a SPARQL endpoint for querying the data directly.

The main venue for data publication is the Zenodo repository, where the full MMM knowledge graph (packaged into four Turtle files) can be downloaded under a CC-BY-NC licence. The package also includes the MMM schema (together with the CIDOC-CRM and FRBROO ontologies) and a VoID description.

To document and explain the MMM data, we turned to the Journal of Open Humanities Data (JOHD). Its short data paper format (1,000 words, peer reviewed) was ideal for our purpose, and complemented the more traditional research articles we were already publishing. Our Zenodo deposit already met an important pre-requisite for JOHD publication: deposit in an appropriate public repository, with an assigned DOI. The Open Access framework of JOHD, under a Creative Commons Attribution (CC-BY) licence, also met one of our key criteria for dissemination.

We noted that an Article Publication Charge would apply. While this could be funded by the project, JOHD also offered a waiver programme when an author could prove that they did not have access to such funds.

The highly structured nature of JOHD’s short paper format was very helpful in guiding our presentation of the contextual information about the MMM dataset. Our article explained the process through which the data were aggregated and transformed, the nature of the files deposited in the Zenodo repository, and the online services through which the data were also available. We then discussed the ways in which the data had already been reused, and the purposes for which the data might be used in the future.

The resulting short article is an important component in our approach to presenting and explaining the work of the MMM Project. The process of submitting this article helped us to think about the significance and value of the MMM data in themselves, and how they might be useful to other researchers. This reflected one of the important discussions we had during the MMM Project: what were the respective values of the different products and outputs from the project?

For humanities researchers, it can be difficult to see the data as having their own value, in addition to the value of the research based on the data. For digital humanities specialists, it can be difficult to see the data separately from the Web portals and interfaces we produce. Our data publication process was an important way of learning that datasets constructed to enable and support research are important in their own right.

Author: Toby Burrows, Oxford e-Research Centre, University of Oxford

Toby Burrows is a member of the Journal of Open Humanities Data editorial board.

Exploring research questions through browsing: ResearchSpace for MMM

This report, prepared by Graham Klyne and David Lewis of the Oxford e-Research Centre, relates to an investigation into the use of the ResearchSpace software to explore the MMM data.

This document describes an experiment using ResearchSpace to realise research questions devised for the Mapping Manuscript Migrations project (MMM) over project data. It also surveys the questions to assess how well this approach would work, and briefly considers using worksets: named groupings of entities. The initial findings are encouraging, but further work will be needed to validate them in a real research context.

http://blog.mappingmanuscriptmigrations.org/wp-content/uploads/2020/12/Exploring-research-questions-through-browsing_-ResearchSpace-for-MMM.pdf

Recording Provenance in TEI-Encoded Manuscript Catalogues for More Effective Data Sharing

The effective reuse of provenance-related information from catalogues of medieval and Renaissance manuscripts depends, ultimately, on the approach taken by the manuscript cataloguer towards recording provenance. There are two possible approaches:

  • Recording the physical evidence found in the manuscript itself, usually in the form of a series of notes or narrative statements about the manuscript’s history;
  • Assembling a structured list of successive stages in the ownership of the manuscript, usually in chronological order, together with information about the evidence for each stage.

This short paper is the result of the MMM Project’s work in transforming TEI-XML documents from Medieval Manuscripts in Oxford Libraries into RDF triples which could be mapped to the MMM unified data model.

It assesses the results of this process, and examines ways of structuring TEI-encoded descriptions in order to improve the effectiveness of their mapping to event-centric data models.

http://blog.mappingmanuscriptmigrations.org/wp-content/uploads/2020/12/Encoding_provenance_in_TEI_rev.pdf

You can read more about the MMM Project’s work with the Oxford TEI-XML data in this forthcoming paper: Burrows, Toby, Athanasios Velios, Matthew Holford, David Lewis, Andrew Morrison, and Kevin Page, ‘Transforming TEI Manuscript Descriptions into RDF Graphs,’ Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies (GraphSDE) 2019 proceedings, forthcoming 2020

MMM White Paper

The White Paper prepared by the Mapping Manuscript Migrations project as its final report to the Digging into Data Challenge can now be downloaded from the project’s page on the Digging into Data site:

https://diggingintodata.org/awards/2016/project/mapping-manuscript-migrations-digging-data-history-and-provenance-pre-modern

A copy is also available here:

http://blog.mappingmanuscriptmigrations.org/wp-content/uploads/2020/08/mmm.digging.project.white_.paper__0.pdf

Exporting and Re-using Data from the MMM Portal

Author: Toby Burrows

My goal is to export data relating to manuscripts formerly owned by Sir Thomas Phillipps (1792-1872) from the MMM Portal, and then to import these data into a separate nodegoat database of Phillipps manuscripts. This has involved the following steps.

In the MMM Portal:

  • Filter for Thomas Phillipps as an owner: result = 8,750 records
  • Export these results as a CSV spreadsheet into the Yasgui SPARQL service, with the accompanying SPARQL query

In Yasgui:

  • Edit the SPARQL query from MMM:
    • Remove unwanted elements – chiefly the IDs and some provenance events; keep the labels
    • Add two missing queries: Phillipps number, and number of miniatures
    • Remove the 10-manuscript limit in the query
  • Re-run the SPARQL query (21 variables)
    • Result = 149,777 rows and 21 columns (81.9 seconds) – an average of 17 rows per manuscript
  • Download the spreadsheet

In Google Sheets: (OpenRefine could also be used here)

  • Upload and open the CSV file
  • Fix UTF-8 character display problems
  • With the Power Tools add-on, use Merge and Combine to combine the rows relating to a single manuscript:
    • Use a semicolon delimiter to merge values
    • Remove an additional semicolon at the beginning and end  of merged values – where there was empty content;
  • Result = 8,750 rows / manuscripts – 6,882 of them with Phillipps numbers

In nodegoat:

  • Upload the amended CSV file
  • Create Import Templates for each section of the import process
  • Import the objects (manuscripts) and descriptive fields to nodegoat
  • Import production and transfer events as sub-objects – use the MMM URI as the field for matching with objects / manuscripts

I will document the nodegoat process in more detail on my Phillipps project blog.

Another new MMM publication

Burrows, T, Emery, D, Fraas, M, Hyvönen, E, Ikkala, E, Koho, M, Lewis, D, Morrison, A, Page, K, Ransom, L, Thomson, E, Tuominen, J, Velios, A and Wijsman, H 2020 “Mapping Manuscript Migrations Knowledge Graph: Data for Tracing the History and Provenance of Medieval and Renaissance Manuscripts,” Journal of Open Humanities Data, 6: 3. DOI: https://doi.org/10.5334/johd.14

MMM data – new upload

The MMM aggregated dataset was updated on May 12th, 2020. This involved a new upload of data from the source datasets, followed by their transformation and mapping to the MMM data model. The updated data can be browsed and searched through the MMM portal and the MMM SPARQL endpoint:

http://mappingmanuscriptmigrations.org/

http://ldf.fi/mmm/sparql

http://mappingmanuscriptmigrations.org/