Exporting and Re-using Data from the MMM Portal

Author: Toby Burrows

My goal is to export data relating to manuscripts formerly owned by Sir Thomas Phillipps (1792-1872) from the MMM Portal, and then to import these data into a separate nodegoat database of Phillipps manuscripts. This has involved the following steps.

In the MMM Portal:

  • Filter for Thomas Phillipps as an owner: result = 8,750 records
  • Export these results as a CSV spreadsheet into the Yasgui SPARQL service, with the accompanying SPARQL query

In Yasgui:

  • Edit the SPARQL query from MMM:
    • Remove unwanted elements – chiefly the IDs and some provenance events; keep the labels
    • Add two missing queries: Phillipps number, and number of miniatures
    • Remove the 10-manuscript limit in the query
  • Re-run the SPARQL query (21 variables)
    • Result = 149,777 rows and 21 columns (81.9 seconds) – an average of 17 rows per manuscript
  • Download the spreadsheet

In Google Sheets: (OpenRefine could also be used here)

  • Upload and open the CSV file
  • Fix UTF-8 character display problems
  • With the Power Tools add-on, use Merge and Combine to combine the rows relating to a single manuscript:
    • Use a semicolon delimiter to merge values
    • Remove an additional semicolon at the beginning and end  of merged values – where there was empty content;
  • Result = 8,750 rows / manuscripts – 6,882 of them with Phillipps numbers

In nodegoat:

  • Upload the amended CSV file
  • Create Import Templates for each section of the import process
  • Import the objects (manuscripts) and descriptive fields to nodegoat
  • Import production and transfer events as sub-objects – use the MMM URI as the field for matching with objects / manuscripts

I will document the nodegoat process in more detail on my Phillipps project blog.

Leave a Reply

Your email address will not be published. Required fields are marked *