Author: Toby Burrows
My goal is to export data relating to manuscripts formerly owned by Sir Thomas Phillipps (1792-1872) from the MMM Portal, and then to import these data into a separate nodegoat database of Phillipps manuscripts. This has involved the following steps.
In the MMM Portal:
- Filter for Thomas Phillipps as an owner: result = 8,750 records
- Export these results as a CSV spreadsheet into the Yasgui SPARQL service, with the accompanying SPARQL query
In Yasgui:
- Edit the SPARQL query from MMM:
- Remove unwanted elements – chiefly the IDs and some provenance events; keep the labels
- Add two missing queries: Phillipps number, and number of miniatures
- Remove the 10-manuscript limit in the query
- Re-run the SPARQL query (21 variables)
- Result = 149,777 rows and 21 columns (81.9 seconds) – an average of 17 rows per manuscript
- Download the spreadsheet
In Google Sheets: (OpenRefine could also be used here)
- Upload and open the CSV file
- Fix UTF-8 character display problems
- With the Power Tools add-on, use Merge and
Combine to combine the rows relating to a single manuscript:
- Use a semicolon delimiter to merge values
- Remove an additional semicolon at the beginning and end of merged values – where there was empty content;
- Result = 8,750 rows / manuscripts – 6,882 of them with Phillipps numbers
In nodegoat:
- Upload the amended CSV file
- Create Import Templates for each section of the import process
- Import the objects (manuscripts) and descriptive fields to nodegoat
- Import production and transfer events as sub-objects – use the MMM URI as the field for matching with objects / manuscripts
I will document the nodegoat process in more detail on my Phillipps project blog.