0% found this document useful (0 votes)
10 views3 pages

Script

This tutorial by Noah Wakeland teaches how to use OpenRefine for cleaning and reconciling metadata in digital collections. It covers importing datasets, identifying inconsistencies, clustering values, reconciling with Wikidata, and exporting cleaned data. The goal is to enhance the quality and accessibility of metadata for libraries, archives, and museums.

Uploaded by

wakelnoa000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views3 pages

Script

This tutorial by Noah Wakeland teaches how to use OpenRefine for cleaning and reconciling metadata in digital collections. It covers importing datasets, identifying inconsistencies, clustering values, reconciling with Wikidata, and exporting cleaned data. The goal is to enhance the quality and accessibility of metadata for libraries, archives, and museums.

Uploaded by

wakelnoa000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Script

LIS 557
Prof Rucker
Noah Wakeland
April 17, 2025

[Intro Slide – Title]

“How to Use OpenRefine to Clean and Reconcile Metadata for Digital Collections”

Narration:

Hello my name is Noah Wakeland , and welcome to this tutorial on using OpenRefine to
clean and reconcile metadata for digital collections. This screencast is designed for
students and professionals in libraries, archives, and museums—especially those
working with metadata cleanup and digital curation. By the end of this session, ideally
you’ll know how to import data into OpenRefine, identify inconsistencies, cluster and
standardize values, and reconcile your data with external authorities like Wikidata.

Let’s get started!

[Step 1: Launching OpenRefine]

Narration:

First, open OpenRefine. If you haven’t installed it yet, you can download it from
openrefine.org. OpenRefine runs in your browser but operates locally on your machine.
Once it’s open, click on “Create Project” to start.

[Step 2: Importing a Dataset]

Narration:
Let’s import a sample metadata file. For this demo, I’ll be using a CSV file with metadata
from a small digital image collection. Click “Choose Files,” select your dataset, and hit
“Next.” You’ll now see a preview. Make sure your columns look correct—especially the
headers—and then click “Create Project.”

[Step 3: Exploring the Data]

Narration:

Here’s our dataset inside OpenRefine. Let’s focus on the “Creator” and “Subject” fields.
As you can see, there are inconsistencies—some creators are listed as “Smith, John,”
others as “J. Smith,” or even “John Smith.” This kind of variation can make searching
and aggregation difficult.

[Step 4: Clustering and Cleaning Values]

Narration:

To clean this up, click the dropdown arrow on the “Creator” column, go to “Edit cells,”
and select “Cluster and Edit.” OpenRefine offers several clustering methods. Let’s start
with the “key collision” method.

You’ll see suggestions where OpenRefine thinks the values refer to the same entity. For
example, “Smith, John” and “J. Smith” may be clustered together. If the match is correct,
click “Merge.” You can manually edit the standardized name if needed.

Once you’ve gone through the clusters, click “Merge Selected & Re-Cluster” or “Merge
& Close.”

[Step 5: Reconciliation with Wikidata]

Narration:

Now, let’s reconcile the data with an external authority to enhance interoperability. Go
back to the “Creator” column, click the dropdown, and select “Reconcile” > “Start
Reconciling.”
Choose Wikidata as the reconciliation service. OpenRefine will now try to match each
value with a corresponding entity in Wikidata.

Once the matching is complete, you’ll see confidence levels. You can review and
confirm or reject matches manually. If a creator is matched with a Wikidata entity, you
can fetch more information—like birthdates or standardized IDs—by adding new
columns.

[Step 6: Exporting the Cleaned Data]

Narration:

When you’re done cleaning and reconciling, you can export your dataset. Click “Export”
in the top right and choose your preferred format—CSV, Excel, or even JSON. This
cleaned dataset is now much more consistent and enriched with linked data references,
making it more useful for discovery and reuse.

[Conclusion Slide]

Narration:

That’s the basics of cleaning and reconciling metadata using OpenRefine. This tool can
dramatically improve the quality and consistency of metadata in digital collections,
making resources more accessible and discoverable.

Thanks for watching, and I hope this tutorial was helpful and that this tutorial helps you
in your work with digital collections and metadata!

You might also like