Skip to Main Content

Digital Tools for Research

This guide provides information about digital tools that can be useful for research data management and analysis.

Getting Started

Creating a Project

  1. Open OpenRefine.
  2. To create a new project, select "Create Project."
  3. Data can be imported from a number of locations and in a number of formats. Select where you would like to get your data from, choose the files, and click "Next."
  4. This screen shows a preview of your data. Before creating the project, change the settings based on the file format (e.g. CSV, TSV, XLS, XLSX, JSON, MARC, Wikitext, RDF, XML).
    • For CSV or TSV data, set the character encoding to match the original file, identify how columns are separated, and choose options for  parsing the data.
    • When you are done, click "Create Project."
  5. You are now ready to work with your data.

Opening a Project

Once you've created a project, you can return at any time by using the "Open Project" tab on the home screen. Select the "Open Project" tab and choose the project you want to work on from the lis


Adapted from OpenRefine LibGuide (2023). University of Illinois Urbana-Champaign.

Layout

In the top right corner there are three buttons:

  1. “Open…” returns you to the home screen where you can select projects.
  2. “Export” opens a dropdown menu of options to export your data.
  3. “Help” opens the OpenRefine User Documentation in a new tab in your browser.

Below the bolded header stating how many rows/records there are two options:

  1. “Show as” allows you to change the grid view between rows and records. For more information on the difference between rows and records, see the explanation of Records and Rows below.
  2. “Show” allows you to change the number of rows/records visible in the grid view.

In the centre of the page is your data in the grid view, which looks similar to Excel. Features of the grid view include:

  1. Column headings with dropdown arrows for choosing functions
  2. Row/Record numbers and alternate row/record shading
  3. Selectable flags and stars

On the left, there is a pane with two tabs:

  1. “Facet/Filter” allows you to work on selected sections of your data, including faceting, clustering, and filtering.
  2. “Undo/Redo” tracks and stores your history, allows you to undo or redo transformations, and export a JSON file of your transformations.

Records and Rows

There are two settings for the grid view in OpenRefine: rows or records.

The difference between rows and records is that “rows” display your data in individual lines, each numbered separately, while “records” display your data in multi-line groupings depending on the relationships between the data in those lines. For example:

This data has been transformed using “split multi-valued cells” on the author field to separate different authors into their own lines. On the left, the data is displayed as “records,” showing the different lines with the multiple authors grouped together. On the right, the data is displayed as “rows,” showing each of the multiple authors as a separate line.

NB! Take caution when permanently renumbering rows or records and be aware of what setting you are viewing your data under.


Adapted from OpenRefine LibGuide (2023). University of Illinois Urbana-Champaign.

Joining Projects

In OpenRefine it is possible to merge two of your projects, linking data that you have been working on separately or making additions to an existing data set. It is important to remember that this will only work with projects that are stored in your specific instance of OpenRefine and will not work across two different instances of OpenRefine.

What is a Key?

Before you can begin merging your data, it is important to be certain your data includes a “key.” Oftentimes, data will have a unique identifier which is in turn associated with a set of information. For example, you might have an ISBN that is linked to the title of a book, the author’s name, and the publisher. In order to merge two sets of data, it is important that there is some sort of unique identifier, or “key,” for each row of data so that when the projects are merged, the program can identify which rows “match.”

How to Join Two Projects

  1. Identify the two projects you would like to merge:
    1. One project to import data INTO
    2. One project to export data FROM
  2. In the project with the data to be exported, identify the unique key you will be using.

  1. In the project into which data is being imported, select the column matching the key and click on the arrow button in the column header.
  2. Choose “Edit column” and select “Add a column based on this column.”

  1. In the pop-up window, give the new column a name and then enter this expression in the GREL expression box:

cell.cross('arg1','arg2').cells['arg3'].value[arg4]

  • arg1 = name of project you are exporting data from
  • arg2 = name of the key column
  • arg3 = name of the column you are importing
  • arg4 = indicate which value to import in the array (if multiple matches for the key) (recommended to use 0)

  1. When the syntax is okay and you are satisfied with the preview, click “OK” and the new column will be added.

Tips

  • Copy down the name of the project you will be exporting data from ahead of time or have the project open in another window so that you don’t have to switch back and forth between projects.
  • Remember that regular expressions are CASE SENSITIVE. Nothing will happen if the column name and project name are not exact.

Adapted from OpenRefine LibGuide (2023). University of Illinois Urbana-Champaign.