OpenRefine features several functions for performing commonly needed transformations to your data. Many of these transformations are intended for data cleaning purposes and can also be done by using regular expressions. All these transformations are logged in your tracked history.
For quick reference, consult the list of transformations and their function below.
Transformation | Function |
---|---|
Trim leading and trailing whitespace | Removes extra spaces before and after the value in each cell in the selected column. |
Collapse consecutive whitespace | Removes extra spaces between individual strings within a cell value in the selected column. |
Unescape HTML entities | Removes HTML strings indicating particular characters (e.g., “&” vs “&”). |
Replace Smart quotes with ascii | Replaces Smart quotes to their ASCII (e.g. ' and ") |
To titlecase | Changes the first letter of all words within a cell value to an uppercase letter. |
To uppercase | Changes all letters of all words within a cell value to uppercase letters. |
To lowercase | Changes all letters of all words within a cell value to lowercase letters. |
To number | Changes the cell format to number. |
To date | Changes the cell format to date. |
To text | Changes the cell format to text. |
To null | Changes the cell format to null. |
To empty string | Changes the cell format to empty string. |
Adapted from OpenRefine LibGuide (2023). University of Illinois Urbana-Champaign.
Your data may include duplicate information which is unnecessary to keep as part of the data set.
Tips
Adapted from OpenRefine LibGuide (2023). University of Illinois Urbana-Champaign.
Sometimes you will have columns with lots of pieces of data all bunched together. There are several ways OpenRefine allows you to separate these pieces of data into more useful divisions, either into new columns or into a multi-row record.
NB! The new rows will still be part of the same record. For more information on rows and records see the Layout section
The pop-up window allows for several options and control when splitting cells. The options are:
Example
A column contains dates in the YYYY/MM/DD format. You wish to divide this into three columns (Year, Month, Day). Under “by field lengths,” use the list “4, 1, 2, 1, 2” indicating “year, slash, month, slash, day.” The result will be five columns, two with only slashes which you can remove, and three with the Year, Month, and Day respectively.
Tips
Example
A column contains dates in the YYYY/MM/DD format. You wish to divide this into three columns (Year, Month, Day). Under “by field lengths,” use the list “4, 1, 2, 1, 2” indicating “year, slash, month, slash, day.” The result will be five columns, two with only slashes which you can remove, and three with the Year, Month, and Day respectively.
Tips
Adapted from OpenRefine LibGuide (2023). University of Illinois Urbana-Champaign.
OpenRefine allows to combine information from multiple columns. This method allows you to combine the contents of two columns, add a specific string or characters to a column's values, or a combination of both. There are several ways to combine contents from multiple columns.
The new column will be placed after the column you chose. You can reorder the columns after.
NB! This function requires the use of regular expressions. The relevant expressions are provided here, but for more information, please refer to the section on Google Refine Expression Language (GREL).
Go to the column you would like alter and click the arrow button on the column header.
Select the “Edit columns” and then choose “Add Column based on this column.” OR
Select the "Edit cells" and then chose "Transform."
If selecting 2, you can create a new column for merging data. See an example of the first picture below.
If selecting 3, you can overwrite merging data in the column you chose. See an example of the second picture below.
A pop-up window will appear with several sections, including a GREL expression input box and a preview panel. For more information on this window, refer to the Google Refine Expression Language (GREL) section.
The expression for combining column information is:
value + cells[‘Column’].value OR
cells[‘Column1’].value + cells[‘Colmn2’].value
"Value" indicates the values in the current column.
"Column" should match the name of the column whose values you would like to combine with the current column.
If a data type in a column is number, the result will return to sum of two values in each row. To avoid this, add string as following step 5.
To combine column information AND additional strings, use a “+.” For example:
value + “-” + cells[‘Column’].value OR
cells[‘Column1’].value + “-“ + cells[‘Colmn2’].value
Adds a dash between the values of the current column and the values of the column to be combined with the current column.
Once you have typed your expression into the expression box, check the preview window to confirm that your transformation looks the way it should.There is also a section to the right of the expression box which will tell you if there are any syntax errors in your expression.
When you are satisfied with the preview, select “OK.”
NB! When inputting the name of the column you wish to combine with the current column, remember that regular expressions are CASE SENSITIVE. Nothing will happen if the column name is not exact.
If some sells contain null values, null cells will return when merging columns using these techniques. To avoid null values, follow a few simple steps:
Adapted from OpenRefine LibGuide (2023). University of Illinois Urbana-Champaign.
OpenRefine includes functions which allow you to reshape your data, transposing information between rows and columns to visualize the information in a different arrangement.
NB! If you enter “2,” OpenRefine will create two columns, pulling cell values from rows in sets of two. For example: (Row 1, Column A) and (Row 2, Column A) will become (Row 1, Column A) and (Row 1, Column B) respectively.
Tips
Tips
Adapted from OpenRefine LibGuide (2023). University of Illinois Urbana-Champaign.
The Library proactively supports and enhances the learning, teaching, and research activities of the University. The Library acts as a catalyst for your success as University of Galway’s hub for scholarly information discovery, sharing, and publication.
Library
University of Galway
University Road,
Galway, Ireland
T. +353 91 493399