CCNMTL (1999-2015) pages for archival purposes only. Please visit CTL.columbia.edu.

Clean Up Messy Data with Google Refine

It is often difficult to analyze data across multiple sources. Data points like "personal income" might be called "income, personal" or "personal inc." or "pers incm" or any number of other, similar names across datasets. Google Refine helps normalize datasets to help make the analysis smoother and more accurate.

Faculty might choose to use Refine to clean up "real" data before giving it to students for analysis. This might lend an aura of authenticity to an otherwise bland "dataset" assignment. Advanced undergrads or grad students might use the tool independently to normalize disparate datasets for final- or group-assignments in which accurate, authentic data is key.

For more information about how to use this tool visit the project page.

Video source: Google Refine YouTube channel