Data Cleaning and Transformation, COM-3590-331



Spring 2019,
TR: 3:00 - 4:15

"TEXTBOOKS"

  • [PAPER] “Quantitative Data Cleaning for Large Databases”, by Joseph M. Hellerstein
  • [ SLIDES] "Quantitative Data Cleaning for Large Databases". Keynote, QDB Workshop, 2009. A survey of basic concepts in Robust Statistics, techniques to scale them up to large datasets, and implications for improving data entry forms.

Tentative Schedule

  Topics Readings
Week 1 Course overview. Garbage in, garbage out: how dirty data can impact analysis. Errors vs. artifacts. Sources of errors in data and their telltale signs in data sets.
Slides: INTRO

Week 2
Week 3
Week 4

DATA SCIENTISTS TO FOLLOW:



Karlijn Willems DataCamp, MEDIUM

RESOURCES



Data Preparation and Feature Engineering in ML
Visual Literacy: An E-Learning Tutorial on Visualization for Communication, Engineering and Business
Gapminder. Gapminder produces free teaching resources making the world understandable based on reliable statistics. Gapminder promotes a fact-based worldview everyone can understand.

10 Examples of Data Cleansing
7 Types of Data Artifact