Data Cleaning and Transformation, COM-3590-331
Spring 2019,
TR: 3:00 - 4:15
"TEXTBOOKS"
- [PAPER] “Quantitative Data Cleaning for Large Databases”, by Joseph M. Hellerstein
- [ SLIDES] "Quantitative Data Cleaning for Large Databases". Keynote, QDB Workshop, 2009. A survey of basic concepts in Robust Statistics, techniques to scale them up to large datasets, and implications for improving data entry forms.
Tentative Schedule
Topics | Readings | |
---|---|---|
Week 1 | Course overview. Garbage in, garbage out: how dirty data can impact analysis.
Errors vs. artifacts. Sources of errors in data and their telltale signs in data sets. Slides: INTRO |
|
Week 2 | ||
Week 3 | ||
Week 4 |
DATASETS:
Some Datasets Available on the Web:
RESOURCES
Data Preparation and Feature Engineering in ML
Visual Literacy: An E-Learning Tutorial on Visualization for Communication, Engineering and Business
Gapminder. Gapminder produces free teaching resources making the world understandable based on reliable statistics. Gapminder promotes a fact-based worldview everyone can understand.
10 Examples of Data Cleansing
7 Types of Data Artifact