This is a project-based course designed to provide students training and experience in solving real-world problems using machine learning, with a focus on problems from public policy and social good.
Through lectures, discussions, readings, and project assignments, students will learn about and experience building end-to-end machine learning systems, starting from project definition and scoping, through modeling, to field validation and turning their analysis into action. Through the course, students will develop skills in problem formulation, working with messy data, communicating about machine learning with non-technical stakeholders, model interpretability, understanding and mitigating algorithmic bias & disparities, and evaluating the impact of deployed models.
Students will be expected to know python, and have prior coursework in machine learning.
Rayid Ghani | Kit Rodolfa |
---|---|
GHC 8023 Office Hours: TBD |
![]() GHC 8018 Office Hours: TBD |
Sebastian Caldas | Himil Sheth |
---|---|
Office Hours: Tue 10-11am Wed 10-11am in GHC 8009 |
![]() Office Hours: Mon 2:00-3:00pm Thu 4:30-5:30pm GHC 8th Floor, by printers |
See the draft syllabus for much more detail as well, including information about group projects, grading, and helpful optional readings.
Week | Dates | Holidays? | Lecture/Discussion Topic | Project Activity | Goal | Required Readings | Deliverable / Expected Output |
---|---|---|---|---|---|---|---|
1 | Tu: Jan 14 Th: Jan 16 |
Tu: Intro/Overview + Project Overviews Th: Scoping, Problem Definition, Balancing goals (equity, efficiency, effectiveness) |
Intro/Overview | Get familiar with the class, goals, and understand project choices | Thursday: • Data Science Project Scoping Guide • Using Machine Learning to Assess the Risk of and Prevent Water Main Breaks |
||
2 | Tu: Jan 21 Th: Jan 23 |
Tu: Case Studies + Discussion Th: Acquiring Data, Privacy, Record Linkage |
Project Definition & Data Discovery | Data Audit and Exploration TA Sessions: SQL, Databases, github |
Tuesday: • Fine-grained dengue forecasting using telephone triage services • Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning • What Happens When an Algorithm Cuts Your Health Care |
Beginning of week, team and project assignments | |
3 | Tu: Jan 28 Th: Jan 30 |
Tu: Data Exploration Th: Building ML Pipelines |
Finalize Project Scope and Data Stories | Tuesday: • TBD reading on data exploration • Practical Statistics for Data Scientists, Chapter 1 Thursday: • Architecting a Machine Learning Pipeline |
ETL of some dataset (census?) Data exploration Scope refinement |
||
4 | Tu: Feb 4 Th: Feb 6 |
Analytical Formulation / Baselines | Initial Data Science Pipeline Setup and Mockups (problem formulation and validation process) |
Tuesday: • Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations • Always Start with a Stupid Model, No Exceptions |
First week of deep dives Project Scope + Proposal with Descriptive Statistics |
||
5 | Tu: Feb 11 Th: Feb 13 |
Feature Engineering / Imputation | Code Pipeline Development | Iteration 1 - Build End to End Code Pipeline (Focus on end-to-end shell) |
Tuesday: • TBD Feature Development Case Study • Missing Data Conundrum |
Skeleton Code (Pipeline), Mockups Proposal Peer Reviews |
|
6 | Tu: Feb 18 Th: Feb 20 |
Performance Metrics / Evaluation Pt. I (splits, metrics) | Tuesday: • Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure • The Secrets of Machine Learning |
Technical Modeling Plan (features, label definition(s), model specifications, etc) | |||
7 | Tu: Feb 25 Th: Feb 27 |
(Feb 24 drop deadline) | Performance Metrics / Evaluation Pt. II (audition) | Iteration 2 - End to End Code Pipeline (Focus on feature development) |
Tuesday: • Evaluating and Comparing Classifiers • Transductive Optimization of Top k Precision |
Code (Pipeline), Initial Models (and analysis) | |
8 | Tu: Mar 3 Th: Mar 5 |
Overfitting, Leakage, Issues in Deployment | Tuesday: • Three Pitfalls to Avoid in Machine Learning • Leakage in Data Mining • Why is Machine Learning Deployment Hard? |
Early Results: Correct but Crappy | |||
9 | Tu: Mar 17 Th: Mar 19 |
(prev wk spring brk) | Model Interpretability Pt. I: global + postmodeling | Iteration 3 - End to End Code Pipeline (Focus on evaluation, results and intial front-end demo) |
Tuesday: • Interpretable Classification Models for Recidivism Prediction • Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission |
Refined Feature List | |
10 | Tu: Mar 24 Th: Mar 26 |
Model Interpretability Pt. II: local | Tuesday: • Why Should I Trust You? Explaining the Predictions of any Classifier • Model Agnostic Supervised Local Explanations • Explainable machine-learning predictions for the prevention of hypoxaemia during surgery |
Model Interpretation | |||
11 | Tu: Mar 31 Th: Apr 2 |
Bias and Fairness Pt I | Tuesday: • Fairness Definitions Explained • A Theory of Justice, pages 1-19 • Racial Equity in Algorithmic Criminal Justice [Focus on sections: I.B.2, all of section II, III introduction, III.B, and III.D.3] |
Results (across models, features, metrics) Add bias analysis methods |
|||
12 | Tu: Apr 7 Th: Apr 9 |
Bias and Fairness Pt II | Model selection, evaluation, balancing efficiency and equity | Final model choice and understanding its performance and impact on disparities | Tuesday: • A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions • Equality of Opportunity in Supervised Learning • Classification with fairness constraints: A meta-algorithm with provable guarantees |
Draft Research Proposal Section | |
13 | Tu: Apr 14 Th: Apr 16 |
Apr 16 | Causality and Field Validation | Tuesday: • The seven tools of causal inference, with reflections on machine learning • TBD Field Trial Case Study |
No deep dive - Thursday off | ||
14 | Tu: Apr 21 Th: Apr 23 |
Analysis to Action, Accountability and Transparency | Communications & Transition Planning | Project Report and Presentations Field Trial Design |
Tuesday: • Ethics and Data Science, entire book • Communicating Data with Tableau, Chapter 1 • Teaching Statistics: A Bag of Tricks, Chapter 11 |
Last week of deep dives Draft Field Trial Design Section |
|
15 | Tu: Apr 28 Th: Apr 30 |
Final Presentations | Presentations | Presentation | |||
16 | May 7 | (Finals Wk) | Final Report Due | Final Report | Report and Repo and Code Documentation |