10-718: Machine Learning in Practice

Instrutor: Bryan Wilder ([email protected])

This is a project-based course designed to provide students training and experience in solving real-world problems using machine learning, exploring the interface between research and practice. The goal of this course is to give students exposure to the nuance of applying machine learning to the real-world, where common assumptions (like iid and stationarity) break down. Students will learn how to formulate real-world business or policy scenarios as machine learning problems, how to address common challenges which arise in applying ML to such problems (e.g., distribution shift or missingness), and how to rigorously evaluate the results of such interventions in practice (e.g., through designing randomized trials or observational studies). We will place an emphasis throughout on issues related to ethics and fairness in machine learning, and discuss how choices throughout the machine learning pipeline – including problem formulation, outcome definition, data collection, and model training – contribute to the social impact of algorithmic systems.

An overview of the schedule is given below. For a detailed schedule, including readings for each day, see the final section of this page.

Course Schedule

Week	Dates	Topic	Assignments
1	Tu: Jan 17	Class Intro and Overview
1	Th: Jan 19	ML Project Scoping	Project Team Selection
2	Tu: Jan 24	Analytical Formulation / Baselines	Individual Assignment: Getting to know the class project (due Monday)
2	Th: Jan 26	Model Selection Methodology
3	Tu: Jan 31	Feature Engineering and Imputation	Project Assignment 1: Formulation and Baseline (due Monday)
3	Th: Feb 2	Case study session 1	Paper reflection 1 (due before class)
4	Tu: Feb 7	Case study session 2	Paper reflection 2 (due before class) Project Assignment 2: Validation set up Initial pipeline with train and validation set(s) and baseline implemented (due Monday)
4	Th: Feb 9	Imputation + introduction to censored data	Note: class is virtual today
5	Tu: Feb 14	Human-AI interaction 1	Project Assignment 3: list of features and some subset implemented (due Monday)
5	Th: Feb 16	Human-AI interaction 2	Paper reflection 3 (due before class)
6	Tu: Feb 21	Hands on: Review of modeling results	Project Assignment 4: modeling results (due Monday)
6	Th: Feb 23	Midterm review + calibration
7	Tu: Feb 28	Hands on: model debugging and updates	Updated model results assignment (+ model selection) Due Monday. Takehome midterm available Tuesday
7	Th: Mar 2	No Class - Extra time for midterm/project work	Midterm due Fri
8	Tu: Mar 7	No Class - Mid-semester break
8	Th: Mar 9	No Class - Mid-semester break
9	Tu: Mar 14	Guest lecture: Fei Fang	Paper reflection 4 (due before class)
9	Th: Mar 16	ML ethics
10	Tu: Mar 21	Fairness overview	Paper reflection 5 (due before class)
10	Th: Mar 23	Fairness Methods 1
11	Tu: Mar 28	Fairness Methods 2
11	Th: Mar 30	Fairness Methods 3	Fairness Writeup Due on Friday
12	Tu: Apr 4	Field Trials and Causal Inference 1	Paper reflection 6 (due before class)
12	Th: Apr 6	Field Trials and Causal Inference 2
13	Tu: Apr 11	Uncertainty quantification overview	Paper reflection 7 (due before class)
13	Th: Apr 13	No class (Spring Carnival)
14	Tu: Apr 18	Uncertainty quantification methods 1
14	Th: Apr 20	Uncertainty quantification methods 2
15	Tu: Apr 25	Uncertainty quantification methods 3
15	Th: Apr 27	Wrap-Up	UQ Writeup Due on Friday
	Finals Week		Final Reflection Writeup Due (Date TBD)

Detailed schedule and readings

Tuesday 1/17: Introduction

Thursday 1/19: Project scoping

Required Reading:

Data Science Project Scoping Guide. Available Online

Optional Readings:

Fine-grained dengue forecasting using telephone triage services by Rehman, NA, et al. Sci. Adv. 2016. Available Online
Deconstructing Statistical Questions by Hand, D.J. J. Royal Stat Soc. A 157(3) 1994. Available Online
Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning by Potash, E, et al. KDD 2015. Available Online

Due Friday 1/20: Project groups

Due Monday 1/23: Individual Assignment: Getting to know the class project

Tuesday 1/24: Project formulation and baselines

Required Readings

Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations by Obermeyer, Z., Powers, B., et al. Science. 2019. Available Online
Problem Formulation and Fairness by Passi and Barocas. FAT* 2019. Available Online

Optional Readings:

Always Start with a Stupid Model, No Exceptions by Ameisen, E. Medium. Available Online
Create a Common-Sense Baseline First by Ramakrishnan. Medium. Available Online
Data Science for Business by Provost and Fawcett. O’Reilly. 2013. Chapter 2: Business Problems and Data Science Available Online

Thursday 1/26: Model selection and performance metrics

Required Reading:

Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure by Roberts, DR, Bahn, V, et al. Ecography 40:2017. Available Online

Optional Readings:

Time Series Nested Cross-Validation by Cochrane, C. Medium. Available Online
The Secrets of Machine Learning by Rudin, C. and Carlson, D. arXiv preprint: 1906.01998. 2019. Available Online
Big Data and Social Science (2nd edition) edited by Foster, Ghani, et al. Section 7.7 of Chapter 7: Machine Learning. Available Online

Tuesday 1/31: Feature engineering, missing data, and imputation

Optional Readings:

Missing Data Conundrum by Akinfaderin, W. Medium. Available Online
Feature Engineering for Machine Learning by Zhang, A. and Casari, A. O’Reilly. 2018. Chapter 2: Fancy Tricks with Simple Numbers Available Online
Missing-data imputation by Gelman, A. Available Online

Thursday 2/2: Case study on bandits in maternal and child health

Required reading: Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health by Mate et al. Available Online

Tuesday 2/7: Case study on social networks and HIV prevention

Required reading: AI-augmented interventions for HIV prevention in youth experiencing homelessness by Wilder et al. Available Online

Thursday 2/9: Imputation + censoring and survival analysis

Class held virtually (link on Slack)

Optional reading: Censoring Issues in Survival Analysis by Leung et al. Available Online

Tuesday 2/14: Human-AI interaction 1

Optional reading: Explainable Machine Learning for Public Policy: Use Cases, Gaps, and Research Directions by Amarasinghe, K., et al. arXiv preprint: arxiv/2010.14374 Available Online

Optional reading: Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice by Ustun and Rudin. INFORMS Journal on Applied Analytics Available Online

Thursday 2/16: Human-AI interaction 2

Required reading: Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making by Zhang et al. FAT* 2020. Available Online

Optional reading: Explainable machine-learning predictions for the prevention of hypoxaemia during surgery by Lundberg et al. Nature Biomedical Engineering 2018. Available Online

Tuesday 3/14: Guest Lecture from Fei Fang

Required reading: AI for Food Rescue by Shi et al. Available Online

Thursday 3/16: ML ethics

Optional reading: The Fallacy of AI Functionality by Raji et al. Available Online

Optional reading: Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms by Vyas et al. Available Online

Tuesday 3/21: Fairness introduction

Required reading: Measurement and Fairness, by Jacobs and Wallach. FAccT 2021. Available Online

Optional reading: The Measure and Mismeasure of Fairness by Corbett-Davies and Goel. Available Online

Tuesday 4/4: Causality and field trials 1

Required reading: Introduction to Randomized Evaluations, by Gibson and Sautmann. Available Online

Optional reading: Randomized Experiments, by Coston, Dulce Rubio, and Kennedy. Available Online

Thursday 4/6: Causality and field trials 2

Optional reading: Difference-in-Differences, by Zeldow, Hatfield, et al. Available Online

Tuesday 4/11: Uncertainty quantification introduction

Required reading: Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods, Sections 1-3, by Hüllermeier and Waegeman. Available Online

Fairness and Uncertainty Quantification Modules

Each module will span three class sessions. Within each class session, we will cover 2-3 methods, each of which will be presented by one of the groups.

For the day that your group presents, you will give a 20-25 min talk. This talk should present an overview to the class of what problem the method solves and how it works. Then, you should show results from applying this method to your project and discuss the strengths and weaknesses of the method. Accordingly, you must finish implementing this method and analyzing the results by the class session you present in.
You may either implement the method from the paper yourself, or use an existing implementation. If you implement the method yourself, comment on what made this process easier or harder (e.g., how complicated is the method to implement? is it numerically stable? does it have a lot of tricky hyperparameters or other decisions in implementation?). If you use an existing implemention, we expect a correspondingly more thorough analysis of the process of applying the method yourself and the results (since you're saving time on the implementation).
We will release a signup sheet for groups to sign up for methods/times on a first-come-first-serve basis.
For each of the sessions that your group does not present in, you should choose one method to implement (no need to sign up, just pick whichever of the methods from that day you like). By the end of the module, each group will have implemented three methods in total (the one that you implement for your presentation plus two more from the other sessions). At the end of the module, your group will produce an extended technical abstract, 4-5 pages in length (using the NeurIPS template) which provides an overview of the three methods you implemented, describes how they were applied in the context of your project, discusses the results of applying these methods, and makes recommendations about the context in which each method may be more or less suitable.

Each such module contributes 15% of your grade, of which 5% will come from the presentation (clearly presenting the method to the class, implementing it in time for the class session, and discussing your lessons learned) and 10% will come from the writeup.

One note: many of the questions that we study are only "interesting" if the accuracy of the underlying ML system is reasonably strong but less than 100% (e.g., there may not be fairness concerns about differences in error rates across groups if the system gives the right prediction on every single instance). Depending on your problem formulation and the state of the ML pipeline you've developed, you may consider making the ML problem easier or harder (e.g. by adjusting the amount of time for information to accumulate before making a prediction).

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
files		files
Lecture6-AdityaMate.pdf		Lecture6-AdityaMate.pdf
Lecture7-hiv.pptx		Lecture7-hiv.pptx
Lecture8-censoring.pptx		Lecture8-censoring.pptx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

10-718: Machine Learning in Practice

Course Schedule

Detailed schedule and readings

Fairness and Uncertainty Quantification Modules

About

Uh oh!

Releases

Packages

bwilder0/mlpractice_s2023

Folders and files

Latest commit

History

Repository files navigation

10-718: Machine Learning in Practice

Course Schedule

Detailed schedule and readings

Fairness and Uncertainty Quantification Modules

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages