Skip to content

bwilder0/mlpractice_s2023

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

10-718: Machine Learning in Practice

Instrutor: Bryan Wilder ([email protected])

TA: Ananya Joshi ([email protected])

Syllabus

This is a project-based course designed to provide students training and experience in solving real-world problems using machine learning, exploring the interface between research and practice. The goal of this course is to give students exposure to the nuance of applying machine learning to the real-world, where common assumptions (like iid and stationarity) break down. Students will learn how to formulate real-world business or policy scenarios as machine learning problems, how to address common challenges which arise in applying ML to such problems (e.g., distribution shift or missingness), and how to rigorously evaluate the results of such interventions in practice (e.g., through designing randomized trials or observational studies). We will place an emphasis throughout on issues related to ethics and fairness in machine learning, and discuss how choices throughout the machine learning pipeline – including problem formulation, outcome definition, data collection, and model training – contribute to the social impact of algorithmic systems.

An overview of the schedule is given below. For a detailed schedule, including readings for each day, see the final section of this page.

Course Schedule

Week Dates Topic Assignments
1 Tu: Jan 17 Class Intro and Overview
1 Th: Jan 19 ML Project Scoping Project Team Selection
2 Tu: Jan 24 Analytical Formulation / Baselines Individual Assignment: Getting to know the class project (due Monday)
2 Th: Jan 26 Model Selection Methodology
3 Tu: Jan 31 Feature Engineering and Imputation Project Assignment 1: Formulation and Baseline (due Monday)
3 Th: Feb 2 Case study session 1 Paper reflection 1 (due before class)
4 Tu: Feb 7 Case study session 2 Paper reflection 2 (due before class)
Project Assignment 2:
Validation set up
Initial pipeline with train and validation set(s) and baseline implemented (due Monday)
4 Th: Feb 9 Imputation + introduction to censored data Note: class is virtual today
5 Tu: Feb 14 Human-AI interaction 1 Project Assignment 3:
list of features and some subset implemented (due Monday)
5 Th: Feb 16 Human-AI interaction 2 Paper reflection 3 (due before class)
6 Tu: Feb 21 Hands on: Review of modeling results Project Assignment 4:
modeling results (due Monday)
6 Th: Feb 23 Midterm review + calibration
7 Tu: Feb 28 Hands on: model debugging and updates Updated model results assignment (+ model selection) Due Monday. Takehome midterm available Tuesday
7 Th: Mar 2 No Class - Extra time for midterm/project work Midterm due Fri
8 Tu: Mar 7 No Class - Mid-semester break
8 Th: Mar 9 No Class - Mid-semester break
9 Tu: Mar 14 Guest lecture: Fei Fang Paper reflection 4 (due before class)
9 Th: Mar 16 ML ethics
10 Tu: Mar 21 Fairness overview Paper reflection 5 (due before class)
10 Th: Mar 23 Fairness Methods 1
11 Tu: Mar 28 Fairness Methods 2
11 Th: Mar 30 Fairness Methods 3 Fairness Writeup Due on Friday
12 Tu: Apr 4 Field Trials and Causal Inference 1 Paper reflection 6 (due before class)
12 Th: Apr 6 Field Trials and Causal Inference 2
13 Tu: Apr 11 Uncertainty quantification overview Paper reflection 7 (due before class)
13 Th: Apr 13 No class (Spring Carnival)
14 Tu: Apr 18 Uncertainty quantification methods 1
14 Th: Apr 20 Uncertainty quantification methods 2
15 Tu: Apr 25 Uncertainty quantification methods 3
15 Th: Apr 27 Wrap-Up UQ Writeup Due on Friday
Finals Week Final Reflection Writeup Due (Date TBD)

Detailed schedule and readings

Tuesday 1/17: Introduction


Thursday 1/19: Project scoping

Required Reading:

Optional Readings:

  • Fine-grained dengue forecasting using telephone triage services by Rehman, NA, et al. Sci. Adv. 2016. Available Online
  • Deconstructing Statistical Questions by Hand, D.J. J. Royal Stat Soc. A 157(3) 1994. Available Online
  • Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning by Potash, E, et al. KDD 2015. Available Online

Due Friday 1/20: Project groups


Due Monday 1/23: Individual Assignment: Getting to know the class project


Tuesday 1/24: Project formulation and baselines

Required Readings

  • Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations by Obermeyer, Z., Powers, B., et al. Science. 2019. Available Online

  • Problem Formulation and Fairness by Passi and Barocas. FAT* 2019. Available Online

Optional Readings:

  • Always Start with a Stupid Model, No Exceptions by Ameisen, E. Medium. Available Online

  • Create a Common-Sense Baseline First by Ramakrishnan. Medium. Available Online

  • Data Science for Business by Provost and Fawcett. O’Reilly. 2013. Chapter 2: Business Problems and Data Science Available Online


Thursday 1/26: Model selection and performance metrics

Required Reading:

  • Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure by Roberts, DR, Bahn, V, et al. Ecography 40:2017. Available Online

Optional Readings:

  • Time Series Nested Cross-Validation by Cochrane, C. Medium. Available Online

  • The Secrets of Machine Learning by Rudin, C. and Carlson, D. arXiv preprint: 1906.01998. 2019. Available Online

  • Big Data and Social Science (2nd edition) edited by Foster, Ghani, et al. Section 7.7 of Chapter 7: Machine Learning. Available Online


Tuesday 1/31: Feature engineering, missing data, and imputation

Optional Readings:

  • Missing Data Conundrum by Akinfaderin, W. Medium. Available Online

  • Feature Engineering for Machine Learning by Zhang, A. and Casari, A. O’Reilly. 2018. Chapter 2: Fancy Tricks with Simple Numbers Available Online

  • Missing-data imputation by Gelman, A. Available Online


Thursday 2/2: Case study on bandits in maternal and child health

Required reading: Field Study in Deploying Restless Multi-Armed Bandits: Assisting Non-Profits in Improving Maternal and Child Health by Mate et al. Available Online


Tuesday 2/7: Case study on social networks and HIV prevention

Required reading: AI-augmented interventions for HIV prevention in youth experiencing homelessness by Wilder et al. Available Online


Thursday 2/9: Imputation + censoring and survival analysis

Class held virtually (link on Slack)

Optional reading: Censoring Issues in Survival Analysis by Leung et al. Available Online


Tuesday 2/14: Human-AI interaction 1

Optional reading: Explainable Machine Learning for Public Policy: Use Cases, Gaps, and Research Directions by Amarasinghe, K., et al. arXiv preprint: arxiv/2010.14374 Available Online

Optional reading: Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice by Ustun and Rudin. INFORMS Journal on Applied Analytics Available Online


Thursday 2/16: Human-AI interaction 2

Required reading: Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making by Zhang et al. FAT* 2020. Available Online

Optional reading: Explainable machine-learning predictions for the prevention of hypoxaemia during surgery by Lundberg et al. Nature Biomedical Engineering 2018. Available Online


Tuesday 3/14: Guest Lecture from Fei Fang

Required reading: AI for Food Rescue by Shi et al. Available Online


Thursday 3/16: ML ethics

Optional reading: The Fallacy of AI Functionality by Raji et al. Available Online

Optional reading: Hidden in Plain Sight — Reconsidering the Use of Race Correction in Clinical Algorithms by Vyas et al. Available Online


Tuesday 3/21: Fairness introduction

Required reading: Measurement and Fairness, by Jacobs and Wallach. FAccT 2021. Available Online

Optional reading: The Measure and Mismeasure of Fairness by Corbett-Davies and Goel. Available Online


Tuesday 4/4: Causality and field trials 1

Required reading: Introduction to Randomized Evaluations, by Gibson and Sautmann. Available Online

Optional reading: Randomized Experiments, by Coston, Dulce Rubio, and Kennedy. Available Online


Thursday 4/6: Causality and field trials 2

Optional reading: Difference-in-Differences, by Zeldow, Hatfield, et al. Available Online


Tuesday 4/11: Uncertainty quantification introduction

Required reading: Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods, Sections 1-3, by Hüllermeier and Waegeman. Available Online

Fairness and Uncertainty Quantification Modules

Each module will span three class sessions. Within each class session, we will cover 2-3 methods, each of which will be presented by one of the groups.

  • For the day that your group presents, you will give a 20-25 min talk. This talk should present an overview to the class of what problem the method solves and how it works. Then, you should show results from applying this method to your project and discuss the strengths and weaknesses of the method. Accordingly, you must finish implementing this method and analyzing the results by the class session you present in.

  • You may either implement the method from the paper yourself, or use an existing implementation. If you implement the method yourself, comment on what made this process easier or harder (e.g., how complicated is the method to implement? is it numerically stable? does it have a lot of tricky hyperparameters or other decisions in implementation?). If you use an existing implemention, we expect a correspondingly more thorough analysis of the process of applying the method yourself and the results (since you're saving time on the implementation).

  • We will release a signup sheet for groups to sign up for methods/times on a first-come-first-serve basis.

  • For each of the sessions that your group does not present in, you should choose one method to implement (no need to sign up, just pick whichever of the methods from that day you like). By the end of the module, each group will have implemented three methods in total (the one that you implement for your presentation plus two more from the other sessions). At the end of the module, your group will produce an extended technical abstract, 4-5 pages in length (using the NeurIPS template) which provides an overview of the three methods you implemented, describes how they were applied in the context of your project, discusses the results of applying these methods, and makes recommendations about the context in which each method may be more or less suitable.

Each such module contributes 15% of your grade, of which 5% will come from the presentation (clearly presenting the method to the class, implementing it in time for the class session, and discussing your lessons learned) and 10% will come from the writeup.

One note: many of the questions that we study are only "interesting" if the accuracy of the underlying ML system is reasonably strong but less than 100% (e.g., there may not be fairness concerns about differences in error rates across groups if the system gives the right prediction on every single instance). Depending on your problem formulation and the state of the ML pipeline you've developed, you may consider making the ML problem easier or harder (e.g. by adjusting the amount of time for information to accumulate before making a prediction).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published