Papers by Debarghya Mukherjee

Many instances of algorithmic bias are caused by distributional shifts. For example, machine lear... more Many instances of algorithmic bias are caused by distributional shifts. For example, machine learning (ML) models often perform worse on demographic groups that are underrepresented in the training data. In this paper, we leverage this connection between algorithmic fairness and distribution shifts to show that algorithmic fairness interventions can help ML models overcome distribution shifts, and that domain adaptation methods (for overcoming distribution shifts) can mitigate algorithmic biases. In particular, we show that (i) enforcing suitable notions of individual fairness (IF) can improve the out-of-distribution accuracy of ML models, and that (ii) it is possible to adapt representation alignment methods for domain adaptation to enforce (individual) fairness. The former is unexpected because IF interventions were not developed with distribution shifts in mind. The latter is also unexpected because representation alignment is not a common approach in the IF literature.
neural information processing systems, Dec 6, 2021
Many instances of algorithmic bias are caused by subpopulation shifts. For example, ML models oft... more Many instances of algorithmic bias are caused by subpopulation shifts. For example, ML models often perform worse on demographic groups that are underrepresented in the training data. In this paper, we study whether enforcing algorithmic fairness during training improves the performance of the trained model in the target domain. On one hand, we conceive scenarios in which enforcing fairness does not improve performance in the target domain. In fact, it may even harm performance. On the other hand, we derive necessary and sufficient conditions under which enforcing algorithmic fairness leads to the Bayes model in the target domain. We also illustrate the practical implications of our theoretical results in simulations and on real data.
arXiv e-prints, May 4, 2021
Many instances of algorithmic bias are caused by subpopulation shifts. For example, ML models oft... more Many instances of algorithmic bias are caused by subpopulation shifts. For example, ML models often perform worse on demographic groups that are underrepresented in the training data. In this paper, we study whether enforcing algorithmic fairness during training improves the performance of the trained model in the \emph{target domain}. On one hand, we conceive scenarios in which enforcing fairness does not improve performance in the target domain. In fact, it may even harm performance. On the other hand, we derive necessary and sufficient conditions under which enforcing algorithmic fairness leads to the Bayes model in the target domain. We also illustrate the practical implications of our theoretical results in simulations and on real data.

Electronic Journal of Statistics, 2022
This paper presents a number of new findings about the canonical change point estimation problem.... more This paper presents a number of new findings about the canonical change point estimation problem. The first part studies the estimation of a change point on the real line in a simple stump model using the robust Huber estimating function which interpolates between the 1 (absolute deviation) and 2 (least squares) based criteria. While the 2 criterion has been studied extensively, its robust counterparts and in particular, the 1 minimization problem have not. We derive the limit distribution of the estimated change point under the Huber estimating function and compare it to that under the 2 criterion. Theoretical and empirical studies indicate that it is more profitable to use the Huber estimating function (and in particular, the 1 criterion) under heavy tailed errors as it leads to smaller asymptotic confidence intervals at the usual levels compared to the 2 criterion. We also compare the 1 and 2 approaches in a parallel setting, where one has m independent single change point problems and the goal is to control the maximal deviation of the estimated change points from the true values, and establish rigorously that the 1 estimation criterion provides a superior rate of convergence to the 2 , and that this relative advantage is driven by the heaviness of the tail of the error distribution. Finally, we derive minimax optimal rates for the change plane estimation problem in growing dimensions and demonstrate that Huber estimation attains the optimal rate while the 2 scheme produces a rate sub-optimal estimator for heavy tailed errors. In the process of deriving our results, we establish a number of properties about the minimizers of compound Binomial and compound Poisson processes which are of independent interest.

ArXiv, 2021
Post-processing in algorithmic fairness is a versatile approach for correcting bias in ML systems... more Post-processing in algorithmic fairness is a versatile approach for correcting bias in ML systems that are already used in production. The main appeal of postprocessing is that it avoids expensive retraining. In this work, we propose general post-processing algorithms for individual fairness (IF). We consider a setting where the learner only has access to the predictions of the original model and a similarity graph between individuals guiding the desired fairness constraints. We cast the IF post-processing problem as a graph smoothing problem corresponding to graph Laplacian regularization that preserves the desired “treat similar individuals similarly” interpretation. Our theoretical results demonstrate the connection of the new objective function to a local relaxation of the original individual fairness. Empirically, our post-processing algorithms correct individual biases in large-scale NLP models such as BERT, while preserving accuracy.
One of the main barriers to the broader adoption of algorithmic fairness in machine learning is t... more One of the main barriers to the broader adoption of algorithmic fairness in machine learning is the trade-off between fairness and performance of ML models: many practitioners are unwilling to sacrifice the performance of their ML model for fairness. In this paper, we show that this trade-off may not be necessary. If the algorithmic biases in an ML model are due to sampling biases in the training data, then enforcing algorithmic fairness may improve the performance of the ML model on unbiased test data. We study conditions under which enforcing algorithmic fairness helps practitioners learn the Bayes decision rule for (unbiased) test data from biased training data. We also demonstrate the practical implications of our theoretical results in real-world ML tasks.

Non-randomized treatment effect models are widely used for the assessment of treatment effects in... more Non-randomized treatment effect models are widely used for the assessment of treatment effects in various fields and in particular social science disciplines like political science, psychometry, psychology. More specifically, these are situations where treatment is assigned to an individual based on some of their characteristics (e.g. scholarship is allocated based on merit or antihypertensive treatments are allocated based on blood pressure level) instead of being allocated randomly, as is the case, for example, in randomized clinical trials. Popular methods that have been largely employed till date for estimation of such treatment effects suffer from slow rates of convergence (i.e. slower than √(n)). In this paper, we present a new model coined SCENTS: Score Explained Non-Randomized Treatment Systems, and a corresponding method that allows estimation of the treatment effect at √(n) rate in the presence of fairly general forms of confoundedness, when the `score' variable on who...

Optimal transport (OT) measures distances between distributions in a way that depends on the geom... more Optimal transport (OT) measures distances between distributions in a way that depends on the geometry of the sample space. In light of recent advances in computational OT, OT distances are widely used as loss functions in machine learning. Despite their prevalence and advantages, OT loss functions can be extremely sensitive to outliers. In fact, a single adversarially-picked outlier can increase the standard W_2-distance arbitrarily. To address this issue, we propose an outlier-robust formulation of OT. Our formulation is convex but challenging to scale at a first glance. Our main contribution is deriving an equivalent formulation based on cost truncation that is easy to incorporate into modern algorithms for computational OT. We demonstrate the benefits of our formulation in mean estimation problems under the Huber contamination model in simulations and outlier detection tasks on real data.

Regression discontinuity design models are widely used for the assessment of treatment effects in... more Regression discontinuity design models are widely used for the assessment of treatment effects in psychology, econometrics and biomedicine, specifically in situations where treatment is assigned to an individual based on their characteristics (e.g. scholarship is allocated based on merit) instead of being allocated randomly, as is the case, for example, in randomized clinical trials. Popular methods that have been largely employed till date for estimation of such treatment effects suffer from slow rates of convergence (i.e. slower than √(n)). In this paper, we present a new model and method that allows estimation of the treatment effect at √(n) rate in the presence of fairly general forms of confoundedness. Moreover, we show that our estimator is also semi-parametrically efficient in certain situations. We analyze two real datasets via our method and compare our results with those obtained by using previous approaches. We conclude this paper with a discussion on some possible extens...

This paper presents a number of new findings about the canonical change point estimation problem.... more This paper presents a number of new findings about the canonical change point estimation problem. The first part studies the estimation of a change point on the real line in a simple stump model using the robust Huber estimating function which interpolates between the ℓ_1 (absolute deviation) and ℓ_2 (least squares) based criteria. While the ℓ_2 criterion has been studied extensively, its robust counterparts and in particular, the ℓ_1 minimization problem have not. We derive the limit distribution of the estimated change point under the Huber estimating function and compare it to that under the ℓ_2 criterion. Theoretical and empirical studies indicate that it is more profitable to use the Huber estimating function (and in particular, the ℓ_1 criterion) under heavy tailed errors as it leads to smaller asymptotic confidence intervals at the usual levels compared to the ℓ_2 criterion. We also compare the ℓ_1 and ℓ_2 approaches in a parallel setting, where one has m independent single c...

We study and predict the evolution of Covid-19 in six US states from the period May 1 through Aug... more We study and predict the evolution of Covid-19 in six US states from the period May 1 through August 31 using a discrete compartment-based model and prescribe active intervention policies, like lockdowns, on the basis of minimizing a loss function, within the broad framework of partially observed Markov decision processes. For each state, Covid-19 data for 40 days (starting from May 1 for two northern states and June 1 for four southern states) are analyzed to estimate the transition probabilities between compartments and other parameters associated with the evolution of the epidemic. These quantities are then used to predict the course of the epidemic in the given state for the next 50 days (test period) under various policy allocations, leading to different values of the loss function over the training horizon. The optimal policy allocation is the one corresponding to the smallest loss. Our analysis shows that none of the six states need lockdowns over the test period, though the ...

Manski's celebrated maximum score estimator for the discrete choice model, which is an optima... more Manski's celebrated maximum score estimator for the discrete choice model, which is an optimal linear discriminator, has been the focus of much investigation in both the econometrics and statistics literatures, but its behavior under growing dimension scenarios largely remains unknown. This paper addresses that gap. Two different cases are considered: p grows with n but at a slow rate, i.e. p/n → 0; and p ≫ n (fast growth). In the binary response model, we recast Manski's score estimation as empirical risk minimization for a classification problem, and derive the ℓ_2 rate of convergence of the score estimator under a transition condition in terms of our margin parameter that calibrates the level of difficulty of the estimation problem. We also establish upper and lower bounds for the minimax ℓ_2 error in the binary choice model that differ by a logarithmic factor, and construct a minimax-optimal estimator in the slow growth regime. Some extensions to the general case – the m...

We study and predict the evolution of Covid-19 in six US states from the period May 1 through Aug... more We study and predict the evolution of Covid-19 in six US states from the period May 1 through August 31 using a discrete compartment-based model and prescribe active intervention policies, like lockdowns, on the basis of minimizing a loss function, within the broad framework of partially observed Markov decision processes. For each state, Covid-19 data for 40 days (starting from May 1 for two northern states and June 1 for four southern states) are analyzed to estimate the transition probabilities between compartments and other parameters associated with the evolution of the epidemic. These quantities are then used to predict the course of the epidemic in the given state for the next 50 days (test period) under various policy allocations, leading to different values of the loss function over the training horizon. The optimal policy allocation is the one corresponding to the smallest loss. Our analysis shows that none of the six states need lockdowns over the test period, though the ...
Linear thresholding models postulate that the conditional distribution of a response variable in ... more Linear thresholding models postulate that the conditional distribution of a response variable in terms of covariates differs on the two sides of a (typically unknown) hyperplane in the covariate space. A key goal in such models is to learn about this separating hyperplane. Exact likelihood or least square methods to estimate the thresholding parameter involve an indicator function which make them difficult to optimize and are, therefore, often tackled by using a surrogate loss that uses a smooth approximation to the indicator. In this note, we demonstrate that the resulting estimator is asymptotically normal with a near optimal rate of convergence: n^-1 up to a log factor, in a classification thresholding model. This is substantially faster than the currently established convergence rates of smoothed estimators for similar models in the statistics and econometrics literatures.

Non-randomized treatment effect models are widely used for the assessment of treatment effects in... more Non-randomized treatment effect models are widely used for the assessment of treatment effects in various fields and in particular social science disciplines like political science, psychometry, psychology. More specifically, these are situations where treatment is assigned to an individual based on some of their characteristics (e.g. scholarship is allocated based on merit or antihypertensive treatments are allocated based on blood pressure level) instead of being allocated randomly, as is the case, for example, in randomized clinical trials. Popular methods that have been largely employed till date for estimation of such treatment effects suffer from slow rates of convergence (i.e. slower than $\sqrt{n}$). In this paper, we present a new model coined SCENTS: Score Explained Non-Randomized Treatment Systems, and a corresponding method that allows estimation of the treatment effect at $\sqrt{n}$ rate in the presence of fairly general forms of confoundedness, when the `score' var...
There is no trade-off: enforcing fairness can improve accuracy
ArXiv, 2020
One of the main barriers to the broader adoption of algorithmic fairness in machine learning is t... more One of the main barriers to the broader adoption of algorithmic fairness in machine learning is the trade-off between fairness and performance of ML models: many practitioners are unwilling to sacrifice the performance of their ML model for fairness. In this paper, we show that this trade-off may not be necessary. If the algorithmic biases in an ML model are due to sampling biases in the training data, then enforcing algorithmic fairness may improve the performance of the ML model on unbiased test data. We study conditions under which enforcing algorithmic fairness helps practitioners learn the Bayes decision rule for (unbiased) test data from biased training data. We also demonstrate the practical implications of our theoretical results in real-world ML tasks.

arXiv: Applications, 2020
We study and predict the evolution of Covid-19 in six US states from the period May 1 through Aug... more We study and predict the evolution of Covid-19 in six US states from the period May 1 through August 31 using a discrete compartment-based model and prescribe active intervention policies, like lockdowns, on the basis of minimizing a loss function, within the broad framework of partially observed Markov decision processes. For each state, Covid-19 data for 40 days (starting from May 1 for two northern states and June 1 for four southern states) are analyzed to estimate the transition probabilities between compartments and other parameters associated with the evolution of the epidemic. These quantities are then used to predict the course of the epidemic in the given state for the next 50 days (test period) under various policy allocations, leading to different values of the loss function over the training horizon. The optimal policy allocation is the one corresponding to the smallest loss. Our analysis shows that none of the six states need lockdowns over the test period, though the ...

Post-processing in algorithmic fairness is a versatile approach for correcting bias in ML systems... more Post-processing in algorithmic fairness is a versatile approach for correcting bias in ML systems that are already used in production. The main appeal of postprocessing is that it avoids expensive retraining. In this work, we propose general post-processing algorithms for individual fairness (IF). We consider a setting where the learner only has access to the predictions of the original model and a similarity graph between individuals, guiding the desired fairness constraints. We cast the IF post-processing problem as a graph smoothing problem corresponding to graph Laplacian regularization that preserves the desired “treat similar individuals similarly” interpretation. Our theoretical results demonstrate the connection of the new objective function to a local relaxation of the original individual fairness. Empirically, our post-processing algorithms correct individual biases in large-scale NLP models such as BERT, while preserving accuracy.

arXiv: Statistics Theory, 2020
Linear thresholding models postulate that the conditional distribution of a response variable in ... more Linear thresholding models postulate that the conditional distribution of a response variable in terms of covariates differs on the two sides of a (typically unknown) hyperplane in the covariate space. A key goal in such models is to learn about this separating hyperplane. Exact likelihood or least square methods to estimate the thresholding parameter involve an indicator function which make them difficult to optimize and are, therefore, often tackled by using a surrogate loss that uses a smooth approximation to the indicator. In this note, we demonstrate that the resulting estimator is asymptotically normal with a near optimal rate of convergence: $n^{-1}$ up to a log factor, in a classification thresholding model. This is substantially faster than the currently established convergence rates of smoothed estimators for similar models in the statistics and econometrics literatures.
Many instances of algorithmic bias are caused by subpopulation shifts. For example, ML models oft... more Many instances of algorithmic bias are caused by subpopulation shifts. For example, ML models often perform worse on demographic groups that are underrepresented in the training data. In this paper, we study whether enforcing algorithmic fairness during training improves the performance of the trained model in the target domain. On one hand, we conceive scenarios in which enforcing fairness does not improve performance in the target domain. In fact, it may even harm performance. On the other hand, we derive necessary and sufficient conditions under which enforcing algorithmic fairness leads to the Bayes model in the target domain. We also illustrate the practical implications of our theoretical results in simulations and on real data.
Uploads
Papers by Debarghya Mukherjee