Papers by Ambarish Chattopadhyay

arXiv (Cornell University), Dec 4, 2023
A growing number of researchers are conducting randomized experiments to analyze causal relations... more A growing number of researchers are conducting randomized experiments to analyze causal relationships in network settings where units influence one another. A dominant methodology for analyzing these experiments is design-based, leveraging random treatment assignments as the basis for inference. In this paper, we generalize this design-based approach to accommodate complex experiments with a variety of causal estimands and different target populations. An important special case of such generalized network experiments is a bipartite network experiment, in which treatment is randomized among one set of units, and outcomes are measured on a separate set of units. We propose a broad class of causal estimands based on stochastic interventions for generalized network experiments. Using a design-based approach, we show how to estimate these causal quantities without bias and develop conservative variance estimators. We apply our methodology to a randomized experiment in education where participation in an anti-conflict promotion program is randomized among selected students. Our analysis estimates the causal effects of treating each student or their friends among different target populations in the network. We find that the program improves the overall conflict awareness among students but does not significantly reduce the total number of such conflicts.
Harvard Data Science Review, Jan 31, 2024
Comparison and contrast are the basic means to unveil causation and learn which treatments work. ... more Comparison and contrast are the basic means to unveil causation and learn which treatments work. To build good comparison groups, randomized experimentation is key, yet often infeasible. In such non-experimental settings, we illustrate and discuss diagnostics to assess how well the common linear regression approach to causal inference approximates desirable features of randomized experiments, such as covariate balance, study representativeness, interpolated estimation, and unweighted analyses. We also discuss alternative regression modeling, weighting, and matching approaches and argue they should be given strong consideration in empirical work.

arXiv (Cornell University), Apr 13, 2021
A basic principle in the design of observational studies is to approximate the randomized experim... more A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under ideal circumstances. In practice, linear regression models are commonly used to analyze observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate balance, self-weighted sampling, and study representativeness? In this paper, we provide answers to this and related questions by analyzing the implied (individual-level data) weights of various linear regression methods, bringing new insights at the intersection of regression modeling and causal inference. We derive new closed-form expressions of these implied weights and examine their properties in finite and large samples. Among others, in finite samples we characterize the implied target population of linear regression and in large samples demonstrate the multiply robust properties of regression estimators from the perspective of their implied weights. We show that the implied weights of general regression methods can be equivalently obtained by solving a convex optimization problem. This equivalence allows us to bridge ideas from the regression modeling and causal inference literatures. As a result, we propose novel regression diagnostics for causal inference that are part of the design stage of an observational study. We implement the weights and diagnostics in the new lmw package for R.
arXiv (Cornell University), May 5, 2021
The original Finite Selection Model (FSM) was developed in the 1970s to enhance the design of the... more The original Finite Selection Model (FSM) was developed in the 1970s to enhance the design of the RAND Health Insurance Experiment (HIE; Newhouse et al. 1993). At the time of its development by Carl Morris (Morris 1979), there were fundamental computational limitations to make the method widely available for practitioners. Today, as randomized experiments increasingly become more common, there is a need for implementing experimental designs that are randomized, balanced, robust, and easily applicable to several treatment groups. To help address this problem, we revisit the original FSM under the potential outcome framework for causal inference and provide its first readily available software implementation. In this paper, we provide an introduction to the FSM and a step-by-step guide for its use in R.

Trials, Jun 20, 2022
Background: Major depressive disorder (MDD) is a leading cause of disease morbidity. Combined tre... more Background: Major depressive disorder (MDD) is a leading cause of disease morbidity. Combined treatment with antidepressant medication (ADM) plus psychotherapy yields a much higher MDD remission rate than ADM only. But 77% of US MDD patients are nonetheless treated with ADM only despite strong patient preferences for psychotherapy. This mismatch is due at least in part to a combination of cost considerations and limited availability of psychotherapists, although stigma and reluctance of PCPs to refer patients for psychotherapy are also involved. Internetbased cognitive behaviorial therapy (i-CBT) addresses all of these problems. Methods: Enrolled patients (n = 3360) will be those who are beginning ADM-only treatment of MDD in primary care facilities throughout West Virginia, one of the poorest and most rural states in the country. Participating treatment providers and study staff at West Virginia University School of Medicine (WVU) will recruit patients and, after obtaining informed consent, administer a baseline self-report questionnaire (SRQ) and then randomize patients to 1 of 3 treatment arms with equal allocation: ADM only, ADM + self-guided i-CBT, and ADM + guided i-CBT. Follow-up SRQs will be administered 2, 4, 8, 13, 16, 26, 39, and 52 weeks after randomization. The trial has two primary objectives: to evaluate aggregate comparative treatment effects across the 3 arms and to estimate heterogeneity of treatment effects (HTE). The primary outcome will be episode remission based on a modified version of the patient-centered Remission from Depression Questionnaire (RDQ). The sample was powered to detect predictors of HTE that would increase the proportional remission rate by 20% by optimally assigning individuals as opposed to randomly assigning them into three treatment groups of equal size.

Statistics in Medicine, Sep 3, 2020
There are two seemingly unrelated approaches to weighting in observational studies. One of them m... more There are two seemingly unrelated approaches to weighting in observational studies. One of them maximizes the fit of a model for treatment assignment to then derive weights-we call this the modeling approach. The other directly optimizes certain features of the weights-we call this the balancing approach. The implementations of these two approaches are related: the balancing approach implicitly models the propensity score, while instances of the modeling approach impose balance conditions on the covariates used to estimate the propensity score. In this article, we review and compare these two approaches to weighting. Previous review papers have focused on the modeling approach, emphasizing the importance of checking covariate balance. However, as we discuss, the dispersion of the weights is another important aspect of the weights to consider, in addition to the representativeness of the weighted sample and the sample boundedness of the weighted estimator. In particular, the dispersion of the weights is important because it translates into a measure of effective sample size, which can be used to select between alternative weighting schemes. In this article, we examine the balancing approach to weighting, discuss recent methodological developments, and compare instances of the balancing and modeling approaches in a simulation study and an empirical study. In practice, unless the treatment assignment model is known, we recommend using the balancing approach to weighting, as it systematically results in better covariate balance with weights that are minimally dispersed. As a result, effect estimates tend to be more accurate and stable.

arXiv (Cornell University), Mar 15, 2023
The linear regression model is widely used in the biomedical and social sciences as well as in po... more The linear regression model is widely used in the biomedical and social sciences as well as in policy and business research to adjust for covariates and estimate the average effects of treatments. Behind every causal inference endeavor there is at least a notion of a randomized experiment. However, in routine regression analyses in observational studies, it is unclear how well the adjustments made by regression approximate key features of randomization experiments, such as covariate balance, study representativeness, sample boundedness, and unweighted sampling. In this paper, we provide software to empirically address this question. In the new lmw package for R, we compute the implied linear model weights for average treatment effects and provide diagnostics for them. The weights are obtained as part of the design stage of the study; that is, without using outcome information. The implementation is general and applicable, for instance, in settings with instrumental variables and multi-valued treatments; in essence, in any situation where the linear model is the vehicle for adjustment and estimation of average treatment effects with discrete-valued interventions.

arXiv (Cornell University), May 19, 2022
The Finite Selection Model (FSM) was developed by Carl Morris in the 1970s for the design of the ... more The Finite Selection Model (FSM) was developed by Carl Morris in the 1970s for the design of the RAND Health Insurance Experiment (HIE) (Morris 1979, Newhouse et al. 1993), one of the largest and most comprehensive social science experiments conducted in the U.S. The idea behind the FSM is that each treatment group takes its turns selecting units in a fair and random order to optimize a common assignment criterion. At each of its turns, a treatment group selects the available unit that maximally improves the combined quality of its resulting group of units in terms of the criterion. In the HIE and beyond, we revisit, formalize, and extend the FSM as a general tool for experimental design. Leveraging the idea of D-optimality, we propose and analyze a new selection criterion in the FSM. The FSM using the D-optimal selection function has no tuning parameters, is affine invariant, and when appropriate, retrieves several classical designs such as randomized block and matched-pair designs. For multi-arm experiments, we propose algorithms to generate a fair and random selection order of treatments. We demonstrate FSM's performance in a case study based on the HIE and in ten randomized studies from the health and social sciences. On average, the FSM achieves 68% better covariate balance than complete randomization and 56% better covariate balance than rerandomization in a typical study. We recommend the FSM be considered in experimental design for its conceptual simplicity, efficiency, and robustness.
arXiv (Cornell University), May 23, 2023
Comparison and contrast are the basic means to unveil causation and learn which treatments work. ... more Comparison and contrast are the basic means to unveil causation and learn which treatments work. To build good comparison groups that isolate the average effect of treatment from confounding factors, randomization is key, yet often infeasible. In such non-experimental settings, we illustrate and discuss diagnostics to assess how well the common linear regression approach to causal inference approximates desirable features of randomized experiments, such as covariate balance, study representativeness, interpolated estimation, and unweighted analyses. We also discuss alternative regression modeling, weighting, and matching approaches and argue they should be given strong consideration in empirical work.

Biometrika, Oct 29, 2022
A basic principle in the design of observational studies is to approximate the randomized experim... more A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under ideal circumstances. In practice, linear regression models are commonly used to analyze observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate balance, self-weighted sampling, and study representativeness? In this paper, we provide answers to this and related questions by analyzing the implied (individual-level data) weights of various linear regression methods, bringing new insights at the intersection of regression modeling and causal inference. We derive new closed-form expressions of these implied weights and examine their properties in finite and large samples. Among others, in finite samples we characterize the implied target population of linear regression and in large samples demonstrate the multiply robust properties of regression estimators from the perspective of their implied weights. We show that the implied weights of general regression methods can be equivalently obtained by solving a convex optimization problem. This equivalence allows us to bridge ideas from the regression modeling and causal inference literatures. As a result, we propose novel regression diagnostics for causal inference that are part of the design stage of an observational study. We implement the weights and diagnostics in the new lmw package for R.
arXiv (Cornell University), Mar 15, 2023

Biometrika
Summary A basic principle in the design of observational studies is to approximate the randomized... more Summary A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under ideal circumstances. At present, linear regression models are commonly used to analyse observational data and estimate causal effects. How do linear regression adjustments in observational studies emulate key features of randomized experiments, such as covariate balance, self-weighted sampling and study representativeness? In this paper, we provide answers to this and related questions by analysing the implied individual-level data weights of various linear regression methods. We derive new closed-form expressions of these implied weights, and examine their properties in both finite and large samples. Among others, in finite samples we characterize the implied target population of linear regression, and in large samples demonstrate the multiply robust properties of regression estimators from the perspective of their implied weights. We sho...

arXiv (Cornell University), May 19, 2022
The Finite Selection Model (FSM) was developed by Carl Morris in the 1970s for the design of the ... more The Finite Selection Model (FSM) was developed by Carl Morris in the 1970s for the design of the RAND Health Insurance Experiment (HIE) (Morris 1979, Newhouse et al. 1993), one of the largest and most comprehensive social science experiments conducted in the U.S. The idea behind the FSM is that each treatment group takes its turns selecting units in a fair and random order to optimize a common criterion. At each of its turns, a treatment group selects the available unit that maximally improves the combined quality of its resulting group of units in terms of the criterion. In the HIE and beyond, we revisit, formalize, and extend the FSM as a general tool for experimental design. Leveraging the idea of D-optimality, we propose and analyze a new selection criterion in the FSM. The FSM using the D-optimal selection function has no tuning parameters, is affine invariant, and when appropriate retrieves several classical designs such as randomized block and matched-pair designs. For multi-arm experiments, we propose algorithms to generate a fair and random selection order of treatments. We demonstrate FSM's performance in a case study based on the HIE and in ten randomized studies from the health and social sciences. On average, the FSM achieves 68% better covariate balance than complete randomization and 56% better covariate balance than rerandomization in a typical study. We recommend the FSM be considered in experimental design for its conceptual simplicity, efficiency, balance, and robustness.
arXiv (Cornell University), May 5, 2021
The original Finite Selection Model (FSM) was developed in the 1970s to enhance the design of the... more The original Finite Selection Model (FSM) was developed in the 1970s to enhance the design of the RAND Health Insurance Experiment (HIE; Newhouse et al. 1993). At the time of its development by Carl Morris (Morris 1979), there were fundamental computational limitations to make the method widely available for practitioners. Today, as randomized experiments increasingly become more common, there is a need for implementing experimental designs that are randomized, balanced, robust, and easily applicable to several treatment groups. To help address this problem, we revisit the original FSM under the potential outcome framework for causal inference and provide its first readily available software implementation. In this paper, we provide an introduction to the FSM and a step-by-step guide for its use in R.

arXiv (Cornell University), Mar 16, 2022
The problem of generalization and transportation of treatment effect estimates from a study sampl... more The problem of generalization and transportation of treatment effect estimates from a study sample to a target population is central to empirical research and statistical methodology. In both randomized experiments and observational studies, weighting methods are often used with this objective. Traditional methods construct the weights by separately modeling the treatment assignment and study selection probabilities and then multiplying functions (e.g., inverses) of their estimates. In this work, we provide a justification and an implementation for weighting in a single step. We show a formal connection between this one-step method and inverse probability and inverse odds weighting. We demonstrate that the resulting estimator for the target average treatment effect is consistent, asymptotically Normal, multiply robust, and semiparametrically efficient. We evaluate the performance of the one-step estimator in a simulation study. We illustrate its use in a case study on the effects of physician racial diversity on preventive healthcare utilization among Black men in California. We provide R code implementing the methodology.
Comprehensive R Archive Network (CRAN), Mar 10, 2021

Background: Major Depressive Disorder (MDD) is a leading cause of disease morbidity. Combined tre... more Background: Major Depressive Disorder (MDD) is a leading cause of disease morbidity. Combined treatment with antidepressant medication (ADM) plus psychotherapy yields a much higher MDD remission rate than ADM-only. But 77% of US MDD patients are nonetheless treated with ADM-only despite strong patient preferences for psychotherapy. This mismatch is due at least in part to a combination of cost considerations and limited availability of psychotherapists, although stigma and reluctance of PCPs to refer patients for psychotherapy are also involved. Internet-based Cognitive Behavior Therapy (i-CBT) addresses all of these problems. Methods: Enrolled patients (n=3,360) will be those who are beginning ADM-only treatment of MDD in primary care facilities throughout West Virginia, one of the poorest and most rural states in the country. Participating treatment providers and study staff at West Virginia University School of Medicine (WVU) will recruit patients and, after obtaining informed co...
Uploads
Papers by Ambarish Chattopadhyay