Skip to main content

Suhas Vijaykumar

Followers

25

Following

23

Public Views

University of Zagreb

Uppsala University

University of East London

Armando Marques-Guedes

UNL - New University of Lisbon

University of Leicester

Gwen Robbins Schug

University of North Carolina at Greensboro

Gabriel Gutierrez-Alonso

University of Salamanca

Macquarie University

Universidade Federal do Rio Grande do Sul

Swansea University

Uploads

Papers by Suhas Vijaykumar

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

arXiv (Cornell University), Feb 1, 2024

This paper explores the use of unstructured, multimodal data, namely text and images, in causal i... more This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data.

Hedonic Prices and Quality Adjusted Price Indices Powered by AI

arXiv (Cornell University), Apr 28, 2023

Accurate, real-time measurements of price index changes using electronic records are essential fo... more Accurate, real-time measurements of price index changes using electronic records are essential for tracking inflation and productivity in today's economic environment. We develop empirical hedonic models that can process large amounts of unstructured product data (text, images, prices, quantities) and output accurate hedonic price estimates and derived indices. To accomplish this, we generate abstract product attributes, or "features," from text descriptions and images using deep neural networks, and then use these attributes to estimate the hedonic price function. Specifically, we convert textual information about the product to numeric features using large language models based on transformers, trained or fine-tuned using product descriptions, and convert the product image to numeric features using a residual network model. To produce the estimated hedonic price function, we again use a multi-task neural network trained to predict a product's price in all time periods simultaneously. To demonstrate the performance of this approach, we apply the models to Amazon's data for first-party apparel sales and estimate hedonic prices. The resulting models have high predictive accuracy, with R 2 ranging from 80% to 90%. Finally, we construct the AI-based hedonic Fisher price index, chained at the year-over-year frequency. We contrast the index with the CPI and other electronic indices.

A Resolution in Algorithmic Fairness: Calibrated Scores for Fair Classifications

ArXiv, 2020

Calibration and equal error rates are fundamental conditions for algorithmic fairness that have b... more Calibration and equal error rates are fundamental conditions for algorithmic fairness that have been shown to conflict with each other, suggesting that they cannot be satisfied simultaneously. This paper shows that the two are in fact compatible and presents a method for reconciling them. In particular, we derive necessary and sufficient conditions for the existence of calibrated scores that yield classifications achieving equal error rates. We then present an algorithm that searches for the most informative score subject to both calibration and minimal error rate disparity. Applied empirically to credit lending, our algorithm provides a solution that is more fair and profitable than a common alternative that omits sensitive features.

Multiple Randomization Designs: Estimation and Inference with Interference

arXiv (Cornell University), 2024

Classical designs of randomized experiments, going back to Fisher and Neyman in the 1930s still d... more Classical designs of randomized experiments, going back to Fisher and Neyman in the 1930s still dominate practice even in online experimentation. However, such designs are of limited value for answering standard questions in settings, common in marketplaces, where multiple populations of agents interact strategically, leading to complex patterns of spillover effects. In this paper, we discuss new experimental designs and corresponding estimands to account for and capture these complex spillovers. We derive the finite-sample properties of tractable estimators for main effects, direct effects, and spillovers, and present associated central limit theorems.

Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions

arXiv (Cornell University), Mar 24, 2023

Kernel Ridge Regression Inference

arXiv (Cornell University), Feb 13, 2023

Stability and Efficiency of Random Serial Dictatorship

arXiv (Cornell University), Oct 13, 2021

This paper establishes non-asymptotic convergence of the cutoffs in Random serial dictatorship me... more This paper establishes non-asymptotic convergence of the cutoffs in Random serial dictatorship mechanism (RSD), in an environment with many students, many schools, and arbitrary student preferences. Convergence is shown to hold when the number of schools, m, and the number of students, n, satisfy the relation m ln m ≪ n, and we provide an example showing that this result is sharp. We differ significantly from prior work in the mechanism design literature in our use of analytic tools from randomized algorithms and discrete probability, which allow us to show concentration of the RSD lottery probabilities and cutoffs even against adversarial student preferences.

Localization, Convexity, and Star Aggregation

arXiv (Cornell University), May 18, 2021

Offset Rademacher complexities have been shown to provide tight upper bounds for the square loss ... more Offset Rademacher complexities have been shown to provide tight upper bounds for the square loss in a broad class of problems including improper statistical learning and online learning. We show that the offset complexity can be generalized to any loss that satisfies a certain general convexity condition. Further, we show that this condition is closely related to both exponential concavity and self-concordance, unifying apparently disparate results. By a novel geometric argument, many of our bounds translate to improper learning in a non-convex class with Audibert's star algorithm. Thus, the offset complexity provides a versatile analytic tool that covers both convex empirical risk minimization and improper learning under entropy conditions. Applying the method, we recover the optimal rates for proper and improper learning with the p-loss for 1 < p < ∞, and show that improper variants of empirical risk minimization can attain fast rates for logistic regression and other generalized linear models.

Frank Wolfe Meets Metric Entropy

arXiv (Cornell University), May 17, 2022

The Frank-Wolfe algorithm has seen a resurgence in popularity due to its ability to efficiently s... more The Frank-Wolfe algorithm has seen a resurgence in popularity due to its ability to efficiently solve constrained optimization problems in machine learning and high-dimensional statistics. As such, there is much interest in establishing when the algorithm may possess a "linear" O(log(1/)) dimension-free iteration complexity comparable to projected gradient descent. In this paper, we provide a general technique for establishing domain specific and easy-to-estimate lower bounds for Frank-Wolfe and its variants using the metric entropy of the domain. Most notably, we show that a dimensionfree linear upper bound must fail not only in the worst case, but in the average case: for a Gaussian or spherical random polytope in R d with poly(d) vertices, Frank-Wolfe requires up toΩ(d) iterations to achieve a O(1/d) error bound, with high probability. We also establish this phenomenon for the nuclear norm ball. The link with metric entropy also has interesting positive implications for conditional gradient algorithms in statistics, such as gradient boosting and matching pursuit. In particular, we show that it is possible to extract fastdecaying upper bounds on the excess risk directly from an analysis of the underlying optimization procedure. Contents

Classification as Direction Recovery: Improved Guarantees via Scale Invariance

arXiv (Cornell University), May 17, 2022

Modern algorithms for binary classification rely on an intermediate regression problem for comput... more Modern algorithms for binary classification rely on an intermediate regression problem for computational tractability. In this paper, we establish a geometric distinction between classification and regression that allows risk in these two settings to be more precisely related. In particular, we note that classification risk depends only on the direction of the regressor, and we take advantage of this scale invariance to improve existing guarantees for how classification risk is bounded by the risk in the intermediate regression problem. Building on these guarantees, our analysis makes it possible to compare algorithms more accurately against each other and suggests viewing classification as unique from regression rather than a byproduct of it. While regression aims to converge toward the conditional expectation function in location, we propose that classification should instead aim to recover its direction.

Unambiguous Computation Suhas Vijaykumar Spring 2016 Overview

This survey paper is designed to review the literature on unambiguous computation. Broadly speaki... more This survey paper is designed to review the literature on unambiguous computation. Broadly speaking, studying unambiguous computation is an attempt to determine how much of the power of non-determinism comes from the ability to accept many di↵erent proofs of the same statement. The literature on this subject is of both practical and philosophical interest, and the Isolating lemma of Mulmuley, Vazirani and Vazirani [5] appears beautifully throughout. In the first first, we will define the notion of unambiguous computation and the associated complexity classes we will consider. In the second section, we’ll discuss unambiguous polynomial-time computation. Most of the results in this section are from the ‘80s—in particular we’ll discuss relationships between the complexity classes P, UP, and NP, and prove the Isolating lemma as well as the ValiantVazirani theorem. In the third and final section, we’ll discuss some of the more recent results about unambiguous log-space computation. In pa...

A Possibility in Algorithmic Fairness: Calibrated Scores for Fair Classifications

arXiv: Learning, 2020

Calibration and equal error rates are fundamental criteria of algorithmic fairness that have been... more Calibration and equal error rates are fundamental criteria of algorithmic fairness that have been shown to conflict with one another. This paper proves that they can be satisfied simultaneously in settings where decision-makers use risk scores to assign binary treatments. In particular, we derive necessary and sufficient conditions for the existence of calibrated scores that yield classifications achieving equal error rates. We then present an algorithm that searches for the most informative score subject to both calibration and minimal error rate disparity. Applied to a real criminal justice risk assessment, we show that our method can eliminate error disparities while maintaining calibration. In a separate application to credit lending, the procedure provides a solution that is more fair and profitable than a common alternative that omits sensitive features.

A Possibility in Algorithmic Fairness: Can Calibration and Equal Error Rates Be Reconciled?

Decision makers increasingly rely on algorithmic risk scores to determine access to binary treatm... more Decision makers increasingly rely on algorithmic risk scores to determine access to binary treatments including bail, loans, and medical interventions. In these settings, we reconcile two fairness criteria that were previously shown to be in conflict: calibration and error rate equality. In particular, we derive necessary and sufficient conditions for the existence of calibrated scores that yield classifications achieving equal error rates at any given group-blind threshold. We then present an algorithm that searches for the most accurate score subject to both calibration and minimal error rate disparity. Applied to the COMPAS criminal risk assessment tool, we show that our method can eliminate error disparities while maintaining calibration. In a separate application to credit lending, we compare our procedure to the omission of sensitive features and show that it raises both profit and the probability that creditworthy individuals receive loans. 2012 ACM Subject Classification Mat...

Higher Bruhat Orders in Type B

The Electronic Journal of Combinatorics

Motivated by the geometry of hyperplane arrangements, Manin and Schechtman defined for each integ... more Motivated by the geometry of hyperplane arrangements, Manin and Schechtman defined for each integer $n \geq 1$ a hierarchy of finite partially ordered sets $B(n, k),$ indexed by positive integers $k$, called the higher Bruhat orders. The poset $B(n, 1)$ is naturally identified with the weak left Bruhat order on the symmetric group $S_n$, each $B(n, k)$ has a unique maximal and a unique minimal element, and the poset $B(n, k + 1)$ can be constructed from the set of maximal chains in $B(n, k)$. Ben Elias has demonstrated a striking connection between the posets $B(n, k)$ for $k = 2$ and the diagrammatics of Bott-Samelson bimodules in type A, providing significant motivation for the development of an analogous theory of higher Bruhat orders in other Cartan-Killing types, particularly for $k = 2$. In this paper we present a partial generalization to type B, complete up to $k = 2$, prove a direct analogue of the main theorem of Manin and Schechtman, and relate our construction to the ...

Higher Bruhat Orders in Type B

The Electronic Journal of Combinatorics, Jul 22, 2016

Motivated by the geometry of certain hyperplane arrangements, Manin and Schechtman [2] defined fo... more Motivated by the geometry of certain hyperplane arrangements, Manin and Schechtman [2] defined for each integer n ≥ 1 a hierarchy of finite partially ordered sets B(n, k), indexed by positive integers k, called the higher Bruhat orders. The poset B(n, 1) is naturally identified with the weak left Bruhat order on the symmetric group Sn, each B(n, k) has a unique maximal and a unique minimal element, and the poset B(n, k + 1) can be constructed from the set of maximal chains in B(n, k). Elias [1] has demonstrated a striking connection between the posets B(n, k) for k = 2 and the diagrammatics of Bott-Samelson bimodules in type A, providing significant motivation for the development of an analogous theory of higher Bruhat orders in other Cartan-Killing types, particularly for k = 2. In this paper we present a partial generalization to type B, complete up to k = 2, prove a direct analogue of the main theorem of Manin and Schechtman, and relate our construction to the weak Bruhat order and reduced expression graph for Weyl group Bn. Contents 1. Higher Bruhat orders in type A (cf. [2]) 1 2. Connection with Type A Weyl Groups 3 3. Construction in Type B 4 4. Connection with Type B Weyl Groups 9 5. Appendix 11 5.1. Proof of Lemma 14 11 5.2. Proof of Theorem 15 12 6. Acknowledgements 15 References 15

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

arXiv (Cornell University), Feb 1, 2024

This paper explores the use of unstructured, multimodal data, namely text and images, in causal i... more This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data.

Hedonic Prices and Quality Adjusted Price Indices Powered by AI

arXiv (Cornell University), Apr 28, 2023

Accurate, real-time measurements of price index changes using electronic records are essential fo... more Accurate, real-time measurements of price index changes using electronic records are essential for tracking inflation and productivity in today's economic environment. We develop empirical hedonic models that can process large amounts of unstructured product data (text, images, prices, quantities) and output accurate hedonic price estimates and derived indices. To accomplish this, we generate abstract product attributes, or "features," from text descriptions and images using deep neural networks, and then use these attributes to estimate the hedonic price function. Specifically, we convert textual information about the product to numeric features using large language models based on transformers, trained or fine-tuned using product descriptions, and convert the product image to numeric features using a residual network model. To produce the estimated hedonic price function, we again use a multi-task neural network trained to predict a product's price in all time periods simultaneously. To demonstrate the performance of this approach, we apply the models to Amazon's data for first-party apparel sales and estimate hedonic prices. The resulting models have high predictive accuracy, with R 2 ranging from 80% to 90%. Finally, we construct the AI-based hedonic Fisher price index, chained at the year-over-year frequency. We contrast the index with the CPI and other electronic indices.

A Resolution in Algorithmic Fairness: Calibrated Scores for Fair Classifications

ArXiv, 2020

Calibration and equal error rates are fundamental conditions for algorithmic fairness that have b... more Calibration and equal error rates are fundamental conditions for algorithmic fairness that have been shown to conflict with each other, suggesting that they cannot be satisfied simultaneously. This paper shows that the two are in fact compatible and presents a method for reconciling them. In particular, we derive necessary and sufficient conditions for the existence of calibrated scores that yield classifications achieving equal error rates. We then present an algorithm that searches for the most informative score subject to both calibration and minimal error rate disparity. Applied empirically to credit lending, our algorithm provides a solution that is more fair and profitable than a common alternative that omits sensitive features.

Multiple Randomization Designs: Estimation and Inference with Interference

arXiv (Cornell University), 2024

Classical designs of randomized experiments, going back to Fisher and Neyman in the 1930s still d... more Classical designs of randomized experiments, going back to Fisher and Neyman in the 1930s still dominate practice even in online experimentation. However, such designs are of limited value for answering standard questions in settings, common in marketplaces, where multiple populations of agents interact strategically, leading to complex patterns of spillover effects. In this paper, we discuss new experimental designs and corresponding estimands to account for and capture these complex spillovers. We derive the finite-sample properties of tractable estimators for main effects, direct effects, and spillovers, and present associated central limit theorems.

Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions

arXiv (Cornell University), Mar 24, 2023

Kernel Ridge Regression Inference

arXiv (Cornell University), Feb 13, 2023

Stability and Efficiency of Random Serial Dictatorship

arXiv (Cornell University), Oct 13, 2021

This paper establishes non-asymptotic convergence of the cutoffs in Random serial dictatorship me... more This paper establishes non-asymptotic convergence of the cutoffs in Random serial dictatorship mechanism (RSD), in an environment with many students, many schools, and arbitrary student preferences. Convergence is shown to hold when the number of schools, m, and the number of students, n, satisfy the relation m ln m ≪ n, and we provide an example showing that this result is sharp. We differ significantly from prior work in the mechanism design literature in our use of analytic tools from randomized algorithms and discrete probability, which allow us to show concentration of the RSD lottery probabilities and cutoffs even against adversarial student preferences.

Localization, Convexity, and Star Aggregation

arXiv (Cornell University), May 18, 2021

Offset Rademacher complexities have been shown to provide tight upper bounds for the square loss ... more Offset Rademacher complexities have been shown to provide tight upper bounds for the square loss in a broad class of problems including improper statistical learning and online learning. We show that the offset complexity can be generalized to any loss that satisfies a certain general convexity condition. Further, we show that this condition is closely related to both exponential concavity and self-concordance, unifying apparently disparate results. By a novel geometric argument, many of our bounds translate to improper learning in a non-convex class with Audibert's star algorithm. Thus, the offset complexity provides a versatile analytic tool that covers both convex empirical risk minimization and improper learning under entropy conditions. Applying the method, we recover the optimal rates for proper and improper learning with the p-loss for 1 < p < ∞, and show that improper variants of empirical risk minimization can attain fast rates for logistic regression and other generalized linear models.

Frank Wolfe Meets Metric Entropy

arXiv (Cornell University), May 17, 2022

The Frank-Wolfe algorithm has seen a resurgence in popularity due to its ability to efficiently s... more The Frank-Wolfe algorithm has seen a resurgence in popularity due to its ability to efficiently solve constrained optimization problems in machine learning and high-dimensional statistics. As such, there is much interest in establishing when the algorithm may possess a "linear" O(log(1/)) dimension-free iteration complexity comparable to projected gradient descent. In this paper, we provide a general technique for establishing domain specific and easy-to-estimate lower bounds for Frank-Wolfe and its variants using the metric entropy of the domain. Most notably, we show that a dimensionfree linear upper bound must fail not only in the worst case, but in the average case: for a Gaussian or spherical random polytope in R d with poly(d) vertices, Frank-Wolfe requires up toΩ(d) iterations to achieve a O(1/d) error bound, with high probability. We also establish this phenomenon for the nuclear norm ball. The link with metric entropy also has interesting positive implications for conditional gradient algorithms in statistics, such as gradient boosting and matching pursuit. In particular, we show that it is possible to extract fastdecaying upper bounds on the excess risk directly from an analysis of the underlying optimization procedure. Contents

Classification as Direction Recovery: Improved Guarantees via Scale Invariance

arXiv (Cornell University), May 17, 2022

Modern algorithms for binary classification rely on an intermediate regression problem for comput... more Modern algorithms for binary classification rely on an intermediate regression problem for computational tractability. In this paper, we establish a geometric distinction between classification and regression that allows risk in these two settings to be more precisely related. In particular, we note that classification risk depends only on the direction of the regressor, and we take advantage of this scale invariance to improve existing guarantees for how classification risk is bounded by the risk in the intermediate regression problem. Building on these guarantees, our analysis makes it possible to compare algorithms more accurately against each other and suggests viewing classification as unique from regression rather than a byproduct of it. While regression aims to converge toward the conditional expectation function in location, we propose that classification should instead aim to recover its direction.

Unambiguous Computation Suhas Vijaykumar Spring 2016 Overview

This survey paper is designed to review the literature on unambiguous computation. Broadly speaki... more This survey paper is designed to review the literature on unambiguous computation. Broadly speaking, studying unambiguous computation is an attempt to determine how much of the power of non-determinism comes from the ability to accept many di↵erent proofs of the same statement. The literature on this subject is of both practical and philosophical interest, and the Isolating lemma of Mulmuley, Vazirani and Vazirani [5] appears beautifully throughout. In the first first, we will define the notion of unambiguous computation and the associated complexity classes we will consider. In the second section, we’ll discuss unambiguous polynomial-time computation. Most of the results in this section are from the ‘80s—in particular we’ll discuss relationships between the complexity classes P, UP, and NP, and prove the Isolating lemma as well as the ValiantVazirani theorem. In the third and final section, we’ll discuss some of the more recent results about unambiguous log-space computation. In pa...

A Possibility in Algorithmic Fairness: Calibrated Scores for Fair Classifications

arXiv: Learning, 2020

Calibration and equal error rates are fundamental criteria of algorithmic fairness that have been... more Calibration and equal error rates are fundamental criteria of algorithmic fairness that have been shown to conflict with one another. This paper proves that they can be satisfied simultaneously in settings where decision-makers use risk scores to assign binary treatments. In particular, we derive necessary and sufficient conditions for the existence of calibrated scores that yield classifications achieving equal error rates. We then present an algorithm that searches for the most informative score subject to both calibration and minimal error rate disparity. Applied to a real criminal justice risk assessment, we show that our method can eliminate error disparities while maintaining calibration. In a separate application to credit lending, the procedure provides a solution that is more fair and profitable than a common alternative that omits sensitive features.

A Possibility in Algorithmic Fairness: Can Calibration and Equal Error Rates Be Reconciled?

Decision makers increasingly rely on algorithmic risk scores to determine access to binary treatm... more Decision makers increasingly rely on algorithmic risk scores to determine access to binary treatments including bail, loans, and medical interventions. In these settings, we reconcile two fairness criteria that were previously shown to be in conflict: calibration and error rate equality. In particular, we derive necessary and sufficient conditions for the existence of calibrated scores that yield classifications achieving equal error rates at any given group-blind threshold. We then present an algorithm that searches for the most accurate score subject to both calibration and minimal error rate disparity. Applied to the COMPAS criminal risk assessment tool, we show that our method can eliminate error disparities while maintaining calibration. In a separate application to credit lending, we compare our procedure to the omission of sensitive features and show that it raises both profit and the probability that creditworthy individuals receive loans. 2012 ACM Subject Classification Mat...

Higher Bruhat Orders in Type B

The Electronic Journal of Combinatorics

Motivated by the geometry of hyperplane arrangements, Manin and Schechtman defined for each integ... more Motivated by the geometry of hyperplane arrangements, Manin and Schechtman defined for each integer $n \geq 1$ a hierarchy of finite partially ordered sets $B(n, k),$ indexed by positive integers $k$, called the higher Bruhat orders. The poset $B(n, 1)$ is naturally identified with the weak left Bruhat order on the symmetric group $S_n$, each $B(n, k)$ has a unique maximal and a unique minimal element, and the poset $B(n, k + 1)$ can be constructed from the set of maximal chains in $B(n, k)$. Ben Elias has demonstrated a striking connection between the posets $B(n, k)$ for $k = 2$ and the diagrammatics of Bott-Samelson bimodules in type A, providing significant motivation for the development of an analogous theory of higher Bruhat orders in other Cartan-Killing types, particularly for $k = 2$. In this paper we present a partial generalization to type B, complete up to $k = 2$, prove a direct analogue of the main theorem of Manin and Schechtman, and relate our construction to the ...

Higher Bruhat Orders in Type B

The Electronic Journal of Combinatorics, Jul 22, 2016

Motivated by the geometry of certain hyperplane arrangements, Manin and Schechtman [2] defined fo... more Motivated by the geometry of certain hyperplane arrangements, Manin and Schechtman [2] defined for each integer n ≥ 1 a hierarchy of finite partially ordered sets B(n, k), indexed by positive integers k, called the higher Bruhat orders. The poset B(n, 1) is naturally identified with the weak left Bruhat order on the symmetric group Sn, each B(n, k) has a unique maximal and a unique minimal element, and the poset B(n, k + 1) can be constructed from the set of maximal chains in B(n, k). Elias [1] has demonstrated a striking connection between the posets B(n, k) for k = 2 and the diagrammatics of Bott-Samelson bimodules in type A, providing significant motivation for the development of an analogous theory of higher Bruhat orders in other Cartan-Killing types, particularly for k = 2. In this paper we present a partial generalization to type B, complete up to k = 2, prove a direct analogue of the main theorem of Manin and Schechtman, and relate our construction to the weak Bruhat order and reduced expression graph for Weyl group Bn. Contents 1. Higher Bruhat orders in type A (cf. [2]) 1 2. Connection with Type A Weyl Groups 3 3. Construction in Type B 4 4. Connection with Type B Weyl Groups 9 5. Appendix 11 5.1. Proof of Lemma 14 11 5.2. Proof of Theorem 15 12 6. Acknowledgements 15 References 15