Econometrics and Machine Learning: A Comprehensive
Comparison
Dr Merwan Roudane
July 24, 2024
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 1 / 45
Table of Contents
1 Leo Breiman’s Philosophy of Modeling
2 Econometrics and Machine Learning: Definitions and Types
3 Terminology: Econometrics vs. Machine Learning
4 Differences Between Econometrics and Machine Learning
5 Challenges and Limitations
6 What Econometrics Can Learn from Machine Learning
7 What Machine Learning Can Learn from Econometrics
8 Data Splitting in ML vs. Full Data Use in Econometrics
9 Combining Econometrics and Machine Learning
10 Research Problems: Econometrics vs. ML
11 Recent Developments
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 2 / 45
Leo Breiman’s Philosophy of Modeling
Leo Breiman’s ”Two Cultures” in Statistical Modeling
Data Modeling Culture
Assumes a stochastic data model
Focus on parameter estimation and inference
Algorithmic Modeling Culture
Treats the data mechanism as unknown
Focus on predictive accuracy
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 3 / 45
Leo Breiman’s Philosophy of Modeling
Data Modeling Culture
Rooted in traditional statistics and econometrics
Assumes data is generated by a specific stochastic model
Goals:
Estimate model parameters
Test hypotheses
Make inferences about the population
Examples: Linear regression, logistic regression, time series models
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 4 / 45
Leo Breiman’s Philosophy of Modeling
Algorithmic Modeling Culture
Emerged with the rise of machine learning and data science
Treats the data mechanism as a complex, unknown ”black box”
Goals:
Achieve high predictive accuracy
Find a function that maps inputs to outputs
Examples: Random forests, neural networks, support vector machines
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 5 / 45
Leo Breiman’s Philosophy of Modeling
Implications of the Two Cultures
Different approaches to model validation
Trade-offs between interpretability and predictive power
Varying emphasis on theoretical foundations
Distinct perspectives on the role of domain knowledge
Ongoing debate about the most appropriate approach in different
contexts
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 6 / 45
Econometrics and Machine Learning: Definitions and Types
Econometrics: Definition
Definition
Econometrics is the application of statistical methods to economic data to
give empirical content to economic relationships.
Combines economic theory, mathematics, and statistical inference
Aims to quantify economic relationships and test economic theories
Focuses on causal inference and parameter estimation
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 7 / 45
Econometrics and Machine Learning: Definitions and Types
Types of Econometric Methods
Cross-sectional analysis
Time series analysis
Panel data methods
Instrumental variables estimation
Difference-in-differences
Regression discontinuity design
Structural equation modeling
Vector autoregression (VAR)
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 8 / 45
Econometrics and Machine Learning: Definitions and Types
Machine Learning: Definition
Definition
Machine Learning is a field of study that gives computers the ability to
learn without being explicitly programmed.
Focuses on developing algorithms that can learn from and make
predictions or decisions based on data
Emphasizes predictive accuracy and pattern recognition
Often deals with high-dimensional and unstructured data
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 9 / 45
Econometrics and Machine Learning: Definitions and Types
Types of Machine Learning Methods
Supervised Learning
Classification (e.g., logistic regression, decision trees)
Regression (e.g., linear regression, random forests)
Unsupervised Learning
Clustering (e.g., k-means, hierarchical clustering)
Dimensionality reduction (e.g., PCA, t-SNE)
Reinforcement Learning
Deep Learning (neural networks)
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 10 / 45
Terminology: Econometrics vs. Machine Learning
Terminology Comparison
Concept Econometrics Machine Learning
Output variable Dependent variable Label / Target
Input variables Independent variables Features / Predictors
Model fit R-squared, Adjusted R-squared Performance metrics
Model assessment Hypothesis testing, p-values Cross-validation
Error term Residual Loss
Model Estimator Learner / Algorithm
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 11 / 45
Terminology: Econometrics vs. Machine Learning
Terminology Differences: Implications
Reflects different focuses and philosophical approaches
Econometrics terminology emphasizes statistical inference
Machine learning terminology reflects focus on prediction and
algorithm performance
Understanding both sets of terminology is crucial for interdisciplinary
work
Bridging terminological gaps can lead to better integration of methods
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 12 / 45
Differences Between Econometrics and Machine Learning
Key Differences
Primary goals
Approach to model selection
Handling of high-dimensional data
Emphasis on interpretability
Treatment of causality
Theoretical foundations
Typical applications
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 13 / 45
Differences Between Econometrics and Machine Learning
Primary Goals
Econometrics Machine Learning
Causal inference Predictive accuracy
Parameter estimation Pattern recognition
Hypothesis testing Automation
Policy evaluation Scalability
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 14 / 45
Differences Between Econometrics and Machine Learning
Approach to Model Selection
Econometrics
Machine Learning
Theory-driven
Data-driven
Emphasis on model
Emphasis on predictive
interpretability
performance
Focus on unbiasedness and
Use of cross-validation
efficiency
Ensemble methods and model
Use of information criteria (AIC,
averaging
BIC)
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 15 / 45
Differences Between Econometrics and Machine Learning
Handling of High-Dimensional Data
Econometrics Machine Learning
Traditional focus on Designed to handle
low-dimensional data high-dimensional data
Instrumental variables for Feature selection and
endogeneity dimensionality reduction
Recent developments in Regularization techniques
high-dimensional econometrics (Lasso, Ridge)
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 16 / 45
Differences Between Econometrics and Machine Learning
Emphasis on Interpretability
Econometrics
Machine Learning
High emphasis on
Often sacrifices interpretability
interpretability
for predictive power
Focus on marginal effects and
”Black box” models common
elasticities
Recent focus on interpretable
Importance of economic
ML
significance
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 17 / 45
Differences Between Econometrics and Machine Learning
Treatment of Causality
Econometrics Machine Learning
Central focus on causal Traditionally focused on
relationships correlation, not causation
Extensive toolbox for causal Recent developments in causal
inference ML
Emphasis on identifying Challenges with
assumptions high-dimensional causality
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 18 / 45
Differences Between Econometrics and Machine Learning
Theoretical Foundations
Machine Learning
Econometrics Rooted in computer science and
Grounded in economic theory optimization
Strong statistical foundations Focus on computational
Emphasis on asymptotic efficiency
properties Emphasis on empirical
performance
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 19 / 45
Differences Between Econometrics and Machine Learning
Typical Applications
Econometrics Machine Learning
Policy evaluation Image and speech recognition
Demand estimation Recommender systems
Macroeconomic forecasting Fraud detection
Labor market analysis Autonomous vehicles
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 20 / 45
Challenges and Limitations
Challenges in Econometrics
Endogeneity and identification issues
Model misspecification
Limited external validity
Difficulty handling very large datasets
Assumptions of linearity and normality
Interpretability vs. complexity trade-off
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 21 / 45
Challenges and Limitations
Limitations of Econometrics
Often relies on strong assumptions
May struggle with high-dimensional data
Can be computationally intensive for large datasets
Limited flexibility in modeling complex, non-linear relationships
May overfit in small samples if model is too complex
Challenges in handling unstructured data (text, images)
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 22 / 45
Challenges and Limitations
Challenges in Machine Learning
Lack of causal interpretation
Overfitting and poor generalization
Black-box nature of complex models
Data quality and bias
Computational intensity
Difficulty in quantifying uncertainty
Limited theoretical guarantees
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 23 / 45
Challenges and Limitations
Limitations of Machine Learning
Often lacks clear causal interpretation
May capture spurious correlations
Requires large amounts of data for complex models
Can be sensitive to distribution shifts
May struggle with small, nuanced datasets
Limited ability to incorporate domain knowledge
Challenges in model interpretability and explainability
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 24 / 45
What Econometrics Can Learn from Machine Learning
Lessons from Machine Learning for Econometrics
Cross-validation for model selection and evaluation
Regularization techniques for high-dimensional data
Ensemble methods for improved prediction
Flexible modeling of non-linear relationships
Handling of unstructured data (text, images)
Scalable algorithms for big data
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 25 / 45
What Econometrics Can Learn from Machine Learning
Cross-validation in Econometrics
Provides a more robust measure of out-of-sample performance
Helps mitigate overfitting
Can be used alongside traditional model selection criteria
Challenges:
Maintaining temporal structure in time series data
Accounting for clustered or hierarchical data structures
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 26 / 45
What Econometrics Can Learn from Machine Learning
Regularization in Econometrics
Useful for high-dimensional problems (e.g., many covariates)
Lasso: Can perform variable selection
Ridge: Handles multicollinearity
Elastic Net: Combines Lasso and Ridge
Applications:
Selecting instrumental variables
Estimating treatment effects with many controls
High-dimensional fixed effects models
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 27 / 45
What Econometrics Can Learn from Machine Learning
Ensemble Methods in Econometrics
Can improve prediction accuracy
Examples:
Bagging for more stable estimates
Random forests for non-linear relationships
Boosting for iterative improvements
Challenges:
Maintaining interpretability
Incorporating economic theory
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 28 / 45
What Machine Learning Can Learn from Econometrics
Lessons from Econometrics for Machine Learning
Causal inference frameworks
Treatment of endogeneity
Incorporation of domain knowledge
Emphasis on model interpretability
Rigorous statistical inference
Handling of panel and time series data
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 29 / 45
What Machine Learning Can Learn from Econometrics
Causal Inference in Machine Learning
Moving beyond prediction to causal relationships
Adapting econometric tools for causal ML:
Instrumental variables
Difference-in-differences
Regression discontinuity
Challenges:
Maintaining flexibility of ML models
Scaling causal inference to high dimensions
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 30 / 45
What Machine Learning Can Learn from Econometrics
Incorporating Domain Knowledge in ML
Econometrics often relies heavily on domain expertise
Ways to incorporate domain knowledge in ML:
Feature engineering based on theory
Constrained optimization respecting economic laws
Transfer learning from theoretical models
Bayesian priors informed by economic intuition
Benefits:
Improved interpretability
Better generalization to out-of-sample scenarios
Alignment with existing theoretical frameworks
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 31 / 45
Data Splitting in ML vs. Full Data Use in Econometrics
Data Splitting Philosophy
Machine Learning (Data
Econometrics (Full Data Use)
Splitting)
Maximizes statistical power
Estimates out-of-sample
Focuses on in-sample fit and performance
inference
Mitigates overfitting
Relies on asymptotic theory
Validates model generalizability
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 32 / 45
Data Splitting in ML vs. Full Data Use in Econometrics
Rationale Behind Data Splitting
Provides unbiased estimate of model performance on new data
Helps detect and prevent overfitting
Allows for model selection and hyperparameter tuning
Mimics real-world scenario of predicting on unseen data
Essential for complex, flexible models prone to overfitting
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 33 / 45
Data Splitting in ML vs. Full Data Use in Econometrics
How to Split Data
Common split ratios: 70-30, 80-20 (train-test)
Cross-validation: k-fold, leave-one-out
Stratified sampling for imbalanced datasets
Time-based splitting for time series data
Nested cross-validation for model selection and evaluation
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 34 / 45
Data Splitting in ML vs. Full Data Use in Econometrics
Limitations of Data Splitting
Reduced sample size for model estimation
Potential loss of statistical power
May not work well for small datasets
Can be sensitive to specific random splits
Challenges with non-i.i.d. data (e.g., time series, spatial data)
May not capture long-term or rare events in test set
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 35 / 45
Data Splitting in ML vs. Full Data Use in Econometrics
When Data Splitting is Useful
Large datasets where statistical power is not an issue
Complex models with many parameters
When the primary goal is predictive performance
In presence of potential overfitting
When assessing model generalizability is crucial
For model comparison and selection
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 36 / 45
Combining Econometrics and Machine Learning
Hybrid Approaches
Double Machine Learning for Treatment Effects
Causal Forests
Neural Network-based Instrumental Variables
High-dimensional Econometrics with ML Feature Selection
ML for Heterogeneous Treatment Effects
Synthetic Control Methods with ML
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 37 / 45
Combining Econometrics and Machine Learning
Applications of Combined Approaches
Policy Evaluation
Consumer Demand Estimation
Labor Market Analysis
Financial Risk Assessment
Macroeconomic Forecasting
Text-based Economic Indicators
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 38 / 45
Research Problems: Econometrics vs. ML
Problems Better Suited for Econometrics
Causal impact of minimum wage on employment
Effect of education on earnings
Evaluating the effectiveness of a new economic policy
Estimating price elasticity of demand
Analyzing the determinants of economic growth
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 39 / 45
Research Problems: Econometrics vs. ML
Problems Better Suited for Machine Learning
Predicting consumer purchasing behavior
Credit scoring and fraud detection
Stock price prediction
Sentiment analysis of economic news
Clustering countries based on economic indicators
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 40 / 45
Recent Developments
Recent Developments in Econometrics
Machine Learning for Heterogeneous Treatment Effects
Synthetic Control Methods
High-dimensional Econometrics
Econometrics of Networks
Structural Estimation with Deep Learning
Text Analysis in Economics
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 41 / 45
Recent Developments
Recent Developments in Machine Learning for Economics
Causal Machine Learning
Interpretable and Explainable AI
Reinforcement Learning for Economic Decision Making
Federated Learning for Privacy-preserving Analysis
Automated Machine Learning (AutoML)
Transfer Learning in Economics
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 42 / 45
Recent Developments
Conclusion
Econometrics and Machine Learning have distinct strengths
Increasing convergence and cross-pollination of ideas
Hybrid approaches leverage the best of both worlds
Future research likely to further integrate these fields
Importance of understanding both paradigms for modern data
analysis in economics
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 43 / 45
Recent Developments
References
Breiman, L. (2001). Statistical modeling: The two cultures. Statistical
Science, 16(3), 199-231.
Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and
Panel Data. MIT Press.
Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of
Statistical Learning. Springer.
Varian, H. R. (2014). Big data: New tricks for econometrics. Journal
of Economic Perspectives, 28(2), 3-28.
Athey, S., Imbens, G. W. (2019). Machine Learning Methods
Economists Should Know About. Annual Review of Economics, 11,
685-725.
Mullainathan, S., Spiess, J. (2017). Machine learning: an applied
econometric approach. Journal of Economic Perspectives, 31(2),
87-106.
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 44 / 45
Recent Developments
Thank You
Questions? Comments?
Dr Merwan Roudane Econometrics and Machine Learning: A Comprehensive Comparison
July 24, 2024 45 / 45