0% found this document useful (0 votes)

10 views5 pages

Tutorial 3 - Updated

This document is a tutorial for practicing data science methods in finance, focusing on predicting diamond prices and forecasting US stock market returns using R. It includes exercises utilizing the Caret and glmnet packages for model training and evaluation, as well as instructions for data preparation and analysis. The tutorial encourages the development of clean programs and does not require submission of answers.

Uploaded by

q.s.b.bibo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views5 pages

Tutorial 3 - Updated

Uploaded by

q.s.b.bibo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Science Methods in Finance

R Tutorial 3

November 8, 2024

Important Instructions
• The purpose of this tutorial is for you to practise some of the key concepts we covered
in the first topic

• It should not be submitted, but we strongly encourage you to work through it

For this exercise, NO write-up of your answers or submission is required. How-

ever, we recommend you already begin developing clean programs that you can
use later in the group assignment and the take-home exam

1
Question 1
The task in this question is to predict diamond prices. The main learning goal of this
question is to begin familiarizing you with the Caret package. The caret package contains
functions to streamline the model training process for complex regression and classification
problems. This package alone is often all you need to know for solving almost any supervised
machine learning problem; in addition, it also provides tools for auxiliary techniques such as:

• Data preparation (imputation, centering/scaling data, etc.)

• Data splitting

• Variable selection

• Model evaluation

1. Load the ggplot and summarize the diamonds dataset (a dataset that comes built-in
with the ggplot2 package), which contains the prices and other attributes of almost
54,000 diamonds. Split the data into a test and a training sample.

2. Estimate a linear model with lm and compare the RMSE in the test sample with the
RMSE based on the training sample.

3. Fit a Lasso to the training sample by using the glmnet package. Compare the RMSE
with the RMSE from the linear model for both the training and testing sample.

4. We now want to repeat the exercise with the Caret package. Begin by google “train-
Control Caret". The function trainControl generates parameters that further control
how models are created. Initialize a trainControl object that trains a model using 5 fold
cross-validation

5. Train an elastic-net model and set tuneLength=10. Use the trainControl object and
compare the RMSE with that from the linear model.

6. Explain how the glmnet can fit both Lasso and Ridge, which you can control by an
alpha parameter. Setting alpha = 0 gives the Ridge regression while setting alpha = 1
gives the Lasso regression.

2
Question 2
The main task of this question is to forecast the return on the US stock market.
To do so, you will need to use time-series cross-validation and a supervised learning methods
we covered in class. With permission from you, we plan to post the two best codes
with output on Canvas.
The following steps are meant to help you to get started.

1. We will use the datafile “[Link]”

2. You are free to choose any horizon but we will use quarterly horizon in the tutorial

3. The target you want to predict is the variable called “CRSP_SPvw" and the set of
predictors are the ones in the excel sheet. You are free to choose the ones you want to
include, but it should be at least 5, and make sure to lag the predictors by one period
(or lead the target)

4. Clean up your data with dplyr and use the readxl library to get your data into r. Using
dplyr together with readxl is part of the learning goal. The r file on canvas “02_start"
provides an example of how to align data.

After having cleaned the data, you should write a program that produces the following
output (but feel free to do much more)

5. Write a program that takes predictors as input and predict the return on the market
next period (feel free to play around with different forecast horizons)

6. Calculate out of sample R2 over the full period

7. Calculate out of sample R2 on a rolling basis

8. Compare the performance of your best model with that of a rolling mean. A rolling
mean means that you take the average of the target over the period from time t to T
and use that to predict the value of the target for period T + k.

3
Solution Q1.6
The glmnet package in R provides tools for fitting generalized linear models using penalized
maximum likelihood estimation. It supports ridge regression, lasso regression, and elastic-net
regression.
The general formulation of the regression problem solved by glmnet, with predictors X =
[x1 , x2 , . . . , xn ], response variable y, and regularization parameters λ and α, is as follows:

1 2 1 2
min ky − Xβk2 + λ (1 − α) kβk2 + α kβk1 ,
β 2N 2
where:

• 1
2N
ky − Xβk22 represents the mean squared error (MSE) scaled by 12 . This is commonly
used to simplify the mathematical derivation of the optimization problem.

• kβk22 represents the squared L2-norm of the coefficients β, known as the ridge penalty
term.

• kβk1 represents the L1-norm of the coefficients β, known as the lasso penalty term.

• N is the number of observations.

• λ is the regularization parameter controlling the amount of shrinkage: larger values of

λ result in greater shrinkage.

• α is the elastic-net mixing parameter, where 0 ≤ α ≤ 1:

– α = 1: The penalty is purely the lasso penalty.

– α = 0: The penalty is purely the ridge penalty.

– 0 < α < 1: The penalty combines lasso and ridge penalties, forming the elastic net
penalty.

The parameter λ is typically chosen through cross-validation.

For specific cases:

• Setting α = 1 in the glmnet function fits a lasso model, minimizing:

1 2
min ky − Xβk2 + λ kβk1 .
β 2N

4
• Setting α = 0 fits a ridge regression model, minimizing:

1 2 1 2
min ky − Xβk2 + λ kβk2 .
β 2N 2

Lasso regression performs feature selection by setting some coefficients exactly to zero.
Ridge regression, while shrinking coefficients, retains all predictors in the model.

R For Data Science Lecture 5: Chang Liu
No ratings yet
R For Data Science Lecture 5: Chang Liu
65 pages
Block 5 ST3189
No ratings yet
Block 5 ST3189
6 pages
Glmnet
No ratings yet
Glmnet
42 pages
Machine Learning in Chinese Stock Market
No ratings yet
Machine Learning in Chinese Stock Market
43 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Fet402 Lec02 2023 Econometrics
No ratings yet
Fet402 Lec02 2023 Econometrics
60 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
Bahan Univariate Linear Regression
No ratings yet
Bahan Univariate Linear Regression
64 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
1 page
Linear Regression
No ratings yet
Linear Regression
18 pages
Econometrics Chapter 3
No ratings yet
Econometrics Chapter 3
80 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
A Brief Overview of The Classical Linear Regression Model (CLRM)
No ratings yet
A Brief Overview of The Classical Linear Regression Model (CLRM)
85 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Statistical Modelling and Regression Techniques
No ratings yet
Statistical Modelling and Regression Techniques
63 pages
TP MSDC 3
No ratings yet
TP MSDC 3
6 pages
ML Module III
No ratings yet
ML Module III
64 pages
Linear Regression with Python OLS
No ratings yet
Linear Regression with Python OLS
23 pages
Chapter 03 Highlighted
No ratings yet
Chapter 03 Highlighted
80 pages
Unit 2
No ratings yet
Unit 2
92 pages
Overview of glmnet Package Features
No ratings yet
Overview of glmnet Package Features
7 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Lecture+Notes+-+Advanced+Regression
No ratings yet
Lecture+Notes+-+Advanced+Regression
12 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
100% (2)
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
7 pages
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
No ratings yet
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
14 pages
Linear Regression in MATLAB
No ratings yet
Linear Regression in MATLAB
9 pages
Ch2 Slides Edited
No ratings yet
Ch2 Slides Edited
66 pages
Econometrics Exam: OLS & R Code
No ratings yet
Econometrics Exam: OLS & R Code
3 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2002 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2002 1
105 pages
Lecture 3 Classical Linear Regression Model
No ratings yet
Lecture 3 Classical Linear Regression Model
55 pages
Overview of Classical Regression Model
100% (1)
Overview of Classical Regression Model
84 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
AI & ML Lab Manual - LDCE
No ratings yet
AI & ML Lab Manual - LDCE
70 pages
Unit II - Supervised Machine Learning Techniques
No ratings yet
Unit II - Supervised Machine Learning Techniques
131 pages
Machine Learning in Econometrics
No ratings yet
Machine Learning in Econometrics
41 pages
Linear Models in Econometrics Module
100% (2)
Linear Models in Econometrics Module
73 pages
Regression Analysis & Model Estimation
No ratings yet
Regression Analysis & Model Estimation
66 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
445 Lecture 7
No ratings yet
445 Lecture 7
30 pages
HW3 Equity
No ratings yet
HW3 Equity
5 pages
2.1 Supervised Regression
No ratings yet
2.1 Supervised Regression
26 pages
Tutorial 1-13 Answer Intermediate Macro
No ratings yet
Tutorial 1-13 Answer Intermediate Macro
40 pages
Analysis Course HW5
No ratings yet
Analysis Course HW5
7 pages
Regularization in Polynomial Regression
No ratings yet
Regularization in Polynomial Regression
30 pages
Lecture 1: Introduction and Key Concepts
No ratings yet
Lecture 1: Introduction and Key Concepts
62 pages
Lecture 2.1 Regression
No ratings yet
Lecture 2.1 Regression
20 pages
DS Unit 4
No ratings yet
DS Unit 4
21 pages
Alizadeh, S., Brandt, M. and Diebold, F.X. (2002), "Range-Based Estimation of Stochastic Volatility Models," Journal of Finance, 57, 1047-1092
No ratings yet
Alizadeh, S., Brandt, M. and Diebold, F.X. (2002), "Range-Based Estimation of Stochastic Volatility Models," Journal of Finance, 57, 1047-1092
64 pages
Homework 3
No ratings yet
Homework 3
10 pages
IV With Heterogenous Treatment Effects
No ratings yet
IV With Heterogenous Treatment Effects
76 pages
Statistics Mcqs - Estimation Part 1: Examrace
100% (1)
Statistics Mcqs - Estimation Part 1: Examrace
7 pages
Applied Econometrics For Managers (MBAA-II, AY: 2023-24) IIM Kashipur
No ratings yet
Applied Econometrics For Managers (MBAA-II, AY: 2023-24) IIM Kashipur
3 pages
Understanding PMCC and Regression Analysis
No ratings yet
Understanding PMCC and Regression Analysis
3 pages
Chapter7 ANOVA
No ratings yet
Chapter7 ANOVA
20 pages
Kruskal-Wallis Test Results Summary
No ratings yet
Kruskal-Wallis Test Results Summary
5 pages
LINEAR MODELS Cheatsheet
No ratings yet
LINEAR MODELS Cheatsheet
14 pages
Regression Course Outline
No ratings yet
Regression Course Outline
5 pages
One Sample Hypothesis Testing Guide
No ratings yet
One Sample Hypothesis Testing Guide
37 pages
استخدام أحد نماذج بوكس-جينكنز للتنبؤ بأعداد الطالبات في المرحلة الأساسية في محافظة أبين PDF
No ratings yet
استخدام أحد نماذج بوكس-جينكنز للتنبؤ بأعداد الطالبات في المرحلة الأساسية في محافظة أبين PDF
2 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
SCM Sampling Techniques Guide
No ratings yet
SCM Sampling Techniques Guide
3 pages
Assignment 2 (2015F)
No ratings yet
Assignment 2 (2015F)
8 pages
Black-Scholes Option Pricing Explained
No ratings yet
Black-Scholes Option Pricing Explained
15 pages
Price Elasticity of Supply Explained
No ratings yet
Price Elasticity of Supply Explained
13 pages
Survival 2
No ratings yet
Survival 2
17 pages
Atkinson-Riani - Robust Diagnostic Regression Analysis
No ratings yet
Atkinson-Riani - Robust Diagnostic Regression Analysis
341 pages
Lecture 1 Course Content PDF
No ratings yet
Lecture 1 Course Content PDF
4 pages
C - Stat 310 Stat 311
No ratings yet
C - Stat 310 Stat 311
2 pages
Study Plan
100% (1)
Study Plan
2 pages
Testing Hypothesis For Oil and Gas Companies
No ratings yet
Testing Hypothesis For Oil and Gas Companies
4 pages
Understanding Value at Risk (VaR) Methods
No ratings yet
Understanding Value at Risk (VaR) Methods
4 pages
Taller Final Estadistica Tercer Corte
No ratings yet
Taller Final Estadistica Tercer Corte
5 pages
Assignment-1: ANOVA Analysis (Spinning The Coin)
No ratings yet
Assignment-1: ANOVA Analysis (Spinning The Coin)
5 pages
1 - Probability Distributions in R
No ratings yet
1 - Probability Distributions in R
4 pages
Choosing The Right Statistical Test
No ratings yet
Choosing The Right Statistical Test
10 pages
6 Estimation
No ratings yet
6 Estimation
28 pages
Binomial Trees for Finance Experts
No ratings yet
Binomial Trees for Finance Experts
33 pages

Tutorial 3 - Updated

Uploaded by

Tutorial 3 - Updated

Uploaded by

Data Science Methods in Finance

• It should not be submitted, but we strongly encourage you to work through it

For this exercise, NO write-up of your answers or submission is required. How-

• Data preparation (imputation, centering/scaling data, etc.)

1. We will use the datafile “[Link]”

6. Calculate out of sample R2 over the full period

7. Calculate out of sample R2 on a rolling basis

• N is the number of observations.

• λ is the regularization parameter controlling the amount of shrinkage: larger values of

• α is the elastic-net mixing parameter, where 0 ≤ α ≤ 1:

– α = 1: The penalty is purely the lasso penalty.

– α = 0: The penalty is purely the ridge penalty.

The parameter λ is typically chosen through cross-validation.

• Setting α = 1 in the glmnet function fits a lasso model, minimizing:

You might also like