IIMT2641
Introduction to Business Analytics
Introduction to Business Analytics
IIMT2641
Tutorial 03 – Introduction to R (II)
Linear Regression - NBA
IIMT2641
Introduction to Business Analytics
Functions
• A function takes in data, process it and return a
result or accomplish a specific task.
• Generally runs as funcname(input)
• Some basic functions useful for summarizing data:
– length() : length of a vector (number of elements)
– min() : minimum value
– max() : maximum value
– range() : range of data
– mean() : mean
– sd() : standard deviation
– sum() : sum
The University of Hong Kong 3
IIMT2641
Introduction to Business Analytics
• Create a vector
– temp <- c(35, 23, 29, 31, 28, 27)
• length(temp)
[1] 6
• min(temp)
[1] 23
• max(temp)
[1] 35
The University of Hong Kong 4
IIMT2641
Introduction to Business Analytics
• range(temp)
[1] 23 35
• mean(temp)
[1] 28.8333
• sd(temp)
[1] 4.020779
• sum(temp)
[1] 173
• In RStudio, we could type “?<funcation name>” to look
for the description of a function. e.g. ?mean
The University of Hong Kong 5
IIMT2641
Introduction to Business Analytics
Linear Regression - NBA
Read in the data
> NBA = read.csv("NBA_train.csv")
> str(NBA)
The University of Hong Kong 6
IIMT2641
Introduction to Business Analytics
Playoffs and Wins
# Compute Points Difference
> NBA$PTSdiff = NBA$PTS - NBA$oppPTS
# Check for linear relationship
> plot(NBA$PTSdiff, NBA$W)
The University of Hong Kong 8
IIMT2641
Introduction to Business Analytics
Playoffs and Wins
# Linear regression model for wins
> WinsReg = lm(W ~ PTSdiff, data=NBA)
> summary(WinsReg)
The University of Hong Kong 9
IIMT2641
Introduction to Business Analytics
Points Scored
# Linear regression model for points scored
> PointsReg = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB +
DRB + TOV + STL + BLK, data=NBA)
> summary(PointsReg)
The University of Hong Kong 10
IIMT2641
Introduction to Business Analytics
Points Scored
# Sum of Squared Errors
> PointsReg$residuals
> SSE = sum(PointsReg$residuals^2)
> SSE
[1] 28394314
The University of Hong Kong 11
IIMT2641
Introduction to Business Analytics
Points Scored
# Root mean squared error
> RMSE = sqrt(SSE/nrow(NBA))
> RMSE
[1] 184.4049
# Average number of points in a season
> mean(NBA$PTS)
[1] 8370.24
The University of Hong Kong 12
IIMT2641
Introduction to Business Analytics
Points Scored
# Remove insignifcant variables
> PointsReg2 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + DRB + STL + BLK, data=NBA)
> summary(PointsReg2)
The University of Hong Kong 13
IIMT2641
Introduction to Business Analytics
Points Scored
> PointsReg3 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + STL + BLK, data=NBA)
> summary(PointsReg3)
The University of Hong Kong 14
IIMT2641
Introduction to Business Analytics
Points Scored
> PointsReg4 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + STL, data=NBA)
> summary(PointsReg4)
The University of Hong Kong 15
IIMT2641
Introduction to Business Analytics
Points Scored
Compute SSE and RMSE for new model
> SSE_4 = sum(PointsReg4$residuals^2)
> RMSE_4 = sqrt(SSE_4/nrow(NBA))
> SSE_4
[1] 28421465
> RMSE_4
[1] 184.493
The University of Hong Kong 16
IIMT2641
Introduction to Business Analytics
Making Predictions
# Read in test set
NBA_test = read.csv("NBA_test.csv")
The University of Hong Kong 17
IIMT2641
Introduction to Business Analytics
Making Predictions
# Make predictions on test set
> PointsPredictions = predict(PointsReg4,
newdata=NBA_test)
> PointsPredictions
The University of Hong Kong 18
IIMT2641
Introduction to Business Analytics
Making Predictions
# Compute out-of-sample R^2
> SSE = sum((PointsPredictions - NBA_test$PTS)^2)
> SST = sum((mean(NBA$PTS) - NBA_test$PTS)^2)
> R2 = 1 - SSE/SST
> R2
[1] 0.8127142
# Compute the RMSE
> RMSE = sqrt(SSE/nrow(NBA_test))
> RMSE
[1] 196.3723
The University of Hong Kong 19