0% found this document useful (0 votes)

13 views5 pages

R Code Regression PCA Guide

This document serves as a beginner-friendly guide to performing regression analysis in R, covering essential steps such as setting up the environment, reading datasets, and building regression models. It includes methods for checking assumptions like multicollinearity, autocorrelation, heteroscedasticity, and normality of residuals, as well as techniques for outlier detection and data transformation. Additionally, it introduces Principal Component Analysis (PCA) for dimensionality reduction and provides a summary of R functions used in the analysis.

Uploaded by

Dr. Pritam Kumar Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views5 pages

R Code Regression PCA Guide

Uploaded by

Dr. Pritam Kumar Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Beginner-Friendly Guide to R Code: Regression

Analysis, Assumptions, and PCA

1 Setting Up the Environment

The following code sets and conﬁrms the working directory where R will look for and
save ﬁles.
1 setwd ( " C : / Users / oralc / Desktop / MDA " )
2 getwd ()

Purpose:
• setwd(): Sets the working directory.
• getwd(): Conﬁrms the current working directory.

2 Reading the Dataset

This section loads the dataset and makes its columns accessible by name.
1 rdata = read . csv ( " Data _ lifestyle . csv " , header = TRUE )
2 rdata
3 names ( rdata )
4 attach ( rdata )

Explanation:
• Loads the dataset and assigns it to rdata.
• header = TRUE: First row contains column names.
• attach(): Allows direct access to columns (e.g., Y instead of rdata$Y).

3 Loading Required Libraries

These libraries provide tools for regression diagnostics and transformations.
1 library ( car )
2 library ( lmtest )

Purpose:
• car: For VIF and transformations.
• lmtest: For diagnostics like Durbin-Watson and Breusch-Pagan tests.

1
4 Building Regression Models
The following code builds multiple linear regression models.
1 Reg1 = lm ( Y ~ X1 + X2 )
2 summary ( Reg1 )
3

4 Reg2 = lm ( Y ~ X1 + X2 + ... + X21 )

5 summary ( Reg2 )
6

7 Rega = lm ( Y ~ . , data = rdata )

8 summary ( Rega )

Theory: Multiple Linear Regression is deﬁned as:

Y = β0 + β1 X1 + β2 X2 + · · · + βn Xn + ε

The summary() function provides R2 , coeﬃcients (β), and p-values.

5 Checking for Multicollinearity

This section checks for multicollinearity among predictors.
1 cor ( rdata )
2 View ( cor ( rdata ) )
3

4 vif ( Reg2 )
5 mean ( vif ( Reg2 ) )

Theory:

• High correlation among predictors indicates multicollinearity.

• Variance Inﬂation Factor (VIF):

1
VIFj =
1 − Rj2

VIF > 4 suggests multicollinearity.

6 Autocorrelation Check
The Durbin-Watson test checks for autocorrelation in residuals.
1 dwt ( Reg2 )

Durbin-Watson Test:

• DW ≈ 2: No autocorrelation.

• DW < 1.5: Indicates problematic positive autocorrelation.

2
7 Heteroscedasticity Test
This section tests for constant variance in residuals.
1 residuals ( Reg2 )
2 summary ( residuals ( Reg2 ) )
3 plot ( residuals ( Reg2 ) )
4

5 bptest ( Reg2 )

Breusch-Pagan Test:

• H0 : Constant variance (homoscedasticity).

• H1 : Non-constant variance (heteroscedasticity).

• p > 0.05: Assumption of homoscedasticity holds.

8 Normality of Residuals
The Shapiro-Wilk test checks if residuals are normally distributed.
1 shapiro . test ( residuals ( Reg2 ) )

Shapiro-Wilk Test:

• H0 : Residuals are normally distributed.

• p > 0.05: No violation of normality.

9 Outlier Detection Cooks Distance

Cooks Distance identiﬁes inﬂuential points in the dataset.
1 cook = cooks . distance ( Reg2 )
2 boxplot ( cook )
3 hist ( cook )
4 plot ( cook )
5 which ( cook > 0.01)
6

7 rdata1 $ cooks . distance = cooks . distance ( Reg2 )

8 cleandata = subset ( rdata1 , cooks . distance < 0.01)

Theory:

• Cooks Distance identiﬁes highly inﬂuential points.

• Threshold: 0.01 (or 4/n).

• Remove outliers to create cleandata.

3
10 Response Variable Transformation
Transformations address non-linearity or heteroscedasticity.
1 rdata $ logY = log ( rdata $ Y )
2 trReg = lm ( logY ~ . , data = rdata )
3 summary ( trReg )
4

5 pt = powerTransform ( rdata $ Y )
6 rdata $ newY = ( rdata $ Y ^ 0.71)
7 boxReg = lm ( newY ~ . , data = rdata )
8 summary ( boxReg )

When to Use Transformations:

• log(Y): For skewed or exponential data.

• sqrt(Y): For moderate skewness.

• 1/Y: For large values with low impact.

• Box-Cox: Auto-selects optimal transformation using powerTransform().

11 PCA Preparation (Optional for Multicollinear-

ity)
This step prepares the dataset for Principal Component Analysis (PCA).
1 rdata1 . df = data . frame ( rdata )
2 rdata1 = rdata1 . df [ , 2:22] # Drop Y

Prepares data by excluding the dependent variable.

12 Principal Component Analysis (PCA)

PCA reduces dimensionality and resolves multicollinearity.
1 install . packages ( " psy " )
2 install . packages ( " psych " )
3 install . packages ( " GPArotation " )
4

5 library ( psy )
6 library ( psych )
7 library ( GPArotation )
8

9 scree . plot ( rdata )

11 model modeled = pca ( rdata , nfactors = 15 , rotate = " none " )

12 model1 $ loadings
13

14 PCAmodel = pca ( rdata , nfactors = 4 , rotate = " varimax " , method =

" regression " , scores = TRUE )

4
15 PCAmodel $ loadings
16 PCAmodel $ scores
17

18 finalPCAdata = cbind ( rdata , PCAmodel $ scores )

19 write . csv ( finalPCAdata , file = " finalPCAdata . csv " )

Theory:

• PCA reduces dimensionality and resolves multicollinearity.

• Varimax rotation improves interpretability.

• scores = TRUE: Adds new components (PC1, PC2, etc.) to the dataset.

13 Optional GUI for Beginners

R Commander provides a GUI for statistical analysis.
1 install . packages ( " Rcmdr " )
2 library ( Rcmdr )

Opens R Commander for easier statistical analysis.

14 Summary: Code vs. Theory

Code / Concept Theory Explanation

lm() Multiple Linear Regression
vif() Detects multicollinearity (VIF > 4 is problematic)
dwt() Durbin-Watson Test (Autocorrelation)
bptest() Breusch-Pagan Test (Homoscedasticity)
[Link]() Normality of residuals (Shapiro-Wilk test)
[Link]() Inﬂuential Outliers (Cook’s D > 0.01)
log(Y), powerTransform() Fix non-linearity or heteroscedasticity
pca() Principal Component Analysis (PCA) for dimensionality

Table 1: Summary of R Code and Corresponding Theory

R Script
No ratings yet
R Script
14 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
Mod 3
No ratings yet
Mod 3
50 pages
Econometrics I: RStudio Guide
No ratings yet
Econometrics I: RStudio Guide
77 pages
Da Lab File 2
No ratings yet
Da Lab File 2
13 pages
R Examples
No ratings yet
R Examples
56 pages
CS 2008 3complete PDF
No ratings yet
CS 2008 3complete PDF
53 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Applied Statistics
No ratings yet
Applied Statistics
457 pages
Model Lab
No ratings yet
Model Lab
6 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
R Practicals
No ratings yet
R Practicals
32 pages
R Practical Ecotrix
No ratings yet
R Practical Ecotrix
4 pages
R for Applied Statistics
No ratings yet
R for Applied Statistics
457 pages
Course Notes18
No ratings yet
Course Notes18
113 pages
Assignment 2 - Factor Hair
No ratings yet
Assignment 2 - Factor Hair
39 pages
R Commands
No ratings yet
R Commands
18 pages
Applied Statistics with R Guide
No ratings yet
Applied Statistics with R Guide
417 pages
Practical Machine Learning Course Notes
No ratings yet
Practical Machine Learning Course Notes
76 pages
R Studio Cheat Sheet
No ratings yet
R Studio Cheat Sheet
6 pages
Essential R
No ratings yet
Essential R
261 pages
List of Experiments
No ratings yet
List of Experiments
5 pages
R Lab 3
No ratings yet
R Lab 3
7 pages
Advanced Statistical Modeling in R
No ratings yet
Advanced Statistical Modeling in R
16 pages
R Regression Analysis Guide
No ratings yet
R Regression Analysis Guide
16 pages
Empirical Guidance
No ratings yet
Empirical Guidance
38 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Bda Skill
No ratings yet
Bda Skill
34 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Learning Book 11 Feb
No ratings yet
Learning Book 11 Feb
322 pages
Econometrics in R
No ratings yet
Econometrics in R
34 pages
Saurabh
No ratings yet
Saurabh
22 pages
Basic Statistics
No ratings yet
Basic Statistics
66 pages
R Course
No ratings yet
R Course
7 pages
R Codes
No ratings yet
R Codes
5 pages
R For Statistical Learning
No ratings yet
R For Statistical Learning
301 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
R For Introductory Econometrics-1
No ratings yet
R For Introductory Econometrics-1
4 pages
Data Science
No ratings yet
Data Science
15 pages
Baum - An Introduction To Modern Econometrics Using Stata
100% (1)
Baum - An Introduction To Modern Econometrics Using Stata
376 pages
R Notes
No ratings yet
R Notes
4 pages
R Functions for Statistical Analysis
No ratings yet
R Functions for Statistical Analysis
4 pages
Notes
No ratings yet
Notes
6 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
17 pages
R Notesss
No ratings yet
R Notesss
12 pages
BAN5
No ratings yet
BAN5
2 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
DA Lab Week-1
No ratings yet
DA Lab Week-1
7 pages
Project 2 Factor Hair Revised Case Study
No ratings yet
Project 2 Factor Hair Revised Case Study
25 pages
Econometrics
No ratings yet
Econometrics
28 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
Pratik Zanke Factor Hair Revised
No ratings yet
Pratik Zanke Factor Hair Revised
37 pages
Brand Reboot
No ratings yet
Brand Reboot
3 pages
E-Commerce Platform Challenge Analysis
No ratings yet
E-Commerce Platform Challenge Analysis
39 pages
Snapdeal's Strategic Gaps Analysis
No ratings yet
Snapdeal's Strategic Gaps Analysis
12 pages
GSK Treasure 3
No ratings yet
GSK Treasure 3
16 pages
Flipkart Neighborhood Hubs - Strategic Implementation Plan
No ratings yet
Flipkart Neighborhood Hubs - Strategic Implementation Plan
5 pages
Dental
No ratings yet
Dental
6 pages
Linear Discriminant Analysis (LDA)
No ratings yet
Linear Discriminant Analysis (LDA)
5 pages
Analysis of Variance
No ratings yet
Analysis of Variance
2 pages
Summary of Ug Fee (2024-2025)
No ratings yet
Summary of Ug Fee (2024-2025)
6 pages
Douglas College PHIL1170 - 13 - 0182852 PDF
No ratings yet
Douglas College PHIL1170 - 13 - 0182852 PDF
3 pages
Duneier Sidewalk
No ratings yet
Duneier Sidewalk
3 pages
Bringing Up Bebe
0% (4)
Bringing Up Bebe
2 pages
My Resume Veerdhaval
No ratings yet
My Resume Veerdhaval
1 page
PTE Success for IELTS Strugglers
100% (1)
PTE Success for IELTS Strugglers
25 pages
Philosophy of Law - Syllabus
No ratings yet
Philosophy of Law - Syllabus
6 pages
Bece 103 (0-2)
No ratings yet
Bece 103 (0-2)
5 pages
Protein Chemistry MCQs & Case Study
No ratings yet
Protein Chemistry MCQs & Case Study
5 pages
Algebra & Functions Test
No ratings yet
Algebra & Functions Test
2 pages
Chivvy Poem
100% (1)
Chivvy Poem
15 pages
Microeconomics Canadian Edition Fourth Edition Paul Krugman Ebook and TestBank Bundle Full Version
No ratings yet
Microeconomics Canadian Edition Fourth Edition Paul Krugman Ebook and TestBank Bundle Full Version
322 pages
案例研究在心理学中的重要性
100% (1)
案例研究在心理学中的重要性
10 pages
English Test Answers for 11th Grade
No ratings yet
English Test Answers for 11th Grade
12 pages
5.6 N Sem I 1a Introduction To Communication Skills B.SC
No ratings yet
5.6 N Sem I 1a Introduction To Communication Skills B.SC
6 pages
Overview of Operating Systems History
No ratings yet
Overview of Operating Systems History
48 pages
Lesson Exemplar: (If Available, Write The Indicated Melc) (If Available, Write The Attached Enabling Competencies)
No ratings yet
Lesson Exemplar: (If Available, Write The Indicated Melc) (If Available, Write The Attached Enabling Competencies)
4 pages
Most Essential LCs Media and Information Literacy
No ratings yet
Most Essential LCs Media and Information Literacy
2 pages
2023 P4 English End of Year Exam MGS
No ratings yet
2023 P4 English End of Year Exam MGS
22 pages
Elementary Test 2
No ratings yet
Elementary Test 2
2 pages
Erich Fromm 1929a5-E: Psychoanalysis and Sociology
100% (4)
Erich Fromm 1929a5-E: Psychoanalysis and Sociology
3 pages
Research Processes For Practical Research 1
No ratings yet
Research Processes For Practical Research 1
40 pages
Culture Shock Lesson Plan for ESL Students
No ratings yet
Culture Shock Lesson Plan for ESL Students
16 pages
1st Unit Examination Sechedule 2025-26
No ratings yet
1st Unit Examination Sechedule 2025-26
1 page
Abacus Place Value 2 Digit
No ratings yet
Abacus Place Value 2 Digit
2 pages
Progressivist Curriculum in Education
100% (1)
Progressivist Curriculum in Education
3 pages
DLL Gen Math Week 4
No ratings yet
DLL Gen Math Week 4
6 pages
MHI Speech at House of The Parliament of Canada - Feb 27, 2014
No ratings yet
MHI Speech at House of The Parliament of Canada - Feb 27, 2014
7 pages
Metal Products Market Survey Presentation
No ratings yet
Metal Products Market Survey Presentation
12 pages
Present Simple vs Continuous Guide
No ratings yet
Present Simple vs Continuous Guide
1 page