0% found this document useful (0 votes)

3 views10 pages

R Computer Lab4 Instructions

The document outlines a comprehensive guide for performing exploratory data analysis (EDA) and regression diagnostics using R, specifically focusing on fixed effects structures. It includes instructions for data preparation, visualization techniques, and regression modeling, along with specific code snippets for various analyses such as plotting relationships between science scores and other variables. Additionally, it covers model diagnostics and influence diagnostics using the HLMdiag package.

Uploaded by

nimra anjum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views10 pages

R Computer Lab4 Instructions

Uploaded by

nimra anjum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Edps/Psych/Stat 587

Spring 2019
C.J.Anderson

R: EDA of Fixed Effects Structure with a Little Regression Diagnostics

Date: tba

1. The first thing you need to do is install and load the following packages --- each should be in

require( ) or library( ) :
library(lme4)
library(lmerTest)
library(lattice)
library(ggplot2)
library(stringi) # installed before HLMdiag or HLMdiag won't load properly
library(texreg)
library(optimx)
2.
Do basic setup, which will be on lab4 template:
#
# read in data into R
# the first row of the data matrix is the variable names
#

# Change path to where your data live

Lab4<-read.table("D:/Dropbox/edps587/Lab_and_homework/
Computer lab4/lab3.txt",header=TRUE)

#look at first few rows of data set

head(lab3)

#
# Set-up -- from before
#
# Use if statement to create "boy"

lab3$boy = ifelse(lab4$gender=="boy", 1, 0) # for some

reason this creates "NA" for girls
lab4$boy[is.na(lab4$boy)] <- 1

# Create Third

lab3$third <- ifelse(lab4$grade==3, 1, 0)

# School mean Center math and school mean math

grpMmath <- as.data.frame(aggregate(math~idschool,

data=lab4, "mean"))
names(grpMmath) <- c('idschool', 'grpMmath')
lab3 <- merge(lab3,grpMmath, by=c('idschool'))

lab4$grpCmath <- lab4$math – lab4$grpMmath

#
# make school id a factor variable
#
Lab4$idschool<-as.factor(lab4$idschool)

#
# Re-scale for later use
#
lab3$gcmath <- scale(lab3$grpCmath, center=FALSE,
scale=TRUE)
lab3$gmmath <- scale(lab3$grpMmath, center=FALSE,
scale=TRUE)

3. Since this is a very large data set, it is sometimes nice to have a smaller random sample to us.
The following code will do this:

####################################################
# Get random sample of 25 schools
####################################################
#
# Look at individuals within schools
#
####################################################

groups <- unique(lab4$idschool)[sample(1:145,25)]

subset <- lab4[lab4$idschool%in%groups,]

4. The first graph is a lattice plot to look at individual students within schools. Use the random
sample and plot science by math scores.
a. With points for students and linear regression for each school
#
# Plot 2: Lattice plot with regressions
#
xyplot(science ~ math | idschool, data=subset,
col.line=c('black','blue'),
type=c('p','r'),
main='Varability in Science ~ math relationship')

b. Do the same plot but use smooth regression (i.e., loess) rather than linear. To do
this change

type=c('p','smooth')

c. Note: if you want to do include both points, loess and linear regression, add ‘r’to
the type statement…you do not have to do this but you can if you want to see what
is looks like. This code will do it:

xyplot(science ~ math | idschool, data=subset,

type=c('p','smooth','r'),
main='Varability in Science ~ math relationship')

5. Using the sub-sample of the data, draw lattice plot of science by hours of watching TV (as
numeric variable) including
a. Linear regression lines
b. Smooth (loess) regressions

6. Using the sub-sample of the data, draw lattice plot of science by hours of playing computer
games (as numeric variable) including
a. Linear regression lines
b. Smooth (loess) regressions

7. Draw plot of regressions of science on math for all schools in the same plot:
a. Note this code is a bit picky in that it works best if all the code below is on one line
of input.

############################################################
# All regressions in one plot: science ~ math
############################################################
ggplot(lab4, aes(x=math, y=science, col=idschool,type='l'),
main='Regressions all in one plot') +
geom_smooth(method="lm", se=FALSE ) +
theme(legend.position="none")

b. Repeat but now only use our sub-sample of data.

8. Plot science by hour of watching TV.

a. Need to compute mean science for each value of watching TV:

(y <- aggregate(science~hoursTV, data=lab4, "mean"))

b. Now plot the means

plot(y[,1], y[,2], type='b',

ylab='Mean Science',
xlab='Hours Watching TV',
main='Marginal of Science x HoursTV')
9. Plot mean science by hours of playing computer games. (Revise code from above).

10. Plot regressions of science on hours watching TV for schools all in one figure

a. Use linear regression:

# Linear regressions
ggplot(lab4, aes(x=hoursTV, y=science,
col=idschool,type='l')) + geom_smooth(method="lm",
se=FALSE ) + theme(legend.position="none")

b. Use loess (smooth) curve:

geom_smooth(method="loess", se=FALSE)

11. Repeat 10 but plot science by hours playing computer games.

12. Plot science by type of community
a. Use linear regression

lab4$community <- as.factor(lab4$typecommunity)

ggplot(lab4, aes(x=math, y=science, col=community,

type='l')) + geom_smooth(method='lm', se=FALSE)

b. Use smooth lines

ggplot(lab4, aes(x=math, y=science, col=community,

type='l')) + geom_smooth(method='loess', se=FALSE)

c. Check frequencies per type of community

table(lab4$typecommunity)

lab4$group <- cut(lab4$science,10)

table(lab4$group,lab4$typecommunity)

13. Fit simple regression to each school and examine (plot) R-squares by sample size.

Set up for plot and loop

nj <- table(lab4$idschool) # number per school

nj <- as.data.frame(nj)
names(nj) <- c("idschool","nj")

ssmodel <- (0) # initialize SS model

sstotal <- (0) # initialize SS total
R2 <- matrix(99,nrow=nclusters,ncol=2) # for saving R2s

# For a simple model: science ~ math

# Loop through schools

index <-1 # initialize

for (i in (unique(lab4$idschool))) {
sub <- lab4[ which(lab4$idschool==i),] # data for one school
model0 <- lm(science ~ math, data=sub) # fit the model
a <- anova(model0) # save anova table
ssmodel <- ssmodel + a[1,2] # add to SS model
sstotal <- sstotal + sum(a[,2]) # sum of all SS
# save the R2
R2[index,1:2] <- as.numeric(cbind(i,summary(model0)$r.squared))

index <- index+1 # up-date index

}
a. Some clean up and prepare for plotting

R2meta <- ssmodel/sstotal

R2 <- as.data.frame(R2)
names(R2) <- c("idschool","R2")
R2.mod1 <- merge(R2,nj,by=c("idschool"))

b. Now plot the results

plot(R2.mod1$nj,R2.mod1$R2,type='p',
ylim=c(0,1),
ylab="R squared",
xlab="n_j (number of observations per school)" ,
main="Science ~ math")
abline(h=R2meta,col='blue') # add line for R2meta
text(70,0.95,'R2meta=.36',col="blue") # text of R2meta's

14. Compute and plot R-squares from regression of science on math, hours watching TV
and hours playing computer games where hours watching TV & playing computer
games are numeric variables. Call them

R2.mod2
15. Compute and plot R-square from regressions of science on math and hours watching
TV where hours watching TV are categorical variables. Call them

R2.mod3

16. To compare the R-squares from the models from item 13 (numeric TV) versus those
from item 14 (categorical TV), plot the R-squares against each other and draw in
reference line:
plot(R2.mod2$R2,R2.mod3$R2, pch=20,cex=1.4,
ylim=c(0,1),
xlim=c(0,1),
ylab='Hours as Discrete',
xlab='Hours as Numeric',
main='Hours Discrete vs Numeric')
abline(a=0,b=1, col='blue') # adds a reference line

17. Easiest regression diagnostic to obtain

a. Fit model:
modelxx <- lmer(science ~ 1 + gcmath + gmmath + hrsTV +
hrscg + ( 1 + gcmath|idschool), lab4, REML=FALSE,
control = lmerControl(optimizer ="Nelder_Mead"))

b. Plot the residuals

plot(modelxx, xlab='Fitted Conditional',

ylab='Pearson Residuals')
c. Use HLMdiag to get conditional residuals (i.e., the fixed plus EB of Us) and do a
dotplot Get stuff

# Get y-(Xgamma+ZU) where U estimated by Empricial Bayes

res1 <- HLMresid(modelxx, level=1, type="EB",
standardize=TRUE)

head(res1) # look at what is here

d. Does dotpots

# Plot of random effects with confidence bars

dotplot(ranef(modelxx,condVar=TRUE),
lattice.options=list(layout=c(1,2)))

18. We will now make a panel of plots (similar to what SAS does):

a. Ask for 4 graphs on a page in a 2 x 2 layout:

par(mfrow=c(2,2))

b. Do a scatter plot of standardized residuals x fitted values

fit <- fitted(modelxx) # get conditional fitted values

plot(fit,res1, xlab='Conditional Fitted Values',

ylab='Pearson Std Residuals',
main='Conditional Residuals')

c. Does the plot and adds in reference line

qqnorm(res1) # draws plot

abline(a=0,b=9, col='blue') # reference line

d. Draw histogram of standardized residuals are overlay normal distribution

h<- hist(res1,breaks=15,density=20) # draw historgram

xfit <- seq(-40, 40, length=50) # sets range & number quantiles
yfit <- dnorm(xfit, mean=0, sd=7.177) # should be normal
yfit <- yfit*diff(h$mids[1:2])*length(res1)# use mid-points
lines(xfit, yfit, col='darkblue', lwd=2) # draws normal

e. In the 4th plot put information about the model.

plot.new( ) # a plot with nothing in it.
text(.5,1.0,'Modelxx') # potentially useful text
text(.5,0.9,'Devience=48,356.5')
text(.5,0.8,'AIC=48,386.5')
text(.5,0.7,'BIC=48,489.5')

18. Influence diagnostics using package HLMdiag functions:

a. Cook’s distances:

par(mfrow=c(2,2))

cook <- cooks.distance(modelxx, group="idschool")

dotplot_diag(x=cook, cutoff="internal",
name="cooks.distance", ylab=("Cook's distance"),
xlab=("School"))

b. Mdfit:

mdfit <- mdffits(modelxx,group="idschool")

dotplot_diag(x=mdfit, cutoff="internal", name="mdffits",
ylab=("MDFIts"), xlab=("School"))

19. Not REUQIRED FOR HOMEWORK, but these can be

informative. See lme4 mannual for explanation of zeta
plot as well as some other graphics.

# zeta plot: straight lines are good

p1 <- profile(lmer(science ~ 1 + gcmath + gmmath +hrsTV + hrscg

+ ( 1 + gcmath|idschool), lab4, REML=FALSE))
xyplot(p1 , aspect =0.7)

xyplot(p1 , aspect =0.7, absVal=TRUE)

confint(p1, level=.99)

R Examples
No ratings yet
R Examples
56 pages
WEEK
No ratings yet
WEEK
17 pages
Machine Learning-Intro
No ratings yet
Machine Learning-Intro
7 pages
Code For Week4 - Simple Linear Regression
No ratings yet
Code For Week4 - Simple Linear Regression
5 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
R Computer Lab 3 Keep
No ratings yet
R Computer Lab 3 Keep
6 pages
Essential R Studio Commands Guide
No ratings yet
Essential R Studio Commands Guide
5 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
32 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
R For Marketing Research and Analytics
No ratings yet
R For Marketing Research and Analytics
47 pages
R Code
No ratings yet
R Code
9 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R Lab 3
No ratings yet
R Lab 3
7 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Lecture 10 R
No ratings yet
Lecture 10 R
117 pages
R Programming: Simple Linear Regression Lab
No ratings yet
R Programming: Simple Linear Regression Lab
5 pages
Essential R Functions Overview
No ratings yet
Essential R Functions Overview
3 pages
Essential R Functions Overview
No ratings yet
Essential R Functions Overview
3 pages
R Practicals
No ratings yet
R Practicals
32 pages
R Markdown Analysis of Regression Models
No ratings yet
R Markdown Analysis of Regression Models
33 pages
RStudio Tips and Common Functions Guide
No ratings yet
RStudio Tips and Common Functions Guide
7 pages
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
No ratings yet
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
6 pages
Regression Analysis for Engineers
No ratings yet
Regression Analysis for Engineers
76 pages
Problem Set 1 Solution Numerical Methods
No ratings yet
Problem Set 1 Solution Numerical Methods
32 pages
Linear Regression Analysis in R
100% (1)
Linear Regression Analysis in R
15 pages
Unit 4 - R Programming
No ratings yet
Unit 4 - R Programming
26 pages
List of Experiments
No ratings yet
List of Experiments
5 pages
330 Lecture9 2014
No ratings yet
330 Lecture9 2014
40 pages
EDA With R Lab Manual
No ratings yet
EDA With R Lab Manual
110 pages
R Manual
No ratings yet
R Manual
10 pages
BAN5
No ratings yet
BAN5
2 pages
R1 Plots
No ratings yet
R1 Plots
20 pages
Soruma SECOND ASSEsiment Final L Reg
No ratings yet
Soruma SECOND ASSEsiment Final L Reg
34 pages
R Lab: Normal Distribution & Matrix Operations
No ratings yet
R Lab: Normal Distribution & Matrix Operations
4 pages
R Console
No ratings yet
R Console
6 pages
D4 - R
No ratings yet
D4 - R
4 pages
Lab 10 Forest Regression
No ratings yet
Lab 10 Forest Regression
5 pages
Teaching R
No ratings yet
Teaching R
15 pages
R Course
No ratings yet
R Course
7 pages
R Data Analysis: Tree Age Estimation
No ratings yet
R Data Analysis: Tree Age Estimation
10 pages
Econometrics I: RStudio Guide
No ratings yet
Econometrics I: RStudio Guide
77 pages
This Is The Course Script
No ratings yet
This Is The Course Script
9 pages
R Programming Basics: Vectors, Matrices, Dataframes
No ratings yet
R Programming Basics: Vectors, Matrices, Dataframes
13 pages
17CSU083 Hritik Kounsal R
No ratings yet
17CSU083 Hritik Kounsal R
27 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
Data Science
No ratings yet
Data Science
20 pages
Applied Statistics MAT1011
No ratings yet
Applied Statistics MAT1011
22 pages
Practical 5 2
No ratings yet
Practical 5 2
7 pages
R Programs
No ratings yet
R Programs
30 pages
Multicollinearity and Oaxaca - Tutorial
No ratings yet
Multicollinearity and Oaxaca - Tutorial
35 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
Introduction To R For Gene Expression Data Analysis
No ratings yet
Introduction To R For Gene Expression Data Analysis
11 pages
Min Max Mean SD : Set - Seed
No ratings yet
Min Max Mean SD : Set - Seed
1 page
Random Intercept and Slope Models Using R
No ratings yet
Random Intercept and Slope Models Using R
22 pages
9.ModelBuilding Beamer
No ratings yet
9.ModelBuilding Beamer
209 pages
7.marginal Inference Beamer Post
No ratings yet
7.marginal Inference Beamer Post
138 pages
6.estimation Beamer Post
No ratings yet
6.estimation Beamer Post
101 pages
2.anova Beamer Post
No ratings yet
2.anova Beamer Post
64 pages
Crankshaft Draw
No ratings yet
Crankshaft Draw
1 page
Auto Pneumatic Bumper System
No ratings yet
Auto Pneumatic Bumper System
33 pages
Selection of Pipe Supports
No ratings yet
Selection of Pipe Supports
1 page
Climate Change: Global Impacts & Actions
No ratings yet
Climate Change: Global Impacts & Actions
10 pages
Material Safety Data Sheet: Acetone
No ratings yet
Material Safety Data Sheet: Acetone
5 pages
Fire Safety Cables Catalogue
No ratings yet
Fire Safety Cables Catalogue
36 pages
Pneumothorax Types and Treatments
100% (1)
Pneumothorax Types and Treatments
21 pages
S6061 Series of Rotary Air Damper Actuators
No ratings yet
S6061 Series of Rotary Air Damper Actuators
5 pages
Special-Purpose Steam Turbine Data Sheet
No ratings yet
Special-Purpose Steam Turbine Data Sheet
20 pages
Clinical Hematology Essentials
No ratings yet
Clinical Hematology Essentials
44 pages
Who Runs The World: Data: Editors
No ratings yet
Who Runs The World: Data: Editors
310 pages
Operación Solenoides AX4N-AX4S PDF
No ratings yet
Operación Solenoides AX4N-AX4S PDF
7 pages
2016.verification of Flexural Buckling According To Eurocode 3 Part 1-1 Using Bow Imperfections
No ratings yet
2016.verification of Flexural Buckling According To Eurocode 3 Part 1-1 Using Bow Imperfections
14 pages
Jura Giga 5
No ratings yet
Jura Giga 5
2 pages
Chronic Kidney Disease
No ratings yet
Chronic Kidney Disease
2 pages
WMS-SCAFFOLDING PROCEDURE ETI HVAC Rev.00B
100% (1)
WMS-SCAFFOLDING PROCEDURE ETI HVAC Rev.00B
31 pages
Visual Arts Overview for Art Enthusiasts
No ratings yet
Visual Arts Overview for Art Enthusiasts
3 pages
CDTB - OLR To 4E
No ratings yet
CDTB - OLR To 4E
43 pages
Cre Lab Viva Questions
100% (2)
Cre Lab Viva Questions
5 pages
Uganda Poultry Production Insights
No ratings yet
Uganda Poultry Production Insights
4 pages
Slab Culvert Design Guidelines
83% (111)
Slab Culvert Design Guidelines
17 pages
FNAF Slideshow Template Overview
No ratings yet
FNAF Slideshow Template Overview
49 pages
Logistics Interview Prep Guide
No ratings yet
Logistics Interview Prep Guide
4 pages
United States Design Patent (10) Patent No.: Nagashima
No ratings yet
United States Design Patent (10) Patent No.: Nagashima
8 pages
Orthodontic Removable Appliances Guide
No ratings yet
Orthodontic Removable Appliances Guide
9 pages
Harbour Masters Directions Edition 2 Version1.1 December 2022-1 PDF
No ratings yet
Harbour Masters Directions Edition 2 Version1.1 December 2022-1 PDF
42 pages
ICCT Coastdowns-EU 201605 Fs
No ratings yet
ICCT Coastdowns-EU 201605 Fs
3 pages
Electrical Machine PYQ
No ratings yet
Electrical Machine PYQ
25 pages
Daikin Chiller Selection Guide
No ratings yet
Daikin Chiller Selection Guide
7 pages
Bible Quiz: Synoptic Gospels & Temptations
100% (1)
Bible Quiz: Synoptic Gospels & Temptations
2 pages