Notes
Source, Console, Envi/History, Viewer
setwd("C:/path/folder")
Set working directory
Calculators (+,-,*,/,^)
Objects cannot start with a number
The arrow always pointed to the object
e.g.
A<-69 YES
123->B YES
986<-C NO!
Workspace is where all the data is temporarily stored
save.image(“filename.RData”) saves all data in workspace
save(file= “filename.RData”, list=“object”) saves a specific
object
load(“filename.RData”) or load(“path/filename.RData”) loads
the saved workspace
ls() lists all the objects in workspace
rm(object) removes one object in the workspace
rm(list = ls()) removes all objects in the workspace
Vectors refer to one or more numbers in a one-dimensional
array
Can be created using the c()
function
Can also be created using
sequence of numbers
E.g.
A=c(1,2,3)
B=c(1:3)
Store data as vectors primarily
(Object1=c(1,2,3))
(Object2=c(4,5,6))
(Object3=c(7:9))
Combine them with cbind()
(ObjectGroup=cbind(Object1,Object2,Object3))
Importing datasets
• Install and load the “foreign” package
• To import Stata files, type read.dta(“filename.dta”)
• To import datasets from Epi info, type
read.epiinfo(“filename.rec”)
• To import Sas files, type read.xport(“filename”,
to.data.frame=T)
• To import SPSS files, type read.spss(“filename.sav”,
to.data.frame=T)
Exporting datasets
Important reminder -> always check your working directory
• To export data for R, use function save(object,
file=“filename.RData”)
• To export data as a delimited text, use function
write.table(object,“filename.txt”)
• To export data as csv, use function (object, “filename.csv”)
Editing
data.entry function allows viewing and editing of existing
variables and data frames in R. Upon closing, all changes made
will be automatically saved.
• edit() and fix() functions can also be used to open a simple
window to review the existing data and make necessary
corrections
Desc stats
Quantitative variable
• Use summary(variable) or summary(dataset)to get measures
of central tendency and dispersion
• Use mean(variable), sd(variable), median(variable),
min(variable) and max(variable) if you want to determine the
following values.
Qualitative variable
• Use xtabs(~rowvar + columnvar) to determine frequency for
each category
Tests of Normality
Histogram
hist(x, breaks = #, freq = NULL, main = paste(“Histogram of”),
ylab = yname, xlab = xname, col = (“color”))
Quantile plots (can add color)
Quantile plots assess the distribution of data by plotting the
quantiles of the data to the quantiles of a theoretical
distribution
qqnorm(variable) provides the plot of quantiles of the variable
with quantiles of a normal distribution, while qqline(variable)
returns the same output with the reference line for normal
distribution
Shapiro-Wilk (p<0.05, reject null, not normally
distributed)
One of the most powerful test in determining if a variable
came from a normally distributed population.
HO: The variable is normally distributed
HA: The variable does not follow a normal distribution
Syntax – shapiro.test(variable)
Hypothesis testing
Test for means
• t-test for one mean - t.test(var, mu=#)
• t-test for two independent means ->
t.test (var1, var2, var.equal = T, paired = F) OR
t.test(var1~var2, var.equal = T, paired = F)
• t-test for two dependent means ->
t.test (var1, var2, paired = T) OR t.test(var1~var2, paired = T)
Correlation
• cor(var1, var2)
• cor.test(var1,var2)
Scatterplot
plot(x, y, xlab = “xlabel”, ylab = “ylabel”, xlim = c(#,#), ylim =
c(#,#), main = “Plot Title”)
pairs(~var1+var2+var3, data = object, main = “Scatterplot
Matrix”)
Linear Regression
• linear.object=lm(DV~IV1+IV2+..., data = obj)
• summary(object)
Chi-square test of association
• xtabs(~var1+var2)
• chisq.test(var1, var2)