Course presented at the Congreso Internacional de Estadı́stica
Escuela Superior Politécnica de Chimborazo, Riobamba, Ecuador; October 2019
Introduction to R
Nicholas T. Longford, Imperial College London, United Kingdom
[email protected]1. Basics
2. How R works
3. What is special about R
4. How to be clever in R
5. R for all your computational needs; programming
6. Graphics in R — to explore, to inform, to impress
7. Expanding your computational skills and experience
First steps. R as a calculator
Start: Click on the R icon
Finish: q() and click on Don’t Save, Cancel, or Save
Save — Saving your work (Workspace)
— objects you created in the session
A session:
Your work in R between Start and Finish
A session consists of a sequence of expressions (commands)
which are executed, if possible
missing value — NA — most operations with NA — result is NA
2
The basic rules of R
Syntax — the expression you type has to be interpretable
Execution — R has its rules for what can be executed
Error and warning messages
A comprehensive help system help(help)
Workspace — storing the objects you created
Objects are created by assignment
The syntax of an assignment:
[New name] <- a valid expression
<- — the symbol for assignment
3
The basic rules of R. Example
Assignment A <- 5
Check your workspace ls()
Display the value of A A
Use A in an expression (A + 4)^2 / 107.3
Create a new object using A B <- log((A + 4)^4 + 17)
Remove A from workspace rm(A)
System-defined objects Functions sqrt, exp, round, . . . ,
Constants pi, datasets
User-defined objects A
Apply a function sqrt(A + 5.2)
4
R as a calculator. Scalar operations
The standard rules of calculations:
valid representation of numbers
numbers (or objects) separated by symbols for operations
priority indicated by paired parentheses ( )
parentheses can be used multiply
A valid name (syntax) has to be used for an object used for assignment
alpha-numeric
The value of an object can be over-written
assignment to an object that has already been defined
5
From scalars to vectors
System-defined function to concatenate scalars and vectors
A3B <- c(2, 6, 4.15, π, 1004)
Spaces have no interpretation, except to separate object names and numbers
Use spaces as a ‘cosmetic’ feature of your code
Carriage return indicate the end of an expression
Semicolon ; separates two expressions written on the same line
An expression can be written across more than one line,
if it is incomplete at the end of the line
Try: 17 +
32 - 0.5
6
From scalars to vectors II
Vectors can be concatenated:
A4B <- c(A3B, 107, -16, -0.03, A3B, round(A3B^2, 1) )
Operations on vectors:
length(A4B); names(A4B)
round — see above
Functions help, args, example
Syntax: func(arg1, arg2, ...)
Examples:
help(round); help(seq); args(floor)
7
Vectors generated by the system
Regular sequences: seq, rep
seq(5, 15, length=21)
rep(seq(4), 5)
Random numbers
runif(500, 2, 8)
rnorm, rgamma, rbeta, rchisq, rt, . . .
— use help to learn about them
Changing the order of the elements of a vector:
sort(vec); rev(vec); (rank(vec); order(vec))
Combined with other functions: unique(round(vec))
8
Arguments of a function
Arguments have (symbolic) names, e.g., runif(n, min=0, max=1)
Mandatory argument: You have to specify a value (e.g., for n)
Optional argument: There is a default;
— it can be overruled by your specification,
e.g., runif(n=500, min=1, max=10)
With the names, the arguments can be presented in any order
Without the names, the order of the arguments has to conform the definition
Named and unnamed arguments can be mixed, but it is not a good practice
The arguments may be values, objects, or expressions (evaluations)
9
Type of objects and their attributes
vectors
functions is.vector, as.vector
scalar is a vector (of length 1)
matrices and arrays — matrix, is.matrix, as.matrix
lists — list, is.list, as.list
collections of objects
functions — function, is.function
data frames — data.frame, is.data.frame
user-defined types
10
Numeric, character and logical
Three basic types of variables/values
as.numeric, is.numeric
as.character, is.character
as.logical, is.logical
Coercion — forced change from one type to another
chr <- as.character(c(4, 17.4))
cannot use arithmetic on chr
AB * (CC > 0)
CC > 0 is a logical vector,
but in the numeric operation it is interpreted as 0/1
11
Character and logical functions
character strings (words) are in double quotes, e.g. "word"
word — an object’s name
nchar, substring, paste
Logical values: T, F
Logical operators:
==, &, |, !=, !
character and logical vector
types cannot be mixed in a vector
Try: c(15, "A"); c(4.2, T, 9, F); c(T, F)^2
12
Naming and subsetting
vector AB;
names(AB) <- c("First", "Second", "Third", "Tabasco",
"Quintana Roo", ...)
AB[seq(6)] — the first 6 elements of AB
Subsetting:
by element No.s: ls()[seq(50)]
by names: AB[c("First", "Second")]
by a logical vector (T — include; F — exclude)
by negatives of elem. No.s — elem.s to exclude (e.g. vec[-seq(4)])
!! The subset-vector can be an expression (evaluations)
13
A bit of fun
In a party of 15 unrelated persons, what is the probability
that two persons have birthday on the same day?
Probability that 2 persons, 3 persons have birthdays on distinct days:
364 364 363 364 363 362
, × , × × ,...
365 365 365 365 365 365
1 - prod(seq(365, 365 - 2)) / 365^3
Improvement:
Psz <- 5
1 - prod(seq(365, 365 - Psz + 1)) / 365^Psz
Q. Simultaneous calculation for a sequence of party sizes??
14
Graphics
Histogram:
hist(rnorm(20000, 1.7, 4.2))
Study help(hist)
Additional (optional) arguments:
xlab, ylab, xlim, ylim,
main, sub, . . .
Adding to the histogram:
points, lines, segments, polygon
text, legend
— each function with a vast array of its own arguments
15
Plots
Function plot
plot(vec)
Additional arguments — the same as for hist
— generic arguments for plotting
Plot types:
argument type=, values: "n", "p", "l", "b"
line width: lwd, symbol size: cex, colour: col
The system-defined pallette of colours: colors()
Homework: Study help(plot)
16
Functions
Examples (functions with a single expression):
## Function to count the number of unique elements
LeUni <- function(vec)
length(unique(vec)
## The number of unique elements of a vector
sumNA <- function (vec)
sum(is.na(vec))
## The probability of same-day birthday
BdayPr <- function(k)
1 - prod((365 - seq(k) + 1)/365)
17
Programming
Loops:
for (i in vec)
{
R code (involving i)
}
Conditional loops:
while (condition)
{
R code
}
Example: Iterative algorithms (e.g., GLM)
18
Matrices
— two-dimensional arrays
MAT <- matrix(data=seq(16), nrow=8, ncol=6,
dimnames=seq(8), LETTERS(6))
diag(vec)
Recycling — re-using a vector if necessary
Coercion — the result has elements of the same type
Submatrices:
MAT[, seq(3)]; MAT[c(3, 7, 3, 8), c("C", "E", "A")]
MAT[sort.list(MAT[, 2]), ]
Repetition, using conditions, exclusion, etc.
19
Working with matrices
matrix multiplication — operator %*%
Function apply — apply a function on every row/column of a matrix
apply(MAT, 1, sum) — the vector of row totals
apply(MAT, 2, LeUni) — the vector of column . . .
1/2 — the dimension (margin)
— can use your own function
The result: a vector if the function’s result is a scalar
a matrix if the function’s result is a vector of fixed length
a list if the function’s result has variable length
A matrix is also a vector: MAT[5 + seq(15)]
20
Input and output
Function scan for inputting a vector
library(foreign) — input of data formatted by other packages
read.csv, read.sas, read.dta
Output:
write.csv, write.sas, write.dta
Save one or a set of R objects
save
Recover the objects in a saved file:
load
System-defined datasets stored in R: package datasets
21
Lists
List in R is an indexed collection of objects
list(MAT, letters, LeUni, ...)
The elements of the list may be unrelated
— of different types, with different attributes
Example:
LST <- list() ## An empty list
for (i in seq(10))
LST[[i]] <- seq(i)
Operating on a list:
lapply(LST, sum) ## List of the within-element totals
22
Lists II
as.list --- convert to a list
sapply(LST, sum) ## Turn the result to a vector if possible
The function used in lapply or sapply may even be apply
names(LST) ## list with named elements
LST[[3]] ## 3rd element of the list
LST[seq(2,5)] ## sublist comprising elements 2 -- 5
Example:
ExtrC <- function(mat, cls)
mat[, cls]
lapply(LST, ExtrC, 4) ## The 4th cols pf elem.s of LST
23
Summary (syntax)
( ) priority in evaluation, delimiting the arguments of a function
[ ] subvector, submatrix, or sublist
[[ ]] element of a list
{ } the scope of a function or loop
, separing the arguments of a function
; separating two expression in a line
: (integer) from – to (like seq)
= the arguments of a function
== ‘equal’ as a logical operand
<- assignment (to an object)
+, -, /, *, %*%, ^, sum, prod, |, &, !, !=, <, >, <=, >=, " "
24