Data Structures
Data Structures
• Vector
• Matrix
• Array
• Data Frame
• List
Vector: 1D Array (either row or column)
Matrix: 2D Array (rows, columns)
Array: 3D or more dimensions (rows, columns, planes)
Data Frames
• A data frame is used for storing data tables. It can be
seen as a collection of column-vectors of equal length.
• Columns:
• Typically called variables or fields
• Must have unique column names
• Each column must have a single Data Type, but the
Data Type of each column can be different.
• Rows:
• Typically called observations or records
• Row names are optional, their index tends to be a
numerical sequence.
Differences Between Computer Science and Math
• Coordinates in Databases: (row, col) vs Coordinates in Math: (x, y)
vs
Differences Between Computer Science and Math
• Databases: (3, 2) vs Cartesian: (3, 2)
vs
Different ways to create arrays (1/2)
1. With the combine function “c( )”
w <- c(”a”, ”b”, ”c”, “d“, “e“, ”f”)
2. With the “matrix(data, nrow, ncol)” function
h <- matrix(0, 4, 5)
3. With the “array(data, c(row, col, plane))” function,
z <- array(w, c(3, 2))
z <- array(c(“AMZN”, “AAPL”, “DIS”, “FB”, “MCD”, “TSLA”), c(2, 3))
Different ways to create arrays (2/2)
4. Adding elements to an existing variable in the workspace or
environment
x<-5
x[2]<-6
x[3]<-7
If the variable doesn’t exist already, we will get an error:
z[2] <- “b”
However, we can add non-consecutive elements to an existing variable
and any undefined elements in-between will be NA, for example, the
command below generates NAs in x[4] and x[5]
x[6] <- 10
Other two dimensional objects
Data frames:
table <- data.frame(x, w)
Other useful data structures exist, like
time series “ts( )”
extensible time series “xts( )”
tibbles “tibble()”
Subsetting
• Subsetting (or Slicing): Once an
array is created, its information
can be retrieved either by
calling the variable completely
or by calling a subset of the
array.
Recycling
• Recycling: When performing an
operation with structures of
different sizes, R tries to take the
object of smaller length and
reuse it as necessary to match
the object of greater length. If the
greater length is a multiple of the
smaller length (or if the smaller
length is 1), then R doesn’t send
an error message
Data Types & Data Structures: Troubleshooting
• Functions to confirm / identify data types & data structures:
str( object ) → structure
typeof( object ) → storage mode
class( object ) → class attribute(s)
• These are mainly useful when troubleshooting, since sometimes the
variables types aren’t the ones we assumed and the coercing of some
operations might not work as expected.
Next steps…
Derechos Reservados 2018 Tecnológico de Monterrey
Prohibida la reproducción total o parcial de esta obra sin
expresa autorización del Tecnológico de Monterrey.