Foundations of R Software
Loops, Functions
Command: Window+Shift+S for
copy from anywhere(Picture)
1
Functions, Loops and Control Statements
• Conditional execution: if()
• Syntax
if (condition) {execute commands if condition is TRUE}
• The code in the {} is evaluated if the logical test contained in
the () is TRUE.
• If the logical test is FALSE, R will ignore all of the code in the
{}.
• if() should not be applied when the condition being
evaluated is a vector. It is best used only when meeting a single
element condition.
Example:
>x=5
> if (x > 4) x * 3
[1] 15
>x=3 2
> if (x > 4) x * 3
No response is obtained.
>x = 6
>if(x > 3){ print("The value is more than 3")}
[1] "The value is more than 3“
>x = 2
>if(x > 3){print("The value is more than 3")}
No outcome is obtained.
• Conditional execution: if else()
• Syntax
if (condition) {executes commands if condition is TRUE}
else { executes commands if condition is FALSE }
Please note:
• The condition in this control statement may not be vector valued
and if so, only the first element of the vector is used.
• if else() should not be applied when the condition being
evaluated is a vector. It is best used only when meeting a single
element condition.
• The condition may be a complex expression where the logical
operators "and" (&&) and "or" (||) can be used.
Example:
>x = 5
>if ( x==3 ) { x = x-1 } else { x = 2*x }
>x
[1] 10
Interpretation:
• If x = 3, then execute x = x – 1.
• If x ≠ 3, then execute x = 2*x.
In this case, x = 5, so x ≠ 3. Thus x = 2*5
>x = 3
>if ( x==3 ) { x = x-1 } else { x = 2*x }
>x
[1] 2
Interpretation:
• If x = 3, then execute x = x – 1.
• If x ≠ 3, then execute x = 2*x.
In this case, x = 3. Thus x = 3‐1 = 2
Example :
• Suppose we want to print if a value is more than 3 or less than 3.
>x = 6
>if(x > 3){print("The value is more than 3")
+} else {print("The value is less than 3")}
[1] "The value is more than 3“
• Suppose we want to print if a value is more than 3 or less than 3.
>x = 2
>if(x > 3){print("The value is more than 3")
+} else {print("The value is less than 3")}
[1] "The value is less than 3“
• Conditional execution: Nested if else if()
The if…else…if statement allows to execute a block of code when there are
more than two alternatives.
It extends the earlier if else () condition.
• Syntax
if (condition1) {executes commands if condition1 is TRUE
} else if (condition2) {executes commands if condition2 is TRUE
} else if (condition3) {executes commands if condition3 is TRUE
}
……
else {
executes commands if all conditions are FALSE
}
Example:
>x = 5
>if ( x==3 ) {x = x-1} else if ( x < 3 ) {x = x+5} else { x = 2*x }
>x
[1] 10
Interpretation:
• If x = 3, then execute x = x – 1.
• If x < 3, then execute x = x + 5.
• If x > 3, then execute x = 2*x.
In this case, x = 5, so x > 3. Thus x = 2*5
>x = 2
>if ( x==3 ) {x = x-1} else if ( x < 3 ) {x = x+5} else { x = 2*x }
>x
[1] 7
Interpretation:
• If x = 3, then execute x = x – 1.
• If x < 3, then execute x = x + 5.
• If x > 3, then execute x = 2*x.
In this case, x = 2, so x < 3. Thus x = 2+5
Example :
>x = 3
>if ( x==3 ) {x = x-1} else if ( x < 3 ) {x = x+5} else { x = 2*x }
>x
[1] 2
Interpretation:
• If x = 3, then execute x = x – 1.
• If x < 3, then execute x = x + 5.
• If x > 3, then execute x = 2*x.
In this case, x = 3. Thus x = 3‐1
Conditional execution: ifelse()
• Syntax
ifelse(test, yes, no)
• Vector‐valued evaluation of conditions .
• For the components in the vector‐valued logical expression test
which provide the value TRUE, the operations given by yes are
executed.
• For the components in the vector‐valued logical expression test
which provide the value FALSE, the operations given by no are
executed.
Example:
> x = 1:10
>x
[1] 1 2 3 4 5 6 7 8 9 10
> ifelse( x<6, x^2, x+1 )
[1] 1 4 9 16 25 7 8 9 10 11
Interpretation
• If x < 6 (TRUE), then x = x2 (YES) .
• If x ≥ 6 (FALSE), then x = x + 1 (NO).
• So for x = 1, 2, 3, 4, 5, we get x = x^2=1, 4, 9, 16, 25
• For x=6, 7, 8, 9, 10, we get x= x+1 = 7, 8, 9, 10, 11
Example :
>x = c(7,9,8,4)
>ifelse(x %% 2 == 0,"even number","odd number")
[1] "odd number" "odd number" "even number" "even number"
[%%: Modulo Division‐ Finds the remainder after division of one number
by another.]
Interpretation:
If the remainder of x divided by 2
• is 0, then print ”even number” (YES) .
• is not equal to 0, then print ”odd number” (NO).
So for x = 7, 9, we get x = "odd number" and
for x=8, 4, we get x= "even number"
Switch Command
• switch is a substitute for long if statements that compare a variable to
several integral values.
• switch is a multiway branch statement.
• switch tests a variable for equality against a list of values.
• switch map and search over a list of values.
• If there are more than one matches for any given value, then
• switch returns the first matched value.
• switch(expr, case1, case2,....)
• switch evaluates expr and accordingly chooses one of the further
arguments (in ...).
• expr: an expression evaluating to a number or a character string.
switch command allows a variable to be checked for equality against a
list of values or cases.
• A character string expression always matched to the listed cases.
• A non character string expression is coerced to integer.
• For multiple matches, the first match element will be used.
Examples: switch () function used as an integer
> switch(2,"apple", "banana", "orange")
[1] "banana"
> switch(1,"apple", "banana", "orange")
[1] "apple“
• switch () function used as a string as well. The matching named
item’s value is returned.
> switch("colour", "colour" = "blue", "gender" ="male", "volume" = 50)
[1] "blue"
> switch("volume", "colour" = "blue", "gender" ="male", "volume" = 50)
[1] 50
• In the case of no match, if there is a unnamed element of ... its
value is returned.
> switch(4,"apple", "banana", "orange")
No outcome
> switch("size", "colour" = "blue", "gender" = "male", "volume" = 50)
No outcome
Some functions useful in conditional execution: which()
• The which() function returns the position of the elements in a logical
vector which are TRUE.
• Give the TRUE indices of a logical object, allowing for array indices.
which(x, arr.ind, useNames)
x: Specified input logical vector
arr.ind: logical, returns the array indices if x is an array.
useNames: logical, says the dimension names of an array.
Example :
> x = c(10,15,8,14,6,12)
>x
[1] 10 15 8 14 6 12
> which(x == 14)
[1] 4
> which(x != 12)
[1] 1 2 3 4 5
> which(x > 10)
[1] 2 4 6
> x = matrix(nrow=3, ncol=3, data=1:9)
>x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> which.min(x) #find which is the minimium value
[1] 1
> which.max(x) #find which is the maximium value
[1] 9
> x = matrix(nrow=3, ncol=3, data=1:9)
>x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> which(x %% 2 == 1)
[1] 1 3 5 7 9
> which(x %% 2 == 1, arr.ind = TRUE)
row col
[1,] 1 1
[2,] 3 1
[3,] 2 2
[4,] 1 3
[5,] 3 3
Control structures in R : Loops
Repetitive commands are executed by loops
• for loop
• while loop
• repeat loop
1. The for loop
If the number of repetitions is known in advance (e.g. if all
commands have to be executed for all cases i = 1,2,...,n in the data),
a for( ) loop can be used
Syntax:
for (name in vector) {commands to be executed}
• A variable with name name is sequentially set to all values, which
contained in the vector.
• All operations/commands are executed for all these values.
Example:
> for ( i in 1:5 ) { print( i^2 ) }
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
Note: print is a function to print the argument
> for ( i in c(2,4,6,7) ) { print( i^2 ) }
[1] 4
[1] 16
[1] 36
[1] 49
>x = c(2,4,6,8,10,12)
>excount = function(x){
>count = 0
>for (xval in x) {
>if(xval/2 > 3)
>count = count+1
>}
>print(count)
>}
>excount(x)
[1] 3
Nested looping with for loop:
We can have a loop inside a loop.
Example :
>child = c("child1", "child2", "child3")
>sweet = c("sweet1", "sweet2", "sweet3")
> for (x in child) {
+ for (y in sweet) {
+ print(paste(x, y))
+}
+}
[1] "child1 sweet1"
[1] "child1 sweet2"
[1] "child1 sweet3"
[1] "child2 sweet1"
[1] "child2 sweet2"
[1] "child2 sweet3"
[1] "child3 sweet1"
[1] "child3 sweet2"
[1] "child3 sweet3“
The break command:
Using the break command, we can stop the loop before it has
looped through all the items.
Ex:
>drink = c("coffee", "lemonade", "tea", "juice")
>for (x in drink) {
+if (x == "tea") {
+break
+}
+print(x)
+}
[1] "coffee"
[1] "lemonade"
Example: Hello World Program
The next command:
Using the next command, we can skip an iteration without terminating
the loop.
Suppose we want to skip lemonade.
>drink = c("coffee", "lemonade", "tea", "juice")
>for (x in drink) {
+if (x == "lemonade") {
+next
+}
+print(x)
+}
[1] "coffee"
[1] "tea"
[1] "juice"
for (x in drink) {
if (x == "tea") {
next
}
print(x)
}
[1] "coffee"
[1] "lemonade"
[1] "juice"
2
2. The while loop
• If the number of loops is not known in before,
• e.g. when an iterative algorithm to maximize a likelihood
function is used, one can use a while() loop.
Syntax:
while(condition){ commands to be executed as long as
condition is TRUE }
• If the condition is not true before entering the loop, no
commands within the loop are executed.
3
Example :
>i = 1
>while (i < 10) {
+ print(i^2)
+ i = i+2
+}
[1] 1
[1] 9
[1] 25
[1] 49
[1] 81
13
>sumfunction = function(){
+sum = 0
+number = as.integer(readline(prompt="Please
+select any number less than 25: "))
+while (number <= 25) {
+sum = sum + number
+number = number + 1 }
+print(paste("The sum of numbers received from
+the While Loop: ", sum))
+}
1
> sumfunction()
Please select any number less than 25 : 22
[1] "The sum of numbers received from the
While Loop: 94"
> 22+23+24+25
[1] 94
9
3. The repeat loop
• The repeat loop doesn’t test any condition — in contrast to the
while() loop — before entering the loop and also not during the
execution of the loop.
• Again, the programmer is responsible that the loop terminates
after the appropriate number of iterations. For this the break
command can be used.
Syntax:
repeat{ commands to be executed }
14
>i = 1
>repeat{
+print( i^2 )
+ i = i+2
+if ( i > 10 )
+break
+}
[1] 1
[1] 9
[1] 25
[1] 49
[1] 81
• Additionally, the command next is available, to return to the
beginning of the loop (to return to the first command in the
loop).
>i = 1
>repeat{
+ i = i+1
+ if (i < 10) next
+ print(i^2)
+if (i >= 13) break
+}
[1] 100
[1] 121
[1] 144
[1] 169
1
Command Line versus Scripts
Functions
• Functions are a bunch of commands grouped together in a sensible
unit
• Functions take input arguments, do calculations (or make some
graphics, call other functions) and produce some output and return a
result in a variable. The returned variable can be a complex construct,
like a list
Functions : Components
Function Name − This is the name of the function which is stored
as an object with this name.
Arguments − An argument contains the input values. A function
may contain no arguments.
Function Body − Contains statements that defines what the
function has to do.
Return Value − The evaluated value of the last expression in the
6
function body.
Functions : Built‐in functions
R has Built‐in functions also.
Simple examples of in‐built functions are
sum(), prod(), mean(), max(), sum(x) etc.
• Functions
Syntax
Name <- function(Argument1, Argument2, ...)
{
expression(s)
}
where expression(s) is a single command or a group of commands.
You can use = operator also.
Name = function(Argument1, Argument2, ...)
{
expression(s)
7
}
Function arguments with description and default values
• Function arguments can be given a meaningful name
• Function arguments can be set to default values
• Functions can have the special argument ’...’
• Functions (Single variable)
• The sign = or <‐ is furthermore used for defining functions:
> abc = function(x){
x^2
}
• A function is called with argument as
> abc(3)
[1] 9
> abc(6)
[1] 36
> abc(9)
[1] 81
Functions (Two variables)
> abc = function(x,y){
x^2+y^2
> abc(3,4)
[1] 25
> abc(10, 10)
[1] 200
> abc(-2, -3)
[1] 13
Functions in two variables
> abc=function(x,y){
+ for(i in 1:x){
+ for(j in 1:y){
+ z=i^2+j^2
+ print(z)
+}
+}
+}
> abc(4,4)
[1] 2
[1] 5
[1] 10
[1] 17
[1] 5
[1] 8
[1] 13
[1] 20
[1] 10
[1] 13
[1] 18
Functions in two variables
abc=function(x,y){
A=matrix(nrow=x,ncol=y)
for(i in 1:x){
for(j in 1:y){
z=i^2+j^2
A[i,j]=z
}
}
print(A)
}
> abc(4,4)
[,1] [,2] [,3] [,4]
[1,] 2 5 10 17
[2,] 5 8 13 20
[3,] 10 13 18 25
[4,] 17 20 25 32
Functions (Two variables)
> abc = function(x){
+ sin(x)^2+cos(x)^2 + x
+}
> abc(9)
[1] 10
> abc(99)
[1] 100
> abc(-15)
[1] -14
Calling a function without an Argument
abc = function() {
for(i in 1:3) {
print(i^3)
> abc()
[1] 1
[1] 8
[1] 27
cat()
• The cat() function is a versatile tool in R for concatenating and
printing objects.
• Unlike print(), it is optimized for outputting multiple variables on
the same line, making it a preferred choice for many R
programmers.
Syntax
cat(objects) involves listing the objects you want to print,
separated by commas.
Ex: >cat("Hello", "World", "\n")
[1] “Hello World”
• To print multiple variables, simply include them in
the cat() function:
Ex: >a <- 5
>b <- 10
cat("Values:", a, b, "\n")
[1] Values: 5 10
Incorporating Text and Variables
You can mix text and variables in a single cat() call:
Ex: >name <- "Alice"
>age <- 30
> cat("Name:", name, "- Age:", age, "\n")
[1] Name: Alice - Age: 30
Using cat() in Loops
cat() is particularly useful in loops for printing dynamic content:
>for (i in 1:3) { cat("Iteration:", i, "\n") }
[1] Iteration: 1
Iteration: 2
Iteration: 3
Advanced Formatting
For more control over formatting, combine cat() with sprintf():
Ex:>pi_value <- 3.14159
>cat(sprintf("Pi to two decimal places: %.2f\n", pi_value))
[1] pi to two decimal places: 3.14
sprintf() Function:
• In R Programming Language we can create formatted strings
using various methods, including the sprintf() function, the
paste() function.
• Formatted strings allow us to insert of variables and values into
a string while controlling their formatting.
Syntax:
sprintf(format, values)
where
format: Format of printing the values
values: to be passed into format
Ex: >x1 <- "Welcome"
>x2 <- "Geeks for Geeks“
>sprintf("% s to % s", x1, x2)
[1] "Welcome to Geeks for Geeks“
Interpretation:The variables x1 and x2 are inserted into these
placeholders, resulting in the formatted string “Welcome to
GeeksforGeeks”. sprintf() replaces the placeholders with the values
of x1 and x2, allowing for dynamic and structured string
formatting.
Ex: >x1 <- "GeeksforGeeks"
>x2 <- 100
x3 <- "success"
>sprintf("% s gives %.0f percent % s", x1, x2, x3)
[1] "GeeksforGeeks gives 100 percent success“
>mean_value <- 35.68
>standard_deviation <- 7.42
# Create a formatted string to display the results
formatted_string <- sprintf("The mean is %.2f, and the standard
deviation is %.2f", mean_value, standard_deviation)
# Print the formatted string
cat(formatted_string)
[1] The mean is 35.68, and the standard deviation is 7.42
paste() Function
• the paste() function is used for concatenating vectors after
converting them to character mode.
• It allows for specifying a separator between the elements being
concatenated.
Ex: x <- c("Hello", "World", "!")
> paste(x)
[1] "Hello" "World" "!"
>x <- c("Hello", "World", "!")
>paste(x, collapse = ", ")
[1] "Hello, World, !"
Difference between cat() Function and paste() Function
Feature cat() paste()
Output Handling Directly prints Returns concatenated
concatenated strings string as a single
character vector
Default Separator No separator added Space (" ") added
between elements by
default
Vector Handling Concatenates Concatenates
elements within a multiple vectors
single vector simultaneously
Output Control Less flexibility More flexibility with
additional arguments
like collapse
Data frames
• Data Frames are data displayed in a format as a table.
• Data Frames can have different types of data inside it. While the
first column can be character, the second and third can
be numeric or logical.
• However, each column should have the same type of data.
• Use the data.frame() function to create a data frame
Ex:>Data_Frame <- data.frame (
+Training = c("Strength", "Stamina", "Other"),
+Pulse = c(100, 150, 120),
+Duration = c(60, 30, 45))
>Data_Frame
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
> typeof(Data_Frame)
[1] "list"
> class(Data_Frame)
Data frames
> summary(Data_Frame)
Training Pulse Duration
Length:3 Min. :100.0 Min. :30.0
Class :character 1st Qu.:110.0 1st Qu.:37.5
Mode :character Median :120.0 Median :45.0
Mean :123.3 Mean :45.0
3rd Qu.:135.0 3rd Qu.:52.5
Max. :150.0 Max. :60.0
> Data_Frame[1] # first column as column elements
Training
1 Strength
2 Stamina
3 Other
> Data_Frame[["Training"]] # first column as row elements
[1] "Strength" "Stamina" "Other"
> Data_Frame$Training # first column as row elements
[1] "Strength" "Stamina" "Other"
Data frames
# Print the new row
> New_row_DF <- rbind(Data_Frame, c("Strength", 110, 110))
> New_row_DF
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
4 Strength 110 110
# Print the new column
> New_col_DF<-cbind(New_row_DF, Steps
+=c(1000, 6000, 2000,356))
> New_col_DF
Training Pulse Duration Steps
1 Strength 100 60 1000
2 Stamina 150 30 6000
3 Other 120 45 2000
4 Strength 110 110 356
Data frames
# Print the new row
>Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45))
# Remove the first row and column
Data_Frame_New <- Data_Frame[-c(1), -c(1)]
>Data_Frame_New # Print the new data frame
Pulse Duration
2 150 30
3 120 45
>> dim(Data_Frame)
[1] 3
> ncol(Data_Frame)
[1] 3
> nrow(Data_Frame)
[1] 3
> length(Data_Frame)
[1] 3
Data frames
>Data_Frame1 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
>Data_Frame2 <- data.frame (
Training = c("Stamina", "Stamina", "Strength"),
Pulse = c(140, 150, 160),
Duration = c(30, 30, 20)
)
>New_Data_Frame <- rbind(Data_Frame1, Data_Frame2)
>New_Data_Frame
>New_Data_Frame <- Cbind(Data_Frame1, Data_Frame2)
>New_Data_Frame
Data frames
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
4 Stamina 140 30
5 Stamina 150 30
6 Strength 160 20
>Data_Frame3 <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
>Data_Frame4 <- data.frame (
Steps = c(3000, 6000, 2000),
Calories = c(300, 400, 300)
)
>New_Data_Frame1 <- cbind(Data_Frame3, Data_Frame4)
>New_Data_Frame1
Data frames
Training Pulse Duration Steps Calories
1 Strength 100 60 3000 300
2 Stamina 150 30 6000 400
3 Other 120 45 2000 300
>Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# Remove the first row and column
>Data_Frame_New <- Data_Frame[-c(1), -c(1)]
Pulse Duration
2 150 30
3 120 45
>dim(Data_Frame_New)
[1] 2 2
> ncol(Data_Frame_New)
[1] 2
> nrow(Data_Frame_New)
[1] 2
>length(Data_Frame) #number of columns in a Data Frame
[1] 3
# creating a data frame
>friend.data <- data.frame( friend_id = c(1:5), friend_name =
c("Sachin", "Sourav", "Dravid", "Sehwag", "Dhoni"),
stringsAsFactors = FALSE )
# print the data frame
>print(friend.data)
friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni
*We can get the structure of the R data frame using str() function
in R.
>print(str(friend.data))
'data.frame': 5 obs. of 2 variables:
$ friend_id : int 1 2 3 4 5
$ friend_name: chr "Sachin" "Sourav" "Dravid" "Sehwag" ...
NULL
• Extract Data from Data Frame in R
Extracting data from an R data frame means that to access its rows
or columns. One can extract a specific column from an R data
frame using its column name.
# Extracting friend_name column
>result <- data.frame(friend.data$friend_name)
>print(result)
friend.data. friend_name
1 Sachin
2 Sourav
3 Dravid
4 Sehwag
5 Dhoni
# Expanding data frame
>friend.data$location <- c("Kolkata", "Delhi", "Bangalore",
"Hyderabad", "Chennai")
>resultant <- friend.data
# print the modified data frame
>print(resultant)
• friend_id friend_name location
1 1 Sachin Kolkata
2 2 Sourav Delhi
3 3 Dravid Bangalore
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai
• # Access Items using []
>friend.data[1]
# Access Items using [[]]
>friend.data[['friend_name']]
# Access Items using $
>friend.data$friend_id
# Remove a row with friend_id = 3
>data <- subset(data, friend_id != 3)
>data
friend_id friend_name location
1 1 Sachin Kolkata
2 2 Sourav Delhi
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai
# Remove the 'location' column
>data <- select(data, -location)
>data
friend_id friend_name
1 1 Sachin
2 2 Sourav
4 4 Sehwag
5 5 Dhoni
Sequences
A sequence is a set of related numbers, events, movements, or
items that follow each other in a particular order.
The regular sequences can be generated in R.
Syntax
seq()
seq(from = -1, to = 1,
by = ((to - from)/(length.out - 1)),
length.out = NULL, along.with = NULL, ...)
• Help for seq
>help("seq")
• The default increment is +1 or −1
> seq(from=2, to=4)
[1] 2 3 4
> seq(from=4, to=2)
[1] 4 3 2
> seq(from=-4, to=4)
[1] -4 -3 -2 -1 0 1 2 3 4
• Sequence with constant increment:
• Generate a sequence from 10 to 20 with an increment of 2 units
> seq(from=10, to=20, by=2)
[1] 10 12 14 16 18 20
• Sequence with constant decrement:
• Generate a sequence from 20 to 10 with an decrement of 2 units
> seq(from=20, to=10, by=-2)
[1] 20 18 16 14 12 10
• Downstream sequence with constant decrement:
• Generate a sequence from 3 to ‐2 with a decrement of 0.5 units
> seq(from=3, to=-2, by=-0.5)
[1] 3.0 2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2
• Sequences with a predefined length with default increment +1
> seq(to=10, length=10)
[1] 1 2 3 4 5 6 7 8 9 10
• Sequences with a predefined length with default increment +1
> seq(from=10, length=10)
[1] 10 11 12 13 14 15 16 17 18 19
• Sequences with a predefined length with constant fractional
increment
• > seq(from=10, length=10, by=0.1)
• [1] 10.0 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9
• Sequences with a predefined length with constant decrement
> seq(from=10, length=10, by=-2)
[1] 10 8 6 4 2 0 -2 -4 -6 -8
• Sequences with a predefined length with constant fractional
decrement
> seq(from=10, length=5, by=-.2)
[1] 10.0 9.8 9.6 9.4 9.2
Sequences:
• Continuous sequences with constant unit increment and
decrement
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> 10:1
[1] 10 9 8 7 6 5 4 3 2 1
> 5:15
[1] 5 6 7 8 9 10 11 12 13 14 15
> 15:5
[1] 15 14 13 12 11 10 9 8 7 6 5
> -1:-10
[1] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
> -10:-1
[1] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
> -5:-15
[1] -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15
> -15:-5
[1] -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5
> 1.23:10
[1] 1.23 2.23 3.23 4.23 5.23 6.23 7.23 8.23 9.23
> 1.23:10.54
[1] 1.23 2.23 3.23 4.23 5.23 6.23 7.23 8.23 9.23
10.23
> 10.54:2.23
[1] 10.54 9.54 8.54 7.54 6.54 5.54 4.54
3.54 2.54
> -1.23:-10
[1] -1.23 -2.23 -3.23 -4.23 -5.23 -6.23 -7.23 -
8.23 -9.23
> -5.23:6
[1] -5.23 -4.23 -3.23 -2.23 -1.23 -0.23 0.77
1.77 2.77 3.77 4.77 5.77
> seq(10)
[1] 1 2 3 4 5 6 7 8 9 10
# is the same as
> seq(1:10)
[1] 1 2 3 4 5 6 7 8 9 10
> x=2
> seq(1, x, x/10)
[1] 1.0 1.2 1.4 1.6 1.8 2.0
> x=50
> seq(0, x, x/10)
[1] 0 5 10 15 20 25 30 35 40 45 50
# Outcome of sequences can be stored.
>x = seq(1, 50, 1/2)
>x
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0
[18] 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0
16.5 17.0 17.5
[35] 18.0 18.5 19.0 19.5 20.0 20.5 21.0 21.5 22.0 22.5 23.0 23.5 24.0 24.5
25.0 25.5 26.0
[52] 26.5 27.0 27.5 28.0 28.5 29.0 29.5 30.0 30.5 31.0 31.5 32.0 32.5 33.0
33.5 34.0 34.5
[69] 35.0 35.5 36.0 36.5 37.0 37.5 38.0 38.5 39.0 39.5 40.0 40.5 41.0 41.5
42.0 42.5 43.0
[86] 43.5 44.0 44.5 45.0 45.5 46.0 46.5 47.0 47.5 48.0 48.5 49.0 49.5 50.0
>y=2*x
>y
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[22] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
[43] 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
[64] 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
[85] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
Sequences: Index vector
• Assignment of an index‐vector
>x = c(9,8,7,6)
>x
[1] 9,8,7,6
>ind = seq(along=x)
>ind
[1] 1 2 3 4
• Accessing a value in the vector through index vector
• Accessing an element of an index‐vector
> x[ ind[2] ]
[1] 8
Generating sequences of dates
• Generating current time and date Sys.time() command provides the
current time and date from the computer system.
> Sys.time()
[1] "2021-11-29 21:23:57 IST"
• Sys.Date() command provides the current date from the computer
system.
> Sys.Date()
[1] "2021-11-29"
Generating sequences of dates
Usage
seq(from, to, by, length.out = NULL, along.with = NULL, ...)
Arguments
• from starting date (Required)
• to end date (Optional)
• by increment of the sequence. "day", "week", "month", "quarter" or
"year".
• length.out integer, optional. Desired length of the sequence.
• along.with take the length from the length of this argument.
• Sequence of years
Ex: > seq(as.Date("2010-01-01"), as.Date("2017-01-01"), by = "years")
>[1] "2010-01-01" "2011-01-01" "2012-01-01" "2013-01-01"
[5] "2014-01-01" "2015-01-01" "2016-01-01" "2017-01-01“
• Sequence of days
> seq(as.Date("2017-01-01"), by = "days",
length = 6)
[1] "2017-01-01" "2017-01-02" "2017-01-03" "2017-01-04"
[5] "2017-01-05" "2017-01-06"
Sequence of months
> seq(as.Date("2017-01-01"), by = "months", length = 6)
>[1] "2017-01-01" "2017-02-01" "2017-03-01" "2017-04-01"
[5] "2017-05-01" "2017-06-01“
Sequence by years
> seq(as.Date("2017-01-01"), by = "years", length = 6)
>[1] "2017-01-01" "2018-01-01" "2019-01-01" "2020-01-01"
[5] "2021-01-01" "2022-01-01“
To find sequence with defining start and end dates
> startdate = as.Date("2016-1-1")
> enddate = as.Date("2017-1-1")
> out = seq(enddate, startdate, by = "-1 month")
[1] "2017-01-01" "2016-12-01" "2016-11-01" "2016-10-01"
[5] "2016-09-01" "2016-08-01" "2016-07-01" "2016-06-01“
[9] "2016-05-01" "2016-04-01" "2016-03-01" "2016-02-01"
[13] "2016-01-01"
Sequence of months
letters is used to find sequence of lowercase alphabets
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"
[15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z“
letters[from_index:to_index] is used to find sequence of
lowercase alphabets from a particular index to a specified index.
> letters[1:3]
[1] "a" "b" "c"
> letters[3:1]
[1] "c" "b" "a"
> letters[21:23]
[1] "u" "v" "w“
> letters[2]
[1] "b“
Sequence of months
LETTERS is used to find sequence of uppercase alphabets
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N"
[15] "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z“
LETTERS[from_index:to_index] is used to find sequence of
uppercase alphabets from a particular index to a specified index.
> LETTERS[1:3]
[1] "A" "B" "C"
> LETTERS[3:1]
[1] "C" "B" "A"
> LETTERS[21:23]
[1] "U" "V" "W“
> LETTERS[2]
[1] "B"
Repeats:
• The rep() function replicates numeric values, or text, or the values
of a vector for a specific number of times. Command rep is used to
replicates the values in a vector.
• Syntax rep(x) replicates the values in a vector x.
• rep(x, times=n) # Repeat x as a whole n times
• rep(x, each=n) # Repeat each cell n times
• Following commands repeat each cell for the desired length
of the output vector
• rep(x, length.out=n)
• rep(x, length=n)
• rep_len(x, length.out)
Help for the command rep
> help("rep")
Repeat an object n−times:
> rep(3.5, times=10)
[1] 3.5 3.5 3.5 3.5 3.5 3.5 3.5 3.5 3.5 3.5
> rep(1:4, 2)
[1] 1 2 3 4 1 2 3 4
Repeat an object n−times:
• rep(x, times = n)
Repeat each cell n−times:
• rep(x, each = n)
> x = 1:4
>x
[1] 1 2 3 4
> rep(x, times = 3)
[1] 1 2 3 4 1 2 3 4 1 2 3 4
> rep(x, each = 3)
[1] 1 1 1 2 2 2 3 3 3 4 4 4
Every object is repeated several times successively:
> rep(1:4, each = 2)
[1] 1 1 2 2 3 3 4 4
> rep(1:4, each = 2, times = 3)
[1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
> rep(1:4, times = 3, each = 2)
[1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4
• Observe the effect of ordering of each and times.
• Every object is repeated a different number of times:
> rep(1:4, 2:5)
[1] 1 1 2 2 2 3 3 3 3 4 4 4 4 4
• Every object is repeated a different number of times:
> ans = seq(from=2, to=8, by=2)
> ans
[1] 2 4 6 8
> rep(1:4, ans)
[1] 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 4 4
> x = matrix(nrow=2, ncol=2, data=1:4, byrow=T)
>x
[,1] [,2]
[1,] 1 2
[2,] 3 4
> rep(x, 3)
[1] 1 3 2 4 1 3 2 4 1 3 2 4
• Repetition of characters
> rep(c("a", "b", "c"), 2)
[1] "a" "b" "c" "a" "b" "c"
> rep(c("apple", "banana", "cake"), 2)
[1] "apple" "banana" "cake" "apple" "banana" "cake "
Repetition of characters for pre‐specified length
> rep(2, length.out=5)
[1] 2 2 2 2 2
> rep(2, length=5)
[1] 2 2 2 2 2
> rep(c(2,3), length=5)
[1] 2 3 2 3 2
> rep(c(2,3,4), length=5)
[1] 2 3 4 2 3
• Repetition of characters for pre‐specified length
> rep("apple", length=5)
[1] "apple" "apple" "apple" "apple" "apple"
> rep(c("a", "b", "c"), length=2)
[1] "a" "b"
> rep(c("a", "b", "c"), length=5)
[1] "a" "b" "c" "a" "b"
Sorting:
sort function sorts the values of a vector in ascending order (by
default) or descending order.
Syntax
sort(x, decreasing = FALSE, ..,)
sort(x, decreasing = FALSE, na.last = NA, ...)
x Vector of values to be sorted
decreasing Should the sort be increasing or decreasing
na.last for controlling the treatment of NAs.
If TRUE, missing values in the data are put last;
if FALSE, they are put first;
if NA, they are removed.
Example: > y = c(8,5,7,6)
>y
[1] 8 5 7 6
> sort(y)
[1] 5 6 7 8
> sort(y, decreasing = TRUE)
[1] 8 7 6 5
Example: > y = c(9,8,5,7,6)
>y
[1] 9 8 5 7 6
> order(y)
[1] 3 5 4 2 1
> order(y, decreasing = TRUE)
[1] 1 2 4 5 3
Mode:
Every object has a mode.
The mode indicates how the object is stored in memory: as a number,
• character string,
• list of pointers to other objects,
• function etc.
mode function gives us such information.
Syntax:
mode()
Example:> mode(2.432)
[1] "numeric"
> mode(c(3,4,5,6,7,8))
[1] "numeric"
Example: > y = c(9,8,5,7,6)
> mode("India")
[1] "character"
> mode(c("India", "CANADA"))
[1] "character“
> mode(factor(c("UP", "MP")) )
[1] "numeric"
> mode(list("India", "USA"))
[1] "list"
> mode(data.frame(x=1:2, y=c("India", "USA")))
[1] "list"
> mode(print)
[1] "function“
Lists:
• Vectors, matrices, and arrays is that each of these types of objects
may only contain one type of data.
• For example, a vector may contain all numeric data or all character
data.
• A list is a special type of object that can contain data of multiple
types.
• Lists are characterized by the fact that their elements do not need
to be of the same object type.
• Lists can contain elements of different types so that the list
elements may have different modes.
Lists:
• Lists can even contain other structured objects, such as lists and
data frames which allows to create recursive data structures.
• Lists can be indexed by position.
• So x[[5]] refers to the fifth element of x.
Lists can extract sublists.
So x[c(2,5)] is a sublist of x that consists of the second and
fifth elements.
List elements can have names.
Both x[["Students"]] and x$Students refer to the element named
"Students" .
Difference between a vector and a list :
• In a vector, all elements must have the same mode.
• In a list, the elements can have different modes.
Example:
> x1 = matrix(nrow=2, ncol=2, data=1:4, byrow=T)
> x2 = matrix(nrow=2, ncol=2, data=5:8, byrow=T)
> x1
[,1] [,2]
[1,] 1 2
[2,] 3 4
> x2
[,1] [,2]
[1,] 5 6
[2,] 7 8
> x1+x2
[,1] [,2]
[1,] 6 8
[2,] 10 12
> x1[2,1] = "hello"
> x1
[,1] [,2]
[1,] "1“ "2"
[2,] "hello" "4"
> x1 + x2
Error in x1 + x2 : non-numeric argument to binary operator
• Lists can contain any kind of objects as well as objects of different
types. For example, lists can contain matrices as objects:
Example:> x1 = matrix(nrow=2, ncol=2, data=1:4, byrow=T)
> x2 = matrix(nrow=2, ncol=2, data=5:8, byrow=T)
> x1
[,1] [,2]
[1,] 1 2
[2,] 3 4
> x2
[,1] [,2]
[1,] 5 6
[2,] 7 8
> matlist = list(x1, x2)
> matlist
[[1]]
[,1] [,2]
[1,] 1 2
[2,] 3 4
[[2]]
[,1] [,2]
[1,] 5 6
[2,] 7 8
> matlist[1]
[[1]]
[,1] [,2]
[1,] 1 2
[2,] 3 4
> matlist[2]
[[1]]
[,1] [,2]
[1,] 5 6
[2,] 7 8
• An example of a list that contains different object types:
> z1 = list( c("water", "juice", "lemonade"),
rep(1:4, each=2), matrix(data=5:8, nrow=2,
ncol=2, byrow=T) )
> z1
[[1]]
[1] "water" "juice" "lemonade"
[[2]]
[1] 1 1 2 2 3 3 4 4
[[3]]
[,1] [,2]
[1,] 5 6
[2,] 7 8
• Access the elements of a list using the operator [[]]
Following commands work.
> z1[[1]]
[1] "water" "juice" "lemonade"
• Suppose we want to extract "juice". The command
> z1[1][2] # Notice the positions of brackets
[[1]] NULL
• returns NULL instead of "juice", while
> z1[[1]][2] # Notice the positions of brackets
[1] "juice"
• finally returns the desired result.
Factors:
Factors are used to categorize data.
Examples :
Demography: Male/Female
Music: Rock, Pop, Classic, Jazz
Training: Strength, Stamina
Command: factor()
# Create a factor
>music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"
))
# Print the factor
>music_genre
[1] Jazz Rock Classic Classic Pop Jazz Rock Jazz
Levels: Classic Jazz Pop Rock
Factors:
• To only print the levels, use the levels() function:
Example:
>music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"
))
>levels(music_genre)
[1] "Classic" "Jazz" "Pop" "Rock"
• You can also set the levels, by adding the levels argument inside
the factor() function:
>music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"
), levels = c("Classic", "Jazz", "Pop", "Rock", "Other"))
>levels(music_genre)
[1] "Classic" "Jazz" "Pop" "Rock" "Other"
Access Factors:
To access the items in a factor, refer to the index number, using
[ ] brackets:
Example:
>music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"
))
>music_genre[3]
[1] Classic
Levels: Classic Jazz Pop Rock
Change Item Value:
To change the value of a specific item, refer to the index number:
>music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"
))
>music_genre[3] <- "Pop"
>music_genre[3]
[1] Pop
Levels: Classic Jazz Pop Rock
Change Item Value:
Note that you cannot change the value of a specific item if it is not
already specified in the factor. The following example will produce an
error:
Example: Trying to change the value of the third item ("Classic") to an
item that does not exist/not predefined ("Opera"):
>music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"
))
>music_genre[3]<-"Opera"
>music_genre[3]
Warning message: In `[<-.factor`(`*tmp*`, 3, value = "Opera") : invalid
factor level, NA generated
Change Item Value:
However, if you have already specified it inside the levels argument, it
will work:
Example: music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"
), levels = c("Classic", "Jazz", "Pop", "Rock", "Opera"))
>music_genre[3] <- "Opera"
>music_genre[3]
[1] Opera
Levels: Classic Jazz Pop Rock Opera
Example:
>students_gender <- factor(c("male", "female", "male", "transgender",
"female"))
>print(students_gender[1])
>print(students_gender[4])
[1] male
Levels: female male transgender
[1] transgender
Levels: female male transgender
Example:
>x1 <- c("Dec", "Apr", "Jan", "Mar")
>x2 <- c("Dec", "Apr", "Jam", "Mar")
>sort(x1)
[1] "Apr" "Dec" "Jan" "Mar“
>month_levels <- c(
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
>y1 <- factor(x1, levels = month_levels)
>y1
[1] Dec Apr Jan Mar
Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>sort(y1)
[1] Jan Mar Apr Dec
Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Example:
If any values not in the set will be silently converted to NA:
>y2 <- factor(x2, levels = month_levels)
>y2
[1] Dec Apr <NA> Mar
Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
>gender <- factor(c("female", "male", "male", "female" ))
>levels(gender) <- c(levels(gender), "other")
>gender[3] <- "other"
>print(gender)
[1] female male other female
Levels: female male other
Removing Elements from a factor in R
Subtract one element at a time by using square brackets to
subset the vector and remove the element.
>gender <- factor(c("female", "male", "male", "female" ))
>print(gender[-3])
[1] female male female
Levels: female male
Factors in Data Frame
A Data frame in R is similar to a 2D array, where each column represents a
variable and each row represents a set of values for those variables. When
working with data frames in R, we need to keep these points in mind:
• Column names are required and cannot be empty.
• Each row must have unique names.
• Data in a data frame can only be of three types: factor, numeric, or character.
• Each column must have the same number of data entries.
Example:
>age <- c(40, 49, 48, 40, 67, 52, 53)
>salary <- c(103200, 106200, 150200, 10606, 10390, 14070, 10220)
>gender <- c("male", "male", "transgender", "female", "male", "female",
"transgender")
>employee <- data.frame(age, salary, gender = factor(gender))
>print(employee)
>print(is.factor(employee$gender))
Continue
age salary gender
1 40 103200 male
2 49 106200 male
3 48 150200 transgender
4 40 10606 female
5 67 10390 male
6 52 14070 female
7 53 10220 transgender
[1] TRUE
Example: > data<-c(1,2,2,3,1,2,3,1,3,3)
> fdata<-factor(data)
> fdata
[1] 1 2 2 3 1 2 3 1 3 3
Levels: 1 2 3
> table(fdata)
fdata
123
334
Example:
> fdata<-factor(data,labels=c("I","II","III"))
> fdata
[1] I II II III I II III I III III
Levels: I II III
Example: > x<-c(10,20,30,10,20,10,40,20,50,30)
> is.factor(x)
[1] FALSE
> y<-factor(x)
>y
[1] 10 20 30 10 20 10 40 20 50 30
Levels: 10 20 30 40 50
> is.factor(y)
[1] TRUE
> mean(y)
[1] NA
Warning message:In mean.default(y) : argument is not numeric or logical:
returning NA
> mean(as.numeric(levels(y)[y]))
[1] 24