0% found this document useful (0 votes)
16 views14 pages

R Language Unit 2

Uploaded by

d8750316
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views14 pages

R Language Unit 2

Uploaded by

d8750316
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit – II Data Structures

A data structure is a particular way of organizing data in a computer so that it can be used effectively. The
idea is to reduce the space and time complexities of different tasks. Data structures in R programming are
tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D or nD) and whether they’re
homogeneous (all elements must be of the identical type) or heterogeneous (the elements are often of various
types). This gives rise to the six data types which are most frequently utilized in data analysis.

R provides many built-in data structures. Each is used to handle data in different ways:

 Vectors
 Lists
 Arrays
 Matrices
 Data Frames
 Factors

1. Vectors
A vector is an ordered collection of basic data types of a given length. The only key thing here is all the
elements of a vector must be of the identical data type e.g homogeneous data structures. Vectors are one-
dimensional data structures.

R Vectors are the same as the arrays in R language which are used to hold multiple data values of the same
type. One major key point is that in R Programming Language the indexing of the vector will start from '1'
and not from '0'. We can create numeric vectors and character vectors as well.

Creating a vector in R

A vector is a basic data structure that represents a one-dimensional array. to create a array we use
the "c" function which the most common method use in R Programming Language. We can also
use seq() function or use colons ":"

Here we create three vectors for numerical, character, and logical types using the function c().

numeric_vector = c(10, 20, 30)


character_vector = c("apple", "banana", "cherry")
logical_vector = c(TRUE, FALSE, TRUE) 10 20 30
numbers <- 1:10 "apple" "banana" "cherry"
TRUE FALSE TRUE
print(numeric_vector) 1 2 3 4 5 6 7 8 9 10
print(character_vector)
print(logical_vector)
print(numbers)

You can access the elements of the vector using square brackets.
# Access the first
print(numeric_vector[1]) 10
# Access multiple elements "apple" "cherry"
print(character_vector[c(1, 3)])
You can also perform mathematical and logical operations with vectors.
# Adding a scalar value to the vector 12 22 32
print(numeric_vector + 2) 100 200 300
# Multiplying elements by a scalar FALSE TRUE TRUE
print(numeric_vector * 10)
# Perform logical operations - Check which elements
are greater than 15
print(numeric_vector > 15)

Other important operations include summing, finding the mean, and finding the minimum and maximum
values.
# Summation
print(sum(numeric_vector)) 60
#Mean 20
print(mean(numeric_vector)) 30
# Max and min 10
print(max(numeric_vector))
print(min(numeric_vector))

Modification of a Vector
X <- c(2, 7, 9, 7, 8, 2)
X[3] <- 1
X[2] <- 9
print(X) 291782
Length of R vector 3
print(length(numeric_vector))
Sorting elements of a R Vector Ascending order 1 2 2 7 8 11
X <- c(8, 2, 7, 1, 11, 2) Descending order 11 8 7 2 2 1
A <- sort(X)
cat('Ascending order', A, '\n')
B <- sort(X, decreasing = TRUE)
cat('Descending order', B)

2. Lists

A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous data
structures. These are also one-dimensional data structures. A list can be a list of vectors, list of
matrices, a list of characters and a list of functions and so on.
A list in R programming is a generic object consisting of an ordered collection of objects. Lists
are one-dimensional, heterogeneous data structures. The list can be a list of vectors, a list of
matrices, a list of characters, a list of functions, and so on. A list in R is created with the use of the
list() function.
R allows accessing elements of an R list with the use of the index value. In R, the indexing of a list
starts with 1 instead of 0.
Creating a List [[1]]
empId = c(1, 2, 3, 4) [1] 1 2 3 4
empName = c("Debi", "Sandeep", "Subham", "Shiba")
empList = list(empId, empName) [[2]]
print(empList) [1] "Debi" "Sandeep" "Subham" "Shiba"
Access components by indices: Accessing name components using indices
cat("Accessing name components using indices\n") [1] "Debi" "Sandeep" "Subham" "Shiba"
print(empList[[2]]) Accessing Sandeep from name using indices
cat("Accessing Sandeep from name using indices\n") [1] "Sandeep"
print(empList[[2]][2]) Accessing 4 from ID using indices
cat("Accessing 4 from ID using indices\n") [1] 4
print(empList[[1]][4])
Modifying/AddingComponents of a List After modified the list
empList[[1]][5] = 5 [[1]]
empList[[2]][5] = "Kamala" [1] 1 2 3 4 5
cat("After modified the list\n") [[2]]
print(empList) [1] "Debi" "Sandeep" "Subham" "Shiba"
"Kamala"
Adding Item to List [1] 1 5 6 3 45
my_numbers = c(1,5,6,3) [1] 1 5 6 3
append(my_numbers, 45)
my_numbers
Deleting Components of a List [1] 1 6 3
print(my_numbers[-2])
Check if Item Exists [1] TRUE
thislist <- list("apple", "banana", "cherry")
"apple" %in% thislist
Loop Through a List [1] "apple"
thislist <- list("apple", "banana", "cherry") [1] "banana"
for (x in thislist) { [1] "cherry"
print(x)
}

3. Arrays

Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-
dimensional data structures. For example, if we create an array of dimensions (2, 3, 3) then it
creates 3 rectangular matrices each with 2 rows and 3 columns. They are homogeneous data
structures.

Arrays are important data storage structures defined by a fixed number of dimensions. Arrays are
used for the allocation of space at contiguous memory locations.
In R Programming Language Uni-dimensional arrays are called vectors with the length being their
only dimension. Two-dimensional arrays are called matrices, consisting of fixed numbers of rows
and columns. R Arrays consist of all elements of the same data type. Vectors are supplied as input
to the function and then create an array based on the number of dimensions.
Creating an Array [1] 1 2 3 4 5 6 7 8 9
arr <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) Length of array : 9
print (arr)
cat ("Length of array : ", length(arr))
arr = array(2:10, dim = c(3, 3)) [1,]2 5 8
print(arr) [2,]3 6 9
[3,]4 7 10
Naming of Arrays col1 col2 col3
row_names <- c("row1", "row2", "row3") row1 2 5 8
col_names <- c("col1", "col2", "col3") row2 3 6 9
arr = array(2:10, dim = c(3, 3), row3 4 7 10
dimnames = list(row_names, col_names))
print (arr)
Accessing Multi-Dimensional Array Accessing element at row 3, column 1:
cat("\nAccessing element at row 3, column 1:\n") [1] 4
print(arr[3, 1]) Accessing element at row 2, column 3:
cat("\nAccessing element at row 2, column 3:\n") [1] 9
print(arr[2, 3])
Adding elements to array
x <- c(1, 2, 3, 4, 5)

x <- c(x, 6)
print("Array after 1st modification") [1] "Array after 1st modification"
print(x) [1] 1 2 3 4 5 6

x <- append(x, 7)
print("Array after 2nd modification") [1] "Array after 2nd modification"
print(x) [1] 1 2 3 4 5 6 7

len <- length(x)


x[len + 1] <- 8 [1] "Array after 3rd modification"
print("Array after 3rd modification") [1] 1 2 3 4 5 6 7 8
print(x)

x[len + 3] <- 9 [1] "Array after 4th modification"


print("Array after 4th modification") [1] 1 2 3 4 5 6 7 8 NA 9
print(x)

print("Array after 5th modification") [1] "Array after 5th modification"


x <- append(x, c(10, 11, 12), after = length(x) + 3) [1] 1 2 3 4 5 6 7 8 NA 9 10 11 12
print(x)
Removing Elements from Array
m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) [1] "Original Array"
print("Original Array") [1] 1 2 3 4 5 6 7 8 9
print(m)
print(m[-6]) [1] 1 2 3 4 5 7 8 9
4. Matrices

A matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know


rows are the ones that run horizontally and columns are the ones that run vertically. Matrices are
two-dimensional, homogeneous data structures.

R-matrix is a two-dimensional arrangement of data in rows and columns. In a matrix, rows are the
ones that run horizontally and columns are the ones that run vertically. In R programming, matrices
are two-dimensional, homogeneous data structures. These are some examples of matrices:
Creating matrix and Printing with row and column name C1 C2
mat1 <- matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2) R1 1 4
rownames(mat1) = c("R1", "R2", "R3") R2 2 5
colnames(mat1) = c("C1", "C2") R3 3 6
print(mat1)
Adding Row in existing matrix C1 C2
mat1=rbind(mat1,c(11,12)) R1 1 4
print(mat1) R2 2 5
R3 3 6
11 12
Adding column C1 C2
mat1=cbind(mat1,c(21,22,23,24)) R1 1 4 21
print(mat1) R2 2 5 22
R3 3 6 23
11 12 24
Deleting row and column C1 C2
mat1=mat1[-c(4), -c(3)] R1 1 4
print(mat1 R2 2 5
R3 3 6
Assigning values by Row wise The 3x3 matrix:
A = matrix( c(1, 2, 3, 4, 5, 6, 7, 8, 9),nrow = 3, ncol = 3, byrow = [,1] [,2] [,3]
TRUE) [1,] 1 2 3
cat("The 3x3 matrix:\n") [2,] 4 5 6
print(A) [3,] 7 8 9

5. Data Frames
Data frames are generic data objects of R which are used to store the tabular data. Data frames are the
foremost popular data objects in R programming because we are comfortable in seeing the data within the
tabular form. They are two-dimensional, heterogeneous data structures. These are lists of vectors of equal
lengths.
Data frames have the following constraints placed upon them:
 A data-frame must have column names and every row should have a unique name.
 Each column must have the identical number of items.
 Each item in a single column must be of the same data type.
 Different columns may have different data types.

Data Frames in R Language are generic data objects of R that are used to store tabular data.
Data frames can also be interpreted as matrices where each column of a matrix can be of different data
types. R data frame is made up of three principal components, the data, rows, and columns.
Data_Frame=data.frame ( Training Pulse Duration
Training = c("Strength", "Stamina", "Other"), 1 Strength 100 60
Pulse = c(100, 150, 120), 2 Stamina 150 30
Duration = c(60, 30, 45) 3 Other 120 45
) Training Pulse Duration
Other :1 Min. :100.0 Min. :30.0
print(Data_Frame) Stamina :1 1st Qu.:110.0 1st Qu.:37.5
Strength:1 Median :120.0 Median :45.0
summary(Data_Frame) Mean :123.3 Mean :45.0
3rd Qu.:135.0 3rd Qu.:52.5
print(Data_Frame[1]) Max. :150.0 Max. :60.0
Training
print(Data_Frame[["Training"]]) 1 Strength
2 Stamina
New_row_DF=rbind(Data_Frame, c("Strength", 3 Other
110, 110)) [1] Strength Stamina Other
Levels: Other Stamina Strength
print(New_row_DF) Training Pulse Duration
1 Strength 100 60
New_col_DF=cbind(New_row_DF, Steps = c(1000, 2 Stamina 150 30
6000, 2000,3000)) 3 Other 120 45
4 Strength 110 110
print(New_col_DF) Training Pulse Duration Steps
1 Strength 100 60 1000
Data_Frame_New = New_col_DF[-c(1), -c(1)] 2 Stamina 150 30 6000
3 Other 120 45 2000
# Print the new data frame 4 Strength 110 110 3000
print(Data_Frame_New) Pulse Duration Steps
print(length(Data_Frame)) 2 150 30 6000
3 120 45 2000
New_Data_Frame=rbind(Data_Frame, 4 110 110 3000
New_row_DF) [1] 3
New_Data_Frame Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 Other 120 45
4 Strength 100 60
5 Stamina 150 30
6 Other 120 45
7 Strength 110 110

6. Factors
Factors are the data objects which are used to categorize the data and store it as levels. They are useful for
storing categorical data. They can store both strings and integers. They are useful to categorize unique
values in columns like (“TRUE” or “FALSE”) or (“MALE” or “FEMALE”), etc.. They are useful in data
analysis for statistical modeling.
# Create a factor [1] Jazz Rock Classic
music_genre <- factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", Classic Pop Jazz Rock
"Rock", "Jazz")) Jazz
# Print the factor Levels: Classic Jazz Pop Rock
music_genre
The factor has four levels (categories): Classic, Jazz, Pop and Rock. [1] "Classic" "Jazz" "Pop"
music_genre <- "Rock"
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
levels(music_genre)
Change Item Value [1] Pop
music_genre <-
Levels: Classic Jazz Pop Rock
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
music_genre[3] <- "Pop"
music_genre[3]

Built-in Functions

Character functions

S.No Built-in function Description Example

1 tolower(x) It is used to convert the string into string<- "Learn eTutorials" print(tolower(string))
lower case. Output [1] "learn etutorials"

2 toupper(x) It is used to convert the string into string<- "Learn eTutorials" print(toupper(string))
upper case. Output [1] "LEARN ETUTORIALS"

3 strsplit(x, split)) It splits the elements of character string<- "Learn eTutorials" print(strsplit(string,
vector x at split point. "")) Output [[1]] [1] "L" "e" "a" "r" "n" " " "e"
"T" "u" "t" "o" "r" "i" "a" "l" "s"

4 paste(..., sep="") Concatenate strings after paste("Str1",1:3,sep="") [1] "Str11" "Str12"


using sep string to seperate them. "Str13" paste("a",1:3,sep="M") [1] "aM1"
"aM2" "aM3" paste("Today is", date()) [1]
"Today is Sun Feb 27 06:26:31 2022"

5 Find pattern in x and replace string<- "You are learning GOlang in Learn
sub(pattern,replaceme with replacement text. If eTutorials" sub("GOlang","R",string) Output
nt, fixed=FALSE then pattern is a regular [1] "You are learning R in Learn eTutorials"
x,ignore.case=FALSE expression.
,fixed=FALSE) If fixed = T then pattern is a text
string.

6 grep(pattern, x , It searches for pattern in x. string <- c('R','GO','GOlang') pattern<- '^GO'


ignore.case=FALSE, print(grep(pattern, string)) Output [1] 2 3
fixed=FALSE)

7 substr(x, It is used to extract substrings in a string<- "Learn eTutorials" substr(string, 1, 5)


start=n1,stop=n2) character vector. substr(string, 4, 10) Output [1] "Learn" [1]
"rn eTut" a <- "123456789" substr(a, 5, 3)
output [1] ""
Statistical functions

Built-in function
S.No Description Example

1 mean(x, trim=0, Calculates the average or x=c(2,3,4,5) mean(x,


na.rm=FALSE) mean of a set of numbers trim=0,na.rm=FALSE)
Simply calculate mean of Output [1] 3.5
object x.

2 sd(x) It returns standard x=c(2,3,4,5) print(sd(x))


deviation of an object. Output [1] 1.290994

3 median(x) It returns median x=c(2,3,4,5) print(median(x))


Output [1] 3.5

4 range(x) It returns range x=c(2,3,4,5) print(range(x))


Output [1] 2 5

5 sum(x) It returns sum. x=c(2,3,4,5) print(range(x))


Output [1] 14

6 diff(x, lag=1) It returns differences with x=c(2,3,4,5) print(diff(x,lag=1))


lag indicating which lag to x=c(2,3,4,5) print(diff(x,lag=2))
use. x=c(5,10,15,20,25,30) print(diff(x,lag=2))

Output [1] 1 1 1 [1] 2 2 [1] 10 10 10 10

7 min(x) It returns minimum value x=c(5,10,15,20,25,30) print(min(x))


of object. Output [1] 5

8 max(x) It returns maximum value x=c(5,10,15,20,25,30) print(max(x))


of object. Output [1] 30

9 scale(x, Column center or x = matrix(1:15, nrow=3, ncol=5,byrow


center=TRUE, standardize a matrix. =TRUE) print(x)
scale=TRUE) Output [,1] [,2] [,3] [,4] [,5] [1,] 1
2 3 4 5 [2,] 6 7 8 9 10 [3,]
11 12 13 14 15
print(scale(x, center=TRUE,
scale=TRUE))
Output [,1] [,2] [,3] [,4] [,5] [1,] -1 -
1 -1 -1 -1 [2,] 0 0 0 0 0 [3,]
1 1 1 1 attr(,"scaled:center") [1] 6
7 8 9 10 attr(,"scaled:scale") [1] 5 5 5 5 5
Math functions

S.N Built-in
Description Example
o function

1 abs(x) It returns the absolute value of input x x<- -2 print(abs(x))


Output [1] 2

2 sqrt(x) It returns the square root of input x x<- 2 print(sqrt(x))


Output [1] 1.414214

3 ceiling(x) It returns the smallest integer which is x<- 2.8 print(ceiling(x))


larger than or equal to x. Output [1] 3

4 floor(x) It returns the largest integer, which is x<- 2.8 print(floor(x))


smaller than or equal to x. Output [1] 2

5 trunc(x) It returns the truncate value of input x. x<- c(2.2,6.56,10.11)


print(trunc(x))
Output [1] 2 6 10

6 round(x, It returns round value of input x. x=2.456 print(round(x,digits=2))


digits=n)
x=2.4568
print(round(x,digits=3))
Output [1] 2.46 [1] 2.457

7 cos(x), sin(x), It returns cos(x), sin(x) , tan(x) value of x<- 2


tan(x) input x print(cos(x))
print(sin(x))
print(tan(x))
Output [1] -0.4161468 [1]
0.9092974 [1] -2.18504

8 log(x) It returns natural logarithm of input x x<- 2


print(log(x))
Output [1] 0.6931472

9 log10(x) It returns common logarithm of input x x<- 2


print(log10(x))
Output [1] 0.30103

exp(x) It returns exponent x<- 2


10 print(exp(x))
Output [1] 7.389056
User-defined Functions
A function is a block of code which only runs when it is called. You can pass data, known as parameters, into
a function. A function can return data as a result.

We can distinguish the four main elements:


Function name. To create a UDF, first you have to assign it a name and save it as a new object. You just
have to call the name whenever you want to use the function.
Arguments. The function arguments (also known as parameters) are provided within the parentheses.
Arguments are key for the function to know what data to take as input and/or how to modify the behavior of
the function.
Function body. Within curly brackets comes the body of the function, that is, the instructions to solve a
specific task based on the information provided by the arguments.
Return statement. The return statement is required if you want the function to save as variables the result or
results following the operations in the function body.

Parameters or Arguments?
The terms "parameter" and "argument" can be used for the same thing: information that are passed into a
function.
From a function's perspective:
A parameter is the variable listed inside the parentheses in the function definition.
An argument is the value that is sent to the function when it is called.

Creating a Function in R Programming


Functions are created in R by using the command function(). The general structure of the function file is as
follows:
add_num <- function(a,b) 69
{
sum_result <- a+b
return(sum_result)
}
sum = add_num(35,34)
print(sum)

Calling a Function in R
After creating a Function, we have to call the function to use it. Calling a function in R is done by writing
it's name and passing possible parameters value.
Passing Arguments to Functions in R Programming Language
There are several ways we can pass the arguments to the function:
 Case 1: Generally in R, the arguments are passed to the function in the same order as in the function
definition.
 Case 2: If we do not want to follow any order what we can do is we can pass the arguments using the
names of the arguments in any order.
 Case 3: If the arguments are not passed the default values are used to execute the function.
R Function Examples

1. Single Input Single Output 2. Multiple Input Multiple Output


areaOfCircle = function(radius){ Rectangle = function(length, width){
area = pi*radius^2 area = length * width
return(area) perimeter = 2 * (length + width)
}
result = list("Area" = area, "Perimeter" =
print(areaOfCircle(2)) perimeter)
Output :12.56637 return(result)
}
3. Inline Functions in R Programming Language
f = function(x) x^2*4+x/3 resultList = Rectangle(2, 3)
print(resultList["Area"])
print(f(4)) print(resultList["Perimeter"])
print(f(-2))
print(0) Output
$Area
Output 6
65.33333 $Perimeter
15.33333 10
0

Recursive Function in R

Recursion, in the simplest terms, is a type of looping technique. It exploits the basic working of functions
in R.
Recursion is when the function calls itself. This forms a loop, where every time the function is called, it
calls itself again and again and this technique is known as recursion. Since the loops increase the memory
we use the recursion. The recursive function uses the concept of recursion to perform iterative tasks they
call themselves, again and again, which acts as a loop. These kinds of functions need a stopping condition
so that they can stop looping continuously. Recursive functions call themselves. They break down the
problem into smaller components. The function() calls itself within the original function() on each of the
smaller components. After this, the results will be put together to solve the original problem.
Example: Factorial using Recursion in R
rec_fac <- function(x){
if(x==0 || x==1)
{
return(1)
}
else
{
return(x*rec_fac(x-1))
}
}

rec_fac(5)
Output:
[1] 120
Here, rec_fac(5) calls rec_fac(4), which then calls rec_fac(3), and so on until the input argument x, has
reached 1. The function returns 1 and is destroyed. The return value is multiplied by the argument value and
returned. This process continues until the first function call returns its output, giving us the final result.
Example: Sum of Series Using Recursion

Recursion in R is most useful for finding the sum of self-repeating series. In this example, we will find the
sum of squares of a given series of numbers. Sum = 1 2 +22 +…+N2

sum_series <- function(vec){


if(length(vec)<=1)
{
return(vec^2)
}
else
{
return(vec[1]^2+sum_series(vec[-1]))
}
}
series <- c(1:10)
sum_series(series)
Output:
[1] 385

Reading Files in R Programming

So far the operations using the R program are done on a prompt/terminal which is not stored anywhere. But
in the software industry, most of the programs are written to store the information fetched from the
program. One such way is to store the fetched information in a file. So the two most common operations
that can be performed on a file are:

Importing/Reading Files in R
Exporting/Writing Files in R

Reading Files in R Programming Language

When a program is terminated, the entire data is lost. Storing in a file will preserve our data even if the
program terminates. If we have to enter a large number of data, it will take a lot of time to enter them all.
However, if we have a file containing all the data, we can easily access the contents of the file using a few
commands in R. You can easily move your data from one computer to another without any changes. So
those files can be stored in various formats. It may be stored in a i.e..txt(tab-separated value) file, or in a
tabular format i.e .csv(comma-separated value) file or it may be on the internet or cloud. R provides very
easier methods to read those files.

To know the current working directory getwd()


To create new directory dir.create(“mydir”)
To set working directory Setwd(“path”) ex setwd(“d:/mydir”)
To create
R base functions for importing data
 read.csv(): for reading “comma separated value” files (“.csv”).
 read.csv2(): variant used in countries that use a comma “,” as decimal point and a semicolon “;” as field
separators.
 read.delim(): for reading “tab-separated value” files (“.txt”). By default, point (“.”) is used as decimal
points.
 read.delim2(): for reading “tab-separated value” files (“.txt”). By default, comma (“,”) is used as
decimal points.
The simplified format of these functions are, as follow:
# Read tabular data into R
read.table(file, header = FALSE, sep = "", dec = ".")
# Read "comma separated value" files (".csv")
read.csv(file, header = TRUE, sep = ",", dec = ".", ...)
# Or use read.csv2: variant used in countries that
# use a comma as decimal point and a semicolon as field separator.
read.csv2(file, header = TRUE, sep = ";", dec = ",", ...)
# Read TAB delimited files
read.delim(file, header = TRUE, sep = "\t", dec = ".", ...)
read.delim2(file, header = TRUE, sep = "\t", dec = ",", ...)
file: the path to the file containing the data to be imported into R.
sep: the field separator character. “\t” is used for tab-delimited file.
header: logical value. If TRUE, read.table() assumes that your file has a header row, so row 1 is the
name of each column. If that’s not the case, you can add the argument header = FALSE.
dec: the character used in the file for decimal points.
Reading a local file
To import a local .txt or a .csv file, the syntax would be:
# Read a txt file, named "mtcars.txt"
my_data <- read.delim("mtcars.txt")
# Read a csv file, named "mtcars.csv"
my_data <- read.csv("mtcars.csv")

Create a stud.CSV file as follows


Name,Age,Qualification,Address
Amiya,18,MCA,BBS
Niru,23,Msc,BLS
Debi,23,BCA,SBP
Biku,56,ISC,JJP

# Using read.csv()
myData = read.csv("stud.csv")
print(myData)

Output:

Name Age Qualification Address


1 Amiya 18 MCA BBS
2 Niru 23 Msc BLS
3 Debi 23 BCA SBP
4 Biku 56 ISC JJP
read.csv2(): read.csv() is used for variant used in countries that use a comma “,” as decimal point and a
semicolon “;” as field separators.
Syntax: read.csv2(file, header = TRUE, sep = ";", dec = ",", ...)
Parameters:
file: the path to the file containing the data to be imported into R.
header: logical value. If TRUE, read.csv2() assumes that your file has a header row, so row 1 is the name of
each column. If that’s not the case, you can add the argument header = FALSE.
sep: the field separator character
dec: the character used in the file for decimal points.

Analyzing the CSV File


By default the read.csv() function gives the output as a data frame. This can be easily checked as follows.
Also we can check the number of columns and rows.

data <- read.csv("input.csv") TRUE


5
print(is.data.frame(data)) 8
print(ncol(data))
print(nrow(data))
sal <- max(data$salary) 843.25
print(sal)
# Get the person detail having max salary. id name salary start_date dept
retval <- subset(data, salary == max(salary)) 5 NA Gary 843.25 2015-03-27 Finance
print(retval)
Get all the people working in IT department id name salary start_date dept
retval <- subset( data, dept == "IT") 1 1 Rick 623.3 2012-01-01 IT
print(retval) 3 3 Michelle 611.0 2014-11-15 IT
6 6 Nina 578.0 2013-05-21 IT

Writing into a CSV File id name salary start_date dept


data <- read.csv("input.csv") 1 3 Michelle 611.00 2014-11-15 IT
retval <- subset(data, as.Date(start_date) > 2 4 Ryan 729.00 2014-05-11 HR
as.Date("2014-01-01")) 3 NA Gary 843.25 2015-03-27 Finance
4 8 Guru 722.50 2014-06-17 Finance
# Write filtered data into a new file.
write.csv(retval,"output.csv")
newdata <- read.csv("output.csv")
print(newdata)

You might also like