TY R Programming Lab Book
TY R Programming Lab Book
Society’s
Modern College of Arts, Science & Commerce
(Autonomous)
Ganeshkhind, Pune-16.
Semester - V
Lab Book
Student Name:
College:
Year: Division:
Certificate
Prepared by:
• Bringing uniformity in the way the course is conducted across different colleges
• Set A is used for implementing the basic algorithms or implementing data structure
along with its basic operations. Set A is mandatory.
• Set B is used to demonstrate small variations on the implementations carried out in set
A to improve its applicability. Depending on the time availability the students should
be encouraged to complete set C.
Instructions to the students
Please read the following instructions carefully and follow them.
• Students are expected to carry workbooks during every practical.
• Students should prepare oneself beforehand for the Assignment by reading the relevant
material.
• Instructor will specify which problems to solve in the lab during the allotted slot and
student should complete them and get verified by the instructor. However, student
should spend additional hours in Lab and at home to cover as many problems as
possible given in this workbook.
• Explain the assignment and related concepts in around ten minutes using a white
board if required or by demonstrating the software.
• The value should also be entered on the assignment completion page of the
respective Lab Course.
Assignment Completion Sheet
4 Data Preprocessing
Data Visualization
5
Classifications/Association Rule
6
Instructor Head
Page 6
Assignment 1: Basics Of R programming
Why Use R?
It is a great resource for data analysis, data visualization, data science and machine learning
It provides many statistical techniques (such as statistical tests, classification, clustering and data
reduction)
It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc
It works on different platforms (Windows, Mac, Linux)
It is open-source and free
It has a large community support
It has many packages (libraries of functions) that can be used to solve different problems
To Install RStudio
Page 7
1.) Basic functionality of R, variable, data types in R
If you type 5 + 5, and press enter, you will see that R outputs 10.
Example
5+5
Output:
[1] 10
R Syntax
Syntax
To output text in R, use single or double quotes:
Example
"Hello World!"
To output numbers, just type the number (without quotes):
Example
5
10
25
To do simple calculations, add numbers together:
Example
5+5
R Print Print
Unlike many other programming languages, you can output code in R without using a print function:
Example
"Hello World!"
However, R does have a print() function available if you want to use it. This might be useful if you are
familiar with other programming languages, such as Python, which often uses the print() function to
output code.
Example
print("Hello World!")
Comments
R Comments
Comments can be used to explain R code, and to make it more readable. It can also be used to prevent
execution when testing alternative code.
R Variables
Creating Variables in R
Variables are containers for storing data values.
Page 8
R does not have a command for declaring a variable. A variable is created the moment you first assign
a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value,
just type the
variable name:
name <- "John" age <- 40
name
# output "John" age # output 40
From the example above, name and age are variables, while "John" and 40 are values.
In other programming language, it is common to use = as an assignment operator. In R, we can use
both = and <- as assignment operators.
However, <- is preferred in most cases because the = operator can be forbidden in some context in R.
Print / Output Variables
Compared to many other programming languages, you do not have to use a function to print/output
variables in R. You can just type the name of the variable:
Example
name <- "John Doe"
name # auto-print the value of the name variable
However, R does have a print() function available if you want to use it. This might be useful if you are
familiar with other programming languages, such as Python, which often use a print() function to
output variables.
Example
name <- "John Doe"
print(name) # print the value of the name variable 7
And there are times you must use the print() function to output code, for example when working with
for loops (which you will learn more about in a later chapter):
Example
for (x in 1:5) { print(x)
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Page 9
Multiple Variables
R allows you to assign the same value to multiple variables in one line:
Example
# Assign the same value to multiple variables in one line var1 <- var2 <- var3 <- "Orange"
# Print variable values var1
var2 var3
Variable Names
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume).
Page 10
R has a variety of data types and object classes. Basic Data Types
Basic data types in R can be divided into the following types:
numeric - (10.5, 55, 787)
integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
complex - (9 + 3i, where "i" is the imaginary part)
character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
logical (a.k.a. boolean) - (TRUE or FALSE)
Page 11
R Numbers
Numbers
There are three number types in R:
numeric
integer
complex
Variables of number types are created when you assign a value to them:
Example
x <- 10.5 # numeric y <- 10L # integer z <- 1i # complex
Output:
>x
[1] 10.5
> y [1] 10
>z
[1] 0+1i
Numeric
A numeric data type is the most common type in R, and contains any number with or without a decimal,
like: 10.5, 55, 787: 1 0
Example
x <- 10.5
y <- 55
# Print values of x and y x
y
Output:
>x
[1] 10.5
> y [1] 55
[1] "numeric"
> class(y)
Page 12
[1] "numeric"
Integer
Integers are numeric data without decimals. This is used when you are certain that you will never create
a variable that should contain decimals. To create an integer variable, you must use the letter L after
the integer value:
Example
x <- 1000L y <- 55L
# Print values of x and y x
y
Output:
>x
[1] 1000
> y [1] 55
[1] "integer"
> class(y)
[1] "integer"
Complex
A complex number is written with an "i" as the imaginary part:
Example
x <- 3+5i
y <- 5i
# Print values of x and y x
y
Output:
>x
[1] 3+5i
>y
[1] 0+5i
# Print the class name of x and y class(x)
class(y)
Output:
>class(x)
[1] "complex"
Page 13
> class(y)
[1] "complex"
Type Conversion
You can convert from one type to another with the following functions:
as.numeric()
as.integer()
as.complex()
Example
x <- 1L # integer y <- 2 # numeric
# convert from integer to numeric: a <- as.numeric(x)
# convert from numeric to integer: b <- as.integer(y)
# print values of x and y x 1 2
y
# print the class name of a and b class(a)
class(b)
Output:
print values of x and y
>x
[1] 1
>y
[1] 2
# print the class name of a and b
> class(a)
[1] "numeric"
> class(b)
[1] "integer" 10
OPERATORS
R supports majorly four kinds of binary operators between a set of operands. In this article, we will
see various types of operators in R Programming language and their usage.
Types of the operator in R language
● Arithmetic Operators
● Logical Operators
● Relational Operators
Page 14
Arithmetic Operators
Page 15
Logical Operators:-
Page 16
Relational Operators:-
Operator Syntax Description Example (a = 10, b = 3) Output
== a == b Checks if two values are equal 10 == 3 FALSE
!= a != b Checks if two values are not equal 10 != 3 TRUE
> a>b Checks if left value is greater 10 > 3 TRUE
< a<b Checks if left value is smaller 10 < 3 FALSE
>= a >= b Greater than or equal to 10 >= 10 TRUE
<= a <= b Less than or equal to 10 <= 3 FALSE
Set A
Set B
Set C
Assignment Evaluation
Page 17
Assignment 2: List, Vectors, Data Frame
a) create a list.
Lists are the R objects which contain elements of different types like − numbers, strings, vectors and
another list inside it. A list can also contain a matrix or a function as its elements. List is created using
list() function.
Creating a List
Following is an example to create a list containing strings, numbers, vectors and a logical values.
# Create a list containing strings, numbers, vectors and logical values list_data <-
list("Red","Green",c(21,32,11), TRUE, 51.23, 119.1) print(list_data)
Output:
print(list_data) [[1]]
[1] "Red" [[2]]
[1] "Green" [[3]]
[1] 21 32 11
[[4]]
[1] TRUE [[5]]
[1] 51.23
[[6]]
[1] 119.1
3(b) Implement R Script to access elements in the list.
Giving a name to list elements
There are only three steps to print the list data corresponding to the name:
1. Creating a list.
2. Assign a name to the list elements with the help of names() functi
3. Print the list data.
Example: 1
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Show the list.
print(list_data) 15
Accessing List Elements
Elements of the list can be accessed by the index of the element in the list. In case of named lists it
can also be accessed using the names.
Page 18
Example:2
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Access the first element of the list.
print(list_data[1])
# Access the thrid element. As it is also a list, all its elements will be printed.
print(list_data[3])
# Access the list element using the name of the element.
print(list_data$A_Matrix)
Output:
print(list_data[1])
$`1st Quarter`
[1] "Jan" "Feb" "Mar"
# Access the third element. As it is also a list, all its elements will be printed. print(list_data[3])
$`A Inner list`
$`A Inner list`[[1]]
[1] "green" 16
A Inner list`[[2]] [1] 12.3
# Access the list element using the name of the element. print(list_data$A_Matrix)
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8
Merging Lists
You can merge many lists into one list by placing all the lists inside one list() function.
# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue") # Merge the two lists. merged.list <- c(list1,list2)
# Print the merged list. print(merged.list) Output:
print(merged.list) [[1]]
[1] 1 [[2]]
[1] 2 [[3]]
[1] 3 [[4]]
[1] "Sun" [[5]]
[1] "Mon" [[6]]
[1] "Tue" 17
Page 19
Vectors
A vector is an ordered collection of basic data types of a given length. The only key thing here is all
the elements of a vector must be of the identical data type e.g homogeneous data structures. Vectors
are one-dimensional data structures.
e.g. > X = c(1, 3, 5, 7, 8)
>X
[1] 1 3 5 7 8
> length(X)
[1] 5
> class(X)
[1] "numeric"
Page 20
c. Data frames
Data frames are generic data objects of R which are used to store the tabular data. Data frames are the
foremost popular data objects in R programming because we are comfortable in seeing the data within
the tabular form. They are two-dimensional, heterogeneous data structures. These are lists of vectors
of equal lengths.
Data frames have the following constraints placed upon them:
A data-frame must have column names and every row should have a unique name.
Each column must have the identical number of items.
Each item in a single column must be of the same data type.
Different columns may have different data types.
Example Output /
Operation Syntax Description
Result
Create a data df <- data.frame(name = Create a simple data
2 rows, 2 columns
frame c("A", "B"), age = c(25, 30)) frame
Display structure of data.frame: 2 obs. of 2
View structure str(df)
data frame variables
Number of Get number of rows
nrow(df) / ncol(df) 2/2
rows/columns and columns
Get or set column
Column names names(df) name, age
names
df$name or df[["name"]] or
Access column Access a column "A", "B"
df[, "name"]
Access element in
Access cell (i,j) df[1, 2] 25
row 1, column 2
Access row df[1, ] Access full first row name = "A", age = 25
Subset by Filter rows where
df[df$age > 25, ] Rows where age > 25
condition condition is TRUE
Add a new
df$gender <- c("F", "M") Adds a new column Adds gender column
column
Page 21
Example Output /
Operation Syntax Description
Result
Only name and gender
Remove column df$age <- NULL Deletes age column
remain
Rename columns
Set A
1. Create a numeric vector and print it.
2. Create a data frame from vectors.
3. Access elements in vectors and data frames.
4. Access list elements by name.
5. Modify elements in vectors and data frames.
Set B
Set C
1. Perform set operations (union, intersect) on vectors extracted from data frames or lists.
2. Sort and merge multiple data frames and extract subsets.
3. Create nested lists containing vectors and data frames, then access deep elements
Assignment Evaluation
Page 22
Assignment 3: Matrix String and Factors
R Matrix
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the help of
the vector input to the matrix function. On R matrices, we can perform addition, subtraction,
multiplication, and division operation.
In the R matrix, elements are arranged in a fixed number of rows and columns. The matrix elements
are the real numbers.
A Matrix is created using the matrix() function.
Syntax
matrix(data, nrow, ncol, byrow, dimnames) Following is the description of the parameters used −
data is the input vector which becomes the data elements of the matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
dimname is the names assigned to the rows and columns.
Example
Page 23
Factors
Factors are the R-objects which are created using a vector. It stores the vector along with the distinct
values of the elements in the vector as labels. The labels are always character irrespective of whether
it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the count of levels.
# Create a vector.
apple_colors<- c('green','green','yellow','red','red','red','green')
# Create a factor object.
factor_apple<- factor(apple_colors)
# Print the factor.
print(factor_apple)
print(nlevels(factor_apple))
[1] green green yellow red redred green
Levels: green red yello
Page 24
String
In R programming, a string refers to a sequence of characters enclosed within either single (') or double
(") quotation marks. These strings are stored as elements within character vectors.
Here's a breakdown of strings
Output /
Operation R Code Example Explanation
Result
Create a string str1 <- "Hello, R!" Assign a string to a variable "Hello, R!"
String length nchar(str1) Get number of characters 9
paste("Hello",
Concatenate strings Join strings with a space "Hello World"
"World")
Concatenate without paste0("Hello",
Join strings without space "HelloWorld"
space "World")
Convert to uppercase toupper(str1) Convert string to uppercase "HELLO, R!"
Convert to lowercase tolower(str1) Convert string to lowercase "hello, r!"
Extract substring from position
Extract substring substr(str1, 1, 5) "Hello"
1 to 5
Check if "R" is in string
Find pattern in string grepl("R", str1) TRUE
(TRUE/FALSE)
sub("R", "World", "Hello,
Replace substring Replace first occurrence
str1) World!"
Replace all
gsub("l", "L", str1) Replace all occurrences "HeLLo, R!"
occurrences
Set A
1. Create a 3x3 matrix with numbers from 1 to 9.
2. Access the element in the 2nd row and 3rd column of a matrix.
3. Add two matrices of the same dimensions.
4. Create a factor variable from a character vector.
5. Create a string variable and find its length.
Page 25
Set B
Set C
Assignment Evaluation
Practical In-charge
Page 26
Assignment4: Data Pre-processing
• Data pre-processing consists of a series of steps to transform raw data derived from data
• Data Preprocessing can be defined as a process of converting raw data into a format that is
understandable and usable for further analysis. It is an important step in the Data Preparation stage.
• It ensures that the outcome of the analysis is accurate, complete, and consistent.
Page 27
🔢 📖
🛠 Operation Example Code Output/Result
Step Description
Display the Shows all rows of
1 View Data head(data)
first few rows the sample data
Check for Count NA
age: 2, gender: 1,
2 Missing (NA) values in each colSums(is.na(data))
score: 1
Values column
View rows Displays only rows
Find Rows
3 with any data[!complete.cases(data), ] with missing
with NA
missing value values
Replace NA in data$age[is.na(data$age)] <- NA values in age
Fill NA in age
Numeric mean(data$age, na.rm = and score are
4 and score
Column with TRUE)data$score[is.na(data$score)] <- replaced with
with mean
Mean mean(data$score, na.rm = TRUE) respective means
Replace NA in Fill NA in NA in gender
data$gender[is.na(data$gender)] <-
5 Categorical gender with becomes
"Unknown"
Column "Unknown" "Unknown"
Convert Convert
Enables statistical
6 Categorical to gender to data$gender <- factor(data$gender)
modeling
Factor factor
Standardize
Scale score to Standardized
7 Numeric data$score <- scale(data$score)
z-score (mean=0, sd=1)
Column
Create
Add New pass/fail data$pass <- ifelse(data$score > 0, "Pass",
8 Adds new column
Column based on "Fail")
score
Drop a Remove name
9 data <- subset(data, select = -name) name is dropped
Column column
View Cleaned View final Final preprocessed
10 print(data)
Data cleaned data dataset
Page 28
read.csv() – Read CSV file
Page 29
as.character() – Convert to character
Page 30
Set A
Set B
Import standard dataset and Use Transformation Techniques
Dataset Name: winequality-red.csv
Dataset Link: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-
quality/winequality-red.csv
Write a R program to perform following task
1. To display shape of dataset.
2. Display top rows and columns of dataset
3. Display no of columns and name of columns
Assignment Evaluation
Practical In-charge
Page 31
Assignment 5: Data Visualization
Plot the bar chart barplot(A, xlab = "X-axis", ylab = "Y-axis", main
="Bar-Chart") Output:
Page 32
Horizontal Bar Chart:- Creating a
="Bar-Chart") Output:
Page 33
Adding Label, Title and Color in the BarChart
Label, title and colors are some properties in the bar chart which can be added to the bar by adding and
passing an argument.
Approach:
1. To add the title in bar chart.
barplot( A, main = title_name )
2. X-axis and Y-axis can be labeled in bar chart. To add the label in bar chart. barplot( A, xlab=
x_label_name, ylab= y_label_name)
3. To add the color in bar chart.
barplot( A, col=color_name) Example :
Page 34
="green", main ="GeeksforGeeks-Article
chart") Output:
3. Histograms in R language:-
A histogram contains a rectangular area to display the statistical information which is proportional to
the frequency of a variable and its width in successive numerical intervals. A graphical representation
that manages a group of data points into different specified ranges. It has a special feature which
shows no gaps between the bars and is similar to a vertical bar graph.We can create histogram in R
Programming Language using hist() function.
Page 35
4. Creating a simple Histogram in R
Creating a simple histogram chart by using the above parameter. This vector v is plot using hist().
Example:
Page 36
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39) #
5), breaks = 5)
Output:
Page 37
Using histogram return values for labels using text() To
create a histogram return value chart.
R
# Creating data for the graph. v <- c(19,
5)
= m$counts,
Output:
Page 38
Set A
2. Write an R program to create a grouped bar chart comparing monthly sales of two products
over six months. Include proper axis labels, a legend, and different colors for each product..
2. Write an R program to create a 3D pie chart showing the market share percentage of five
smartphone brands. Label each slice with the brand name and percentage value.
Set B
1.Write an R program to create a scatter plot of students’ scores in Mathematics vs. Science.
Add a regression line and customize point shapes and colors based on gender.
2.ite an R program to create boxplots comparing the distribution of test scores among three
different classes. Use different colors for each class and add meaningful axis titles.
Assignment Evaluation
Practical In-charge
Page 39
Assignment 6: Classifications/Association Rule
What is Classification?
Classification is a supervised machine learning technique used to predict the categorical class labels of
new observations based on past data. It involves training a model on a labeled dataset (where the class
labels are known) and then using that model to assign labels to new data points
Association rule mining is an unsupervised learning method used to discover interesting relationships
(rules) between variables in large datasets, typically transactional data. The goal is to identify rules like:
If a customer buys bread and butter, then they are likely to buy milk.
Lift: How much more likely the consequent is given the antecedent, compared to random chance.
Library Purpose
Main package for mining frequent itemsets and association rules using Apriori and Eclat
arules
algorithms.
arulesViz Visualization of itemsets and association rules (graphs, scatterplots, matrix, etc.).
data.table Efficient data manipulation (often used with arules).
dplyr Data manipulation (helpful in preprocessing transaction data).
Page 40
Set A
1.Apply the apriori algorithm on the above dataset to generate the frequent itemsets and association
rules. Repeat the process with different min_sup values.
2.Create your own transactions dataset and apply the above process on your dataset
SET B:
Write a R program to read the dataset and display its information. Preprocess the data (drop null values
etc.)
Convert the categorical values into numeric format.
Apply the apriori algorithm on the above dataset to generate the frequent itemsets and association rules.
2. Download the groceries dataset.
SET C:
Write a Rcode to implement the apriori algorithm. Test the code on any standard dataset
Assignment Evaluation
Practical In-Charge
Page 41
Page 42