0% found this document useful (0 votes)
12 views42 pages

TY R Programming Lab Book

Uploaded by

bodkedurgesh0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views42 pages

TY R Programming Lab Book

Uploaded by

bodkedurgesh0001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

P.E.

Society’s
Modern College of Arts, Science & Commerce
(Autonomous)
Ganeshkhind, Pune-16.

T. Y. B.Sc. (Computer Science) (NEP Version 1)

Semester - V

Practical Based On practical on Data Science Using R (COM35401)

Lab Book
Student Name:

College:

Roll No: Exam Seat No:

Year: Division:
Certificate

This is to certify that Mr./Ms____________________________________


Exam Seat Number has
successfully completed the assignment for the Lab Course I (COM35401 practical on Data
Science Using R) during the Academic Year and has scored_________________Marks out
of 15.

Teacher In charge HOD.


Dept. of Computer Science

Internal Examiner External Examiner


BOARD OF STUDIES

1. Dr. Shubhangi Bhatambrekar 8. Prof. Prerana Sherla


2. Prof. Kumod Sapkal 9. Mrs. Sayali Suryawanshi
3. Dr. Dipali Meher 10. Mr. Chirantan Dixit
4. Prof. Ranjana Shevkar 11. Ms. Prerana Sarode
5. Dr. Satish Ambike 12. Mr. Atharva Gujar
6. Prof. Sonal Kulakarni 13. Ms. Pradnya Patil
7. Prof. Ashwini Pawar 14. Ms. Nikita Gaikwad

Prepared by:

Prof . Sayali Suryawanshi Modern College of Arts, Science and


Commerce(Autonomous), Ganeshkhind, Pune-16
Prof. Kumod Sapkal Modern College of Arts, Science and
Commerce(Autonomous), Ganeshkhind, Pune-16
Introduction

About the workbook

This workbook is intended to be used by T.Y.B.Sc (Computer Science) students for R


Programing.

R Programing Practical is an important core subject of computer science curriculum, and


hands-on laboratory experience is critical to the understanding of theoretical concepts studied
as part of this course. Study of any programming language is incomplete without hands on
experience of implementing solutions using programming paradigms and verifying them in the
lab. This workbook provides a rich set of problems covering the basic concepts as well as
numerous computing problems demonstrating the applicability and importance of various
concepts.

The objectives of this book are

• Defining clearly the scope of the course

• Bringing uniformity in the way the course is conducted across different colleges

• Continuous assessment of the course

• Bring variation and variety in experiments carried out by different students in a


batch

• Providing ready reference for students while working in the lab

• Catering to the need of slow paced as well as fast paced learners

How to use this workbook


R Programing Practical syllabus is divided into six assignments. Each assignment has
problems divided into three sets A, B and C.

• Set A is used for implementing the basic algorithms or implementing data structure
along with its basic operations. Set A is mandatory.

• Set B is used to demonstrate small variations on the implementations carried out in set
A to improve its applicability. Depending on the time availability the students should
be encouraged to complete set C.
Instructions to the students
Please read the following instructions carefully and follow them.
• Students are expected to carry workbooks during every practical.

• Students should prepare oneself beforehand for the Assignment by reading the relevant
material.

• Instructor will specify which problems to solve in the lab during the allotted slot and
student should complete them and get verified by the instructor. However, student
should spend additional hours in Lab and at home to cover as many problems as
possible given in this workbook.

• Students will be assessed for each exercise on a scale from 0 to 5


 Not done 0
 Incomplete 1
 Late Complete 2
 Needs improvement 3
 Complete 4 ➢ Well Done 5

Instruction to the Practical In-Charge

• Explain the assignment and related concepts in around ten minutes using a white
board if required or by demonstrating the software.

• Choose appropriate problems to be solved by students. Set A is mandatory.


Choose problems from set B depending on time availability. Discuss set C with
students and encourage them to solve the problems by spending additional time
in lab or at home.

• Make sure that students follow the instruction as given above.

• You should evaluate each assignment carried out by a student on a scale of 5 as


specified above by ticking the appropriate box.

• The value should also be entered on the assignment completion page of the
respective Lab Course.
Assignment Completion Sheet

Sr. No Assignment Name Marks Signature


(Out of 5)
Basics Of R programming
1
2 List, Vectors, Data Frame

3 Matrix ,String and Factors

4 Data Preprocessing

Data Visualization
5
Classifications/Association Rule
6

This is to certify that Mr/Ms


University Exam Seat Number has successfully completed the course work for R
Programing Practice and has scored _____ Marks out of 20.

Instructor Head

Internal Examiner External Examiner

Page 6
Assignment 1: Basics Of R programming

1. Installing R and RStudio


R is a programming language and software environment for statistical analysis, graphics representation
and reporting. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand, and is currently developed by the R Development Core Team.
This programming language was named R, based on the first letter of first name of the two R authors
(Robert Gentleman and Ross Ihaka)
R is often used for statistical computing and graphical presentation to analize and visualize data.

Why Use R?
It is a great resource for data analysis, data visualization, data science and machine learning
It provides many statistical techniques (such as statistical tests, classification, clustering and data
reduction)
It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc
It works on different platforms (Windows, Mac, Linux)
It is open-source and free
It has a large community support
It has many packages (libraries of functions) that can be used to solve different problems

To Install R and R Packages


1. Open an internet browser and go to www.r-project.org.
2. Click the "download R" link in the middle of the page under "Getting Started."
3. Select a CRAN location (a mirror site) and click the corresponding link.
4. Click on the "Download R for WINDOWS" link at the top of the page.
5. Click on the file containing the latest version of R under "Files."
6. Save the .pkg file, double-click it to open, and follow the installation instructions.
7. Now that R is installed, you need to download and install RStudio.

To Install RStudio

1. Go to www.rstudio.com and click on the "Download RStudio" button.


2. Click on "Download RStudio Desktop."
3. Click on the version recommended for your system, or the latest Mac version, save the .dmg file on
your computer, double-click it to open, and then drag and drop it to your applications folder.

Page 7
1.) Basic functionality of R, variable, data types in R
If you type 5 + 5, and press enter, you will see that R outputs 10.
Example
5+5
Output:
[1] 10
R Syntax
Syntax
To output text in R, use single or double quotes:
Example
"Hello World!"
To output numbers, just type the number (without quotes):

Example
5
10
25
To do simple calculations, add numbers together:
Example
5+5
R Print Print
Unlike many other programming languages, you can output code in R without using a print function:
Example
"Hello World!"
However, R does have a print() function available if you want to use it. This might be useful if you are
familiar with other programming languages, such as Python, which often uses the print() function to
output code.
Example
print("Hello World!")

Comments
R Comments
Comments can be used to explain R code, and to make it more readable. It can also be used to prevent
execution when testing alternative code.

R Variables
Creating Variables in R
Variables are containers for storing data values.

Page 8
R does not have a command for declaring a variable. A variable is created the moment you first assign
a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable value,
just type the
variable name:
name <- "John" age <- 40
name
# output "John" age # output 40
From the example above, name and age are variables, while "John" and 40 are values.
In other programming language, it is common to use = as an assignment operator. In R, we can use
both = and <- as assignment operators.
However, <- is preferred in most cases because the = operator can be forbidden in some context in R.
Print / Output Variables
Compared to many other programming languages, you do not have to use a function to print/output
variables in R. You can just type the name of the variable:
Example
name <- "John Doe"
name # auto-print the value of the name variable
However, R does have a print() function available if you want to use it. This might be useful if you are
familiar with other programming languages, such as Python, which often use a print() function to
output variables.
Example
name <- "John Doe"
print(name) # print the value of the name variable 7
And there are times you must use the print() function to output code, for example when working with
for loops (which you will learn more about in a later chapter):
Example
for (x in 1:5) { print(x)
}
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Page 9
Multiple Variables
R allows you to assign the same value to multiple variables in one line:
Example
# Assign the same value to multiple variables in one line var1 <- var2 <- var3 <- "Orange"
# Print variable values var1
var2 var3

Variable Names
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume).

Rules for R variables are:


A variable name must start with a letter and can be a combination of letters, digits, period(.)and
underscore(_). If it starts with period(.), it cannot be followed by a digit.
A variable name cannot start with a number or underscore (_)
Variable names are case-sensitive (age, Age and AGE are three different variables)
Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)

# Legal variable names:


myvar<- "John" my_var<- "John"
myVar <- "John" 8 MYVAR <- "John"
myvar2 <- "John"
.myvar<- "John"
# Illegal variable names:
2myvar <- "John" my-var <- "John" my var <- "John"
_my_var<- "John" my_v@ar<- "John" TRUE <- "John"
Remember that variable names are case-sensitive!
Data Types
In programming, data type is an important concept.
Variables can store data of different types, and different types can do different things.
In R, variables do not need to be declared with any particular type, and can even change type after they
have been set:
Example
my_var<- 30 # my_var is type of
numeric my_var
Output:
[1] 30
my_var<- "Sally" # my_var is now of type character (aka string)
my_var
Output:
[1] "Sally"

Page 10
R has a variety of data types and object classes. Basic Data Types
Basic data types in R can be divided into the following types:
 numeric - (10.5, 55, 787)
 integer - (1L, 55L, 100L, where the letter "L" declares this as an integer)
 complex - (9 + 3i, where "i" is the imaginary part)
 character (a.k.a. string) - ("k", "R is exciting", "FALSE", "11.5")
 logical (a.k.a. boolean) - (TRUE or FALSE)

Use the class() function to check the data type of a variable:


Example
# numeric x <- 10.5
class(x)
Output:
[1] "numeric" 9

integer x <- 1000L


class(x)
Output:
[1] "integer" # complex
x <- 9i + 3
class(x)
Output:
[1] "complex"
# character/string x <- "R is exciting" class(x)
Output:
[1] "Character"
# logical/boolean x <- TRUE
class(x)
Output:
[1] ”logical”

Page 11
R Numbers
Numbers
There are three number types in R:
numeric
integer
complex

Variables of number types are created when you assign a value to them:
Example
x <- 10.5 # numeric y <- 10L # integer z <- 1i # complex
Output:
>x

[1] 10.5
> y [1] 10
>z

[1] 0+1i
Numeric
A numeric data type is the most common type in R, and contains any number with or without a decimal,
like: 10.5, 55, 787: 1 0

Example
x <- 10.5
y <- 55
# Print values of x and y x
y
Output:
>x

[1] 10.5
> y [1] 55

# Print the class name of x and y class(x)


class(y)
Output:
> class(x)

[1] "numeric"
> class(y)

Page 12
[1] "numeric"
Integer
Integers are numeric data without decimals. This is used when you are certain that you will never create
a variable that should contain decimals. To create an integer variable, you must use the letter L after
the integer value:
Example
x <- 1000L y <- 55L
# Print values of x and y x
y
Output:
>x

[1] 1000
> y [1] 55

# Print the class name of x and y class(x)


class(y)
Output: 1 1
> class(x)

[1] "integer"
> class(y)

[1] "integer"
Complex
A complex number is written with an "i" as the imaginary part:
Example
x <- 3+5i
y <- 5i
# Print values of x and y x
y
Output:
>x
[1] 3+5i
>y
[1] 0+5i
# Print the class name of x and y class(x)
class(y)
Output:
>class(x)
[1] "complex"

Page 13
> class(y)
[1] "complex"

Type Conversion
You can convert from one type to another with the following functions:
as.numeric()
as.integer()
as.complex()

Example
x <- 1L # integer y <- 2 # numeric
# convert from integer to numeric: a <- as.numeric(x)
# convert from numeric to integer: b <- as.integer(y)
# print values of x and y x 1 2

y
# print the class name of a and b class(a)
class(b)
Output:
print values of x and y
>x

[1] 1
>y

[1] 2
# print the class name of a and b
> class(a)

[1] "numeric"
> class(b)

[1] "integer" 10
OPERATORS
R supports majorly four kinds of binary operators between a set of operands. In this article, we will
see various types of operators in R Programming language and their usage.
Types of the operator in R language
● Arithmetic Operators
● Logical Operators
● Relational Operators

Page 14
Arithmetic Operators

Operator Syntax Description Example (a = 10, b = 3)

+ a+b Adds two numbers 10 + 3

- a-b Subtracts right operand from left 10 - 3

* a*b Multiplies two numbers 10 * 3

/ a/b Divides left operand by right 10 / 3

Returns remainder of division


%% a %% b 10 %% 3
(modulus)

%/% a %/% b Integer (floor) division 10 %/% 3

^ or ** a^b Raises a to the power b 10 ^ 3

Page 15
Logical Operators:-

Operator Name Description Example Output


Returns TRUE if both
Element- c(TRUE, FALSE) &
& elements are TRUE TRUE FALSE
wise AND c(TRUE, TRUE)
(checks each element)
Returns TRUE if either
c(TRUE, FALSE) |
` ` Element-wise OR element is TRUE (element-
c(FALSE, TRUE)
wise)
Reverses the logical state
! NOT (TRUE → FALSE, !TRUE FALSE
FALSE → TRUE)
Short-circuit Checks only the first
&& TRUE && FALSE FALSE
AND element of each operand
Checks only the
` ` Short-circuit OR first element of
each operand
Returns TRUE if only one
Exclusive
xor() of the two is TRUE, not xor(TRUE, FALSE) TRUE
OR
both
Returns TRUE only if the
Is TRUE
isTRUE() value is exactly TRUE (not isTRUE(TRUE) TRUE
check
just truthy)
Exact Returns TRUE if both
identical() identical(TRUE, TRUE) TRUE
equality test values are exactly equal
Returns TRUE if at least
Any TRUE
any() one element in vector is any(c(FALSE, TRUE)) TRUE
in vector
TRUE
Returns TRUE if all
All TRUE in
all() elements in vector are all(c(TRUE, TRUE)) TRUE
vector
TRUE

Page 16
Relational Operators:-
Operator Syntax Description Example (a = 10, b = 3) Output
== a == b Checks if two values are equal 10 == 3 FALSE
!= a != b Checks if two values are not equal 10 != 3 TRUE
> a>b Checks if left value is greater 10 > 3 TRUE
< a<b Checks if left value is smaller 10 < 3 FALSE
>= a >= b Greater than or equal to 10 >= 10 TRUE
<= a <= b Less than or equal to 10 <= 3 FALSE

Set A

1. Program to perform basic arithmetic operations (+, -, *, /, %%, ^)


2. Program to calculate the area of a circle
3. Program to swap two numbers without using a temporary variable
4. Program to demonstrate basic logical operations (&, |, !)

Set B

1. Program to check eligibility to vote (age check)


2. Program to test if a number is both even and positive
3. Program to check whether a year is a leap year
4. Program to check multiple conditions using if...else if...else

Set C

1. Program to check if the sum of two numbers is even


2. Program to calculate grade based on marks and pass/fail logic
3. Program to find the largest of three numbers
4. Program to calculate the total, average, and check pass/fail for 3 subjects

Assignment Evaluation

0: Not Done [ ] 1: Incomplete [ ] 2: Late Complete [ ]

3: Needs Improvement [ ] 4: Complete [ ] 5: WellDone [ ]

Page 17
Assignment 2: List, Vectors, Data Frame

a) create a list.
Lists are the R objects which contain elements of different types like − numbers, strings, vectors and
another list inside it. A list can also contain a matrix or a function as its elements. List is created using
list() function.
Creating a List
Following is an example to create a list containing strings, numbers, vectors and a logical values.
# Create a list containing strings, numbers, vectors and logical values list_data <-
list("Red","Green",c(21,32,11), TRUE, 51.23, 119.1) print(list_data)
Output:
print(list_data) [[1]]
[1] "Red" [[2]]
[1] "Green" [[3]]
[1] 21 32 11
[[4]]
[1] TRUE [[5]]

[1] 51.23
[[6]]
[1] 119.1
3(b) Implement R Script to access elements in the list.
Giving a name to list elements
There are only three steps to print the list data corresponding to the name:
1. Creating a list.
2. Assign a name to the list elements with the help of names() functi
3. Print the list data.

Example: 1
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Show the list.
print(list_data) 15
Accessing List Elements
 Elements of the list can be accessed by the index of the element in the list. In case of named lists it
can also be accessed using the names.

Page 18
Example:2
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2), list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Access the first element of the list.
print(list_data[1])
# Access the thrid element. As it is also a list, all its elements will be printed.
print(list_data[3])
# Access the list element using the name of the element.
print(list_data$A_Matrix)
Output:
print(list_data[1])
$`1st Quarter`
[1] "Jan" "Feb" "Mar"
# Access the third element. As it is also a list, all its elements will be printed. print(list_data[3])
$`A Inner list`
$`A Inner list`[[1]]
[1] "green" 16
A Inner list`[[2]] [1] 12.3
# Access the list element using the name of the element. print(list_data$A_Matrix)
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8

Merging Lists
You can merge many lists into one list by placing all the lists inside one list() function.
# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue") # Merge the two lists. merged.list <- c(list1,list2)
# Print the merged list. print(merged.list) Output:
print(merged.list) [[1]]
[1] 1 [[2]]
[1] 2 [[3]]
[1] 3 [[4]]
[1] "Sun" [[5]]
[1] "Mon" [[6]]
[1] "Tue" 17

Page 19
Vectors
A vector is an ordered collection of basic data types of a given length. The only key thing here is all
the elements of a vector must be of the identical data type e.g homogeneous data structures. Vectors
are one-dimensional data structures.
e.g. > X = c(1, 3, 5, 7, 8)
>X
[1] 1 3 5 7 8
> length(X)
[1] 5
> class(X)
[1] "numeric"

Operation Syntax Description Example Output


Combine values into a
Create a vector v <- c(1, 2, 3) v→123
vector
Access element v[2] Access 2nd element 2
Modify element v[2] <- 5 Change 2nd element to 5 v→153
Length length(v) Get number of elements 3
Add elements v <- c(v, 4) Append element to vector 1234
Remove element v <- v[-2] Remove 2nd element 13
Arithmetic ops v * 2, v + 3, v^2 Element-wise operations e.g., 2 4 6
Keep values greater than
Logical filter v[v > 2] e.g., 3
2
Sum / Mean sum(v), mean(v) Aggregate functions e.g., sum = 6
Sort values in ascending
Sort sort(v) e.g., 1 2 3
order
Reverse the order of
Reverse rev(v) e.g., 3 2 1
elements
Sequence 1:5 or seq(1, 5, by=2) Generate sequences 1 2 3 4 5 or 1 3 5
Repetition rep(1:2, times=2) Repeat elements 1212
Set operations union(v1, v2)intersect() Set union, intersection e.g., 1 2 3 4
Vector e.g., TRUE FALSE
v1 == v2, v1 > v2 Element-wise comparison
comparison TRUE
names(v) <- c("a", "b",
Named elements Assign names to elements Access: v["a"]
"c")

Page 20
c. Data frames
Data frames are generic data objects of R which are used to store the tabular data. Data frames are the
foremost popular data objects in R programming because we are comfortable in seeing the data within
the tabular form. They are two-dimensional, heterogeneous data structures. These are lists of vectors
of equal lengths.
Data frames have the following constraints placed upon them:
 A data-frame must have column names and every row should have a unique name.
 Each column must have the identical number of items.
 Each item in a single column must be of the same data type.
 Different columns may have different data types.

To create a data frame we use the data.frame() function.


e.g
. Name = c("Amiya", "Raj", "Asish")
Language = c("R", "Python", "Java")
Age = c(22, 25, 45)
df = data.frame(Name, Language, Age)
print(df)

Example Output /
Operation Syntax Description
Result
Create a data df <- data.frame(name = Create a simple data
2 rows, 2 columns
frame c("A", "B"), age = c(25, 30)) frame
Display structure of data.frame: 2 obs. of 2
View structure str(df)
data frame variables
Number of Get number of rows
nrow(df) / ncol(df) 2/2
rows/columns and columns
Get or set column
Column names names(df) name, age
names
df$name or df[["name"]] or
Access column Access a column "A", "B"
df[, "name"]
Access element in
Access cell (i,j) df[1, 2] 25
row 1, column 2
Access row df[1, ] Access full first row name = "A", age = 25
Subset by Filter rows where
df[df$age > 25, ] Rows where age > 25
condition condition is TRUE
Add a new
df$gender <- c("F", "M") Adds a new column Adds gender column
column

Page 21
Example Output /
Operation Syntax Description
Result
Only name and gender
Remove column df$age <- NULL Deletes age column
remain
Rename columns

Set A
1. Create a numeric vector and print it.
2. Create a data frame from vectors.
3. Access elements in vectors and data frames.
4. Access list elements by name.
5. Modify elements in vectors and data frames.

Set B

1. Filter elements in a vector based on condition.


2. Subset rows of a data frame based on a condition.
3. Add and remove columns in a data frame.
4. Access and modify nested list elements.
5. Apply functions like mean or sum to vector and data frame columns.
6. Combine two data frames using rbind and cbind.
7. Access list elements that contain data frames and perform operations on them

Set C
1. Perform set operations (union, intersect) on vectors extracted from data frames or lists.
2. Sort and merge multiple data frames and extract subsets.
3. Create nested lists containing vectors and data frames, then access deep elements

Assignment Evaluation

0: Not Done [ ] 1: Incomplete [ ] 2: Late Complete [ ]

3: Needs Improvement [ ] 4: Complete [ ] 5: WellDone [ ]

Page 22
Assignment 3: Matrix String and Factors

R Matrix
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the help of
the vector input to the matrix function. On R matrices, we can perform addition, subtraction,
multiplication, and division operation.
In the R matrix, elements are arranged in a fixed number of rows and columns. The matrix elements
are the real numbers.
A Matrix is created using the matrix() function.
Syntax
matrix(data, nrow, ncol, byrow, dimnames) Following is the description of the parameters used −
data is the input vector which becomes the data elements of the matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
dimname is the names assigned to the rows and columns.

Example

#Arranging elements sequentially by row.


P <- matrix(c(5:16), nrow = 4, byrow = TRUE) print(P)
# Arranging elements sequentially by column. Q <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(Q)
# Defining the column and row names.
row_names = c("row1", "row2", "row3", "row4")
col_names = c("col1", "col2", "col3")
R <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(row_names, col_names)) print(R)
Output:
print(P)

Page 23
Factors

Factors are the R-objects which are created using a vector. It stores the vector along with the distinct
values of the elements in the vector as labels. The labels are always character irrespective of whether
it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.
Factors are created using the factor() function. The nlevels functions gives the count of levels.
# Create a vector.
apple_colors<- c('green','green','yellow','red','red','red','green')
# Create a factor object.
factor_apple<- factor(apple_colors)
# Print the factor.
print(factor_apple)
print(nlevels(factor_apple))
[1] green green yellow red redred green
Levels: green red yello

Operation R Code Example Explanation Output / Result


f <- factor(c("low", Levels: low <
Create factor Create a factor variable
"medium", "high", "low")) medium < high
Shows levels of the "high" "low"
Print factor levels levels(f)
factor "medium"
low=2,
Frequency count of
Count occurrences table(f) medium=1,
each factor level
high=1
Combine two f2 <- factor(c("medium", Levels: high < low
Another factor variable
factors "high", "low")) < medium
All unique levels "high" "low"
Union of factors union(levels(f), levels(f2))
combined "medium"
Element-wise TRUE FALSE
Compare factors f == f2
comparison FALSE TRUE
Convert factor to Shows underlying
as.integer(f) 2312
integer integer codes of factor
Add factor counts Summarizes factor low: 2, medium: 1,
summary(f)
(summary) counts high: 1

Page 24
String

In R programming, a string refers to a sequence of characters enclosed within either single (') or double
(") quotation marks. These strings are stored as elements within character vectors.
Here's a breakdown of strings

Output /
Operation R Code Example Explanation
Result
Create a string str1 <- "Hello, R!" Assign a string to a variable "Hello, R!"
String length nchar(str1) Get number of characters 9
paste("Hello",
Concatenate strings Join strings with a space "Hello World"
"World")
Concatenate without paste0("Hello",
Join strings without space "HelloWorld"
space "World")
Convert to uppercase toupper(str1) Convert string to uppercase "HELLO, R!"
Convert to lowercase tolower(str1) Convert string to lowercase "hello, r!"
Extract substring from position
Extract substring substr(str1, 1, 5) "Hello"
1 to 5
Check if "R" is in string
Find pattern in string grepl("R", str1) TRUE
(TRUE/FALSE)
sub("R", "World", "Hello,
Replace substring Replace first occurrence
str1) World!"
Replace all
gsub("l", "L", str1) Replace all occurrences "HeLLo, R!"
occurrences

Set A
1. Create a 3x3 matrix with numbers from 1 to 9.
2. Access the element in the 2nd row and 3rd column of a matrix.
3. Add two matrices of the same dimensions.
4. Create a factor variable from a character vector.
5. Create a string variable and find its length.

Page 25
Set B

1. Calculate the determinant of a square matrix.


2. Check the levels of a factor and reorder the levels.
3. Convert a numeric vector into an ordered factor.
4. Extract a substring from a string.
5. Search for a pattern within a string.
6. Replace all occurrences of a substring in a string

Set C

1. Merge two factors with different levels.


2. Use factors in a data frame and summarize data by factor levels.
3. Use regular expressions (regex) to extract all email addresses from a text string.

Assignment Evaluation

0: Not Done [ ] 1: Incomplete [ ] 2: Late Complete [ ]

3: Needs Improvement [ ] 4: Complete [ ] 5: WellDone [ ]

Practical In-charge

Page 26
Assignment4: Data Pre-processing

• Data pre-processing consists of a series of steps to transform raw data derived from data
• Data Preprocessing can be defined as a process of converting raw data into a format that is
understandable and usable for further analysis. It is an important step in the Data Preparation stage.
• It ensures that the outcome of the analysis is accurate, complete, and consistent.

Introduction: Data Munging / Wrangling Operations


 Data munging or wrangling refers to preparing data for a dedicated purpose - taking the data from its
raw state and transforming and mapping into another format, normally for use beyond its original intent
and can be used for it more appropriate and valuable for a variety of downstream purposes such as
analytics.
 Data munging process includes operations such as Cleaning Data, Data Transformation, Data
Reduction and Data Discretization

Steps for Data Preprocessing:


1. Importing the libraries
2. Importing the Dataset
3. Handling of Missing Data
4. Data Transformation techniques

Creating own dataset


If data is not available for a particular project, you may extract or collect data on your own and create
your own dataset to work with. The Self activity exercises show some examples of creating own
datasets.
Data in datasets can be of the following types:
1. Numerical - quantitative data
a. Continuous - Any value within a range ex. Temperature
b. discrete - exact and distinct values ex. Number of students enrolled
2. Categorical - Qualitative data
a. Nominal - Naming or labeling variables without order ex. Country name
b. Ordinal - labels that are ordered or ranked in some particular way ex. Exam Grades
# Install libraries (only once)
install.packages("tidyverse") # For data manipulation and visualization
install.packages("data.table") # For fast data processing
install.packages("caret") # For machine learning preprocessing
install.packages("janitor") # For cleaning column names

Page 27
🔢 📖
🛠 Operation Example Code Output/Result
Step Description
Display the Shows all rows of
1 View Data head(data)
first few rows the sample data
Check for Count NA
age: 2, gender: 1,
2 Missing (NA) values in each colSums(is.na(data))
score: 1
Values column
View rows Displays only rows
Find Rows
3 with any data[!complete.cases(data), ] with missing
with NA
missing value values
Replace NA in data$age[is.na(data$age)] <- NA values in age
Fill NA in age
Numeric mean(data$age, na.rm = and score are
4 and score
Column with TRUE)data$score[is.na(data$score)] <- replaced with
with mean
Mean mean(data$score, na.rm = TRUE) respective means
Replace NA in Fill NA in NA in gender
data$gender[is.na(data$gender)] <-
5 Categorical gender with becomes
"Unknown"
Column "Unknown" "Unknown"
Convert Convert
Enables statistical
6 Categorical to gender to data$gender <- factor(data$gender)
modeling
Factor factor
Standardize
Scale score to Standardized
7 Numeric data$score <- scale(data$score)
z-score (mean=0, sd=1)
Column
Create
Add New pass/fail data$pass <- ifelse(data$score > 0, "Pass",
8 Adds new column
Column based on "Fail")
score
Drop a Remove name
9 data <- subset(data, select = -name) name is dropped
Column column
View Cleaned View final Final preprocessed
10 print(data)
Data cleaned data dataset

Page 28
 read.csv() – Read CSV file

 read.table() – Read tabular data

 read_excel() – Read Excel file (readxl package)

 str() – Check structure of data

 summary() – Get summary statistics

 head() / tail() – View beginning/end of data

 names() / colnames() – View or set column names

 dim() / nrow() / ncol() – Get dimensions of data

 is.na() – Check for missing values

 na.omit() – Remove rows with missing values

 complete.cases() – Filter complete rows

 replace() – Replace values in data

 mutate() – Create or modify columns (dplyr package)

 filter() – Subset rows (dplyr package)

 select() – Subset columns (dplyr package)

 rename() – Rename columns (dplyr package)

 arrange() – Sort data (dplyr package)

 scale() – Standardize or normalize data

 as.factor() – Convert to factor

 as.numeric() – Convert to numeric

Page 29
 as.character() – Convert to character

 cut() – Bin continuous data

 gsub() / sub() – Replace strings

 tolower() / toupper() – Change text case

 unique() – Find unique values

 duplicated() – Find duplicate rows

 table() – Frequency table

 merge() – Merge datasets

 cbind() / rbind() – Combine datasets

 library(dplyr) – Load dplyr for data manipulation

Page 30
Set A

Write a R program to perform following task


1. Create Dataset and do the followings:

a) Describing the dataset


b) Shape of the dataset
c) Display first 3 rows from dataset
2. Handling Missing Value:
a) Replace missing value of salary, age column with mean of that column.
3.Write a program to create or modify a column using mutate().
4.Write a program to arrange/sort data in ascending or descending order.
5.Write a program to rename columns using rename().

Set B
Import standard dataset and Use Transformation Techniques
Dataset Name: winequality-red.csv
Dataset Link: http://archive.ics.uci.edu/ml/machine-learning-databases/wine-
quality/winequality-red.csv
Write a R program to perform following task
1. To display shape of dataset.
2. Display top rows and columns of dataset
3. Display no of columns and name of columns

Assignment Evaluation

0: Not Done [ ] 1: Incomplete [ ] 2: Late Complete [ ]

3: Needs Improvement [ ] 4: Complete [ ] 5: WellDone [ ]

Practical In-charge

Page 31
Assignment 5: Data Visualization

1. Visualize the data using Bar chart and box plot:-


A bar chart is a pictorial representation of data that presents categorical data with rectangular bars with
heights or lengths proportional to the values that they represent. In other words, it is the pictorial
representation of dataset. These data sets contain the numerical values of variables that represent the
length or height.
R uses the function barplot() to create bar charts. Here, both vertical and Horizontal bars can be drawn.
Syntax: barplot(H, xlab, ylab, main, names.arg,
col)
Parameters:
• H: This parameter is a vector or matrix containing numeric values which are used in bar chart.
• xlab: This parameter is the label for x axis in bar chart.
• ylab: This parameter is the label for y axis in bar chart.
• main: This parameter is the title of the bar chart.
• names.arg: This parameter is a vector of names appearing under each bar in bar chart.  col:
This parameter is used to give colors to the bars in the graph.

2. Creating a Simple Bar Chart

1. A vector (H <- c(Values…)) is taken which contain numeral values to


be used.
2. This vector H is plot using barplot(). Example:

# Create the data for the chart


A <- c(17, 32, 8, 53, 1)

Plot the bar chart barplot(A, xlab = "X-axis", ylab = "Y-axis", main

="Bar-Chart") Output:

Page 32
Horizontal Bar Chart:- Creating a

Horizontal Bar Chart

Approach: To create a horizontal bar chart:


1. Take all parameters which are required to make simple bar chart.
2. Now to make it horizontal new parameter is added. barplot(A, horiz=TRUE )
Example: Creating a horizontal bar chart

# Create the data for the chart

A <- c(17, 32, 8, 53, 1)

# Plot the bar chart barplot(A, horiz = TRUE,

xlab = "X-axis", ylab = "Y-axis", main

="Bar-Chart") Output:

Page 33
Adding Label, Title and Color in the BarChart

Label, title and colors are some properties in the bar chart which can be added to the bar by adding and
passing an argument.
Approach:
1. To add the title in bar chart.
barplot( A, main = title_name )
2. X-axis and Y-axis can be labeled in bar chart. To add the label in bar chart. barplot( A, xlab=
x_label_name, ylab= y_label_name)
3. To add the color in bar chart.
barplot( A, col=color_name) Example :

# Create the data for the chart

A <- c(17, 2, 8, 13, 1, 22)

B <- c("Jan", "feb", "Mar", "Apr", "May", "Jun")

# Plot the bar chart barplot(A, names.arg = B,

xlab ="Month", ylab ="Articles", col

Page 34
="green", main ="GeeksforGeeks-Article

chart") Output:

3. Histograms in R language:-

A histogram contains a rectangular area to display the statistical information which is proportional to
the frequency of a variable and its width in successive numerical intervals. A graphical representation
that manages a group of data points into different specified ranges. It has a special feature which
shows no gaps between the bars and is similar to a vertical bar graph.We can create histogram in R
Programming Language using hist() function.

Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)


Parameters:
• v: This parameter contains numerical values used in histogram.
• main: This parameter main is the title of the chart.
• col: This parameter is used to set color of the bars.
• xlab: This parameter is the label for horizontal axis.
• border: This parameter is used to set border color of each bar.
• xlim: This parameter is used for plotting values of x-axis.
• ylim: This parameter is used for plotting values of y-axis.  breaks: This parameter is used as
width of each bar.

Page 35
4. Creating a simple Histogram in R

Creating a simple histogram chart by using the above parameter. This vector v is plot using hist().
Example:

# Create the histogram.

hist(v, xlab = "No.of Articles ",


col = "green", border = "black") Output:

Range of X and Y values

To describe the range of values we need to do the following steps:


1. We can use the xlim and ylim parameter in X-axis and Y-axis.
2. Take all parameters which are required to make histogram chart.
Example
# Create data for the graph.

Page 36
v <- c(19, 23, 11, 5, 16, 21, 32, 14, 19, 27, 39) #

Create the histogram.

hist(v, xlab = "No.of Articles", col = "green",

border = "black", xlim = c(0, 50), ylim = c(0,

5), breaks = 5)

Output:

Page 37
Using histogram return values for labels using text() To
create a histogram return value chart.
 R
# Creating data for the graph. v <- c(19,

23, 11, 5, 16, 21, 32, 14, 19,

27, 39, 120, 40, 70, 90)

# Creating the histogram.

m<-hist(v, xlab = "Weight", ylab ="Frequency",

col = "darkmagenta", border = "pink", breaks =

5)

# Setting labels text(m$mids, m$counts, labels

= m$counts,

adj = c(0.5, -0.5))

Output:

Page 38
Set A

1. Write an R program to create a histogram displaying the distribution of daily temperatures


in a city over one year. Adjust the number of bins and add a density curve overlay

2. Write an R program to create a grouped bar chart comparing monthly sales of two products
over six months. Include proper axis labels, a legend, and different colors for each product..

2. Write an R program to create a 3D pie chart showing the market share percentage of five
smartphone brands. Label each slice with the brand name and percentage value.

Set B

1.Write an R program to create a scatter plot of students’ scores in Mathematics vs. Science.
Add a regression line and customize point shapes and colors based on gender.
2.ite an R program to create boxplots comparing the distribution of test scores among three
different classes. Use different colors for each class and add meaningful axis titles.

Assignment Evaluation

0: Not Done [ ] 1: Incomplete [ ] 2: Late Complete [ ]

3: Needs Improvement [ ] 4: Complete [ ] 5: WellDone [ ]

Practical In-charge

Page 39
Assignment 6: Classifications/Association Rule

What is Classification?

Classification is a supervised machine learning technique used to predict the categorical class labels of
new observations based on past data. It involves training a model on a labeled dataset (where the class
labels are known) and then using that model to assign labels to new data points

Association Rule Mining?

Association rule mining is an unsupervised learning method used to discover interesting relationships
(rules) between variables in large datasets, typically transactional data. The goal is to identify rules like:

If a customer buys bread and butter, then they are likely to buy milk.

 Itemset: A set of items in a transaction.

 Support: How frequently the itemset appears in the dataset.

 Confidence: How often the rule has been found to be true.

 Lift: How much more likely the consequent is given the antecedent, compared to random chance.

Concept Purpose R Package Key Function(s)


Classification Predict categorical outcomes rpart rpart(), predict()

Association Rules Discover interesting itemset relationships arules apriori(), inspect()

Library Purpose
Main package for mining frequent itemsets and association rules using Apriori and Eclat
arules
algorithms.
arulesViz Visualization of itemsets and association rules (graphs, scatterplots, matrix, etc.).
data.table Efficient data manipulation (often used with arules).
dplyr Data manipulation (helpful in preprocessing transaction data).

Page 40
Set A
1.Apply the apriori algorithm on the above dataset to generate the frequent itemsets and association
rules. Repeat the process with different min_sup values.

2.Create your own transactions dataset and apply the above process on your dataset

SET B:

1. Download the Market basket dataset.

Write a R program to read the dataset and display its information. Preprocess the data (drop null values
etc.)
Convert the categorical values into numeric format.
Apply the apriori algorithm on the above dataset to generate the frequent itemsets and association rules.
2. Download the groceries dataset.

SET C:
Write a Rcode to implement the apriori algorithm. Test the code on any standard dataset

Assignment Evaluation

0: Not Done [ ] 1: Incomplete [ ] 2: Late Complete [ ]

3: Needs Improvement [ ] 4: Complete [ ] 5: WellDone [ ]

Practical In-Charge

Page 41
Page 42

You might also like