BADSIS Lec 19-20 Sep 9 SR R Programming
BADSIS Lec 19-20 Sep 9 SR R Programming
Module 5
Lecture 19-20: Introduction to R Programming
2
History of R
https://www.researchgate.net/publication/360246719_The_R_Language_An_Engine_for_Bioinformatics_and_Data_Science/figures?lo=1&utm_source=google&utm_medium=organic
3
Why R?
Research & Academic Adoption – New statistical methods and models often appear in R
first, due to its dominance in academia.
https://www.geeksforgeeks.org/r-language/r-programming-language-introduction/
4
Features of R
Wide Packages – CRAN houses more than 10,000 different packages and
extensions that help solve all sorts of problems in data science.
5
Installation of R
Download Rstudio and R from: https://posit.co/download/rstudio-desktop/
6
Installation of R Studio
7
Getting to know R Studio
R - Studio
•RStudio is available in two editions:
• RStudio Desktop, where the program is run locally as a regular desktop application.
• RStudio Server, prepackaged distributions of RStudio Desktop are available for
Windows, OS X, and Linux.
8
Components of R Studio
9
Basics of R
Code: Output:
1 Data Types
2 Variables
3 Keywords
4 Operators
5 Data Structures
10
R Syntax Basics
Variables
Variables which like any other programming language are the name given to reserved
memory locations that can store any type of data.
In R, the assignment can be denoted in three ways:
• = (Simple Assignment)
• <- (Leftward Assignment)
• -> (Rightward Assignment)
https://www.scaler.com/topics/images/variables-in-r-thumbnail.webp
11
R Syntax Basics cont’d.
Comments in R
Keywords in R
https://i.sstatic.net/Dw6Ln.png https://cdn.discuss.boardinfinity.com/original/2X/a/a8973dae9446c9a77f12edcf2f3c7c2e74b4534f.png
12
Datatypes
We can use the class() function to check the data type of a variable:
Code: Output:
13
Variables and Keywords
Code: Output:
14
Operators
Arithmetic Operators Assignment Operators Comparison Operators Logical Operators Miscellaneous Operators
15
Operators cont’d.
Arithmetic Operators Output
16
Conditional Statements
If Statement
An "if statement" is written with the if keyword, and it is used to specify a block of code to be executed if
a condition is TRUE
Else If
The else if keyword is R's way of saying "if the previous conditions were not true, then try this
condition":
Else
The else keyword catches anything which isn't caught by the preceding conditions
17
Loops
The for loop is used when we The while loop runs as long as The repeat loop executes indefinitely
know the exact number of a specified condition until explicitly stopped using the
iterations required. It iterates over holds TRUE. It is useful when break statement. To terminate the
a sequence such as a vector, list or the number of iterations is repeat loop, we use a jump statement
numeric range. unknown beforehand. that is the break keyword.
Syntax Syntax Syntax
for (value in sequence) while ( condition ) repeat
{ { {
statement statement
statement
} if( condition )
} {
break
}}
https://www.geeksforgeeks.org/r-language/loops-in-r-for-while-repeat/
18
Loops cont’d.
19
Functions in R
A function is a block of code which only runs when it is called. We can pass data, known as parameters, into a
function.
• A function can return data as a result.
• To create a function, use the function() keyword
There are several ways we can pass the arguments to the
function:
20
Built-in functions
Built-in Function: Built-in functions in R are pre-defined functions that are available in R
programming languages to perform common tasks or operations.
Statistical
Numeric functions Character functions Others
Functions
• abs(x) – absolute • strsplit(x , split) – • mean(x , •seq(from , to , b
value split the elements trim=0,na.rm= FA y) - generate a
• sqrt(x) – square of character LSE ) – mean of sequence
root vector x object x •rep(x , ntimes) –
• log(x) – • toupper(x) – upper • scale(x , repeat x n times
logarithm case center= TRUE ,
• Round(x , digits • paste(..., sep="") – scale= TRUE ) –
= n) – rounding concatenates column center or
off strings after standardize a
using sep string to matrix.
separate them • dnorm(x) –
normal density
function (by
default m=0 sd=1)
21
User defined functions
• User-defined functions are the functions that are created by the user.
• The User defines the working, parameters, default parameter, etc. of that user-defined
function.
• They can be only used in that specific code.
Code:
Output:
22
Types of Data Structures in R
A data structure is a particular way of organizing data in a computer so that it can be used
effectively.
• R’s base data structures are often organized by their dimensionality (1D, 2D or nD)
• homogeneous (all elements must be of the identical type)
• heterogeneous (the elements are often of various types).
23
Vectors
• The only key thing here is all the elements of a vector must be of the identical data type,
e.g., homogeneous data structures.
Output
Code:
24
Lists
• Lists are heterogeneous data structures. These are also one-dimensional data structures.
• A list can be a list of vectors, list of matrices, a list of characters and a list of functions and so
on.
Code: Output:
25
Data Frames
Data frames are used to store the tabular data. They are two-dimensional, heterogeneous
data structures. These are lists of vectors of equal lengths.
Data frames have the following constraints placed upon them:
•A data-frame must have column names, and every row should have a unique name.
Code: Output:
26
Matrices
Code: Output:
27
Arrays
• Arrays are the R data objects which store the data in more than two dimensions.
• Arrays are n-dimensional data structures. An array of dimensions (2, 3, 3) creates 3
rectangular matrices each with 2 rows and 3 columns.
• They are homogeneous data structures.
Code: Output:
28
Factors
• Factors are the data objects which are used to categorize the data and store it as levels.
They are useful for storing categorical data.
• They can store both strings and integers.
• They are useful to categorize unique values in columns like (“TRUE” or “FALSE”) or
(“MALE” or “FEMALE”), etc..
• They are useful in data analysis for statistical modeling.
Code: Output:
29
Installing Packages
30
Reading excel files in R
To read excel files on R we use the readxl package.
The openxlsx package includes the ability to write to XLSX files
Another method for installing
packages in R
31
Reading excel files in R
Reading the excel file by skipping first two rows
32
Reading excel files in R
33
Reading csv files in R
• read.csv() function is used to read "comma separated value" files.
• It imports data in the form of a data frame.
• The read.csv() function also accepts a number of optional arguments that we can use to modify the import
procedure.
• We can choose to treat the first row as column names, select the delimiter character, and more.
34
Data Manipulation in R
https://www.geeksforgeeks.org/r-language/data-manipulation-in-r-with-dplyr-package/ 35
tidyr() in R
Original dataframe
36
tidyr() in R
• full_seq() function: It fills the missing •fill() function: Used to fill in the missing
values in a vector which should have values in selected columns using the previous
been observed but weren’t. entry.
37
tidyr() in R
•drop_na() function: This function drops •replace_na() function: It replaces missing
rows containing missing values. values.
# create a tibble df with missing values df <- data.frame(S.No = c(1:10), Name = c('John', 'Smith',
df <- tibble(S.No = c(1:10), Name = c('John', 'Smith’, 'Peter', 'Luke', 'King', rep(NA, 5)))
'Peter', 'Luke', 'King', rep(NA, 5))) df # Output (i)
# print df tibble # use replace_na() to replace missing values or na
df # Output (i) df %>% replace_na(list(Name = 'Henry')) # Output (ii)
# use drop_na() to drop columns
df %>% drop_na(Name) # Output (ii)
38
Descriptive statistics in R
Descriptive statistics are techniques used to summarize and find the key characteristics of a
dataset.
A dataset can be summarized in two ways: Mean
1. Quantitative Descriptive Statistics
Median
2. Graphical Descriptive Statistics
Mode
Quantitative
Analysis Variance
Standard Deviation
Quartiles
39
Quantitative Descriptive Statistics
Minimum value in Sepal Length column
40
Quantitative Descriptive Statistics cont’d.
Descriptive summary of
each specie of the
dataset
41
Graphical Descriptive Statistics
1. Histogram 2. Boxplot
• Visualize data distribution, check for skewness and • Compare distributions, detect outliers and analyze
outliers. spread.
42
Graphical Descriptive Statistics cont’d.
43
Graphical Descriptive Statistics cont’d.
44
ggplot2() in R
• ggplot2 is a popular data visualization package in the R programming language
• ggplot is a versatile R graphics library that allows for customization of graphics by adding layers.
• It simplifies creating ready-to-publish charts and includes themes for personalizing charts, allowing
for changes in colors, line types, typefaces, and alignment.
Scales scale_: set of values for each aesthetic mapping in the plot
45
ggplot2() in R
• Plotting a scatter plot using ggplot(). To add • Plotting a scatter plot using ggplot(). A
the geom layer, addition (+) operator is used colour parameter is applied to the
aesthetics. The colour is set to species.
ggplot(iris, aes(x=Sepal.Length, ggplot(iris, aes(x=Sepal.Length, y=Petal.Length,
y=Petal.Length))+geom_point() col=Species))+geom_point()
46
ggplot2() in R
• Plotting a scatter plot using ggplot(). • Plotting a scatter plot using ggplot(). A
Different species are denoted by different smoothened curve is made to pass through
shapes each set.
plot(iris, aes(x=Sepal.Length, y=Petal.Length, col=Species, ggplot(iris, aes(x=Sepal.Length, y=Petal.Length,
shape=Species))+geom_point() col=Species))+geom_point() +geom_smooth(se = FALSE)
47
ggplot2() in R
• Plotting a bar plot using ggplot(). • A scatter plot of horsepower vs mpg is made
and then stat summary to draw the mean.
ggplot(mtcars, aes(x = gear)) +geom_bar()
ggplot(mtcars, aes(hp, mpg)) + geom_point(color =
"red")+ stat_summary(fun.y = "mean", geom = "line",
linetype = "dotted")
48
ggplot2() in R
• Plotting a histogram using ggplot() with mpg • Plotting a boxplot using ggplot(). Cyl is a
as x axis categorical variable based on which 3
boxplots for each class is generated.
ggplot(mtcars,aes(x=mpg)) + geom_histogram() ggplot(mtcars, aes(x=as.factor(cyl), y=mpg)) +
geom_boxplot()
49
ggplot2() in R
• Plotting a boxplot with colours • Plotting a violin plot using ggplot().
50
ggplot2() in R
• Plotting a piechart using ggplot() with the 3 • 2D density contour plot between
classes of ‘cyl’ horsepower and mpg generated using
ggplot()
ggplot(mtcars, aes(x="", y=mpg, fill=cyl)) +
geom_bar(stat="identity", width=1) + coord_polar("y", start=0) ggplot(mtcars, aes(mpg, hp)) +
geom_density_2d_filled(show.legend = FALSE) +
coord_cartesian(expand = FALSE) + labs(x = "mpg")
51
ggplot2() in R
library(GGally)
ggpairs(mtcars,columns = 1:4,aes(color = cyl,
alpha = 0.5))
52
ggplot2() in R
53
Conclusions:
R provides a simple and consistent syntax for data manipulation, statistical analysis, and model
building.
Packages like ggplot2 and plotly enable powerful graphical visualizations, making data exploration
intuitive and visually appealing.
Data handling is seamless with tools like dplyr, tidyr, and data.table for preprocessing and
wrangling.
Machine learning workflows in R are supported through packages such as caret, mlr3, and
tidymodels.
R integrates statistical rigor with ML algorithms, offering regression, classification, clustering, and
ensemble methods.
Visualization and ML models can be combined for effective interpretation and decision-making.
R remains a versatile environment for both beginners and advanced users, bridging the gap
between data science, statistics, and AI/ML.
54
Thank You!
55