DAR
5. Advanced R programming
5.1 Mean, Mode, Median
Mean:
It is calculated by taking the sum of the values and dividing with the number of values in a data
series.
The function mean() is used to calculate this in R.
Syntax : mean(x, trim = 0, na.rm = FALSE, ...)
Following is the description of the parameters used −
x is the input vector.
trim is used to drop some observations from both end of the sorted vector.
na.rm is used to remove the missing values from the input vector.
v1<-c(1,4,5,7,6)
mean(v1)
[1] 4.6
Applying NA option:
v1<-c(4,5,2,6,8,NA)
mean(v1,na.rm = TRUE)
[1] 5
Median:
The middle most value in a data series is called the median.
The median() function is used in R to calculate this value.
Syntax: median(x, na.rm = FALSE)
Following is the description of the parameters used −
x is the input vector.
na.rm is used to remove the missing values from the input vector.
-Median for even no. length of vector
v<-c(1,2,5,3,6,8 ) #middle 2 values are added and then divided by 2
median(v)
[1] 4
-Median for odd no. length of vector
v<-c(1,2,5,3,6) #first vector will be arranged in ascen. order then the middle number will be selected
median(v)
[1] 3
Mode:
The mode is the value that has highest number of occurrences in a set of data. Unike mean and
median, mode can have both numeric and character data.
R does not have a standard in-built function to calculate mode.
v1<-c(1,2,3,3,5,4,5)
table(v1)
v1
12345
11212
tbl<-table(v1)
tbl
v1
12345
11212
names(tbl) [which(tbl==max(tbl))]
[1] "3" "5"
Another method to find mode using predefined function.
First import following things:
- install.packages(“modeest”)
- library(modeest)
v1<-c(1,2,3,3,5,4,5)
m1<-mfv(v1)
m1
[1]3 5
Using mean and median with another R data structure
classA #classA is List
$rno
[1] 1 2 3 4
$name
[1] "A" "B" "C" "D"
$marks
[1] 30 10 40 60
$city
[1] "M" "P" "N" "A"
mean(classA$marks)
[1] 35
median(classA$marks)
[1] 35
5.2 Regression in R
Regression is a Supervised type of machine learning technique.
It is widely used statistical tool to establish relationship between two variables.
Value of output variable Y is predicted based on one or more input variables X.
Output variable is called response variable and dependent variable
Input variable is called predicator variable and independent variable.
It establishes linear relationship between predicator and response variable and this mathematical
formula can be used to predict/estimate the value of response variable when only predicators are
known.
There are two types of regression techniques:
1. Linear Regression
2. Multiple Regression
1. Linear Regression:
Here only one predicator variable is available.
Formula for linear regression: Y= β1+β2X+ Є
-Y is response variable
-X is predicator variable
-β1 is the intercept, which represents the expected or predicted value of Y when X is equal to
zero.
-β2 is the slope representing the change in Y for a one-unit change in X.
-Є is error term or residual, which accounts for the variability not explained by the model.
In R, “lm” command is used to calculate linear regression.
Syntax: lm[(response variable) ~ (predicator variable), data= data_source]
predictheight= lm(height~age, data=heightage)
Here lm function is establishing linear relationship between age and height.
2. Multiple Regression:
Here only more than one predicator variables are available.
In R, “lm” command is used to calculate multiple regression. E.g.:
gross_sal= lm(gross~basic+hra+ta, data=salary)
Here lm function is establishing linear relationship between between gross and
(basic+hra+ta basic+hra+ta)
predict() function is used to predict the value of response variable.
x= data.frame(age=c(10,12,13))
print(predict(prdictheight, x))
Linear Regression Example:
First create one dummy data frame to train the data:
demo
age height
1 1 75.0
2 2 84.5
3 3 93.9
4 4 101.6
5 5 108.4
6 6 114.6
7 7 120.6
8 8 126.4
9 9 132.2
10 10 138.3
11 11 142.0
12 12 148.0
13 13 150.0
Then apply the mathematical formula of regression on that dataframe variables to calculate
the regression:
predictheight<-lm(height~age,data=demo)
predictheight
Call:
lm(formula = height ~ age, data = demo)
Coefficients:
(Intercept) age
74.677 6.205
And now give the input data in the form of data frame or any other data structure and predict
the value of output variable using the predict( )function and the input variable provided by
us
x<-data.frame(age=c(3,17,18))
predict(predictheight,x)
1 2 3
93.29341 180.17033 186.37582
Multiple Regression Example:
multiple
basic hra da gross
1 5000 1000 4000 10000
2 7000 1400 5600 14000
3 10000 2000 8000 20000
4 12000 3000 9600 24600
5 15000 3000 13500 31500
6 17000 3400 15300 35700
7 19000 3800 17100 39900
8 20000 4000 18000 42000
9 22000 6600 20900 49500
10 25000 7500 23750 56250
11 28000 8400 26600 63000
12 30000 9000 28500 67500
salary<-lm(gross~basic+hra+da,data=multiple)
salary
Call:
lm(formula = gross ~ basic + hra + da, data = multiple)
Coefficients:
(Intercept) basic hra da
1.68e-11 1.00e+00 1.00e+00 1.00e+00
sal<-data.frame(basic=26000,hra=8000,da=24000)
print(predict(salary,sal))
1
58000
5.2 Object oriented programming in R
OOP is a programming pattern/model that is based on the concept of objects and classes.
Object is an instance of a class. It can be any real world entity.
Class is a template from which objects are created. It defines the behaviour of objects by using
attributes and functions.
Classes can be organised in hierarchy using parent child relationship.
R has 3 object oriented oriented systems: S3, S4 and Reference Reference class.
All these systems differ in a way they define classes, methods and objects.
In R, instead of methods are nothing but the generic functions that can be applied to any type of
input and produce results based on the input type.
Primary objective of OOP in R is for Print, Summary and Plot methods.
In R, there are two most important classes, i.e., S3 and S4, which play an important role in
performing OOPs concepts.
5.3.1 S3 and S4 Classes
S3 Class:
S3 class does not have a predefined/formal definition.
It is used to overload any function.
Creating classes and objects:
Syntax to create S3 class:
s1<-list(rno=1, sname= “John”,city=“Mumbai”) #create a list
class(s1)= “COstudent” # assign class to the list
Syntax to create print method for S3 class:
print.COstudent <- function(x) {
cat(“student roll no:”, x$rno,“\n”)
cat(“student name:”, x$sname,“\n”)
cat(“student City:”, x$city,“\n”)
}
print(s1)
Function to create objects of class:
createobj=function(sno,snm,city)
{
obj=list(rno=sno,sname=snm,city=city)
attr(obj,"class")=“COstudent”
obj
}
Create new object of class using function:
s2=createobj(2,"jacky”,”Mumbai”)
Add new attributes in a class:
attr(s2,”phone”)<-c(1234567890)
Implementing object oriented feature like Inheritence.
S3 Classs Inheritance
• When new S3 class is derived from old S3 class, it inherits only methods/functions of old class.
• Data members/attributes of objects are not inherited by new class.
Create a new list:
s3=list(rno=3,sname="max”,city=“Pune”)
class(s3)=c(“ITstudent",“COstudent") #inheriting ITstudent class from COstudent class
• Class ITstudent can use all the methods/functions written for COstudent class.
• Following is the new definition of print method written for ITstudent class:
print.ITstudent=function(x)
{
(“Student", x$rno”,"is from", x$city)
}
S4 Class:
• S4 class has formal definition.
Creating classes and objects:
• Syntax to create a S4 class using setClass():
setClass(“book”,slots=list(bid=“numeric”, bname=“character”))
book class will get created with slots/members bid and bname.
• Create S4 object of class using new()
b1<-new(“book”, bid=1, bname=“cyber security”)
When we try to add more attributes in object than we define in the structure:
getSlots(“book”) #displays slots in a class with their class/ datatypes.
SlotNames(“book”)/SlotNames(b1) #displays slots in a class without class/datatypes.
• Change/set the value of slot:
slot(b1,”bid”)<-2 or “@”(B1,”bid”)<-2
• Create a method/function of S4 class:
setMethod (“print”, “book”, function(obj) # obj is the name of object
{
cat(“bookid:",obj@bid obj@bid,"\n")
cat(“bookname:",obj@bname,"\n")
})
Implementing object oriented feature like Inheritence.
S4 Classs Inheritance
• When new S4 class is derived from old S4 class, it not only inherits methods/functions of old class
but data members/attributes of objects are also inherited by new class.
• Create a new S4 class and inherit it from book class:
setClass(“author”,slots=list(authorid=“numeric”,authorname=“character),contains=“book”)
• With above R command a new S4 class “author” will get created with members authorid and
authorname.
• “contains” keyword is used to specify inheritance, i.e. class author is inherited from class book.
A1<-new(“author”, bid=1, bname=“cyber security”, authorid=01,authorname=“Anand Shinde”)
• Above R command is used to create object of class author, with all members inherited from base
class