xtabs()…
xtabs( )
• The xtabs() function creates contingency
tables in frequency-weighted format.
• Use xtabs() when you want to numerically
study the distribution of one categorical
variable, or the relationship between two
categorical variables.
• Categorical variables are also called “factor”
variables in R.
Example
• xtabs() with One Categorical Variable
> Data1 <- data.frame(Reference = c("KRXH", "KRPT", "FHRA", "CZKK", "CQTN", "PZXW", "SZRZ",
"RMZE", "STNX", "TMDW"), Status = c("Accepted", "Accepted", "Rejected", "Accepted", "Rejected",
"Accepted", "Rejected", "Rejected", "Accepted", "Accepted"), Gender = c("Female", "Male", "Male",
"Female", "Female", "Female", "Male", "Female", "Female", "Female"), Test = c("Test1", "Test1", "Test2",
"Test3", "Test1", "Test4", "Test4", "Test2", "Test3", "Test1"), NewOrFollowUp = c("New", "New", "New",
"New", "New", "Follow-up", "New", "New", "New", "New"))
> Data1
Reference Status Gender Test NewOrFollowUp
1 KRXH Accepted Female Test1 New
2 KRPT Accepted Male Test1 New
3 FHRA Rejected Male Test2 New
4 CZKK Accepted Female Test3 New
5 CQTN Rejected Female Test1 New
6 PZXW Accepted Female Test4 Follow-up
7 SZRZ Rejected Male Test4 New
8 RMZE Rejected Female Test2 New
9 STNX Accepted Female Test3 New
10 TMDW Accepted Female Test1 New
> xtabs(~Status, data=Data1)
Status
Accepted Rejected
6 4
> xtabs(~Gender, data=Data1)
Gender
Female Male
7 3
> xtabs(~Test, data=Data1)
Test
Test1 Test2 Test3 Test4
4 2 2 2
Example
• xtabs() with Two Categorical Variable
> Data1
Reference Status Gender Test NewOrFollowUp
1 KRXH Accepted Female Test1 New
2 KRPT Accepted Male Test1 New
3 FHRA Rejected Male Test2 New
4 CZKK Accepted Female Test3 New
5 CQTN Rejected Female Test1 New
6 PZXW Accepted Female Test4 Follow-up
7 SZRZ Rejected Male Test4 New
8 RMZE Rejected Female Test2 New
9 STNX Accepted Female Test3 New
10 TMDW Accepted Female Test1 New > xtabs(~Reference + Gender, data=Data1)
Gender
> xtabs(~Reference + Status, data=Data1) Reference Female Male
Status CQTN 1 0
Reference Accepted Rejected CZKK 1 0
CQTN 0 1 FHRA 0 1
CZKK 1 0 KRPT 0 1
FHRA 0 1 KRXH 1 0
KRPT 1 0 PZXW 1 0
KRXH 1 0 RMZE 1 0
PZXW 1 0 STNX 1 0
RMZE 0 1 SZRZ 0 1
STNX 1 0 TMDW 1 0
SZRZ 0 1 > xtabs(~Gender + Status, data=Data1)
TMDW 1 0 Status
Gender Accepted Rejected
Female 5 2
Male 1 2
Example
• xtabs() with Three Categorical Variable > xtabs(~Status + Gender + Test, data=Data1)
, , Test = Test1
> Data1
Gender
Reference Status Gender Test NewOrFollowUp
Status Female Male
1 KRXH Accepted Female Test1 New
Accepted 2 1
2 KRPT Accepted Male Test1 New
Rejected 1 0
3 FHRA Rejected Male Test2 New
4 CZKK Accepted Female Test3 New
, , Test = Test2
5 CQTN Rejected Female Test1 New
6 PZXW Accepted Female Test4 Follow-up
Gender
7 SZRZ Rejected Male Test4 New
Status Female Male
8 RMZE Rejected Female Test2 New
Accepted 0 0
9 STNX Accepted Female Test3 New
Rejected 1 1
10 TMDW Accepted Female Test1 New
, , Test = Test3
> ftable(xtabs(~Status + Gender + Test, data=Data1))
Test Test1 Test2 Test3 Test4
Gender
Status Gender
Status Female Male
Accepted Female 2 0 2 1
Accepted 2 0
Male 1 0 0 0
Rejected 0 0
Rejected Female 1 1 0 0
Male 0 1 0 1
, , Test = Test4
Gender
Status Female Male
Accepted 1 0
Rejected 0 1
Example
• xtabs() with Three Categorical Variable
> Data1
Reference Status Gender Test NewOrFollowUp
1 KRXH Accepted Female Test1 New
2 KRPT Accepted Male Test1 New
3 FHRA Rejected Male Test2 New > ftable(xtabs(~Reference + Gender + Test, data=Data1))
4 CZKK Accepted Female Test3 New Test Test1 Test2 Test3 Test4
5 CQTN Rejected Female Test1 New Reference Gender
6 PZXW Accepted Female Test4 Follow-up CQTN Female 1 0 0 0
7 SZRZ Rejected Male Test4 New Male 0 0 0 0
8 RMZE Rejected Female Test2 New CZKK Female 0 0 1 0
9 STNX Accepted Female Test3 New Male 0 0 0 0
10 TMDW Accepted Female Test1 New FHRA Female 0 0 0 0
Male 0 1 0 0
KRPT Female 0 0 0 0
Male 1 0 0 0
KRXH Female 1 0 0 0
Male 0 0 0 0
PZXW Female 0 0 0 1
Male 0 0 0 0
RMZE Female 0 1 0 0
Male 0 0 0 0
STNX Female 0 0 1 0
Male 0 0 0 0
SZRZ Female 0 0 0 0
Male 0 0 0 1
TMDW Female 1 0 0 0
Male 0 0 0 0
Table Proportions…
Table proportions
The margins of a table (the row totals or the column
totals) are often useful for calculating proportions.
> counts< -matrix(c(2,2,4,3,1,4,2,0,1,5,3,3),nrow=4)
> counts
[,1] [,2] [,3]
[1,] 2 1 1
[2,] 2 4 5
[3,] 4 2 3
[4,] 3 0 3
Table proportions…
• Note:
Margin number 1 refers to the row totals
> prop.table(counts,1)
[,1] [,2] [,3]
[1,] 0.5000000 0.2500000 0.2500000
[2,] 0.1818182 0.3636364 0.4545455
[3,] 0.4444444 0.2222222 0.3333333
[4,] 0.5000000 0.0000000 0.5000000
Table proportions…
• The column totals are the second margin, so to express
the counts as proportions of the relevant column total.
> prop.table(counts,2)
[,1] [,2] [,3]
[1,] 0.1818182 0.1428571 0.08333333
[2,] 0.1818182 0.5714286 0.41666667
[3,] 0.3636364 0.2857143 0.25000000
[4,] 0.2727273 0.0000000 0.25000000
Table proportions…
To check that the column proportions sum to
one, use colSums( )
> colSums(prop.table(counts,2))
[1] 1 1 1
Table proportions…
• If you want the proportions expressed as a fraction of the
grand total sum(counts), then simply omit the margin
number
> prop.table(counts)
[,1] [,2] [,3]
[1,] 0.06666667 0.03333333 0.03333333
[2,] 0.06666667 0.13333333 0.16666667
[3,] 0.13333333 0.06666667 0.10000000
[4,] 0.10000000 0.00000000 0.1000000
> sum(prop.table(counts))
[1] 1
Covariance and Correlation…
1. Covariance
• Covariance can be measured
using cov() function.
• Covariance is a statistical term used to
measures the direction of the linear
relationship between the data vectors.
• If covariance is positive, it denotes a positive
relationship between the two variables.
1. Covariance
x x . y y
cov xy
n 1
x represents the x data vector
y represents the y data vector
x
represents mean of x data vector
y
represents mean of y data vector
N represents total observations
Syntax:
cov(x, y, method)
where,
• x and y represents the data vectors
• method defines the type of method to be used to compute covariance. Default
is "pearson".
Example
Q: Calculate covariance for the following data set:
x: 2.1, 2.5, 3.6, 4.0 (mean = 3.1)
y: 8, 10, 12, 14 (mean = 11)
Substitute the values into the formula and solve:
Cov(X,Y) = Σ((X-μ)(Y-ν)) / n-1
= (2.1-3.1)(8-11)+(2.5-3.1)(10-11)+(3.6-
3.1)(12-11)+(4.0-3.1)(14-11) /(4-1)
= (-1)(-3) + (-0.6)(-1)+(.5)(1)+(0.9)(3) / 3
= 3 + 0.6 + .5 + 2.7 / 3
= 6.8/3
= 2.267
In R…..
> s<-c(2.1, 2.5, 3.6, 4.0)
> t<-c( 8, 10, 12, 14 )
> cov(s,t)
[1] 2.266667
Covariance - Example
2. Correlation
• cor() function in R programming measures the correlation
coefficient value.
• Correlation is a relationship term in statistics that uses the
covariance method to measure how strong the vectors are related.
• Measure of correlation will always range from -1 to 1.
• An correlation of -1 will denote a perfectly negative (linear)
relationship, an correlation of 0 will denote no (linear) relationship
and an correlation of 1 will denote a perfectly positive (linear)
relationship.
2. Correlation
• The correlation coefficient of two variables in a data set
equals to their covariance divided by the product of their
individual standard deviation.
• It is a normalized measurement of how the two are linearly
related.
or
Where,
x represents the x data vector
y represents the y data vector
x
represents mean of x data vector
y
represents mean of y data vector
Correlation - Example
> cor(women$height,women$weight)
[1] 0.9954948
Example
A sample of 6 children was selected, data about their age
in years and weight in kilograms was recorded as shown in
the following table . It is required to find the correlation
between age and weight.
serial Age Weight
No (years (Kg)
)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
Example
These 2 variables are of the quantitative type, one variable
(Age) is called the independent and denoted as (X) variable
and the other (weight) is called the dependent and denoted as
(Y) variables to find the relation between age and weight
compute the simple correlation coefficient using the following
formula:
Example
In R….
> Age<-c(7,6,8,5,6,9)
> Weight<-c(12,8,12,10,11,13)
> cor(Age,Weight)
[1] 0.7595545