0% found this document useful (0 votes)
17 views6 pages

Categorical Data Analysis Overview

CDA

Uploaded by

Misbahur Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Categorical Data Analysis Overview

CDA

Uploaded by

Misbahur Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lectured by

STT553: Categorical Data Analysis (CDA)


Md. Kaderi Kibria, STT-HSTU

Lecture #2 Introduction to CDA

Objectives of this lecture:


After reading this unit, you should be able to
• basic concepts of CDA
• distribution function of categorical data

Multinomial Distribution
Nominal and ordinal response variables have more than two possible outcomes. When the
observations are independent with the same category probabilities for each, the probability
distribution of counts in the outcome categories is the multinomial.
Suppose we have scenario where there are r=3 outcomes, with probabilities p1, p2, p3 respectively,
such that p1+p2+p3=1. Suppose we have n=7 independent trials, and let Y=(Y1,Y2,Y3) be the rvtr of
counts of each outcome. Suppose we define each Xi as a one-hot vector (exactly one 1, and the rest
n
0) as below, so that Y =∑ X i (this is exactly like how adding indicators/Bernoulli’s gives us a
i=1

Binomial):

Now, what is the probability of this outcome (two of outcome1, one of outcome2, and four of
outcome 3) that is, (Y1=2, Y2=1, Y3=4)? We get the following:
7!
pY , Y 2 , Y3 ( 2,1,4)= p2 × p 12× p43
1
2!1!4! 1
= 7 p1 × p 2 × p 3
2 1 4
( )
2,1,4
This describes the joint distribution of the random vector Y=(Y1,Y2, Y3), and its PMF shoul dremind
7
of you of the binomial PMF. We just count the number of ways
( 2,1,4 ) to get these counts

(multinomial coefficient), and make sure we get each outcome that many times p 21× p12× p43

Now let us define the Multinomial Distribution more generally:

Introduction to categorical data analysis


Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU

Let c denote the number of outcome categories. We denote their probabilities by (p1 , p2 , . . . , pc),
where ∑ p j =1 . For n independent observations, the multinomial probability that x1 fall in
j

category 1, x2 fall in category 2, ... , xc fall in category c, where ∑ x j =n , equals


j

n! x x x
p ( x 1 , x 2 ,. . . , x n)= 1
p p .. . p c
2 c

x 1 ! x 2 ! .. . x n ! 1 2
The binomial distribution is the special case with c = 2 categories.

Then, the mean and variance of multinomial distribution are


μi =npi

and σ2i =npi (1−p i)

Then, we can specify the entire mean vector E[Y] and covariance matrix:

np1
.
E [Y ]=np= .
.
npc
[] var (Y i )=npi (1− p i ) Cov(Y i ,Y j )=−npi p j ( for i ≠ j )

Proof of Multinomial Covariance. Recall that marginally, Xi and Xj are binomial random variables;
let’s decompose them into their Bernoull trials. We’ll use different dummy indices as we’re dealing
with covariances.
Let Xik for k=1,...,n be indicator/Bernoulli rvs of whether the kth trial resulted in outcome i, so that
n
X i =∑ X ik
k =1
Similarly, let Xjl for l =1,...,n be indicators of whether the l-th trial resulted in outcome j, so that
n
X k =∑ X jl .
l =1
Before we begin, we should argue that Cov(Xik,Xjl )=0 when k ≠l since k and l are different trials
and are independent.
Furthermore, E[XikXjk]=0 since it’s not possible that both out come i and j occur at trial k.
n n
Cov( X i , X j )=Cov
n n
( ∑ X ik , ∑ X jl
k=1 l=1
)
= ∑ ∑ Cov ( X ik , X jl )
k=1 l=1
n
= ∑ Cov (X ik , X jl )
k=1

Introduction to categorical data analysis


Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU

n
= ∑ E [ X ik , X jk ]−E [ X ik ] E [ X jk ]
k=1
n− p i

= ∑ pj
k =1
=−npi p j
Note that in the third line we dropped one of the sums because the indicators across different trials
k, are independent (zero covariance). Hence, we just need to sum when k=l.

Relationship between Poisson and Multinomial Distribution


A Poisson model for (Y 1 ,Y 2 ,Y 3 , Y 4 ) treat as independent Piosson random variables, with
parameters (μ 1 , μ 2 , μ 3 , μ 4). The joint probability mass function for {Yi} is the product of the four
mass functions of the form
μ iy e
−μ i i

P ( y i )= ; y =0 ,1 , 2 , 3 , 4
y i!
The total n=∑ Y i also has a Piosson distribution, with parameters ∑ μi .
i i
With Poisson sampling the total count n is random rather than fixed. If we assume a Poisson model
but condition on n, {Yi} no longer have Poisson distributions. Since each Yi can not exceed n. Given
n, {Yi} are also no longer independent, since the value of one affects the possible range for the
others.
For c independent Poisson variates, with E (Y i)=μ i , the conditional probability of a set of counts
{ni} satisfying ∑ Y i=n is
i

P(Y 1=n1 , Y 2=n 2 , . . . ,Y n=nc )


P (Y 1=n1 , Y 2=n2 , .. . ,Y c =n c )∣∑ Y j =n =
[ j ] P ( ∑ Y j=n)
j
ni

n!
∏μ i
= × i
∏ ni! (∑ μ j )n
i i
ni
n! μi
= ×∏
∏ ni ! i
i
(∑ )
j
μj

n!
= ×∏ π ni i
~ multi(n , {πi })
n
∏ i i
!
i

Introduction to categorical data analysis


Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU

Likelihood Function
Let x 1 , x 2 , . . , x n be a random sample of size n from a population with density function f(x). If the
joint pdf may be regarded as a function of θ , it is called likelihood function and denoted by
L( θ ) defined as
L( θ )= f ( x 1 ; θ ) f ( x 2 ; θ ) ,. . . , f ( x n ; θ )=∏ f ( x i ; θ )
i

Maximum Likelihood Estimate


Let L( θ )=∏ f ( x i ; θ ) be the likelihood function for the random variables x 1 , x 2 , . . , x n . Then
i

the value of θ is called maximize the likelihood function that the value of θ is called maximum
likelihood estimator. It is usually denoted by θ^ .
The MLE of θ is the solution of likelihood equation
∂ L( θ )
∂θ =0
If θ^ is the MLE of θ then
∂ L (θ)
=0 and
∂θ∣θ =θ^
∂ 2 L( θ )
<0
∂θ 2∣θ =θ^

Likelihood Function and ML Estimate for Binomial Distribution


If an experiment has n trails with success probability π , then the probability mass function
becomes

p ( y)= ( ny) π (1−π )


y n− y
; y=0,1,2 , . . , n

The binomial coefficient ( ny) has no influence on where the maximum occurs with respect to

π . Thus ignore it. The binomial log-likelihood function becomes


y n− y
L( π )=log p ( y )=log [ π (1−π ) ]
= y log( π )+(n− y ) log (1− π)
Differentiating with respect to π yields
y
π^ =
n
δ2 L( π )
To calculate variance of the MLE, calculate and then taking expectation
δ π2

Introduction to categorical data analysis


Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU

δ2 L ( π ) n
−E
δπ (
2
=
π (1− π ) )
Thus, the asymptotic variance of hat %pi is
1 π (1− π )
=
2 n
−E ( δ δL(π π ) ) 2

Estimation of Multinomial Parameters


Suppose a multinomial experiment consist of n trails and each trail can result in any of c possible outcomes
y 1 , y 2 , . . . , y c . Further suppose that each possible outcome can occur with probabilities
π1 , π2 , . .. , πc . Then the probability distribution becomes,
n!
p ( y)= π y . π y .. . π cy
1 2 c

y 1!, y 2! , .. . , y c ! 1 2
c
n!
= ∏ πiy i

∏ y i! i
i

The Lagrangian with the constrains than has the following form
L( π , λ )=log L ( π )+ λ (1−∑ πi )
i

To find maximum, we differentiate the above function [Link] π and outcomes becomes
y
πi = λi
To solve for λ , we take sum both sides and make use of our initial constant
y
∑ πi=∑ λi
i I
λ =∑ y i=n
i

Thus the MLE of πi becomes


yi
π^ i =
n

Likelihood-Ratio Chi-squared Test


Let us consider the following hypothesis
H o : π j =π j 0 ; j=1,2 , . . . , c
vs H 1 : π j≠ π j 0

Introduction to categorical data analysis


Lectured by
STT553: Categorical Data Analysis (CDA)
Md. Kaderi Kibria, STT-HSTU

The kernel of the multinomial likelihood is


∏ π yj j

Under H o the likelihood is maximized when π^ j= π jo .


In the general case, it is maximized when π^ j= y i / n . The ratio of the likelihoods equals

∏ (π j 0 )y j

π j0 y
j

∏ ( y j / n) y j ( y j /n ) j
Λ= =∏ j

Thus, the likelihood-ratio statistic denoted by G2 is


G2 =−2 log Λ=−2 ∑ [ y j log π j 0 −log ( y j / n ) ]
j
= 2 ∑ y j log ( y j / n π j 0 )
j

This statistic is called the likelihood-ratio chi-squared statistic. The larger the value of G2 , the
greater the evidence against H 0.
Example:

Reference Book:
i. Agresti A. (2019), An Introduction to Categorical Data Analysis, 3rd edition, A John Wiley & Sons
Inc., Publication.
ii.
<><><><><><><><><> End <><><><><><><><><>

Introduction to categorical data analysis

You might also like