0% found this document useful (0 votes)
128 views3 pages

Bayesian Classifier for Document Classification

This document describes using a Bayesian classifier to classify movie reviews as positive or negative based on the presence of certain words. It: 1) Shows a sample dataset of 5 movie reviews labeled as positive or negative based on the presence of words like "loved", "hated", "great", "poor", etc. 2) Calculates probabilities of words occurring in positive and negative reviews to build the classifier. 3) Tests a new review "I hated the poor acting" and predicts it as negative since that combination of words is more probable in the negative class.

Uploaded by

Abu Talha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views3 pages

Bayesian Classifier for Document Classification

This document describes using a Bayesian classifier to classify movie reviews as positive or negative based on the presence of certain words. It: 1) Shows a sample dataset of 5 movie reviews labeled as positive or negative based on the presence of words like "loved", "hated", "great", "poor", etc. 2) Calculates probabilities of words occurring in positive and negative reviews to build the classifier. 3) Tests a new review "I hated the poor acting" and predicts it as negative since that combination of words is more probable in the negative class.

Uploaded by

Abu Talha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Document Classification using Bayesian Classifier

Document Text Class


1 I loved the movie +
2 I hated the movie -
3 A great movie. Good movie +
4 Poor acting -
5 Great acting. A good movie +

Total 10 unique words (Vocabulary): I, loved, the, movie, hated, a, great, good, poor, acting

Table 1: Converts documents into feature set and class label

Doc I Loved The Movie Hated A Great Good poor acting class
1 1 1 1 1 +

2 1 1 1 1 -

3 2 1 1 1 +

4 1 1 -

5 1 1 1 1 1 +
p(+) = 3/5 = 0.6

Compute: P(I|+), P(loved|+), P(the|+), P(movie|+), p(hated|+), P(a|+), P(great|+), p(good|+), P(poor|+),
p(acting|+)

n be the number words in the + class: 14,

nk the number of times word k occurs in the + class

p(wk|+) = (nk +1)/(n+vocabulary)

Table 2: Converts documents into feature set for positive class

Doc I Loved The Movie Hated A Great Good poor acting class
1 1 1 1 1 +

3 2 1 1 1 +

5 1 1 1 1 1 +

P(I|+) = (1+1)/(14 + 10) = 0.0833 P(loved|+) = (1+1)/(14 + 10) = 0.0833

P(the|+) = (1+1)/(14 + 10) = 0.0833 P(movie|+) = (4+1)/(14 + 10) = 0.2083

P(a|+) = (2+1)/(14 + 10) = 0.125 P(great|+) = (2+1)/(14 + 10) = 0.125

P(acting|+) = (1+1)/(14 + 10) = 0.0833 P(good|+) = (2+1)/(14 + 10) = 0.125

P(hated|+) = (0+1)/(14 + 10) = 0.0417 P(poor|+) = (0+1)/(14 + 10) = 0.0417


Table 3: Converts documents into feature set for negative class

Doc I Loved The Movie Hated A Great Good poor acting class

2 1 1 1 1 -

4 1 1 -

(-) = 2/5 = 0.4

Compute: P(I|-), P(love|-), P(the|-), P(movie|-), p(hated|-), P(a|-), P(great|-), p(good|-), P(poor|-),
p(acting|-)

n be the number words in the - class: 6,

nk the number of times word k occurs in the - class

p(wk|-) = (nk +1)/(n+vocabulary)

P(I|-) = (1+1)/(6 + 10) = 0.125 P(loved|-) = (0+1)/(6 + 10) = 0.0625

P(the|-) = (1+1)/(6 + 10) = 0.125 P(movie|-) = (1+1)/(6 + 10) = 0.125

P(hated|-) = (1+1)/(6 + 10) = 0.125 P(a|-) = (0+1)/(6 + 10) = 0.0625

P(great|-) = (0+1)/(6 + 10) = 0.0625 P(good|-) = (0+1)/(6 + 10) = 0.0625

P(poor|-) = (1+1)/(6 + 10) = 0.125 P(acting|-) = (1+1)/(6 + 10) = 0.125

Testing Data: I hated the poor acting, ?

If Cj = +; p(+) x p(I|+) x p(hated|+) x p(the|+) x p(poor|+) x p(acting|+) = 6.03 x 10-7

If Cj = -; p(-) x p(I|-) x p(hated|-) x p(the|-) x p(poor|-) x p(acting|-) = 1.22 x 10-5

Hence, the class label of testing data “I hated the


poor acting” is -.

You might also like