Dataset:
• Review 1 (positive): "The food was amazing and the service was excellent."
• Review 2 (negative): "The food was bad, and the service was terrible."
• Review 3 (positive): "Amazing food and excellent service!"
• Review 4 (negative): "The food was awful and the service was slow."
Bigrams:
• Review 1 (positive):
◦ "The food"
◦ "food was"
◦ "was amazing"
◦ "amazing and"
◦ "and the"
◦ "the service"
◦ "service was"
◦ "was excellent"
• Review 2 (negative):
◦ "The food"
◦ "food was"
◦ "was bad"
◦ "bad and"
◦ "and the"
◦ "the service"
◦ "service was"
◦ "was terrible"
• Review 3 (positive):
◦ "Amazing food"
◦ "food and"
◦ "and excellent"
◦ "excellent service"
• Review 4 (negative):
◦ "The food"
◦ "food was"
◦ "was awful"
◦ "awful and"
◦ "and the"
◦ "the service"
◦ "service was"
◦ "was slow"
Combined List of Unique Bigrams:
The unique bigrams across all reviews are:
• "The food"
• "food was"
• "was amazing"
• "amazing and"
• "and the"
• "the service"
• "service was"
• "was excellent"
• "was bad"
• "bad and"
• "and excellent"
• "excellent service"
• "was terrible"
• "Amazing food"
• "awful and"
• "was awful"
• "was slow"
Term Frequency (TF) Calculation:
For each bigram in each review, calculate the frequency and then normalize by dividing by the total
number of bigrams in that review.
Review 1 (positive):
• Total number of bigrams = 8
◦ "The food": 1/8 = 0.125
◦ "food was": 1/8 = 0.125
◦ "was amazing": 1/8 = 0.125
◦ "amazing and": 1/8 = 0.125
◦ "and the": 1/8 = 0.125
◦ "the service": 1/8 = 0.125
◦ "service was": 1/8 = 0.125
◦ "was excellent": 1/8 = 0.125
Review 2 (negative):
• Total number of bigrams = 8
◦ "The food": 1/8 = 0.125
◦ "food was": 1/8 = 0.125
◦ "was bad": 1/8 = 0.125
◦ "bad and": 1/8 = 0.125
◦ "and the": 1/8 = 0.125
◦ "the service": 1/8 = 0.125
◦ "service was": 1/8 = 0.125
◦ "was terrible": 1/8 = 0.125
Review 3 (positive):
• Total number of bigrams = 4
◦ "Amazing food": 1/4 = 0.25
◦ "food and": 1/4 = 0.25
◦ "and excellent": 1/4 = 0.25
◦ "excellent service": 1/4 = 0.25
Review 4 (negative):
• Total number of bigrams = 8
◦ "The food": 1/8 = 0.125
◦ "food was": 1/8 = 0.125
◦ "was awful": 1/8 = 0.125
◦ "awful and": 1/8 = 0.125
◦ "and the": 1/8 = 0.125
◦ "the service": 1/8 = 0.125
◦ "service was": 1/8 = 0.125
◦ "was slow": 1/8 = 0.125
Bigram IDF
The food 0.125
food was 0.125
was amazing 1.386
amazing and 1.386
and the 0.125
the service 0.125
service was 0.125
was excellent 1.386
was bad 1.386
bad and 1.386
and excellent 1.386
excellent service 1.386
was terrible 1.386
Amazing food 1.386
awful and 1.386
was awful 1.386
was slow 1.386
IDF("The food”)=log(4/3 )=0.125
N=4
IDF(t)=log(N/df(t))
Tf-idf(t)=tf(t)*idf(t)
TF-IDF Calculation:
Let’s calculate the TF-IDF for each bigram in each review by multiplying the TF values with the
corresponding IDF values.
Review 1 (positive):
• TF values:
◦ "The food" = 0.125
◦ "food was" = 0.125
◦ "was amazing" = 0.125
◦ "amazing and" = 0.125
◦ "and the" = 0.125
◦ "the service" = 0.125
◦ "service was" = 0.125
◦ "was excellent" = 0.125
• IDF values:
◦ "The food" = 0.125
◦ "food was" = 0.125
◦ "was amazing" = 1.386
◦ "amazing and" = 1.386
◦ "and the" = 0.125
◦ "the service" = 0.125
◦ "service was" = 0.125
◦ "was excellent" = 1.386
• TF-IDF values for Review 1:
◦ "The food" = 0.125 × 0.125 = 0.015625
◦ "food was" = 0.125 × 0.125 = 0.015625
◦ "was amazing" = 0.125 × 1.386 = 0.17325
◦ "amazing and" = 0.125 × 1.386 = 0.17325
◦ "and the" = 0.125 × 0.125 = 0.015625
◦ "the service" = 0.125 × 0.125 = 0.015625
◦ "service was" = 0.125 × 0.125 = 0.015625
◦ "was excellent" = 0.125 × 1.386 = 0.17325
Review 2 (negative):
• TF values:
◦ "The food" = 0.125
◦ "food was" = 0.125
◦ "was bad" = 0.125
◦ "bad and" = 0.125
◦ "and the" = 0.125
◦ "the service" = 0.125
◦ "service was" = 0.125
◦ "was terrible" = 0.125
• IDF values:
◦ "The food" = 0.125
◦ "food was" = 0.125
◦ "was bad" = 1.386
◦ "bad and" = 1.386
◦ "and the" = 0.125
◦ "the service" = 0.125
◦ "service was" = 0.125
◦ "was terrible" = 1.386
• TF-IDF values for Review 2:
◦ "The food" = 0.125 × 0.125 = 0.015625
◦ "food was" = 0.125 × 0.125 = 0.015625
◦ "was bad" = 0.125 × 1.386 = 0.17325
◦ "bad and" = 0.125 × 1.386 = 0.17325
◦ "and the" = 0.125 × 0.125 = 0.015625
◦ "the service" = 0.125 × 0.125 = 0.015625
◦ "service was" = 0.125 × 0.125 = 0.015625
◦ "was terrible" = 0.125 × 1.386 = 0.17325
Review 3 (positive):
• TF values:
◦ "Amazing food" = 0.25
◦ "food and" = 0.25
◦ "and excellent" = 0.25
◦ "excellent service" = 0.25
• IDF values:
◦ "Amazing food" = 1.386
◦ "food and" = 1.386
◦ "and excellent" = 1.386
◦ "excellent service" = 1.386
• TF-IDF values for Review 3:
◦ "Amazing food" = 0.25 × 1.386 = 0.3465
◦ "food and" = 0.25 × 1.386 = 0.3465
◦ "and excellent" = 0.25 × 1.386 = 0.3465
◦ "excellent service" = 0.25 × 1.386 = 0.3465
Review 4 (negative):
• TF values:
◦ "The food" = 0.125
◦ "food was" = 0.125
◦ "was awful" = 0.125
◦ "awful and" = 0.125
◦ "and the" = 0.125
◦ "the service" = 0.125
◦ "service was" = 0.125
◦ "was slow" = 0.125
• IDF values:
◦ "The food" = 0.125
◦ "food was" = 0.125
◦ "was awful" = 1.386
◦ "awful and" = 1.386
◦ "and the" = 0.125
◦ "the service" = 0.125
◦ "service was" = 0.125
◦ "was slow" = 1.386
• TF-IDF values for Review 4:
◦ "The food" = 0.125 × 0.125 = 0.015625
◦ "food was" = 0.125 × 0.125 = 0.015625
◦ "was awful" = 0.125 × 1.386 = 0.17325
◦ "awful and" = 0.125 × 1.386 = 0.17325
◦ "and the" = 0.125 × 0.125 = 0.015625
◦ "the service" = 0.125 × 0.125 = 0.015625
◦ "service was" = 0.125 × 0.125 = 0.015625
◦ "was slow" = 0.125 × 1.386 = 0.17325
Bigram R1 R2 R3 R4
The food 0.015625 0.015625 0.000 0.015625
food was 0.015625 0.015625 0.000 0.015625
was amazing 0.17325 0.000 0.000 0.000
amazing and 0.17325 0.000 0.000 0.000
and the 0.015625 0.015625 0.000 0.015625
the service 0.015625 0.015625 0.000 0.015625
service was 0.015625 0.015625 0.000 0.015625
was excellent 0.17325 0.000 0.000 0.000
was bad 0.000 0.17325 0.000 0.000
bad and 0.000 0.17325 0.000 0.000
and excellent 0.000 0.000 0.3465 0.000
excellent service 0.000 0.000 0.3465 0.000
was terrible 0.000 0.17325 0.000 0.000
Amazing food 0.000 0.000 0.3465 0.000
awful and 0.000 0.000 0.000 0.17325
was awful 0.000 0.000 0.000 0.17325
was slow 0.000 0.000 0.000 0.17325