0% found this document useful (0 votes)
21 views5 pages

Advanced Techniques in Text-Based Emotion Recognition and Conversations

The document discusses advanced techniques in text-based emotion recognition, focusing on feature representation, model conversations, and deep learning advancements for emotion prediction. It covers various methods such as Bag of Words, TF-IDF, and deep learning architectures like BERT, emphasizing the importance of contextual tracking in conversations. The evolution of emotion recognition systems from basic models to sophisticated approaches highlights the growing significance of multimodal affect prediction in emotionally intelligent systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Advanced Techniques in Text-Based Emotion Recognition and Conversations

The document discusses advanced techniques in text-based emotion recognition, focusing on feature representation, model conversations, and deep learning advancements for emotion prediction. It covers various methods such as Bag of Words, TF-IDF, and deep learning architectures like BERT, emphasizing the importance of contextual tracking in conversations. The evolution of emotion recognition systems from basic models to sophisticated approaches highlights the growing significance of multimodal affect prediction in emotionally intelligent systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Advanced Techniques in Text-Based Emotion

Recognition and Conversations

Introduction

This lecture extends the discussion on emotion recognition from text, focusing on how to build feature
representations, model conversations, and incorporate recent deep learning advances for dynamic
emotion prediction.

Feature Representation Techniques

Bag of Words (BoW)

Represent documents as histograms of word frequency.


Build a vocabulary (dictionary) from all training documents.
Each document is converted to a vector reflecting the frequency of each vocabulary word.
Example: Documents about Delhi might feature frequent use of "government," "parliament," etc.

N-grams

Represent sequences of n consecutive words.


Unigram (n=1): Each word is treated individually.
Bigram (n=2): Pairs of consecutive words (e.g., "Delhi is")
Trigram (n=3): Three-word sequences (e.g., "Delhi is the")
Captures contextual co-occurrence patterns, enhancing semantic inference.

TF-IDF (Term Frequency-Inverse Document Frequency)

Term Frequency (TF): Measures how often a word appears in a document.


Inverse Document Frequency (IDF): Penalizes common words (e.g., "the") occurring across many
documents.
Formula: TF-IDF = TF * log(N / M)
N: Total number of documents
M: Number of documents containing the word
Balances saliency and discriminative power.

Part-of-Speech (POS) Tagging

Page 1 of 5
Annotate each word with its grammatical role (noun, verb, adjective).
Provides structural and semantic context.

Pointwise Mutual Information (PMI)

Measures association between words based on co-occurrence.


PMI-IR: Enhanced with search engine hit probabilities.

Early Emotion Recognition Systems

Example: Blog Mood Classification (Gishe et al.)

Dataset: 815,000 blog posts with mood labels (132 classes) from LiveJournal.
Features Used:
Bag of words
POS tags
PMI and PMI-IR scores
Classifiers trained on these features for mood prediction.

Representation Learning Techniques

Word2Vec (Mikolov et al., 2013)

Learns vector representations of words using neural networks.


CBOW (Continuous Bag of Words):
Predict a target word from surrounding context.
Skip-gram:
Predict surrounding words given a target word.
Input is a one-hot vector; output is a dense, lower-dimensional embedding.

Applications

Vectorized representation of each word used for pooling and ML classification.


Document2Vec: Extension for full document representation.

GloVe (Global Vectors for Word Representation)

Learns embeddings from a global word co-occurrence matrix.


Captures fine-grained semantic similarities.

Bag of Concepts

Page 2 of 5
Extends BoW to represent clustered conceptual units rather than individual terms.
Uses Word2Vec + K-means clustering + TF-IDF weighting.

Deep Learning Architectures

Example: Kratzwald et al. (2018)

Inputs: Bag of words + word embeddings


Dual stream:
Feedforward feature extractor (BoW + embeddings)
RNN (LSTM) for sequential word modeling
Fusion and classification for emotion prediction

Example: Shelke et al.

Focus: Emotion from social media posts (text + emoticons)


Steps:
Text preprocessing: Tokenization, stop word removal, lemmatization
Emoticon labeling (e.g., joy = 1)
Feature extraction using DepecheMood lexicon
Feature ranking + fusion
Deep Neural Network classification

Semantic Emotion Neural Network (Batbaatar et al.)

Dual stream:
Semantic encoder (RNN) for contextual info
Emotion encoder (CNN) for affective info
Concatenate and classify

BERT and Transformer-Based Models

Attention-based models parallelize token processing


BERT: Bidirectional Encoder Representations from Transformers
Pretrained on masked language modeling and next sentence prediction

Examples:

Huang et al.:

Dual BERT models (FriendBERT and ChatBERT)


Emotion pretraining + fine-tuning on Twitter data

Page 3 of 5
Kumar et al.:

Dual-channel explainable emotion system


CNN-RNN and RNN-CNN pipelines
Explainability via intra-/inter-cluster distance analysis

Emotion Recognition During Conversations

Need for Conversational Modeling

Emotions vary with conversational flow.


Requires contextual tracking of speaker responses over time.

Example: Poria et al.

Emotion dynamics across dialogue turns in a sitcom scene.


Emotions annotated frame-by-frame for each speaker.

DialogueCRN (Hu et al.)

Contextual Reasoning Network for emotion tracking in dialogues


Inputs:
Situation-level context (dialogue history)
Speaker-level context (current utterance)
Outputs:
Fused representation → emotion prediction

Yeh et al.

Interaction-aware network with GRU + attention


Parallel speaker analysis with bi-directional RNNs

Lian et al.

Domain-Adversarial Network for speech + text


Workflow:
Speech-to-text → utterance-level GRU
Temporal attention
Predict emotion at each time step

Conclusion

Page 4 of 5
Text-based emotion recognition has evolved from simple vector models like BoW and TF-IDF to
sophisticated deep learning and transformer-based approaches. Modern systems also consider
dialogue context and speaker roles to dynamically infer emotions over time. As the field matures,
conversational and multimodal affect prediction continues to gain importance, especially in emotionally
intelligent systems.

Page 5 of 5

You might also like