PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Meng Luo · Hao Fei · Bobo Li · Shengqiong Wu · Qian Liu ·
Soujanya Poria · Erik Cambria · Mong-Li Lee · Wynne Hsu

National University of Singapore · Wuhan University · The University of Auckland ·
Singapore University of Technology and Design · Nanyang Technological University

Abstract

While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversational ABSA, where two novel subtasks are proposed: Panoptic Sentiment Sextuple Extraction, panoramically recognizing holder, target, aspect, opinion, sentiment, rationale from multi-turn multi-party multimodal dialogue. Sentiment Flipping Analysis, detecting the dynamic sentiment transformation throughout the conversation with the causal reasons. To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit & explicit sentiment elements. To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism. Extensive evaluations demonstrate the superiority of our methods over strong baselines, validating the efficacy of all our proposed methods. The work is expected to open up a new era for the ABSA community.

Sentica

We develop a novel MLLM, Sentica, which adopts the FlanT5 (XXL) as the core LLM for semantics understanding and decision-making. For non-text inputs, we use multimodal models to encode signals into LLM-understandable representations. We use ImageBind as the unified encoder for all three non-text modalities due to its strong capabilities, followed by a linear layer that connects ImageBind to the LLM for representation projection.

1. Code Structure

PanoSent/                     
├── data/
│   ├── T-X_pair_data/                 
│   │   ├── LLaVA/
│   │   ├── miniGPT-4/
│   │   └── VideoChat/
│   ├── PanoSent_train.json            
│   └── PpV_train.json                 
├── PanoSent/
│   ├── model/
│   │   ├── imagebind_encoder.py       
│   │   ├── flant5_model.py          
│   │   ├── projection_layer.py       
│   │   └── lora_utils.py             
│   ├── utils/
│   │   └── imagebind_utils.py        
│   └── datasets/
│       ├── stage1_caption_dataset.py 
│       ├── stage2_sextuple_dataset.py 
│       └── stage3_entailment_dataset.py 
├── scripts/
│   ├── train_stage1.sh               
│   ├── train_stage2.sh               
│   └── …           
├── train.py                           
├── evaluate_subtask1.py              
├── evaluate_subtask2.py               
├── requirements.txt                  
└── README.md

2. Environment Preparation

conda create -n sentica python=3.10
conda activate sentica

git clone https://github.com/PanoSent/PanoSent.git

3. Preparing Pre-trained Checkpoints

ImageBind
Download the official imagebind_huge.pth checkpoint from here, and place it at:
```
./imagebind/imagebind_huge.pth
```
Flan-T5
We use Flan-T5 XXL as the LLM backbone.

4. Preparing Datasets

Sentica consists of three instruction tuning stages. The corresponding datasets are:

4.1 ‘Text+X’ pairs

LLaVA
miniGPT-4
VideoChat

After downloading these datasets, organize them as:

./data/T-X_pair_data/
├── LLaVA/
├── miniGPT-4/
└── VideoChat/

4.2 PanoSent train set

PanoSent_train.json

./data/PanoSent_train.json

4.3 Paraphrase pairs

PpV_train.json

./data/PpV_train.json

5. Training Sentica

Sentica follows a three-stage training process:

Stage 1: Multimodal Understanding Stage

bash scripts/train_stage1.sh

Stage 2: Sextuple Extraction Understanding

bash scripts/train_stage2.sh

…

6. Evaluation

Subtask-I: Panoptic Sentiment Sextuple Extraction

python evaluate_subtask1.py --pred pred.json --gt gold.json

Subtask-II: Sentiment Flipping Analysis

python evaluate_subtask2.py --pred pred.json --gt gold.json

Contact

If you have any questions or feedback, feel free to open an issue or reach out to us at [email protected]

Citation

@inproceedings{luo2024panosent,
  title={Panosent: A panoptic sextuple extraction benchmark for multimodal conversational aspect-based sentiment analysis},
  author={Luo, Meng and Fei, Hao and Li, Bobo and Wu, Shengqiong and Liu, Qian and Poria, Soujanya and Cambria, Erik and Lee, Mong-Li and Hsu, Wynne},
  booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
  pages={7667--7676},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
exp.mod.flip		exp.mod.flip
exp.mod.noflip		exp.mod.noflip
exp.txt.flip		exp.txt.flip
exp.txt.noflip		exp.txt.noflip
imp.mod.flip		imp.mod.flip
imp.mod.noflip		imp.mod.noflip
imp.txt.flip		imp.txt.flip
imp.txt.noflip		imp.txt.noflip
model		model
utils		utils
.DS_Store		.DS_Store
README.md		README.md
poster.png		poster.png
task1_evaluate.py		task1_evaluate.py
task1_reference.json		task1_reference.json
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Abstract

Sentica

1. Code Structure

2. Environment Preparation

3. Preparing Pre-trained Checkpoints

4. Preparing Datasets

4.1 ‘Text+X’ pairs

4.2 PanoSent train set

4.3 Paraphrase pairs

5. Training Sentica

6. Evaluation

Subtask-I: Panoptic Sentiment Sextuple Extraction

Subtask-II: Sentiment Flipping Analysis

Contact

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

PanoSent/PanoSent

Folders and files

Latest commit

History

Repository files navigation

PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

Abstract

Sentica

1. Code Structure

2. Environment Preparation

3. Preparing Pre-trained Checkpoints

4. Preparing Datasets

4.1 ‘Text+X’ pairs

4.2 PanoSent train set

4.3 Paraphrase pairs

5. Training Sentica

6. Evaluation

Subtask-I: Panoptic Sentiment Sextuple Extraction

Subtask-II: Sentiment Flipping Analysis

Contact

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages