PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis
Meng Luo
·
Hao Fei
·
Bobo Li
·
Shengqiong Wu
·
Qian Liu
·
Soujanya Poria
·
Erik Cambria
·
Mong-Li Lee
·
Wynne Hsu
National University of Singapore · Wuhan University · The University of Auckland ·
Singapore University of Technology and Design · Nanyang Technological University
While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversational ABSA, where two novel subtasks are proposed: Panoptic Sentiment Sextuple Extraction, panoramically recognizing holder, target, aspect, opinion, sentiment, rationale from multi-turn multi-party multimodal dialogue. Sentiment Flipping Analysis, detecting the dynamic sentiment transformation throughout the conversation with the causal reasons. To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit & explicit sentiment elements. To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism. Extensive evaluations demonstrate the superiority of our methods over strong baselines, validating the efficacy of all our proposed methods. The work is expected to open up a new era for the ABSA community.
We develop a novel MLLM, Sentica, which adopts the FlanT5 (XXL) as the core LLM for semantics understanding and decision-making. For non-text inputs, we use multimodal models to encode signals into LLM-understandable representations. We use ImageBind as the unified encoder for all three non-text modalities due to its strong capabilities, followed by a linear layer that connects ImageBind to the LLM for representation projection.
PanoSent/
├── data/
│ ├── T-X_pair_data/
│ │ ├── LLaVA/
│ │ ├── miniGPT-4/
│ │ └── VideoChat/
│ ├── PanoSent_train.json
│ └── PpV_train.json
├── PanoSent/
│ ├── model/
│ │ ├── imagebind_encoder.py
│ │ ├── flant5_model.py
│ │ ├── projection_layer.py
│ │ └── lora_utils.py
│ ├── utils/
│ │ └── imagebind_utils.py
│ └── datasets/
│ ├── stage1_caption_dataset.py
│ ├── stage2_sextuple_dataset.py
│ └── stage3_entailment_dataset.py
├── scripts/
│ ├── train_stage1.sh
│ ├── train_stage2.sh
│ └── …
├── train.py
├── evaluate_subtask1.py
├── evaluate_subtask2.py
├── requirements.txt
└── README.md
conda create -n sentica python=3.10
conda activate sentica
git clone https://github.com/PanoSent/PanoSent.git-
ImageBind
Download the officialimagebind_huge.pthcheckpoint from here, and place it at:./imagebind/imagebind_huge.pth -
Flan-T5
We use Flan-T5 XXL as the LLM backbone.
Sentica consists of three instruction tuning stages. The corresponding datasets are:
LLaVAminiGPT-4VideoChat
After downloading these datasets, organize them as:
./data/T-X_pair_data/
├── LLaVA/
├── miniGPT-4/
└── VideoChat/
PanoSent_train.json
./data/PanoSent_train.json
PpV_train.json
./data/PpV_train.json
Sentica follows a three-stage training process:
- Stage 1: Multimodal Understanding Stage
bash scripts/train_stage1.sh- Stage 2: Sextuple Extraction Understanding
bash scripts/train_stage2.sh…
python evaluate_subtask1.py --pred pred.json --gt gold.jsonpython evaluate_subtask2.py --pred pred.json --gt gold.jsonIf you have any questions or feedback, feel free to open an issue or reach out to us at [email protected]
@inproceedings{luo2024panosent,
title={Panosent: A panoptic sextuple extraction benchmark for multimodal conversational aspect-based sentiment analysis},
author={Luo, Meng and Fei, Hao and Li, Bobo and Wu, Shengqiong and Liu, Qian and Poria, Soujanya and Cambria, Erik and Lee, Mong-Li and Hsu, Wynne},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia},
pages={7667--7676},
year={2024}
}