Skip to content

The codes and datasets about LREC-COLING 2024: A Multimodal In-context Tuning Approach for E-Commerce Product Description Generation

Notifications You must be signed in to change notification settings

HITsz-TMG/Multimodal-In-Context-Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 

Repository files navigation

If you have any questions, please feel free to contact me by e-mail: [email protected], Twitter: @LyxTg.

The main contributions are:

  1. We present a product description generation paradigm that is based only on the image and several marketing keywords. For this new setting, we propose a straightforward and effective multimodal in-context tuning approach, named ModICT, integrating the power from the frozen language model and visual encoder.

  2. Our work is the first one to investigate utilizing the in-context learning and text generation capabilities of various frozen language models for multimodal E-commerce product description generation. ModICT can be plugged into various types of language models and the training process is parameter-efficient.

  3. We conduct extensive experiments on our newly built three-category product datasets. The experimental results indicate that the proposed method achieves state-of-the-art performance on a wide range of evaluation metrics. Using the proposed multimodal in-context tuning technical, small models also achieve competitive performance compared to LLMs.

πŸš€ Our Training Approach: ModICT

The overall workflow of ModICT. The left part depicts the process of in-context reference construction. The right parts show the efficient multimodal in-context tuning ways for the sequence-to- sequence language model (1) and autoregressive language model (2). Blocks with red lines are learnable.

πŸ€— Our Proposed Dataset: MD2T

MD2T is a new setting for multimodal E-commerce Description generation based on structured keywords and images.

MD2T Dataset Statistics

MD2T Cases&Bags Clothing Home Appliances
#Train 18,711 200,000 86,858
#Dev 983 6,120 1,794
#Test 1,000 8,700 2,200
Avg_N #MP 5.41 6.57 5.48
Avg_L #MP 13.50 20.34 18.30
Avg_L #Desp 80.05 79.03 80.13

Table: The detailed statistics of MD2T. Avg_N and Avg_L represent the average number and length respectively. MP and Desp indicate the marketing keywords and description.

Our preprocessed data (Text + Images) can be downloaded from https://huggingface.co/datasets/YunxinLi/MD2T.

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation πŸ“.

@article{li2024multimodal,
  title={A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation},
  author={Li, Yunxin and Hu, Baotian and Luo, Wenhan and Ma, Lin and Ding, Yuxin and Zhang, Min},
  journal={LREC-COLING},
  year={2024}
}

About

The codes and datasets about LREC-COLING 2024: A Multimodal In-context Tuning Approach for E-Commerce Product Description Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published