Adapting Language Models to Text Matching based Recommendation Systems

This repository contains the source code for the paper: Adapting Language Models to Text Matching based Recommendation Systems.

Overview

TASTE$^+$ enhances sequential recommendation by adapting language models to text matching. It introduces two pretraining tasks, Masked Item Prediction and Next Item Prediction, which allow the model to capture richer matching signals from user–item sequences. By balancing attention between prompt tokens and item IDs, TASTE$^+$ builds more accurate user representations and improves recommendation performance on Yelp and Amazon datasets, demonstrating the effectiveness of language model pretraining for text matching-based recommendation.

Requirements

1. Conda Environment

conda create -n taste-plus python=3.8
conda activate taste-plus
pip install -r requirements.txt

2. Install Openmatch

git clone https://github.com/OpenMatch/OpenMatch.git
cd OpenMatch
pip install -e .

Reproduction Guide

This section provides a step-by-step guide to reproduce the TASTE$^+$ results.

1. Dataset Preprocessing

We utilize the Amazon Product 2014 and Yelp 2020 datasets. Download the original data from:

The following example uses the Amazon Beauty dataset.

1.1. Download and Prepare Amazon Beauty Dataset:

wget -c http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/ratings_Beauty.csv
wget -c http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/meta_Beauty.json.gz

1.2. Unzip the Metadata File:

gzip -d meta_Beauty.json.gz

1.3. Organize Files:

mkdir data
mv ratings_Beauty.csv data/
mv meta_Beauty.json data/

1.4. Process Raw Data for Recbole:

mkdir dataset
bash scripts/process_origin.sh

1.5. Extract and Process Required Data:

bash scripts/process_beauty.sh

2. Data Preprocessing

Before proceeding, process all four original datasets as described above to obtain the atomic files. Then, construct the mixed pretraining data for TASTE$^+$ according to your desired proportions.

2.1. Construct Training and Test Data using Recbole:

bash scripts/gen_dataset.sh

2.2. Generate Item Representations:

bash scripts/gen_pretrain_items.sh

2.3. Sample Pretraining Data for TASTE$^+$:

For TASTE$^+$ pretraining data construction, we sampled the four datasets with balance. For each dataset, we selected the number of items corresponding to the dataset with the largest number of training samples and then randomly supplemented the datasets with insufficient training data:

python src/sample_train.py

Similarly, we selected the number of training samples from the dataset with the fewest training items in each case to serve as the validation set:

python src/sample_valid.py

2.4. Construct Pretraining Data for Sampled Items:

bash scripts/build_pretrain.sh

2.5. Merge Training and Validation Data:

python src/merge_json.py

3. Pretraining for TASTE$^+$

Pretrain the T5 model using next item prediction (NIP) and masked item prediction (MIP) tasks.

bash scripts/pretrain.sh

Adjust training parameters based on your GPU device. Select the checkpoint with the lowest evaluation loss as the final pretrained checkpoint.

4. Finetuning for TASTE$^+$

bash scripts/gen_train_items.sh
bash scripts/build_train.sh

4.1. Train

bash scripts/train_ft.sh

4.2. Evaluate

bash scripts/eval_ft.sh

4.3. Test

bash scripts/test_ft.sh

Contact

For questions, suggestions, or bug reports, please contact:

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
figs		figs
scripts		scripts
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adapting Language Models to Text Matching based Recommendation Systems

Overview

Requirements

1. Conda Environment

2. Install Openmatch

Reproduction Guide

1. Dataset Preprocessing

1.1. Download and Prepare Amazon Beauty Dataset:

1.2. Unzip the Metadata File:

1.3. Organize Files:

1.4. Process Raw Data for Recbole:

1.5. Extract and Process Required Data:

2. Data Preprocessing

2.1. Construct Training and Test Data using Recbole:

2.2. Generate Item Representations:

2.3. Sample Pretraining Data for TASTE$^+$:

2.4. Construct Pretraining Data for Sampled Items:

2.5. Merge Training and Validation Data:

3. Pretraining for TASTE$^+$

4. Finetuning for TASTE$^+$

4.1. Train

4.2. Evaluate

4.3. Test

Contact

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adapting Language Models to Text Matching based Recommendation Systems

Overview

Requirements

1. Conda Environment

2. Install Openmatch

Reproduction Guide

1. Dataset Preprocessing

1.1. Download and Prepare Amazon Beauty Dataset:

1.2. Unzip the Metadata File:

1.3. Organize Files:

1.4. Process Raw Data for Recbole:

1.5. Extract and Process Required Data:

2. Data Preprocessing

2.1. Construct Training and Test Data using Recbole:

2.2. Generate Item Representations:

2.3. Sample Pretraining Data for TASTE$^+$:

2.4. Construct Pretraining Data for Sampled Items:

2.5. Merge Training and Validation Data:

3. Pretraining for TASTE$^+$

4. Finetuning for TASTE$^+$

4.1. Train

4.2. Evaluate

4.3. Test

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages