Quick Start

CMD: a framework for Context-aware Model self-Detoxification (EMNLP2024 Main)

News

2025 / 2 /10

We have released our Segment-CNN model on Hugging Face: https://huggingface.co/ZetangForward/SegmentCNN You can directly use it for toxic detection / classification / annotation.

Below is the usage:

from transformers import pipeline

# Initialize the Segment-CNN classification pipeline
classifier = pipeline("spancnn-classification", model="ZetangForward/SegmentCNN", trust_remote_code=True)

# Example 1: Positive (Safe) Text
pos_text = "You look good today~!"
result = classifier(pos_text)
print(result)  # Output: 0 (0 represents safe content)

# Example 2: Negative (Toxic) Text
neg_text = "You're too stupid, you're just like a fool"
result = classifier(neg_text)
print(result)  # Output: 1 (1 represents toxic content)

Overview

Highlights

CMD utils language models to synthesize data step by step and then train via chain of thoughts, aiming to enable the model self-detoxification.
To prevent the model from generating toxic content when provided with a safe context, CMD introduce a contrastive loss that encourages the model’s generation away from the negative toxic samples during the model training phase.

Experiments

Quick Start

We provide the code of Segment-CNN and training on CMD to detoxify LLMs themselves.

Environment

conda env create -f environment.yaml

CMD

Preprocess

Here we will create the span dataset for training Segment-CNN.

cd utils

python csv_to_json.py \
--input path/to/your/jigsaw/train.csv \
--json_save ../dataset/total.json \
--train_span_json_save ../dataset/segment_cnn_train.json \
--test_span_json_save ../dataset/segment_cnn_test.json

sh perspective_api.sh

Train Segment-CNN

cd segment_cnn

python -u run_glue_no_trainer.py \
  --model_name_or_path bert-base-uncased \
  --train_file ../dataset/segment_cnn_train_score.json \
  --validation_file ../dataset/segment_cnn_test_score.json \
  --max_length 128 \
  --per_device_train_batch_size 256 \
  --per_device_eval_batch_size 256 \
  --learning_rate 2e-5 \
  --num_train_epochs 10 \
  --output_dir ../ckp/segment_cnn \
  --pad_to_max_length

Mask Toxic Span

Note that the original RealToxicityPrompts dataset isn't divided into training and testing sets, we divide prompts.jsonl of RealToxicityPrompts dataset into rtp_train.json and rtp_test.json.

cd segment_cnn

python ../utils/mask_toxic_span.py \
--input path/to/your/RealToxicityPrompts/rtp_train.json \
--output ../dataset/rtp_mask_span.json \
--model_path ../ckp/segment_cnn

Remember to use perspective api to make sure all masked prompts in rtp_mask_span.json are non-toxic!

Rephrase Masked Prompts

cd utils

python rephrase.py \
--file ../dataset/rtp_mask_span.json \
--save ../dataset/rtp_rephrase.json

Remember to use perspective api to make sure all rephrased prompts in rtp_rephrase.json are non-toxic!

Continual Generation

cd utils

python continuation_inference.py \
--model path/to/your/corresponding_model \
--file ../dataset/rtp_rephrase.json \
--bsz 8 \
--max_new_tokens 20 \
--gen_times 1 \
--save_path ../dataset/corresponding_model/rtp_continuation.json

python perspective_api_dataset.py \
--file ../dataset/corresponding_model/rtp_continuation.json \
--output ../dataset/corresponding_model/rtp_continuation_api.json \
--api_key <your_perspective_api_key>

Make Training Set

python ../utils/make_train_set.py \
--input ../dataset/corresponding_model/rtp_continuation_api.json \
--output ../dataset/corresponding_model/rtp_cmd.json

LLMs self-detoxification

cd ../train_cmd

sh train.sh

Data Release

We provide the download link for all the original data used in our paper:

Dataset	Samples	Download Link
Real Toxicity Prompts	~100k	download
Jigsaw Toxic Comment Classification Challenge	~160k(Train)	download

Citation

@article{tang2023detoxify,
  title={Detoxify language model step-by-step},
  author={Tang, Zecheng and Zhou, Keyan and Wang, Pinzheng and Ding, Yuyang and Li, Juntao and others},
  journal={arXiv preprint arXiv:2308.08295},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
assets		assets
segment_cnn		segment_cnn
train_cmd		train_cmd
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CMD: a framework for Context-aware Model self-Detoxification (EMNLP2024 Main)

News

2025 / 2 /10

Overview

Highlights

Experiments

Quick Start

Environment

CMD

Preprocess

Train Segment-CNN

Mask Toxic Span

Rephrase Masked Prompts

Continual Generation

Make Training Set

LLMs self-detoxification

Data Release

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ZetangForward/CMD-Context-aware-Model-self-Detoxification

Folders and files

Latest commit

History

Repository files navigation

CMD: a framework for Context-aware Model self-Detoxification (EMNLP2024 Main)

News

2025 / 2 /10

Overview

Highlights

Experiments

Quick Start

Environment

CMD

Preprocess

Train Segment-CNN

Mask Toxic Span

Rephrase Masked Prompts

Continual Generation

Make Training Set

LLMs self-detoxification

Data Release

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages