Code for paper "P$^3$SUM: Preserving Author’s Perspective in News Summarizationwith Diffusion Language Models" accepted at NAACL@2024!

Acknowledgement

The diffusion part heavily relies on TESS and SSD-LM. We extend our sincere gratitude to the authors for generously sharing the code in advance and for their exceptional work. Please check their remarkable papers as well:

Installation

conda create -n preserve python=3.8.5
conda activate preserve
conda env update --file env.yml

Run

First fill the path and other hyperparameters in simplex-diffusion-main/commands_for_sum/run_sum.sh:

cd "path/to/simplex-diffusion-main"
model_name="roberta-base"
model_path="model/" #path to the finetuned diffusion model for summarization task
learning_rate=3e-5
max_steps=200000 #120000
CUDA_VISIBLE_DEVICES=1
datasetname="Sampled_Datasets/cnn_dm_500_raw" #path to the input datasets
dataset_config_name="3.0.0"
cache_dir="./"
preprocessing_num_workers=16
overwrite_cache=false
per_device_train_batch_size=8
per_device_eval_batch_size=16
do_train=false #whether train the model
do_eval=false
do_predict=true #wether test the model(generate summaries)
evaluation_strategy="no"
eval_steps=1000
report_to="tensorboard"
overwrite_output_dir=false
max_seq_length=512
max_target_length=100 # recommend to be close to the avg. length of gold summary 
max_source_length=412 #num of tokens in news context
val_max_target_length=100
skip_special_tokens=true
max_eval_samples=100
max_predict_samples=500
simplex_value=5
num_diffusion_steps=1000
lr_scheduler_type="linear"
pad_to_max_length=true
beta_schedule="squaredcos_improved_ddpm"
weight_decay=0.0
warmup_steps=2000
max_steps=200000
gradient_accumulation_steps=1
logging_steps=50
save_steps=20000
conditional_generation="ul2"
save_total_limit=1
tokenized_data_path="raw/xsum/roberta"
metric_for_best_model="rouge1"
if_control=true
ctr_model_name='POLITICS_model' #path to the off-the-shelf classifier
#ctr_model_name=None
ctr_opt_label_idx=0 #political leaning of the input news context
output_dir="out/" #path to save generated summaries
decode_ctr_lr=1000

Resources

We have made the Sampled_Datasets available online for your convenience. For the off-the-shelf classifier, we recommend reaching out to the authors of the POLITICS paper. If you need the weights of the diffusion model for summarization task, please feel free to email Yuhan Liu at [email protected].

Cite

@inproceedings{liu-etal-2024-p3sum,
    title = "{P}$^3${S}um: Preserving Author{'}s Perspective in News Summarization with Diffusion Language Models",
    author = "Liu, Yuhan  and
      Feng, Shangbin  and
      Han, Xiaochuang  and
      Balachandran, Vidhisha  and
      Park, Chan Young  and
      Kumar, Sachin  and
      Tsvetkov, Yulia",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.119",
    pages = "2154--2173",
    abstract = "In this work, we take a first step towards designing summarization systems that are faithful to the author{'}s intent, not only the semantic content of the article. Focusing on a case study of preserving political perspectives in news summarization, we find that existing approaches alter the political opinions and stances of news articles in more than 50{\%} of summaries, misrepresenting the intent and perspectives of the news authors. We thus propose P$^3$Sum, a diffusion model-based summarization approach controlled by political perspective classifiers. In P$^3$Sum, the political leaning of a generated summary is iteratively evaluated at each decoding step, and any drift from the article{'}s original stance incurs a loss back-propagated to the embedding layers, steering the political stance of the summary at inference time. Extensive experiments on three news summarization datasets demonstrate that P$^3$Sum outperforms state-of-the-art summarization systems and large language models by up to 13.7{\%} in terms of the success rate of stance preservation, with competitive performance on standard metrics of summarization quality. Our findings present a first analysis of preservation of pragmatic features in summarization, highlight the lacunae in existing summarization models{---}that even state-of-the-art models often struggle to preserve author{'}s intents{---}and develop new summarization systems that are more faithful to author{'}s perspectives.",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
simplex-diffusion-main		simplex-diffusion-main
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for paper "P$^3$SUM: Preserving Author’s Perspective in News Summarizationwith Diffusion Language Models" accepted at NAACL@2024!

Acknowledgement

Installation

Run

Resources

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Code for paper "P$^3$SUM: Preserving Author’s Perspective in News Summarizationwith Diffusion Language Models" accepted at NAACL@2024!

Acknowledgement

Installation

Run

Resources

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages