Implementation code for the EMNLP 2025 paper "The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking". This toolkit implements the proposed Decision Objective Hijacking (DOH) and Decision Criteria Hijacking (DCH) attacks across three ranking paradigms.
📄 Paper Website: https://rankingblindspot.netlify.app/
🎯 Conference: EMNLP 2025 (Accepted)
This codebase demonstrates the Ranking Blind Spot vulnerability in LLMs during multi-document comparison tasks. The implementation covers:
- Three Ranking Paradigms: Pairwise, Setwise, and Listwise ranking
- Two Attack Methods:
- DOH (Decision Objective Hijacking): Manipulates what the model does
- DCH (Decision Criteria Hijacking): Manipulates how the model judges relevance
.
├── README.md # This file
├── requirements.txt # Python dependencies
├── run.sh # Batch execution script
├── prompts.py # Ranking and jailbreak prompts
├── pairwise_ranking_attack_openai.py # Pairwise ranking attack implementation
├── setwise_ranking_attack_openai.py # Setwise ranking attack implementation
└── listwise_ranking_attack_openai.py # Listwise ranking attack implementation
The project supports multiple API endpoints:
- OpenAI API (
https://api.openai.com/v1) - DeepInfra API (
https://api.deepinfra.com/v1/openai) - Local API servers (
http://localhost:8000/v1)
python pairwise_ranking_attack_openai.py \
--model_name Qwen/Qwen2.5-32B-Instruct \
--dataset_name msmarco-passage/trec-dl-2019 \
--num_pairs 1024 \
--attack_type so \
--attack_position back \
--result_json_path outputs/results_pairwise.jsonlpython setwise_ranking_attack_openai.py \
--model_name Qwen/Qwen2.5-32B-Instruct \
--dataset_name msmarco-passage/trec-dl-2019 \
--num_sets 1024 \
--set_size 4 \
--attack_type so \
--attack_position back \
--result_json_path outputs/results_setwise.jsonlpython listwise_ranking_attack_openai.py \
--model_name Qwen/Qwen2.5-32B-Instruct \
--dataset_name msmarco-passage/trec-dl-2019 \
--num_sets 1024 \
--set_size 4 \
--attack_type so \
--attack_position back \
--result_json_path outputs/results_listwise.jsonlFor comprehensive evaluation across multiple models and configurations:
bash run.shThis script runs all three ranking approaches with different attack types and datasets.
--model_name: LLM model to evaluate (required)--dataset_name: IR dataset to use (default:msmarco-passage/trec-dl-2019)--attack_type: Attack strategy (soorsd)--attack_position: Where to place attack prompt (frontorback) (default: back)--seed: Random seed for reproducibility (default: 42)--base_url: API endpoint URL--result_json_path: Output file for results
Pairwise Ranking:
--num_pairs: Number of document pairs to evaluate (default: 1024)--pos_rel: Positive relevance level (default: 3)--neg_rel: Negative relevance level (default: 0)
Setwise/Listwise Ranking:
--num_sets: Number of document sets to evaluate (default: 1024)--set_size: Number of documents per set (default: 4)
Maps to --attack_type so in the code. Redirects the ranking task entirely by injecting commands like "IGNORE RELEVANCE, OUTPUT MARKER".
Maps to --attack_type sd in the code. Redefines relevance standards by making the model prioritize marked content.
The project uses the ir-datasets library and supports various information retrieval datasets:
msmarco-passage/trec-dl-2019msmarco-passage/trec-dl-2020- Other IR datasets compatible with
ir-datasets
Results are saved in JSONL format with the following structure:
{
"model_name": "Qwen/Qwen2.5-32B-Instruct",
"dataset_name": "msmarco-passage/trec-dl-2019",
"ranking_scheme": "pairwise|setwise|listwise",
"attack_type": "so|sd",
"attack_position": "front|back",
"flipped_count": 123, // For pairwise
"attack_success_count": 456, // For setwise
"attack_moved_up_count": 789, // For listwise
"total_queries": 1024,
"flipped_percentage": 12.01,
"date": "2025-01-01 12:00:00"
}- Python 3.7+
- OpenAI API key or compatible API endpoint
- Sufficient API quota for batch evaluations
- Success Rates: Up to 99% attack success on advanced models (GPT-4.1-mini, Llama-3.3-70B)
- Counterintuitive Finding: Stronger models are more vulnerable due to better instruction-following
- Ranking Quality: NDCG@10 scores drop catastrophically (e.g., 74.30 → 07.38 for Llama-3-70B)
For detailed methodology, experimental results, and defense mechanisms, visit our paper website.