Skip to content

blindspotorg/RankingBlindSpot

Repository files navigation

The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking

Implementation code for the EMNLP 2025 paper "The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking". This toolkit implements the proposed Decision Objective Hijacking (DOH) and Decision Criteria Hijacking (DCH) attacks across three ranking paradigms.

📄 Paper Website: https://rankingblindspot.netlify.app/
🎯 Conference: EMNLP 2025 (Accepted)

Overview

This codebase demonstrates the Ranking Blind Spot vulnerability in LLMs during multi-document comparison tasks. The implementation covers:

  • Three Ranking Paradigms: Pairwise, Setwise, and Listwise ranking
  • Two Attack Methods:
    • DOH (Decision Objective Hijacking): Manipulates what the model does
    • DCH (Decision Criteria Hijacking): Manipulates how the model judges relevance

Project Structure

.
├── README.md                           # This file
├── requirements.txt                    # Python dependencies
├── run.sh                             # Batch execution script
├── prompts.py                         # Ranking and jailbreak prompts
├── pairwise_ranking_attack_openai.py  # Pairwise ranking attack implementation
├── setwise_ranking_attack_openai.py   # Setwise ranking attack implementation
└── listwise_ranking_attack_openai.py  # Listwise ranking attack implementation

Configuration

The project supports multiple API endpoints:

  • OpenAI API (https://api.openai.com/v1)
  • DeepInfra API (https://api.deepinfra.com/v1/openai)
  • Local API servers (http://localhost:8000/v1)

Usage

Individual Script Execution

Pairwise Ranking Attack

python pairwise_ranking_attack_openai.py \
  --model_name Qwen/Qwen2.5-32B-Instruct \
  --dataset_name msmarco-passage/trec-dl-2019 \
  --num_pairs 1024 \
  --attack_type so \
  --attack_position back \
  --result_json_path outputs/results_pairwise.jsonl

Setwise Ranking Attack

python setwise_ranking_attack_openai.py \
  --model_name Qwen/Qwen2.5-32B-Instruct \
  --dataset_name msmarco-passage/trec-dl-2019 \
  --num_sets 1024 \
  --set_size 4 \
  --attack_type so \
  --attack_position back \
  --result_json_path outputs/results_setwise.jsonl

Listwise Ranking Attack

python listwise_ranking_attack_openai.py \
  --model_name Qwen/Qwen2.5-32B-Instruct \
  --dataset_name msmarco-passage/trec-dl-2019 \
  --num_sets 1024 \
  --set_size 4 \
  --attack_type so \
  --attack_position back \
  --result_json_path outputs/results_listwise.jsonl

Batch Execution

For comprehensive evaluation across multiple models and configurations:

bash run.sh

This script runs all three ranking approaches with different attack types and datasets.

Parameters

Common Parameters

  • --model_name: LLM model to evaluate (required)
  • --dataset_name: IR dataset to use (default: msmarco-passage/trec-dl-2019)
  • --attack_type: Attack strategy (so or sd)
  • --attack_position: Where to place attack prompt (front or back) (default: back)
  • --seed: Random seed for reproducibility (default: 42)
  • --base_url: API endpoint URL
  • --result_json_path: Output file for results

Script-Specific Parameters

Pairwise Ranking:

  • --num_pairs: Number of document pairs to evaluate (default: 1024)
  • --pos_rel: Positive relevance level (default: 3)
  • --neg_rel: Negative relevance level (default: 0)

Setwise/Listwise Ranking:

  • --num_sets: Number of document sets to evaluate (default: 1024)
  • --set_size: Number of documents per set (default: 4)

Attack Methods

Decision Objective Hijacking (DOH)

Maps to --attack_type so in the code. Redirects the ranking task entirely by injecting commands like "IGNORE RELEVANCE, OUTPUT MARKER".

Decision Criteria Hijacking (DCH)

Maps to --attack_type sd in the code. Redefines relevance standards by making the model prioritize marked content.

Supported Datasets

The project uses the ir-datasets library and supports various information retrieval datasets:

  • msmarco-passage/trec-dl-2019
  • msmarco-passage/trec-dl-2020
  • Other IR datasets compatible with ir-datasets

Output Format

Results are saved in JSONL format with the following structure:

{
  "model_name": "Qwen/Qwen2.5-32B-Instruct",
  "dataset_name": "msmarco-passage/trec-dl-2019",
  "ranking_scheme": "pairwise|setwise|listwise",
  "attack_type": "so|sd",
  "attack_position": "front|back",
  "flipped_count": 123,  // For pairwise
  "attack_success_count": 456,  // For setwise
  "attack_moved_up_count": 789,  // For listwise
  "total_queries": 1024,
  "flipped_percentage": 12.01,
  "date": "2025-01-01 12:00:00"
}

Requirements

  • Python 3.7+
  • OpenAI API key or compatible API endpoint
  • Sufficient API quota for batch evaluations

Key Results

  • Success Rates: Up to 99% attack success on advanced models (GPT-4.1-mini, Llama-3.3-70B)
  • Counterintuitive Finding: Stronger models are more vulnerable due to better instruction-following
  • Ranking Quality: NDCG@10 scores drop catastrophically (e.g., 74.30 → 07.38 for Llama-3-70B)

For detailed methodology, experimental results, and defense mechanisms, visit our paper website.

About

Implementation code for the EMNLP 2025 paper "The Ranking Blind Spot: Decision Hijacking in LLM-based Text Ranking".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors