AdaFuse: Adaptive Ensemble Decoding for Large Language Models

Overview

AdaFuse is an adaptive ensemble decoding framework that dynamically combines multiple large language models (LLMs) during inference. Unlike traditional fixed-granularity ensemble methods, AdaFuse adapts its fusion strategy on-the-fly based on model confidence and generation context, achieving superior performance across diverse NLP tasks.

Key Features

🎯 Adaptive Word-Level Fusion: Dynamically adjusts fusion granularity during generation based on decoding context
🔍 Confidence-Guided Decoding: Uses uncertainty-based criteria to decide when to apply ensembling
🚀 Diversity-Aware Scaling: Explores alternative continuations only when needed, balancing effectiveness and efficiency
📈 Strong Performance: Achieves 6.88% average relative improvement over strong ensemble baselines
🔧 Training-Free: Works with any pre-trained LLMs without additional training

Method

AdaFuse introduces an adaptive ensemble decoding strategy that:

Confidence Assessment: At each decoding step, evaluates model confidence using top-1 margin (Δ_k)
Adaptive Commitment:
- High confidence → Generate longer word spans directly
- Low confidence → Invoke diversity-aware exploration
Ensemble Decision: Generates multiple candidate continuations and selects the best one based on ensemble scoring
Mid-Generation Correction: Enables flexible correction during generation, not just post-generation

Results

AdaFuse demonstrates consistent improvements across diverse NLP tasks, including open-domain question answering (Natural Questions, TriviaQA), reading comprehension (SQuAD), arithmetic reasoning (GSM8K), and machine translation (FLORES). The framework achieves an average relative improvement of 6.88% over strong ensemble baselines, with particularly notable gains on challenging QA tasks.

Environment & Requirements

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU (recommended for inference)
16GB+ GPU memory for 8B models

Required Packages

torch>=2.0.0
transformers>=4.30.0
datasets>=2.0.0
accelerate>=0.20.0
sacrebleu>=2.0.0
word2number>=1.1
sentencepiece>=0.1.99
protobuf>=3.20.0

Usage

Two-Model Ensemble

python AdaFuse_two_models.py \
  --test_set path/to/test_set.jsonl \
  --prompts path/to/prompts.txt \
  --model_path1 meta-llama/Llama-3.1-8B-Instruct \
  --model_path2 internlm/internlm3-8b-instruct \
  --output_file output.jsonl \
  --theta_delta 0.7 \
  --max_words 3 \
  --max_total_tokens 512

Key Parameters

--theta_delta: Confidence threshold for adaptive decoding
- Higher values → More conservative, fewer ensemble decisions
- Lower values → More exploration, more ensemble decisions
--max_words: Maximum words to generate per step
--max_total_tokens: Maximum total tokens for generation

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
evaluation		evaluation
figures		figures
local_datasets		local_datasets
AdaFuse_four_models.py		AdaFuse_four_models.py
AdaFuse_three_models.py		AdaFuse_three_models.py
AdaFuse_two_models.py		AdaFuse_two_models.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdaFuse: Adaptive Ensemble Decoding for Large Language Models

Overview

Key Features

Method

Results

Environment & Requirements

Prerequisites

Required Packages

Usage

Two-Model Ensemble

Key Parameters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AdaFuse: Adaptive Ensemble Decoding for Large Language Models

Overview

Key Features

Method

Results

Environment & Requirements

Prerequisites

Required Packages

Usage

Two-Model Ensemble

Key Parameters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages