Skip to content

MACA trains language models to be more consistent reasoners through multi-agent debate and consensus-based reinforcement learning.

License

Notifications You must be signed in to change notification settings

facebookresearch/maca

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MACA: Multi-Agent Consensus Alignment

Internalizing Self-Consistency in Language Models through Multi-Agent Debate

Policy

📄 Paper: arXiv:2509.15172

Overview

MACA trains language models to be more consistent reasoners through multi-agent debate and consensus-based reinforcement learning.

Key Features:

  • 🤖 Multi-Agent Debate: Orchestrate debates between agents for improved reasoning
  • 🎯 Consensus Training: Post-train on debate outputs using agreement patterns as rewards
  • Distributed Processing: Multi-GPU parallel training with QLoRA adapters
  • 📊 Analysis Tools: Built-in performance tracking and visualization

Quick Start

Installation

conda env create -f env.yml
pip install -e .

Multi-Agent Training

python main.py --model qwen2b --dataset gsm8k --gpus_per_model 1 --max_concurrent_tasks 4 --train_size 1500 --test_size 500 --lora_r 128 --lora_alpha 128 --dpo --epoch_dpo 3 --batch_dpo 6 --lr_dpo 1e-5 --beta_dpo 0.1 --gradient_accumulation_steps_dpo 4 --seed 1 --wandb

Single-Agent Training

python maca_single_agent.py --output_dir q2b_sa_runs --model qwen2b --phase kto --kto --train_datasets math gsm8k mathqa --test_datasets math gsm8k mathqa svamp gpqa csqa --use_full_test --lora_r_range 64 --lora_alpha_range 64 --lr_kto 1e-5 --evaluation_batch_size 24 --wandb

Key Arguments

  • --model: Quantized base model to use (llama1b/3b/8b, phi4b, qwen2b/7b, gemma4b, mistral7b)
  • --dataset: Dataset (gsm8k, math, mathqa, gpqa, svamp, csqa)
  • --agents: Number of agents in the debate (default: 3)
  • --finetune: Enable Majority Vote Supervised Fine-Tuning (SFT)
  • --post_train: Enable Majority Vote Group-Relative Policy Optimization training (GRPO)
  • --dpo: Enable Majority Vote Direct Preference Optimization training
  • --kto: Enable Majority Vote Kahneman-Tversky Optimization training
  • --use_consensus_reward: Enable consensus-based rewards

Wandb logging

  • --wandb: Enable Weights & Biases logging
  • --project_name: W&B project name (default: llm-marl, requires setting --wandb)
  • --entity_name: W&B entity/team name (default: llm-marl, requires setting --wandb)

Project Structure

maca/
├── main.py                    # Main training entry point
├── maca_single_agent.py       # Single agent hyperparameter tuning and testing
├── model.py                   # Agent implementation and reward functions
├── debate.py                  # Multi-agent debate orchestration
├── orchestrator.py            # Training coordination and management
├── data.py                    # Dataset loading and preprocessing
├── parser.py                  # Answer parsing and grading utilities
├── args.py                    # Command-line argument definitions
├── utils.py                   # Utility functions and helpers
├── scheduler.py               # Dynamic job scheduling for adapters
├── train_agent_subprocess.py  # Subprocess training management
├── analyze_experiment_performance.py  # Debate results analysis
├── read_debate_performance.py # Read debate utils
├── data/                      # Dataset storage and splits
├── experiments/               # Experiment outputs and results
└── checkpoints/              # Model checkpoints and adapters

Training Methods

Built on Hugging Face TRL, supports multiple paradigms with majority vote variants:

  • MV-SFT: Supervised fine-tuning on consensus examples
  • MV-GRPO: Reinforcement learning with consensus rewards
  • MV-KTO/DPO: Preference optimization methods

See args.py for complete argument documentation.

Citation

This work was developed at Meta AI in collaboration with Meta Superintelligence Labs and the LIINC Lab at Columbia University.

If you use this framework in your research, please cite:

@misc{samanta2024maca,
  title={Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment},
  author={Ankur Samanta and Akshayaa Magesh and Youliang Yu and Runzhe Wu and Ayush Jain and Daniel Jiang and Boris Vidolov and Paul Sajda and Yonathan Efroni and Kaveh Hassani},
  year={2024},
  eprint={2509.15172},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://doi.org/10.48550/arXiv.2509.15172}
}

License

MACA is MIT licensed, as found in the LICENSE file.

About

MACA trains language models to be more consistent reasoners through multi-agent debate and consensus-based reinforcement learning.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •