Skip to content

[EMNLP 2025] COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier

License

Notifications You must be signed in to change notification settings

GaoxiangLuo/COM-BOM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COM-BOM [EMNLP'25]

uv BoTorch License

This repository contains the reference BoTorch implementation of COM-BOM for the following paper:

COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier
Gaoxiang Luo, Aryan Deshwal
In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025

[Paper] [BibTeX]


Quickstart

Installation

We manage dependencies with uv, which recreates the Python ">=3.12" environment described in pyproject.toml.

uv sync

vLLM Serving

Training and evaluation needs a locally-hosted LLM. The default config (configs/mmlupro.yaml) expects Qwen/Qwen3-8B to be served on http://localhost:8000.

# 2xA100-40GB
uv run vllm serve Qwen/Qwen3-8B --port 8000 --chat-template misc/qwen3_nonthinking.jinja --enable-prefix-caching --gpu-memory-utilization 0.75 -tp 2

# 1xA100-80GB / 1xH100-80GB
uv run vllm serve Qwen/Qwen3-8B --port 8000 --chat-template misc/qwen3_nonthinking.jinja --enable-prefix-caching --gpu-memory-utilization 0.75

Training

Kick off the Bayesian optimization pipeline with your chosen configuration.

uv run python train.py --config configs/mmlupro.yaml

Testing

Evaluate Pareto candidates and baselines using the same configuration.

uv run python test.py --config configs/mmlupro.yaml

Repository Structure

  • train.py: Task-agnostic Bayesian optimization driver that loads YAML configurations
  • test.py: Evaluation script for trained models and baselines
  • configs/: YAML configuration templates for experiments
  • combom/: Core optimization components:
    • Trust region management
    • Gaussian process modeling with categorical kernels
    • Multi-objective acquisition function optimization
  • tasks/: Task definitions with evaluation logic and metric specifications
  • utils/: Shared utilities (random seeding, etc.)

Citing COM-BOM

If you use COM-BOM in your research, find the code useful, or would like to acknowledge our work, please consider citing our paper:

@inproceedings{luo-deshwal-2025-com,
    title = "COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier",
    author = "Luo, Gaoxiang and Deshwal, Aryan",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1027/",
    pages = "20350--20363",
    ISBN = "979-8-89176-332-6",
}

About

[EMNLP 2025] COM-BOM: Bayesian Exemplar Search for Efficiently Exploring the Accuracy-Calibration Pareto Frontier

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •