We welcome everyone to open an issue for any related work we haven't covered, and we'll try to address it in the next release!
- [2026-05] 🔥 Paper available on preprints.org: https://www.preprints.org/manuscript/202605.1041
If you find this work helpful, please cite us:
@article{bai2026inferencetime,
title = {Inference-Time Control for Trustworthy Large Language Models},
author = {Bai, Yuyang and Liu, Zheyuan and Yan, Han and Xu, Zhangchen and Wan, Yixin and Chen, Canyu and Wang, Zehong and Yuan, Xiangchi and Huang, Yue and Dou, Guangyao and Zhang, Yuji and Zhu, Hangxiao and Li, Zhuofeng and Li, Manling and Zhang, Xiangliang and Bansal, Mohit and Koyejo, Sanmi and Chang, Kai-Wei and Zhang, Yu and Jiang, Meng},
journal = {Preprints},
year = {2026},
month = {May},
publisher = {Preprints},
doi = {10.20944/preprints202605.1041.v1},
url = {https://doi.org/10.20944/preprints202605.1041.v1}
}This work covers Inference-Time Control methods for building trustworthy LLMs, organized into three tiers:
-
Tier 1 — External Controls: Treat the model as a black box. Shape behavior by modifying inputs, decoding process, or outputs, without changing internal weights or activations.
- Context Engineering: Strategic prompt design through rules, instructions, or few-shot exemplars.
- Guardrails: External modules that inspect inputs/outputs against safety or policy constraints.
- Decoding Strategies: Manipulation of token-level distributions during generation.
-
Tier 2 — Internal Manipulations: Require white-box access. Intervene directly in the model's internal computation.
- Representation Engineering: Direct modification of internal activations via steering vectors.
- Unlearning: Targeted removal of information, behaviors, or biases from a pre-trained model.
- Pruning: Post-training removal of weights, neurons, or attention heads for trust-related effects.
-
Tier 3 — System-Level Orchestration: Coordinate multiple LLM agents through structured interaction patterns.
- Multi-Agent Systems: Coordinated agent interactions such as debate or cross-verification.
We thank all the researchers who contributed to this field. This list is maintained by the authors. If you find any missing papers or errors, please open an issue.
