This repo contains the source code for the paper: Predicting Task Performance with Context-aware Scaling Laws.
📃 [Paper] • 💻 [GitHub] • 🤗 [Hugging Face]
Clone this repository and install the dependencies. This codebase was built and tested with Python 3.10.
git clone https://github.com/wang-research-lab/context-scaling.git
cd context-scaling
conda create --name context-scaling python=3.10
conda activate context-scaling
pip install -r requirements.txt
pip install flash-attn==2.4.3 --no-build-isolationOur codebase supports both training (context extension via YaRN) and evaluation.
We reimplement the YaRN context extension method in train.py closely following the official implementation. We use Accelerate and FSDP for distributed training. Configuration is managed via Hydra, and the base configuration can be found at configs/training/train.yaml. For example, to extend the context limit of Llama-2-7b to 8k tokens, run:
accelerate launch --config_file configs/training/accelerate/fsdp_8gpu.yaml train.py run_name=Yarn-Llama-2-7b-8k model.pretrained_model_name_or_path=meta-llama/Llama-2-7b-hf rope_scaling.factor=2.0Extended checkpoints of Llama-2 7b and 13b can be found here.
Evaluation code can be found in eval.py. Configuration is managed via Hydra, and the base configuration can be found at configs/eval/eval.yaml. Supported datasets can be found at configs/eval/dataset. For example, to evaluate Yarn-Llama-2-7b-128k on GSM8K, run:
python eval.py dataset=gsm model=WangResearchLab/Yarn-Llama-2-7b-128kNote: To enable vLLM to process inputs that exceed the model's context window, set the environment variable VLLM_ALLOW_LONG_MAX_MODEL_LEN to 1. Furthermore, when using the base 7b and 13b models, set the environment variable PATCH_VLLM_ROPE to 1 to avoid an illegal memory access error.
If you find this work useful or relevant to your work, please kindly cite our paper:
@misc{contextscaling2025,
title={Predicting Task Performance with Context-aware Scaling Laws},
author={Kyle Montgomery* and David Park* and Jianhong Tu and Michael Bendersky and Beliz Gunel and Dawn Song and Chenguang Wang},
year={2025},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2510.14919}
}