Predicting Task Performance with Context-aware Scaling Laws

This repo contains the source code for the paper: Predicting Task Performance with Context-aware Scaling Laws.

📃 [Paper] • 💻 [GitHub] • 🤗 [Hugging Face]

Installation

Clone this repository and install the dependencies. This codebase was built and tested with Python 3.10.

git clone https://github.com/wang-research-lab/context-scaling.git
cd context-scaling

conda create --name context-scaling python=3.10
conda activate context-scaling

pip install -r requirements.txt
pip install flash-attn==2.4.3 --no-build-isolation

Usage

Our codebase supports both training (context extension via YaRN) and evaluation.

Training

We reimplement the YaRN context extension method in train.py closely following the official implementation. We use Accelerate and FSDP for distributed training. Configuration is managed via Hydra, and the base configuration can be found at configs/training/train.yaml. For example, to extend the context limit of Llama-2-7b to 8k tokens, run:

accelerate launch --config_file configs/training/accelerate/fsdp_8gpu.yaml train.py run_name=Yarn-Llama-2-7b-8k model.pretrained_model_name_or_path=meta-llama/Llama-2-7b-hf rope_scaling.factor=2.0

Extended checkpoints of Llama-2 7b and 13b can be found here.

Evaluation

Evaluation code can be found in eval.py. Configuration is managed via Hydra, and the base configuration can be found at configs/eval/eval.yaml. Supported datasets can be found at configs/eval/dataset. For example, to evaluate Yarn-Llama-2-7b-128k on GSM8K, run:

python eval.py dataset=gsm model=WangResearchLab/Yarn-Llama-2-7b-128k

Note: To enable vLLM to process inputs that exceed the model's context window, set the environment variable VLLM_ALLOW_LONG_MAX_MODEL_LEN to 1. Furthermore, when using the base 7b and 13b models, set the environment variable PATCH_VLLM_ROPE to 1 to avoid an illegal memory access error.

Citation

If you find this work useful or relevant to your work, please kindly cite our paper:

@misc{contextscaling2025,
  title={Predicting Task Performance with Context-aware Scaling Laws},
  author={Kyle Montgomery* and David Park* and Jianhong Tu and Michael Bendersky and Beliz Gunel and Dawn Song and Chenguang Wang},
  year={2025},
  archivePrefix={arXiv},
  url={https://arxiv.org/abs/2510.14919}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
.gitignore		.gitignore
README.md		README.md
dataset_constructors.py		dataset_constructors.py
eval.py		eval.py
metrics.py		metrics.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Task Performance with Context-aware Scaling Laws

Installation

Usage

Training

Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

wang-research-lab/context-scaling

Folders and files

Latest commit

History

Repository files navigation

Predicting Task Performance with Context-aware Scaling Laws

Installation

Usage

Training

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages