Skip to content

Conversation

@qgallouedec
Copy link
Member

After a bunch of tests:

  • pytest tests/test_vllm_client_server.py for client-server
  • for colocate:
    from datasets import load_dataset
    from trl import GRPOTrainer, GRPOConfig
    from trl.rewards import accuracy_reward
    
    dataset = load_dataset("trl-lib/DeepMath-103K", split="train[:100]")
    
    trainer = GRPOTrainer(
        model="Qwen/Qwen3-0.6B",
        reward_funcs=accuracy_reward,
        args=GRPOConfig(use_vllm=True, vllm_mode="colocate"),
        train_dataset=dataset,
    )
    trainer.train()

it seems like TRL works with vLLM 0.11.x

Warning

vLLM 0.12.0 was released a few days ago, and it is NOT compatible with TRL. We will have to modify TRL to support this version

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@kashif kashif merged commit d250e4b into main Dec 8, 2025
10 of 11 checks passed
@kashif kashif deleted the vllm-0-11 branch December 8, 2025 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants