⚖️ Fix scale_rewards issue in GRPO #3992

Peter-Chou · 2025-09-02T06:58:02Z

What does this PR do?

Fixes scale_rewards malfunctioned in GRPOTrainer #3991

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Amazing team members such as @qgallouedec

LeonEricsson

LGTM

LeonEricsson · 2025-09-02T08:55:25Z

trl/trainer/grpo_config.py

+        self.scale_rewards = {"true": "group", "false": "none"}.get(self.scale_rewards.lower(), self.scale_rewards.lower())
+        if self.scale_rewards not in ["batch", "none", "group"]:
+            raise ValueError(
+                f"Invalid value for scale_rewards: {self.scale_rewards}. Must be one of 'batch', 'group', or 'none'."
+            )
+


@qgallouedec I noticed that you moved this into the trainer in the original PR. Thoughts? I prefer eager validation like this.

Yes, I usually try to minimize what we have in post-init. But here I guess it's ok

HuggingFaceDocBuilderDev · 2025-09-02T08:57:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/grpo_config.py

qgallouedec

lgtm, thanks a lot, a patch release will soon be released

Co-authored-by: Leon <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

fix scale_rewards issue in GRPO

753cdcf

LeonEricsson approved these changes Sep 2, 2025

View reviewed changes

LeonEricsson reviewed Sep 2, 2025

View reviewed changes

allow bool scale_rewards

ca300d1

qgallouedec added the 🩹 for patch label Sep 2, 2025

qgallouedec reviewed Sep 2, 2025

View reviewed changes

trl/trainer/grpo_config.py Outdated Show resolved Hide resolved

qgallouedec reviewed Sep 2, 2025

View reviewed changes

trl/trainer/grpo_config.py Outdated Show resolved Hide resolved

qgallouedec reviewed Sep 2, 2025

View reviewed changes

apply suggestions

a25960a

qgallouedec changed the title ~~fix scale_rewards issue in GRPO~~ ⚖️ Fix scale_rewards issue in GRPO Sep 2, 2025

qgallouedec merged commit 35702ce into huggingface:main Sep 3, 2025
10 checks passed

Peter-Chou deleted the scale_rewards_fix branch September 3, 2025 01:27

qgallouedec added a commit that referenced this pull request Sep 3, 2025

⚖️ Fix scale_rewards issue in GRPO (#3992)

a436c0a

Co-authored-by: Leon <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

SamY724 pushed a commit to SamY724/trl that referenced this pull request Sep 6, 2025

⚖️ Fix scale_rewards issue in GRPO (huggingface#3992)

50628b7

Co-authored-by: Leon <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚖️ Fix scale_rewards issue in GRPO #3992

⚖️ Fix scale_rewards issue in GRPO #3992

Uh oh!

Peter-Chou commented Sep 2, 2025 •

edited

Loading

Uh oh!

LeonEricsson left a comment

Uh oh!

LeonEricsson Sep 2, 2025 •

edited

Loading

Uh oh!

qgallouedec Sep 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

⚖️ Fix scale_rewards issue in GRPO #3992

⚖️ Fix scale_rewards issue in GRPO #3992

Uh oh!

Conversation

Peter-Chou commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

LeonEricsson left a comment

Choose a reason for hiding this comment

Uh oh!

LeonEricsson Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qgallouedec Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 2, 2025

Uh oh!

Uh oh!

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Peter-Chou commented Sep 2, 2025 •

edited

Loading

LeonEricsson Sep 2, 2025 •

edited

Loading