⚔️ Optimize truncate_with_protected_tokens to use vectorized operations #3875

chi2liu · 2025-08-09T18:13:28Z

What does this PR do?

This PR optimizes the truncate_with_protected_tokens function in GRPO trainer by replacing list comprehensions with vectorized tensor operations. The optimization reduces CPU-GPU synchronization overhead from O(seq_len) to O(1) per sequence.

Performance improvement

The original implementation used a list comprehension that called .item() for every token in the sequence to check if it's in the protected set:

is_protected = torch.tensor([x.item() in protected_set for x in ids])

The optimized version uses torch.isin for vectorized membership testing:

is_protected = torch.isin(ids, torch.tensor(list(protected_set), device=ids.device))

Benchmark results

~9x speedup on typical workloads (batch_size=32, seq_len=512)
Time saved: ~31ms per batch
Reduces CPU-GPU synchronization points from O(seq_len) to O(1) per sequence

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? (Existing tests cover this function)

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Replace list comprehension with torch.isin for protected token checking. This reduces CPU-GPU synchronization from O(seq_len) to O(1) per sequence, achieving ~9x speedup for typical batch sizes.

Copilot

Pull Request Overview

This PR optimizes the truncate_with_protected_tokens function in the GRPO trainer by replacing inefficient list comprehensions with vectorized tensor operations. The optimization significantly reduces CPU-GPU synchronization overhead and improves performance.

Key changes:

Replaces list comprehension with torch.isin() for vectorized membership testing
Eliminates O(seq_len) CPU-GPU synchronization points per sequence
Achieves ~9x speedup on typical workloads

trl/trainer/grpo_trainer.py

HuggingFaceDocBuilderDev · 2025-08-11T10:47:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

trl/trainer/grpo_trainer.py

…ns (huggingface#3875) Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>

Optimize truncate_with_protected_tokens to use vectorized operations

6a031e5

Replace list comprehension with torch.isin for protected token checking. This reduces CPU-GPU synchronization from O(seq_len) to O(1) per sequence, achieving ~9x speedup for typical batch sizes.

qgallouedec requested a review from Copilot August 9, 2025 18:43

Copilot AI reviewed Aug 9, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

Optimize truncate_with_protected_tokens tensor creation

14f87ec

chi2liu requested a review from kashif August 11, 2025 09:14

kashif approved these changes Aug 11, 2025

View reviewed changes

Merge branch 'main' into perf/optimize-truncate-protected-tokens

49a5df6

kashif reviewed Aug 12, 2025

View reviewed changes

trl/trainer/grpo_trainer.py Outdated Show resolved Hide resolved

kashif and others added 2 commits August 12, 2025 12:24

Update trl/trainer/grpo_trainer.py

8d3ebb6

Merge branch 'main' into perf/optimize-truncate-protected-tokens

1c7d230

qgallouedec changed the title ~~Optimize truncate_with_protected_tokens to use vectorized operations~~ ⚔️ Optimize truncate_with_protected_tokens to use vectorized operations Aug 17, 2025

qgallouedec approved these changes Aug 17, 2025

View reviewed changes

Merge branch 'main' into perf/optimize-truncate-protected-tokens

5748d62

qgallouedec merged commit a6f802f into huggingface:main Aug 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚔️ Optimize truncate_with_protected_tokens to use vectorized operations #3875

⚔️ Optimize truncate_with_protected_tokens to use vectorized operations #3875

Uh oh!

chi2liu commented Aug 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

⚔️ Optimize truncate_with_protected_tokens to use vectorized operations #3875

⚔️ Optimize truncate_with_protected_tokens to use vectorized operations #3875

Uh oh!

Conversation

chi2liu commented Aug 9, 2025

What does this PR do?

Performance improvement

Benchmark results

Before submitting

Who can review?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants