More coverage for LossKwargs + cleaning #38432

SunMarc · 2025-05-28T10:08:28Z

What does this PR do?

This PR does the following:

Move FlashAttentionKwargs and ForCausalKwargs to generic folder
Better handle kwargs check for gradient accumulation ( now we also check that we have the LossKwargs typing, otherwise, we disable the fix)
Extend forward function with LossKwargs typing
Better tests to see which model still needs to be fixed !

How to test

RUN_SLOW=True CUDA_VISIBLE_DEVICES=0 pytest tests/models/ -k "test_model_accepts_loss_kwargs" -s -vvvvv

Remaining work :

176 failed, 154 passed, 75 skipped, 95268 deselected, 8 warnings in 84.25s (0:01:24)

HuggingFaceDocBuilderDev · 2025-05-28T16:59:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc added 7 commits May 28, 2025 12:04

clean kwargs

73a52f5

Merge remote-tracking branch 'upstream/main' into clean-kwargs

6d3ea93

style

a5b0bf9

more cleaning

75e74e9

more

9df5b9b

style

1d1d16d

better tests !

8e0d63a

SunMarc added 3 commits May 28, 2025 19:03

typing

6f19c55

fix

ae3a350

fix

1b3c087

SunMarc mentioned this pull request Jun 23, 2025

fix: astronomical loss with ModernBERT when using gradient checkpointing (#38982) #38983

Merged

SunMarc mentioned this pull request Aug 25, 2025

fix qwen25-vl grad acc #40333

Merged

SunMarc mentioned this pull request Sep 22, 2025

[ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel #41006

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More coverage for LossKwargs + cleaning #38432

More coverage for LossKwargs + cleaning #38432

Uh oh!

SunMarc commented May 28, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

More coverage for LossKwargs + cleaning #38432

Are you sure you want to change the base?

More coverage for LossKwargs + cleaning #38432

Uh oh!

Conversation

SunMarc commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How to test

Uh oh!

HuggingFaceDocBuilderDev commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SunMarc commented May 28, 2025 •

edited

Loading