Skip to content

Conversation

@sergiopaniego
Copy link
Member

@sergiopaniego sergiopaniego commented Aug 28, 2025

What does this PR do?

  • Update all examples scripts to accept kernels from the Hub.
  • Only supported for SFT Trainer?
  • Benchmarks

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

title: Command Line Interface (CLI)
- local: jobs_training
title: Training using Jobs
- local: kernels_hub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this section under "Integration"

  - local: deepspeed_integration
    title: DeepSpeed
    local: kernel_hub_integration
    title: Kernel Hub
  - local: liger_kernel_integration
    title: Liger Kernel

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!
I considered Integrations to be the section for external toolkits so I was unsure about adding it there.


[PLOT]

## Combining FlashAttention Kernels with Liger Kernels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we pull the kernel from the hub here as well? - if yes, then let's mention it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not pulled from the Hub at the moment

</Tip>


## Comparing Attention Implementations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is really needed tbh (in the spirit to ship fast)

@sergiopaniego
Copy link
Member Author

Should I add the script used for benchmarking somewhere? It's a modification of sft.py with a callback (gist)

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a turbo nit :) Great stuff @sergiopaniego !


Building Flash Attention from source can be time-consuming, often taking anywhere from several minutes to hours, depending on your hardware, CUDA/PyTorch configuration, and whether precompiled wheels are available.

In contrast, **Hugging Face Kernels** provide a much faster and more reliable workflow. Developers don’t need to worry about complex setups—everything is handled automatically. In our benchmarks, kernels were ready to use in about **2.5 seconds**, with no compilation required. This allows you to start training almost instantly, significantly accelerating development. Simply specify the desired version, and `kernels` takes care of the rest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a cool feature of kernels! I've lost a cumulative few days of my life waiting for FA2 to compile :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the feeling 😭

@sergiopaniego sergiopaniego merged commit 0c69fd2 into main Sep 4, 2025
10 of 11 checks passed
@sergiopaniego sergiopaniego deleted the kernels_hub_docs branch September 4, 2025 13:37
SamY724 pushed a commit to SamY724/trl that referenced this pull request Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants