Log inSign up
Quentin Gallouédec
1,115 posts
user avatar
Quentin Gallouédec
@QGallouedec
PhD - Post-training @huggingface 🤗 TRL lead maintainer 🇫🇷 in 🇨🇦
Joined May 2019
812
Following
4,458
Followers
  • Pinned
    user avatar
    Quentin Gallouédec
    @QGallouedec
    Mar 31
    We finally shipped TRL v1.0!! stable APIs, broad integrations, and a design built to absorb whatever the field throws at it next. Let's go! hf.co/blog/trl-v1
    17K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Jan 25, 2025
    Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go!
    GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1
    From github.com
    180K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Mar 24, 2025
    ☄️ GRPO now scales to 70B+ models with multi-node training and super-fast performance. Install the latest v0.16 version of TRL pip install trl With all these the freshest features and optimizations that we've added, you can train up to 60 times faster! More details in the
    69K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Feb 9, 2025
    Train an agent with GRPO? Yes, it works! I've made a small demo example if you're interested!
    70K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Feb 2, 2025
    One week into Open-R1, our project to replicate its training pipeline and synthetic data. A thread 🧵 (0/13) More details here:
    Open-R1: Update #1
    From huggingface.co
    72K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Apr 25, 2025
    just pip install trl
    61K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Apr 22, 2024
    🆕 Introducing JAT, the first open-source multi-modal, multi-task multi-domain agent! 🤖 A step toward open generalist agents! 🚀 📰 Blog: huggingface.co/blog/jat
    00:00
    73K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Mar 22, 2025
    🪂 Getting GRPO Done Right (Dr GRPO) is now in TRL @zzlccc proved that scaling by the std introduces question-level difficulty bias! You can now remove this bias 🗑️
    51K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Apr 30, 2025
    GRPO x Curriculum learning 😳 The only difference is that I sorted the dataset (math questions) by difficulty. Do you agree that it's the kind of curve you'd expect? But the most interesting question is, does it give better results? Answer in the thread 🧵 (0/n)
    55K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Aug 18, 2025
    Replying to @_ma_thusal_em
    SFR, 200% Ce que vous voyez est une fibre cassée par le technicien, mais c’est au client de payer la réparation
    84K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Aug 14, 2025
    🚨 Big news! We decided that @huggingface’s post-training library, TRL, will natively supports training Vision Language Models 🖼️ This builds on our recent VLM support in SFTTrainer — and we’re not stopping until TRL is the #1 VLM training library 🥇 More here 👉
    30K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Mar 20, 2025
    🤹‍♀️ GRPO Trainer in TRL now handles mixed objectives! Simply return `None` if the reward function doesn’t apply to the sample. More in the documentation! Kudos to Shirin for contributing this feature to TRL.
    17K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Jul 29, 2025
    📢 TRL 0.20 drops: Fine-tune your VLM with GRPO! And it also includes GSPO. So basically, fine-tune your VLM with GSPO.
    24K
  • user avatar
    Quentin Gallouédec
    @QGallouedec
    Jul 27, 2025
    Merry Christmas 🎁 GSPO is in TRL. Looking forward to see your reward curves 📈
    32K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up