Quentin Gallouédec (@QGallouedec) / X

Quentin Gallouédec

1,115 posts

Quentin Gallouédec

@QGallouedec

PhD - Post-training @huggingface 🤗 TRL lead maintainer 🇫🇷 in 🇨🇦

Joined May 2019

Pinned
Quentin Gallouédec
@QGallouedec
Mar 31
We finally shipped TRL v1.0!! stable APIs, broad integrations, and a design built to absorb whatever the field throws at it next. Let's go! hf.co/blog/trl-v1
17K
Quentin Gallouédec
@QGallouedec
Jan 25, 2025
Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go!
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1
From github.com
180K
Quentin Gallouédec
@QGallouedec
Mar 24, 2025
☄️ GRPO now scales to 70B+ models with multi-node training and super-fast performance. Install the latest v0.16 version of TRL pip install trl With all these the freshest features and optimizations that we've added, you can train up to 60 times faster! More details in the
69K
Quentin Gallouédec
@QGallouedec
Feb 9, 2025
Train an agent with GRPO? Yes, it works! I've made a small demo example if you're interested!
70K
Quentin Gallouédec
@QGallouedec
Feb 2, 2025
One week into Open-R1, our project to replicate its training pipeline and synthetic data. A thread 🧵 (0/13) More details here:
Open-R1: Update #1
From huggingface.co
72K
Quentin Gallouédec
@QGallouedec
Apr 25, 2025
just pip install trl
61K
Quentin Gallouédec
@QGallouedec
Apr 22, 2024
🆕 Introducing JAT, the first open-source multi-modal, multi-task multi-domain agent! 🤖 A step toward open generalist agents! 🚀 📰 Blog: huggingface.co/blog/jat
00:00
73K
Quentin Gallouédec
@QGallouedec
Mar 22, 2025
🪂 Getting GRPO Done Right (Dr GRPO) is now in TRL @zzlccc proved that scaling by the std introduces question-level difficulty bias! You can now remove this bias 🗑️
51K
Quentin Gallouédec
@QGallouedec
Apr 30, 2025
GRPO x Curriculum learning 😳 The only difference is that I sorted the dataset (math questions) by difficulty. Do you agree that it's the kind of curve you'd expect? But the most interesting question is, does it give better results? Answer in the thread 🧵 (0/n)
55K
Quentin Gallouédec
@QGallouedec
Aug 18, 2025
Replying to @_ma_thusal_em
SFR, 200% Ce que vous voyez est une fibre cassée par le technicien, mais c’est au client de payer la réparation
84K
Quentin Gallouédec
@QGallouedec
Aug 14, 2025
🚨 Big news! We decided that @huggingface’s post-training library, TRL, will natively supports training Vision Language Models 🖼️ This builds on our recent VLM support in SFTTrainer — and we’re not stopping until TRL is the #1 VLM training library 🥇 More here 👉
30K
Quentin Gallouédec
@QGallouedec
Mar 20, 2025
🤹‍♀️ GRPO Trainer in TRL now handles mixed objectives! Simply return `None` if the reward function doesn’t apply to the sample. More in the documentation! Kudos to Shirin for contributing this feature to TRL.
17K
Quentin Gallouédec
@QGallouedec
Jul 29, 2025
📢 TRL 0.20 drops: Fine-tune your VLM with GRPO! And it also includes GSPO. So basically, fine-tune your VLM with GSPO.
24K
Quentin Gallouédec
@QGallouedec
Jul 27, 2025
Merry Christmas 🎁 GSPO is in TRL. Looking forward to see your reward curves 📈
32K