Dan Fu (@realDanFu) / X

Dan Fu

898 posts

Dan Fu

@realDanFu

VP, Kernels @togethercompute Assistant Professor @ucsd_cse Looking for talented kernel engineers and performance engineers!

danfu.org

Joined September 2019

Pinned
Dan Fu
@realDanFu
Aug 19, 2024
Excited to share that I will be joining UCSD CSE as an assistant professor in January 2026! I'll be recruiting PhD students from the 2024 application pool - if you're interested in anything ML Sys/efficiency/etc please reach out & put my name on your application! Until then
116K
Dan Fu
@realDanFu
Jan 23, 2023
Attention is all you need... but how much of it do you need? Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! Accepted as a *spotlight* at #ICLR2023! 📣 w/ @tri_dao 📜 arxiv.org/abs/2212.14052 1/n
arxiv.org
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly...
373K
Dan Fu
@realDanFu
Oct 13, 2022
We spent a couple days this week speeding up Stable Diffusion in @huggingface Diffusers using FlashAttention. 3-4x faster than the original version, 33% faster than the super optimized v0.4.1 - and >1 image/s throughput on A100. w/ @tri_dao A short thread on how we did it👇
Dan Fu
@realDanFu
Nov 13, 2023
Announcing FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores! We speed up exact FFT convolutions by up to 7.93x over PyTorch, reduce memory footprint, and get 4.4x speedup end-to-end. Read on for more details: Thanks @arankomatsuzaki and @_akhaliq for
70K
Dan Fu
@realDanFu
Oct 23, 2023
Excited about models that are sub-quadratic in sequence length and model dimension? Our Monarch Mixer paper is now on arXiv -- and super excited to present it as an oral at #NeurIPS2023! Let's dive in to what's new with the paper and the new goodies from this release: Monarch
81K
Dan Fu
@realDanFu
Mar 28, 2023
This sentiment is exactly right - and why we've been working to increase sequence length in our lab for the past two years! From FlashAttention, to S4, H3, Hyena, and more - check out our blog post putting this line of work into context: hazyresearch.stanford.edu/blog/2023-03-2… More below: 1/n
Sam Altman
@sama
Mar 25, 2023
we though we wanted flying cars and not 140/280 characters, but really we wanted 32000 tokens
hazyresearch.stanford.edu
From Deep to Long Learning?
92K
Dan Fu
@realDanFu
Jan 11, 2024
New year, new model drop! w/ @JonSaadFalcon, @simran_s_arora, excited to release new long-context retrieval models with Monarch Mixer, up to 32K sequence length! First step 2 long-context retrieval, outperforming Mistral, BGE, OpenAI on long-context document retrieval. 1/
54K
Dan Fu
@realDanFu
Jun 23, 2022
S4 is an amazing sequence model - but has seemed mysterious. It doesn't have to be! In this blog (originally an internal explainer for our group), @HazyResearch looks at S4 from first principles that are familiar to most sophomore engineering students.
hazyresearch.stanford.edu
Simplifying S4
Explaining S4 from the first principles of signal processing.
Dan Fu
@realDanFu
Feb 15, 2023
What's the simplest model that can get the job done? New paper and blog post on how the answer for sequence modeling (including language) may be convolutions... with a touch of regularization. 📜 arxiv.org/abs/2302.06646 🖥️ github.com/HazyResearch/s… ⌨️ hazyresearch.stanford.edu/blog/2023-02-1… 1/n
arxiv.org
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime...
30K
Dan Fu
@realDanFu
Jul 25, 2023
You've heard of models that are sub-quadratic in sequence length, but what if they were sub-quadratic in model *dimension* too? Announcing a preview of Monarch Mixer - a fully sub-quadratic & hardware-efficient architecture that matches BERT in quality! w @simran_s_arora 1/
61K
Dan Fu
@realDanFu
Mar 5, 2025
Super excited to announce ThunderMLA: fast MLA decode in ThunderKittens ⚡️🐱! Up to 35% faster than FlashMLA. Where does that speedup come from? It's all in the scheduling! 1/
26K
Dan Fu
@realDanFu
Jan 23, 2023
Replying to @realDanFu
One key point: SSMs are *linear* in sequence length instead of quadratic, and have no fixed context length. Long context for everyone! We're super excited, so we're releasing our code and model weights today - up to 2.7B parameters! github.com/HazyResearch/H3 2/n
GitHub - HazyResearch/H3: Language Modeling with the H3 State Space Model
From github.com
15K
Dan Fu
@realDanFu
Jan 10, 2022
The Stanford MLSys Seminar is now available in podcast form on Apple Podcasts, Spotify, Google, and more! We release new podcasts every Monday and Friday (new episodes on Fridays, old episodes from the backlog on Mondays). Check us out on your favorite platform below! (1/n)
Dan Fu
@realDanFu
Apr 19, 2022
Blog alert! 📣 How does contrastive learning work? How can we apply it effectively? New *3-part series* covering *2 new papers* on getting better transfer & robustness, and how to apply contrastive w types to improve entity retrieval. Part 1: hazyresearch.stanford.edu/blog/2022-04-1… 👇 (1/n)
hazyresearch.stanford.edu
Advances in Understanding, Improving, and Applying Contrastive Learning
Part 1 of a 3-part blog series on advances in contrastive learning.