Log inSign up
Dan Fu
Together AI
898 posts
user avatar
Dan Fu
Together AI
@realDanFu
VP, Kernels @togethercompute Assistant Professor @ucsd_cse Looking for talented kernel engineers and performance engineers!
danfu.org
Joined September 2019
241
Following
7,765
Followers
  • Pinned
    user avatar
    Dan Fu
    Together AI
    @realDanFu
    Aug 19, 2024
    Excited to share that I will be joining UCSD CSE as an assistant professor in January 2026! I'll be recruiting PhD students from the 2024 application pool - if you're interested in anything ML Sys/efficiency/etc please reach out & put my name on your application! Until then
    116K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Jan 23, 2023
    Attention is all you need... but how much of it do you need? Announcing H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! Accepted as a *spotlight* at #ICLR2023! 📣 w/ @tri_dao 📜 arxiv.org/abs/2212.14052 1/n
    arXiv logo
    arxiv.org
    Hungry Hungry Hippos: Towards Language Modeling with State Space Models
    State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly...
    373K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Oct 13, 2022
    We spent a couple days this week speeding up Stable Diffusion in @huggingface Diffusers using FlashAttention. 3-4x faster than the original version, 33% faster than the super optimized v0.4.1 - and >1 image/s throughput on A100. w/ @tri_dao A short thread on how we did it👇
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Nov 13, 2023
    Announcing FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores! We speed up exact FFT convolutions by up to 7.93x over PyTorch, reduce memory footprint, and get 4.4x speedup end-to-end. Read on for more details: Thanks @arankomatsuzaki and @_akhaliq for
    70K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Oct 23, 2023
    Excited about models that are sub-quadratic in sequence length and model dimension? Our Monarch Mixer paper is now on arXiv -- and super excited to present it as an oral at #NeurIPS2023! Let's dive in to what's new with the paper and the new goodies from this release: Monarch
    81K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Mar 28, 2023
    This sentiment is exactly right - and why we've been working to increase sequence length in our lab for the past two years! From FlashAttention, to S4, H3, Hyena, and more - check out our blog post putting this line of work into context: hazyresearch.stanford.edu/blog/2023-03-2… More below: 1/n
    user avatar
    Sam Altman
    OpenAI
    @sama
    Mar 25, 2023
    we though we wanted flying cars and not 140/280 characters, but really we wanted 32000 tokens
    hazyresearch.stanford.edu
    From Deep to Long Learning?
    92K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Jan 11, 2024
    New year, new model drop! w/ @JonSaadFalcon, @simran_s_arora, excited to release new long-context retrieval models with Monarch Mixer, up to 32K sequence length! First step 2 long-context retrieval, outperforming Mistral, BGE, OpenAI on long-context document retrieval. 1/
    54K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Jun 23, 2022
    S4 is an amazing sequence model - but has seemed mysterious. It doesn't have to be! In this blog (originally an internal explainer for our group), @HazyResearch looks at S4 from first principles that are familiar to most sophomore engineering students.
    hazyresearch.stanford.edu
    Simplifying S4
    Explaining S4 from the first principles of signal processing.
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Feb 15, 2023
    What's the simplest model that can get the job done? New paper and blog post on how the answer for sequence modeling (including language) may be convolutions... with a touch of regularization. 📜 arxiv.org/abs/2302.06646 🖥️ github.com/HazyResearch/s… ⌨️ hazyresearch.stanford.edu/blog/2023-02-1… 1/n
    arXiv logo
    arxiv.org
    Simple Hardware-Efficient Long Convolutions for Sequence Modeling
    State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime...
    30K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Jul 25, 2023
    You've heard of models that are sub-quadratic in sequence length, but what if they were sub-quadratic in model *dimension* too? Announcing a preview of Monarch Mixer - a fully sub-quadratic & hardware-efficient architecture that matches BERT in quality! w @simran_s_arora 1/
    61K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Mar 5, 2025
    Super excited to announce ThunderMLA: fast MLA decode in ThunderKittens ⚡️🐱! Up to 35% faster than FlashMLA. Where does that speedup come from? It's all in the scheduling! 1/
    26K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Jan 23, 2023
    Replying to @realDanFu
    One key point: SSMs are *linear* in sequence length instead of quadratic, and have no fixed context length. Long context for everyone! We're super excited, so we're releasing our code and model weights today - up to 2.7B parameters! github.com/HazyResearch/H3 2/n
    GitHub - HazyResearch/H3: Language Modeling with the H3 State Space Model
    From github.com
    15K
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Jan 10, 2022
    The Stanford MLSys Seminar is now available in podcast form on Apple Podcasts, Spotify, Google, and more! We release new podcasts every Monday and Friday (new episodes on Fridays, old episodes from the backlog on Mondays). Check us out on your favorite platform below! (1/n)
  • user avatar
    Dan Fu
    Together AI
    @realDanFu
    Apr 19, 2022
    Blog alert! 📣 How does contrastive learning work? How can we apply it effectively? New *3-part series* covering *2 new papers* on getting better transfer & robustness, and how to apply contrastive w types to improve entity retrieval. Part 1: hazyresearch.stanford.edu/blog/2022-04-1… 👇 (1/n)
    hazyresearch.stanford.edu
    Advances in Understanding, Improving, and Applying Contrastive Learning
    Part 1 of a 3-part blog series on advances in contrastive learning.

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms of Service|Privacy Policy|Cookie Policy|Accessibility|Ads info|© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up