Log inSign up
Han Guo
3,449 posts
user avatar
Han Guo
@HanGuo97
PhD Student @MIT_CSAIL | Past: @togethercompute @LTIatCMU @MITIBMLab @UNCNLP, @SFResearch, @BaiduResearch | Machine Learning, NLP.
han-guo.info
Joined August 2016
4,473
Following
4,282
Followers
  • user avatar
    Han Guo
    @HanGuo97
    Jun 6, 2025
    We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels
    264K
  • user avatar
    Han Guo
    @HanGuo97
    Nov 21, 2023
    Introducing LQ-LoRA Decomposing pretrained matrices into (fixed) quantized + (trainable) low-rank components enables more aggressive quantization. We can quantize LLaMA-2 70B to 2.5 bits with minimal degradation in instruction-tuning performance. arxiv.org/abs/2311.12023 🧡1/n
    114K
  • user avatar
    Han Guo
    @HanGuo97
    Jul 21, 2024
    Introducing FLUTE, a CUDA kernel for non-uniformly quantized (via a lookup table) LLM Inference. It accelerates QLoRA's NormalFloat (NF) out of the box and more. As an application, we extended NF4 and are releasing quantized models for LLaMA-3 (8B/70B) and Gemma-2 (9B/27B).
    55K
  • user avatar
    Han Guo
    @HanGuo97
    Jul 22, 2025
    Since our initial arXiv post, several concurrent papers have introduced new architectures with log-linear properties in various forms. Two personal favorites of mine (among others) are: - Transformer-PSM by @MorrisYau et al., and - Radial Attention by Xingyang and @lmxyy1999 et
    user avatar
    Han Guo
    @HanGuo97
    Jun 6, 2025
    We know Attention and its linear-time variants, such as linear attention and State Space Models. But what lies in between? Introducing Log-Linear Attention with: - Log-linear time training - Log-time inference (in both time and memory) - Hardware-efficient Triton kernels
    21K
  • user avatar
    Han Guo
    @HanGuo97
    Dec 11, 2022
    While I'm not at #EMNLP2022, we have two works on the intersection of RL + NLP. RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning (arxiv.org/abs/2205.12548) Efficient (Soft) Q-Learning for Text Generation with Limited Good Data (arxiv.org/abs/2106.07704)
  • user avatar
    Han Guo
    @HanGuo97
    Jan 2, 2021
    Glad to share our latest work "FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging"! Joint work with @nazneenrajani @peterbhase @mohitban47 @caimingxiong (@uncnlp @sfresearch). Paper: arxiv.org/abs/2012.15781 Code: github.com/salesforce/fas… 1/5
  • user avatar
    Han Guo
    @HanGuo97
    Oct 14, 2022
    Super excited to be among this cohort of amazing people! A huge thanks to @ericxing, @yoonrkim, @ZhitingHu, @mohitban47, and everyone who provided mentorship and advice!!
    user avatar
    Microsoft Research
    Microsoft
    @MSFTResearch
    Oct 14, 2022
    At Microsoft Research, we aim to empower the next generation of computing related research talent. Today, we're thrilled to announce and congratulate this year's Microsoft Research PhD Fellowship recipients from around the world. Meet the 2022 recipients: aka.ms/phdfellowship
  • user avatar
    Han Guo
    @HanGuo97
    Apr 16, 2020
    Excited to share that I'll be joining @LTIatCMU as a PhD student this fall after three wonderful undergraduate years at @UNCNLP! Huge thanks to everyone who gave me mentorship and help along the way, especially my advisor Mohit @mohitban47 and collaborator Ram @ramakanth1729! πŸ˜€
  • user avatar
    Han Guo
    @HanGuo97
    Jun 1, 2024
    I've had some chances recently to share what we've been working on. In doing so, I made a few basic background slides that explain `torch.matmul` from GPU/CUDA's point of view, why LLM decoding is memory bound, and how weight-only quantization could speed up decoding. Slides πŸ‘‡
    22K
  • user avatar
    Han Guo
    @HanGuo97
    Jan 18, 2024
    Happy to share that LQ-LoRA will appear at #ICLR2024. TLDR: using matrix decomposition to enable more aggressive quantization before LoRA fine-tuning. - Paper (updated): arxiv.org/abs/2311.12023. - Code (with more artifacts uploaded such as models): github.com/HanGuo97/lq-lo….
    user avatar
    Han Guo
    @HanGuo97
    Nov 21, 2023
    Introducing LQ-LoRA Decomposing pretrained matrices into (fixed) quantized + (trainable) low-rank components enables more aggressive quantization. We can quantize LLaMA-2 70B to 2.5 bits with minimal degradation in instruction-tuning performance. arxiv.org/abs/2311.12023 🧡1/n
    arXiv logo
    arxiv.org
    LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for...
    We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision...
    25K
  • user avatar
    Han Guo
    @HanGuo97
    Jun 17, 2021
    Excited to share our latest work with Bowen Tan @waterluffy Eric Xing @ZhitingHu! Tldr, a new NLG formulation from soft Q-learning perspective, with app. such as learning from noisy data, text attacks, prompt generation. Paper arxiv.org/abs/2106.07704 Code github.com/HanGuo97/soft-…
  • user avatar
    Han Guo
    @HanGuo97
    Apr 29, 2023
    Unfortunately, I won't be at #ICLR2023, but please check out our recent works on Machine Learning + Systems! 1. Federated Learning as Variational Inference iclr.cc/virtual/2023/p… 2. MPCFormer: Fast, Performant, and Private Transformer inference with MPC iclr.cc/virtual/2023/p…
    17K
  • user avatar
    Han Guo
    @HanGuo97
    Jun 6, 2025
    Replying to @HanGuo97
    There has been much recent work on efficient alternatives with sub-quadratic compute and sub-linear memory, including linear attention, state-space models, and long convolution models. Despite their differences, many of these approaches can be captured by the following equation:
    11K
  • user avatar
    Han Guo
    @HanGuo97
    Sep 14, 2021
    Happy to share that our FastIF paper's been accepted at #EMNLP2021! Thanks to wonderful coauthors @nazneenrajani @peterbhase @mohitban47 @CaimingXiong @uncnlp @SFResearch @LTIatCMU Updated paper/code (w. more exps on ANLI/WILDS): arxiv.org/abs/2012.15781 github.com/salesforce/fas…

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms of Service|Privacy Policy|Cookie Policy|Accessibility|Ads info|Β© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up