


default search action
Sehoon Kim 0001
Person information
- affiliation (PhD 2024): University of California, Berkeley, CA, USA
Other persons with the same name
- Sehoon Kim — disambiguation page
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c16]Coleman Richard Charles Hooper, Sehoon Kim, Hiva Mohammadzadeh, Monishwaran Maheswaran, Sebastian Zhao, June Paik, Michael W. Mahoney, Kurt Keutzer, Amir Gholami:
Squeezed Attention: Accelerating Long Context Length LLM Inference. ACL (1) 2025: 32631-32652
[c15]Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami:
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks. ICML 2025
[c14]Rishabh Tiwari, Haocheng Xi, Aditya Tomar, Coleman Richard Charles Hooper, Sehoon Kim, Maxwell Horton, Mahyar Najibi, Michael W. Mahoney, Kurt Keutzer, Amir Gholami:
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache. ICML 2025
[i25]Rishabh Tiwari, Haocheng Xi, Aditya Tomar, Coleman Hooper, Sehoon Kim, Maxwell Horton, Mahyar Najibi, Michael W. Mahoney, Kurt Keutzer, Amir Gholami:
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache. CoRR abs/2502.10424 (2025)
[i24]Coleman Hooper, Sehoon Kim, Suhong Moon, Kerem Dilmen, Monishwaran Maheswaran, Nicholas Lee, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami:
ETS: Efficient Tree Search for Inference-Time Scaling. CoRR abs/2502.13575 (2025)
[i23]Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami:
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks. CoRR abs/2503.09572 (2025)
[i22]Coleman Hooper, Sebastian Zhao, Luca Manolache, Sehoon Kim, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami:
Multipole Attention for Efficient Long Context Reasoning. CoRR abs/2506.13059 (2025)- 2024
[b1]Sehoon Kim:
Full Stack Approach for Efficient Deep Learning Inference. University of California Berkeley, USA, 2024
[j4]Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier M. Duarte, Philip C. Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bähr, Jürgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomás E. Müller-Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Dongning Guo, Kyle J. Hazelwood, Christian Herwig, Babar Khan, Sehoon Kim, Thomas Klijnsma, Yaling Liu, Kin Ho Lo, Tri Nguyen, Gianantonio Pezzullo
, Seyedramin Rasoulinezhad, Ryan A. Rivera, Kate Scholberg, Justin Selig, Sougata Sen, Dmitri Strukov, William Tang, Savannah Thais, Kai Lukas Unger, Ricardo Vilalta, Belinavon Krosigk, Shen Wang, Thomas K. Warburton:
Corrigendum: Applications and techniques for fast machine learning in science. Frontiers Big Data 6 (2024)
[j3]Amir Gholami
, Zhewei Yao
, Sehoon Kim
, Coleman Hooper
, Michael W. Mahoney
, Kurt Keutzer
:
AI and Memory Wall. IEEE Micro 44(3): 33-39 (2024)
[c13]Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipalli, Michael W. Mahoney, Kurt Keutzer, Amir Gholami:
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement. ACL (Findings) 2024: 6498-6526
[c12]Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer:
SqueezeLLM: Dense-and-Sparse Quantization. ICML 2024
[c11]Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami:
An LLM Compiler for Parallel Function Calling. ICML 2024
[c10]Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami:
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization. NeurIPS 2024
[i21]Siddharth Jha, Coleman Hooper, Xiaoxuan Liu, Sehoon Kim, Kurt Keutzer:
Learned Best-Effort LLM Serving. CoRR abs/2401.07886 (2024)
[i20]Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami:
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization. CoRR abs/2401.18079 (2024)
[i19]Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W. Mahoney, Kurt Keutzer:
AI and Memory Wall. CoRR abs/2403.14123 (2024)
[i18]Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipalli, Michael W. Mahoney, Kurt Keutzer, Amir Gholami:
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement. CoRR abs/2403.15042 (2024)
[i17]Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami:
Characterizing Prompt Compression Methods for Long Context Inference. CoRR abs/2407.08892 (2024)
[i16]Lutfi Eren Erdogan, Nicholas Lee, Siddharth Jha, Sehoon Kim, Ryan Tabrizi, Suhong Moon, Coleman Hooper, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami:
TinyAgent: Function Calling at the Edge. CoRR abs/2409.00608 (2024)
[i15]Suhong Moon, Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Woosang Lim, Kurt Keutzer, Amir Gholami:
Efficient and Scalable Estimation of Tool Representations in Vector Space. CoRR abs/2409.02141 (2024)
[i14]Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Monishwaran Maheswaran, June Paik, Michael W. Mahoney, Kurt Keutzer, Amir Gholami:
Squeezed Attention: Accelerating Long Context Length LLM Inference. CoRR abs/2411.09688 (2024)- 2023
[c9]Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer:
Speculative Decoding with Big Little Decoder. NeurIPS 2023
[i13]Sehoon Kim, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer:
Big Little Transformer Decoder. CoRR abs/2302.07863 (2023)
[i12]Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami:
Full Stack Optimization of Transformer Inference: a Survey. CoRR abs/2302.14017 (2023)
[i11]Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer:
SqueezeLLM: Dense-and-Sparse Quantization. CoRR abs/2306.07629 (2023)
[i10]Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Hasan Genc, Kurt Keutzer, Amir Gholami, Yakun Sophia Shao:
SPEED: Speculative Pipelined Execution for Efficient Decoding. CoRR abs/2310.12072 (2023)
[i9]Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami:
An LLM Compiler for Parallel Function Calling. CoRR abs/2312.04511 (2023)- 2022
[j2]Allison McCarn Deiana, Nhan Tran
, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier M. Duarte
, Philip C. Harris, Scott Hauck, Mia Liu, Mark S. Neubauer
, Jennifer Ngadiuba, Seda Ogrenci Memik
, Maurizio Pierini, Thea Aarrestad, Steffen Bähr, Jürgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomás E. Müller-Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche
, Amir Gholami, Ekaterina Govorkova, Dongning Guo, Kyle J. Hazelwood, Christian Herwig, Babar Khan
, Sehoon Kim, Thomas Klijnsma, Yaling Liu, Kin Ho Lo, Tri Nguyen, Gianantonio Pezzullo
, Seyedramin Rasoulinezhad, Ryan A. Rivera, Kate Scholberg, Justin Selig, Sougata Sen, Dmitri Strukov, William Tang, Savannah Thais, Kai Lukas Unger
, Ricardo Vilalta, Belinavon Krosigk
, Shen Wang, Thomas K. Warburton:
Applications and Techniques for Fast Machine Learning in Science. Frontiers Big Data 5: 787421 (2022)
[c8]Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer:
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition. ICASSP 2022: 4288-4292
[c7]Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer:
Learned Token Pruning for Transformers. KDD 2022: 784-794
[c6]Sehoon Kim, Amir Gholami, Albert E. Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer:
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition. NeurIPS 2022
[c5]Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami:
A Fast Post-Training Pruning Framework for Transformers. NeurIPS 2022
[c4]Shixing Yu, Zhewei Yao, Amir Gholami, Zhen Dong, Sehoon Kim, Michael W. Mahoney, Kurt Keutzer:
Hessian-Aware Pruning and Optimal Neural Implant. WACV 2022: 3665-3676
[i8]Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeong-In Yu, Byung-Gon Chun:
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs. CoRR abs/2201.09210 (2022)
[i7]Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami:
A Fast Post-Training Pruning Framework for Transformers. CoRR abs/2204.09656 (2022)
[i6]Sehoon Kim, Amir Gholami, Albert E. Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer:
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition. CoRR abs/2206.00888 (2022)- 2021
[j1]Gyeong-In Yu, Saeed Amizadeh, Sehoon Kim, Artidoro Pagnoni, Ce Zhang, Byung-Gon Chun, Markus Weimer, Matteo Interlandi:
WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Modele. Proc. VLDB Endow. 15(1): 11-20 (2021)
[c3]Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer:
I-BERT: Integer-only BERT Quantization. ICML 2021: 5506-5518
[c2]Jingyi Xu, Sehoon Kim, Borivoje Nikolic
, Yakun Sophia Shao:
Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms. ISPASS 2021: 226-228
[c1]Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeong-In Yu, Byung-Gon Chun:
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs. NeurIPS 2021: 1468-1480
[i5]Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer:
I-BERT: Integer-only BERT Quantization. CoRR abs/2101.01321 (2021)
[i4]Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer:
A Survey of Quantization Methods for Efficient Neural Network Inference. CoRR abs/2103.13630 (2021)
[i3]Sehoon Kim, Amir Gholami, Zhewei Yao, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer:
Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition. CoRR abs/2103.16827 (2021)
[i2]Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Joseph Hassoun, Kurt Keutzer:
Learned Token Pruning for Transformers. CoRR abs/2107.00910 (2021)
[i1]Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier M. Duarte
, Philip C. Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bähr, Jürgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomás E. Müller-Bravo
, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J. Hazelwood, Christian Herwig, Babar Khan, Sehoon Kim, Thomas Klijnsma, Yaling Liu, Kin Ho Lo, Tri Nguyen, Gianantonio Pezzullo, Seyedramin Rasoulinezhad, Ryan A. Rivera, Kate Scholberg, Justin Selig, Sougata Sen, Dmitri Strukov, William Tang, Savannah Thais, Kai Lukas Unger, Ricardo Vilalta, Belinavon Krosigk
, Thomas K. Warburton, Maria Acosta Flechas, Anthony Aportela
, Thomas Calvet, Leonardo Cristella, Daniel Diaz, Caterina Doglioni, Maria Domenica Galati, Elham E Khoda, Farah Fahim, Davide Giri, Benjamin Hawks, Duc Hoang, Burt Holzman, Shih-Chieh Hsu, Sergo Jindariani, Iris Johnson, Raghav Kansal, Ryan Kastner, Erik Katsavounidis, Jeffrey D. Krupa, Pan Li, Sandeep Madireddy, Ethan Marx, Patrick McCormack, Andres Meza, Jovan Mitrevski, Mohammed Attia Mohammed, Farouk Mokhtar, Eric A. Moreno, Srishti Nagu, Rohin Narayan, Noah Palladino, Zhiqiang Que, Sang Eon Park
, Subramanian Ramamoorthy, Dylan S. Rankin, Simon Rothman, Ashish Sharma, Sioni Summers, Pietro Vischia, Jean-Roch Vlimant, Olivia Weng:
Applications and Techniques for Fast Machine Learning in Science. CoRR abs/2110.13041 (2021)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-01-18 21:37 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







