Heming Xia
he-ming.xia AT connect.polyu.hk
I am Heming Xia (夏鹤明), a third-year Ph.D. student in the NLP Group at The Hong Kong Polytechnic University, supervised by Prof. Wenjie Li. I received my master’s degree from the MOE Key Lab of Computational Linguistics at Peking University, where I was advised by Prof. Zhifang Sui. Prior to that, I completed my bachelor’s degree at the School of Physics, Peking University, in 2020. I have previously worked as a research intern at the NLC Group, Microsoft Research Asia and Sea AI Lab, where I had the privilege of collaborating with Dr. Tao Ge and Dr. Cunxiao Du. Currently, I am an incoming visiting student in the NLP Group at the University of California, San Diego, advised by Prof. Julian McAuley. For more details, please see my CV.
📬 I am open to collaborating with highly motivated students on research related to (but not limited to) the topics below. If interested, please feel free to reach out via email.
Research
My research focuses on efficient and effective NLP, with the goal of making LLMs faster, more scalable, and broadly applicable. Specifically, my work centers on the following directions:
- Speculative Decoding: Exploring inference acceleration techniques that maintain output fidelity. This includes our pioneering work on Speculative Decoding [EMNLP’23-findings, ICLR’25], the widely used benchmark Spec-Bench and the first comprehensive survey [ACL’24-findings] in this paradigm.
- Efficient Reasoning: Developing advanced algorithms to enhance the efficiency of reasoning models, spanning efficient training strategies, inference acceleration [EMNLP’25, arXiv’25], and dense representations such as latent CoT [arXiv’25].
- Applications (Efficiency + X): I am interested in how efficiency-oriented techniques can benefit broader applications, with recent focus on tool-augmented agents and multimodal models [EMNLP’25].
In addition, I am actively working on tool learning [e.g., EMNLP’24, ACL’25-findings] and vision-language understanding [e.g., ACL’22, EMNLP’23-findings, EMNLP’25-findings].
News
| Aug 21, 2025 | Got three papers accepted by EMNLP 2025 (2 Main+1 Findings) |
|---|---|
| May 16, 2025 | Got three papers accepted by ACL 2025 (1 Oral+2 Findings) |
| Jan 23, 2025 | Got one paper accepted by ICLR 2025 |
| Jan 19, 2025 | Organized a tutorial on Speculative Decoding at COLING 2025 |
| Sep 21, 2024 | Got four papers accepted by EMNLP 2024 (2 Main+2 Findings) |
Selected Publications
(*) Equal Contribution. (†) Corresponding Author.- TokenSkip: Controllable Chain-of-Thought Compression in LLMsIn EMNLP, 2025
-
- Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative DecodingIn Findings of ACL, 2024