I earned my bachelor’s degree at HIT in 2022. In the same year, I joined the School of Computer Science at USTC for my master’s degree and transitioned to the PhD program in 2024. Currently, I am a second-year Ph.D. candidate jointly supervised by Prof. Prof. Xike Xie and Prof. S. Kevin Zhou.
Please don’t hesitate to reach out for any discussion. You can contact me via Email: yfung@mail dot ustc dot edu dot cn or WeChat: movingffy. I am always open to engaging in intriguing research endeavors! 😊
Research Interests:
My research started with exploring NNs’ memory to tackle interesting tasks. Additionally, I am delving into the KV Cache memory of LLMs to unravel their workings and contribute to applications like LLM efficiency.
🎯 Teaching NNs to memorize data streams & Learnable Large-Scale Data Compression.
- [AAAI 2023 Oral] Meta-sketch: A neural data structure for estimating item frequencies of data streams. link
- [ICLR 2024 Spotlight] Mayfly: a Neural Data Structure for Graph Stream Summarization. link
- [TPAMI24] Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming Data. link
- [ICML 2025] Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams. link
🎯 LLM Efficiency.
[NeurIPS 2025] Ada-KV: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference. link
We introduced the first head-wise adaptive cache compression method and open-sourced our code. Through active collaboration with the community, we have helped drive progress in head-wise cache compression, enabling many follow-up works. Explore our github repo
[arXiv] Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective. link
We present the first perturbation-based analysis showing that KV cache eviction improves when integrating value-cache information with LLM pretrained parameters. We believe this work offes a novel theoretical perspective on KV cache importance estimation.
[EMNLP 2025 findings] SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness. link
We propose the first LLM routing method for KG-RAG, based on the observation of a strong correlation between query difficulty and the skewness of the RAG retrieval score distribution.
🎯 RAG in LLMs.
I am also collaborating with other researchers on LLM RAG, focusing on deployment in practical applications.
- [ACL 2025 findings] FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs link
- [arXiv] Path Pooling: Training-Free Structure Enhancement for Efficient Knowledge Graph Retrieval-Augmented Generation. link
🎯 Broader Research Horizons.
- Interpretable Memory in LLMs
- Multimodal Large Models
- Test-time Scaling Laws / Reasoning Model
Selected Publications
For full publications, please refer to my google scholar.
[NeurIPS 2025] Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S. Kevin Zhou. “Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.” (paper, code).
[Arxiv] Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S. Kevin Zhou. “Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective.” (paper, code).
[EMNLP 2025 findings] Wang Hairu, Yuan FengCo-First, Yukun Cao, Xike Xie, and S. Kevin Zhou. “SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness” ( paper, the code will be available soon.)
[ICASSP 2025] Junlin Lv, Yuan Feng, Xike Xie, Xin Jia, Qirong Peng, and Guiming Xie. “CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs.” (paper, code).
[ICML 2025] Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, and S. Kevin Zhou. “Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams. (paper, code)
[ICLR 2024 Spotlight] Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, and S. Kevin Zhou. “Mayfly: a Neural Data Structure for Graph Stream Summarization.” (paper, code)
[TPAMI 2024] Yukun Cao, Yuan FengCo-First, Hairu Wang, Xike Xie, and S. Kevin Zhou. “Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming Data.” (paper, code).
[AAAI 2023 Oral] Yukun Cao, Yuan Feng, and Xike Xie. “Meta-sketch: A neural data structure for estimating item frequencies of data streams.” (paper, code)
Selected Honors
2024 PhD’s First Prize Scholarship in USTC
2022/2023 two times Master’s First Prize Scholarship in USTC
2021 Outstanding Student Award in HIT
2020/2021 two times National Scholarship in HIT
Academic Service
Reviewer
- NeurIPS, 2024 & 2025
- ICLR, 2024 & 2025 & 2026
- ICDE, 2024 & 2025
Educations
2024.09 - , PhD candidate in Computer Science and Technology, University of Science and Technology of China (USTC).
2022.09 - 2024.06, Master in Computer Science and Technology, University of Science and Technology of China (USTC).
2018.09 - 2022.06, Bachelor in Software Engineering, Harbin Institute of Technology (HIT).
