Yuan Feng (冯源) Homepage

I earned my bachelor’s degree at HIT in 2022. In the same year, I joined the School of Computer Science at USTC for my master’s degree and transitioned to the PhD program in 2024. Currently, I am a second-year Ph.D. candidate jointly supervised by Prof. Prof. Xike Xie and Prof. S. Kevin Zhou.

Please don’t hesitate to reach out for any discussion. You can contact me via Email: yfung@mail dot ustc dot edu dot cn or WeChat: movingffy. I am always open to engaging in intriguing research endeavors! 😊

Research Interests:

My research started with exploring NNs’ memory to tackle interesting tasks. Additionally, I am delving into the KV Cache memory of LLMs to unravel their workings and contribute to applications like LLM efficiency.

🎯 Teaching NNs to memorize data streams & Learnable Large-Scale Data Compression.

[AAAI 2023 Oral] Meta-sketch: A neural data structure for estimating item frequencies of data streams. link
[ICLR 2024 Spotlight] Mayfly: a Neural Data Structure for Graph Stream Summarization. link
[TPAMI24] Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming Data. link
[ICML 2025] Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams. link

🎯 LLM Efficiency.

[NeurIPS 2025] Ada-KV: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference. link
We introduced the first head-wise adaptive cache compression method and open-sourced our code. Through active collaboration with the community, we have helped drive progress in head-wise cache compression, enabling many follow-up works. Explore our github repo
[arXiv] Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective. link
We present the first perturbation-based analysis showing that KV cache eviction improves when integrating value-cache information with LLM pretrained parameters. We believe this work offes a novel theoretical perspective on KV cache importance estimation.
[EMNLP 2025 findings] SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness. link
We propose the first LLM routing method for KG-RAG, based on the observation of a strong correlation between query difficulty and the skewness of the RAG retrieval score distribution.

🎯 RAG in LLMs.

I am also collaborating with other researchers on LLM RAG, focusing on deployment in practical applications.

[ACL 2025 findings] FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs link
[arXiv] Path Pooling: Training-Free Structure Enhancement for Efficient Knowledge Graph Retrieval-Augmented Generation. link

🎯 Broader Research Horizons.

Interpretable Memory in LLMs
Multimodal Large Models
Test-time Scaling Laws / Reasoning Model

Selected Publications

For full publications, please refer to my google scholar.

[NeurIPS 2025] Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S. Kevin Zhou. “Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.” (paper, code).
[Arxiv] Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S. Kevin Zhou. “Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective.” (paper, code).
[EMNLP 2025 findings] Wang Hairu, Yuan Feng^Co-First, Yukun Cao, Xike Xie, and S. Kevin Zhou. “SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness” ( paper, the code will be available soon.)
[ICASSP 2025] Junlin Lv, Yuan Feng, Xike Xie, Xin Jia, Qirong Peng, and Guiming Xie. “CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs.” (paper, code).
[ICML 2025] Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, and S. Kevin Zhou. “Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams. (paper, code)
[ICLR 2024 Spotlight] Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, and S. Kevin Zhou. “Mayfly: a Neural Data Structure for Graph Stream Summarization.” (paper, code)
[TPAMI 2024] Yukun Cao, Yuan Feng^Co-First, Hairu Wang, Xike Xie, and S. Kevin Zhou. “Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming Data.” (paper, code).
[AAAI 2023 Oral] Yukun Cao, Yuan Feng, and Xike Xie. “Meta-sketch: A neural data structure for estimating item frequencies of data streams.” (paper, code)

Selected Honors

2024 PhD’s First Prize Scholarship in USTC
2022/2023 two times Master’s First Prize Scholarship in USTC
2021 Outstanding Student Award in HIT
2020/2021 two times National Scholarship in HIT

Academic Service

Reviewer

NeurIPS, 2024 & 2025
ICLR, 2024 & 2025 & 2026
ICDE, 2024 & 2025

Educations

2024.09 - , PhD candidate in Computer Science and Technology, University of Science and Technology of China (USTC).
2022.09 - 2024.06, Master in Computer Science and Technology, University of Science and Technology of China (USTC).
2018.09 - 2022.06, Bachelor in Software Engineering, Harbin Institute of Technology (HIT).