I earned my bachelor’s degree at HIT in 2022. In the same year, I joined the School of Computer Science at USTC for my master’s degree and transitioned to the PhD program in 2024. Currently, I am a second-year Ph.D. candidate jointly supervised by Prof. Prof. Xike Xie and Prof. S. Kevin Zhou.

Please don’t hesitate to reach out for any discussion. You can contact me via Email: yfung@mail dot ustc dot edu dot cn or WeChat: movingffy. I am always open to engaging in intriguing research endeavors! 😊


Research Interests:

My research started with exploring NNs’ memory to tackle interesting tasks. Additionally, I am delving into the KV Cache memory of LLMs to unravel their workings and contribute to applications like LLM efficiency.

🎯 Teaching NNs to memorize data streams & Learnable Large-Scale Data Compression.

  1. [AAAI 2023 Oral] Meta-sketch: A neural data structure for estimating item frequencies of data streams. link
  2. [ICLR 2024 Spotlight] Mayfly: a Neural Data Structure for Graph Stream Summarization. link
  3. [TPAMI24] Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming Data. link
  4. [ICML 2025] Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams. link

🎯 LLM Efficiency.

  1. [NeurIPS 2025] Ada-KV: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference. link

    We introduced the first head-wise adaptive cache compression method and open-sourced our code. Through active collaboration with the community, we have helped drive progress in head-wise cache compression, enabling many follow-up works. Explore our github repo

  2. [arXiv] Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective. link

    We present the first perturbation-based analysis showing that KV cache eviction improves when integrating value-cache information with LLM pretrained parameters. We believe this work offes a novel theoretical perspective on KV cache importance estimation.

  3. [EMNLP 2025 findings] SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness. link

    We propose the first LLM routing method for KG-RAG, based on the observation of a strong correlation between query difficulty and the skewness of the RAG retrieval score distribution.

🎯 RAG in LLMs.

I am also collaborating with other researchers on LLM RAG, focusing on deployment in practical applications.

  1. [ACL 2025 findings] FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs link
  2. [arXiv] Path Pooling: Training-Free Structure Enhancement for Efficient Knowledge Graph Retrieval-Augmented Generation. link

🎯 Broader Research Horizons.

  1. Interpretable Memory in LLMs
  2. Multimodal Large Models
  3. Test-time Scaling Laws / Reasoning Model

Selected Publications

For full publications, please refer to my google scholar.

  1. [NeurIPS 2025] Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S. Kevin Zhou. “Ada-kv: Optimizing kv cache eviction by adaptive budget allocation for efficient llm inference.” (paper, code).

  2. [Arxiv] Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, and S. Kevin Zhou. “Identify Critical KV Cache in LLM Inference from an Output Perturbation Perspective.” (paper, code).

  3. [EMNLP 2025 findings] Wang Hairu, Yuan FengCo-First, Yukun Cao, Xike Xie, and S. Kevin Zhou. “SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness” ( paper, the code will be available soon.)

  4. [ICASSP 2025] Junlin Lv, Yuan Feng, Xike Xie, Xin Jia, Qirong Peng, and Guiming Xie. “CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs.” (paper, code).

  5. [ICML 2025] Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, and S. Kevin Zhou. “Lego Sketch: A Scalable Memory-augmented Neural Network for Sketching Data Streams. (paper, code)

  6. [ICLR 2024 Spotlight] Yuan Feng, Yukun Cao, Wang Hairu, Xike Xie, and S. Kevin Zhou. “Mayfly: a Neural Data Structure for Graph Stream Summarization.” (paper, code)

  7. [TPAMI 2024] Yukun Cao, Yuan FengCo-First, Hairu Wang, Xike Xie, and S. Kevin Zhou. “Learning to Sketch: A Neural Approach to Item Frequency Estimation in Streaming Data.” (paper, code).

  8. [AAAI 2023 Oral] Yukun Cao, Yuan Feng, and Xike Xie. “Meta-sketch: A neural data structure for estimating item frequencies of data streams.” (paper, code)


Selected Honors

  • 2024 PhD’s First Prize Scholarship in USTC

  • 2022/2023 two times Master’s First Prize Scholarship in USTC

  • 2021 Outstanding Student Award in HIT

  • 2020/2021 two times National Scholarship in HIT


Academic Service

Reviewer

  • NeurIPS, 2024 & 2025
  • ICLR, 2024 & 2025 & 2026
  • ICDE, 2024 & 2025

Educations

  • 2024.09 - , PhD candidate in Computer Science and Technology, University of Science and Technology of China (USTC).

  • 2022.09 - 2024.06, Master in Computer Science and Technology, University of Science and Technology of China (USTC).

  • 2018.09 - 2022.06, Bachelor in Software Engineering, Harbin Institute of Technology (HIT).