Yixin Huang | LLM Systems & Evaluation

About Me

"A journey of a thousand miles begins with a single step." — Confucius

I work on LLM systems, evaluation, and GPU-accelerated ML infrastructure. Currently an M.S. Computer Science student at UC San Diego (GPA: 4.0), previously B.A. in Computer Science & Applied Mathematics from UC Berkeley (GPA: 3.86).

Research Interests

LLM evaluation & benchmarks (agents, games, scientific reasoning)
Large-scale training & inference systems (FSDP, vLLM, Ray, Slurm)
GPU efficiency, memory systems, and model parallelism
Reinforcement learning for agents (GRPO, NeMo-Gym)

Current Focus

🔄 Scaling agent evaluation with interactive environments
⚡ Training & serving efficiency on multi-GPUs
🎯 Reward modeling and RL for LLM agents

Tech Stack

Python PyTorch CUDA vLLM SGLang NeMo RL Areal Ray Docker Slurm FSDP DeepSpeed Linux Git

Featured Projects

🎮

GamingAgent

⭐ 843

LLM/VLM gaming agents and model evaluation through games. Evaluates long-horizon reasoning, memory & perception in Doom, Sokoban, Tetris, and Pokémon Red.

Python LLM Evaluation

View Project →

🔬

VideoScience

CVPR

Benchmark for scientific correctness in text-to-video models. Evaluates physics & chemistry concepts using VLM-as-Judge scoring.

Python Benchmark Video

View Project →

🤖

NVIDIA NeMo Gym

⭐ 603

Build RL environments for LLM training. Integrating Sokoban & Tetris for scalable RL training, reward profiling, and GRPO.

Python RL NVIDIA

View Project →

🌐

lmenv

LLM environment framework for interactive evaluation. Standardized interfaces for game-based agent testing.

Python Framework

View Project →

Global Visitors

A privacy-friendly snapshot of where people have visited this site.

Visitor Map

Live widget powered by MapMyVisitors

Live

Live traffic

Historical visits

Blog

Notes on LLM systems, evaluation, and research workflows.

Jan 29, 2026

Context & Learn in Public / 语境与公开学习

A bilingual reflection on why context matters and why learning in public compounds over time.

Bilingual Reflection

Read →

Coming soon

Deploying efficient inference stacks

Notes on GPU scheduling, memory tuning, and vLLM/SGLang integration.

Systems GPU

Announcements

Short updates on new posts, releases, and talks.

Jan 29, 2026

New blog: Context & Learn in Public

Published a bilingual reflection on why context matters in research and why sharing work-in-progress helps over time.

Read the post →

Jan 29, 2026

VideoScience paper on arXiv

Our evaluation work is now on arXiv, covering scientific reasoning benchmarks for video models.

Read the paper →

Jan 29, 2026

VideoScience-Bench leaderboard live

Check out the public leaderboard tracking model performance on the VideoScience-Bench tasks.

View leaderboard →

Coming soon

Research updates

Upcoming paper releases and project milestones will be posted here.

Get in Touch

Feel free to reach out for collaborations, discussions, or just to say hi!

Email GitHub Scholar Zhihu Resume

📧 Recruiters: Feel free to reach out at [email protected]

💬 Open an issue or discussion on any of my repositories!