GamingAgent
⭐ 843LLM/VLM gaming agents and model evaluation through games. Evaluates long-horizon reasoning, memory & perception in Doom, Sokoban, Tetris, and Pokémon Red.
View Project →
Hi, I'm
Research Assistant at UCSD Hao AI Lab
M.S. Computer Science Student • San Diego, CA
class Researcher:
def __init__(self):
self.focus = [
"LLM Systems",
"Agent Evaluation",
"GPU Infrastructure"
]
self.tools = [
"vLLM", "SGLang",
"NeMo RL", "Ray"
]
def build(self):
return "🚀 Innovation"
"A journey of a thousand miles begins with a single step." — Confucius
I work on LLM systems, evaluation, and GPU-accelerated ML infrastructure. Currently an M.S. Computer Science student at UC San Diego (GPA: 4.0), previously B.A. in Computer Science & Applied Mathematics from UC Berkeley (GPA: 3.86).
LLM/VLM gaming agents and model evaluation through games. Evaluates long-horizon reasoning, memory & perception in Doom, Sokoban, Tetris, and Pokémon Red.
View Project →Benchmark for scientific correctness in text-to-video models. Evaluates physics & chemistry concepts using VLM-as-Judge scoring.
View Project →Build RL environments for LLM training. Integrating Sokoban & Tetris for scalable RL training, reward profiling, and GRPO.
View Project →LLM environment framework for interactive evaluation. Standardized interfaces for game-based agent testing.
View Project →A privacy-friendly snapshot of where people have visited this site.
Live widget powered by MapMyVisitors
Notes on LLM systems, evaluation, and research workflows.
A bilingual reflection on why context matters and why learning in public compounds over time.
Read →Notes on GPU scheduling, memory tuning, and vLLM/SGLang integration.
Short updates on new posts, releases, and talks.
Published a bilingual reflection on why context matters in research and why sharing work-in-progress helps over time.
Read the post →Our evaluation work is now on arXiv, covering scientific reasoning benchmarks for video models.
Read the paper →Check out the public leaderboard tracking model performance on the VideoScience-Bench tasks.
View leaderboard →Upcoming paper releases and project milestones will be posted here.