Kim Seongchan deep-overflow

👋 Hi, I'm Seongchan Kim (김성찬)

🤖 **Vision-Language-Action (VLA) & World Models ** 🧑‍💻 Integrated M.S./Ph.D. @CVLAB in KAIST AI

I research how models can predict and model physical interactions between agents (robots/humans) and the world.

Currently focused on building Vision-Language-Action (VLA) models and interested in World Models to understand the fundamental laws of interaction.

🌍 World Models & Physical Interaction — Modeling and predicting how the world changes through agent-environment interactions.
🦾 Vision-Language-Action (VLA) — Developing embodied AI that understands multi-modal instructions and translates them into physical actions.
🎬 Interaction-Aware Generation — Leveraging generative models to simulate realistic physical dynamics and multi-instance interactions.
🧠 Video Understanding — Utilizing MLLMs for deep temporal reasoning and understanding complex object relationships in video.

Self-Evolving Neural Radiance Fields
Wild3D Workshop @ ICCV 2025
🔗 Project Page
MUG-VOS: Multi-Granularity Video Object Segmentation
AAAI 2025
🔗 Project Page
Referring Video Object Segmentation via Language Aligned Track Selection
arXiv 2025 🔗 Project Page
InterRVOS: Interaction-aware Referring Video Object Segmentation
CVPR 2026
🔗 Project Page
MATRIX: Mask Track Alignment for Interaction-Aware Video Generation
ICLR 2026

✨ “Understanding the World through Video and Multimodalities.”

_{🔄 Last updated: 2025년 9월 28일 | 💻 Made with ❤️ by Deep Overflow}