π€ **Vision-Language-Action (VLA) & World Models ** π§βπ» Integrated M.S./Ph.D. @CVLAB in KAIST AI
I research how models can predict and model physical interactions between agents (robots/humans) and the world.
Currently focused on building Vision-Language-Action (VLA) models and interested in World Models to understand the fundamental laws of interaction.
-
π World Models & Physical Interaction β Modeling and predicting how the world changes through agent-environment interactions.
-
π¦Ύ Vision-Language-Action (VLA) β Developing embodied AI that understands multi-modal instructions and translates them into physical actions.
-
π¬ Interaction-Aware Generation β Leveraging generative models to simulate realistic physical dynamics and multi-instance interactions.
-
π§ Video Understanding β Utilizing MLLMs for deep temporal reasoning and understanding complex object relationships in video.
-
Self-Evolving Neural Radiance Fields
Wild3D Workshop @ ICCV 2025
π Project Page -
MUG-VOS: Multi-Granularity Video Object Segmentation
AAAI 2025
π Project Page -
Referring Video Object Segmentation via Language Aligned Track Selection
arXiv 2025 π Project Page -
InterRVOS: Interaction-aware Referring Video Object Segmentation
CVPR 2026
π Project Page -
MATRIX: Mask Track Alignment for Interaction-Aware Video Generation
ICLR 2026
- π Google Scholar
- πΌ LinkedIn
- π¦ X (Twitter)
- π Personal Website
β¨ βUnderstanding the World through Video and Multimodalities.β
π Last updated: 2025λ 9μ 28μΌ | π» Made with β€οΈ by Deep Overflow