Wonjae (Dan) Kim / 김원재

Lead Research Scientist @ TwelveLabs

prof_pic5.jpg

I lead the Embedding & Search team at TwelveLabs, where we build multimodal foundation models for video understanding. I’m the first author of ViLT, one of the early works that shaped efficient vision-language architectures. Previously, I was a research scientist at Naver AI LAB and Kakao, and I hold an M.Sc. and B.Sc. from Seoul National University.

My current research focuses on:

  • Multimodal Representation Learning (video, audio, text)
  • Large-scale Embedding & Search Systems
  • User Behavior Modeling for Search

We’re Hiring! I’m building a research team at TwelveLabs where your models ship to thousands of customers within months. We’re tackling joint embedding spaces across modalities and containerized asset search—problems that go beyond simple retrieval to true semantic understanding of video structure. If you want to see your work create real-world impact at scale, grab a coffee chat with me. I’m looking for scientists and engineers who are excited to push video-language AI from idea to production. Join us in Seoul →

news

Dec 01, 2025 TwelveLabs releases Marengo 3.0, a new standard for foundation models that understand the world in all its complexity.
Oct 15, 2025 One ICCV-2025 paper to appear: An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image Retrieval.
Apr 01, 2025 One CVPR-2025 EVAL-FoMo 2 Workshop paper: Emergence of Text Readability in Vision Language Models.
Feb 04, 2025 I’ve started a new chapter at TwelveLabs!
Jan 01, 2025 One ICLR-2025 paper to appear: Probabilistic Language-Image Pre-Training.

latest posts

selected publications

  1. ECCV Oral
    HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
    Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, and Sangdoo Yun
    In 17th European Conference on Computer Vision (ECCV 2024), 2024
  2. ICML Long talk
    ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
    Wonjae* Kim, Bokyung* Son, and Ildoo Kim
    In 38th International Conference on Machine Learnings (ICML 2021), 18–24 jul 2021
  3. NeurIPS
    Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning
    Wonjae Kim and Yoonho Lee
    In 32nd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019