About me

I’m Seung Hyun (Sally) Hahm, an M.S. student in Computer Science at Dartmouth College advised by Professor SouYoung Jin, where I study how AI can perceive and communicate the world for those who cannot see it.

I build AI systems that address a growing gap created—ironically—by AI itself.

As AI tools become more powerful, they are increasingly woven into everyday life through dashboards, browsers, chat interfaces, and screen-based applications. These systems improve productivity and convenience for many people, but they also make a quiet assumption: that users can see, scan, and navigate visual interfaces.

Because of this, the divide created by AI is no longer only between early adopters and late adopters. It is increasingly between people who can interface with AI naturally and people who fundamentally cannot due to physical constraints—particularly blind and low-vision (BLV) users. At the same time, these are the users who could benefit from AI the most.

My work focuses on building AI systems that act as an interpretive layer between the visual world and BLV users. I began with movies, developing training-free high-quality audio description model, and I am now interested in extending these ideas to broader interfaces and environments.

My prior work also focused on automatically constructing large-scale, high-quality datasets to enable multimodal AI systems that are more useful to humans, as well as scalable.

Motivation

What is red? (255, 0, 0)? A wavelength range? The color of a rose?

A friend once told me he was colorblind, and I tried—earnestly—to explain what “red” meant. I had facts and metaphors, but I couldn’t translate the experience itself. That gap stuck with me. It showed me how easily systems feel complete only because they quietly assume a “default” sensory baseline—quickly excluding others who do not share it.

Most vision systems assume the user can see the rest. For blind and low-vision (BLV) users, the job is different: translate visual information into another channel without losing what makes it meaningful—who someone is, what changed, and why it matters.

I am primarily motivated by this question: how can computer vision become an independent pathway for understanding, not an add-on to sight?


Research

My research focuses on how computer vision can communicate the world to people who cannot rely on sight.

  • Narrative-grounded Audio Description for movies: Generated long-form video narration that preserves continuity across scenes—who someone is, what changed, and why it matters.

  • Retrieval-Informed Video Understanding (RAG): Grounding descriptions with structured external context to reduce hallucination and improve story fidelity.

  • Social & Emotional Understanding in Movies: Modeled inter-character relationships to enrich descriptions with meaningful social dynamics.

  • Movie AD Evaluation Framework: Built comprehension-based evaluation with a 6,000+ Q&A benchmark to test whether descriptions actually support BLV understanding.

Publications

  • Tell the Story, Not the Frames: Narrative-Aware Retrieval for Audio Description — CVPR 2026 (under review)
  • Selected as an Oral Presenter at NECV2025
  • Character Relationship Prediction in Movies: Toward Emotionally-Aware Automatic Audio Descriptions — High Honors Thesis

Projects

Friendly Spot — Socially Adaptive Human–Robot Interaction (Boston Dynamics Spot), Dartmouth College

  • Built an affect-aware proxemics system that fuses emotion, pose, gesture, identity, and distance into a scalar comfort score, then selects 1 of 6 socially meaningful behaviors (e.g., approach slowly, keep distance, back away, sit).
  • Validated key perception modules in controlled indoor trials: 90% gesture recognition (36/40 hands), >80% emotion alignment with prompted expressions, and <5% repeat-recognition errors.

InstructBLIP Video-Captioning Optimization, Dartmouth College (Sep 2024–Dec 2024)

  • Optimized InstructBLIP’s Q-Former for video–language modeling on MSR-VTT improving video comprehension with a +24 CIDEr gain.
  • Achieved #2 on the MSR-VTT leaderboard using only 6K video–text pairs.

Academic Recognition

  • Graduated with High Honors.
  • Received academic citations for exceptional academic performance in Computer Vision, Deep Learning, and Multimodal Generative AI.

Work Experience

  • NextCare — Co-founded a health-data startup building blockchain-based infrastructure for privacy-preserving medical-record exchange and AI-driven health assistants.
    Focused on designing scalable systems and applying AI for accessible, trustworthy healthcare solutions.

Teaching

  • Graduate TA — COSC 76: Artificial Intelligence (Fall 2025, Prof. Soroush Vosoughi)
  • Undergraduate TA — COSC 74: Machine Learning (Spring 2025, Prof. Soroush Vosoughi)
  • Class Notes Ally — COSC 89.32: Multimodal Generative AI (Fall 2024, Prof. Yu-Wing Tai)
    – A position through Dartmouth’s Student Accessibility Services, providing course notes and adapted materials for students with documented disabilities to ensure equitable academic access.

Hobbies

I also enjoy drawing!