Hi, I'm a Research Scientist at AIRoA, working on Vision-Language-Action (VLA) systems. I also work as a Cooperative Research Fellow at the Institute of Industrial Science, UTokyo.
Fortunately,
I received EgoVis Distinguished Paper Award at CVPR 2025, and competitive fellowships from Google (2024), Microsoft Research Asia (2023), and ETH Zurich Leading House Asia (2023).
I served as Principal Investigator of JST ACT-X Project (2020-2023) and JSPS Research Fellow (DC1) (2022-2024).
I'm interested in computer vision, machine learning, and robotics for embodied AI, aiming to enable agents that can perceive, understand, and generate interactions in the real world. My research interest includes:
3D modeling of human interactions with hands, bodies, and objects
Egocentric perception and video understanding
Generative and diffusion models
Vision-Language-Action (VLA) systems
If you're interested in working with me, please visit my contact page and feel free to reach out!
We propose a novel and powerful framework for visual feature extraction in 3D hand pose estimation using recent state space modeling (i.e., Mamba), dubbed Deformable Mamba (DF-Mamba). DF-Mamba is designed to capture global context cues via Mamba's nature of selective state modeling and deformable state scanning.
We present AssemblyHands-X, the first markerless 3D hand-body benchmark for bimanual activities, designed to study the effect of hand-body coordination for action recognition.
We propose a generative prior for hand pose refinement guided by affordance-aware textual descriptions of hand-object interactions. Our method employs a diffusion-based generative model that learns the distribution of plausible hand poses conditioned on affordance descriptions, which are inferred from VLMs.
This paper presents RPEP, the first pre-training method for event-based 3D hand pose estimation using labeled RGB images and unpaired, unlabeled event data.
We introduce the first extensive self-contact dataset with precise body shape registration, Goliath-SC, consisting of 383K self-contact poses across 130 subjects. Using this dataset, we propose generative modeling of a self-contact prior conditioned by body shape parameters, based on a body-part-wise latent diffusion with self-attention.
Our HANDS workshop will gather vision researchers working on perceiving hands performing actions, including 2D & 3D hand detection, segmentation, pose/shape estimation, tracking, etc. We will also cover related applications including gesture recognition, hand-object manipulation analysis, hand activity understanding, and interactive interfaces.
We introduce EmoSign, the first sign video dataset containing sentiment and emotion labels for 200 American Sign Language (ASL) videos with open-ended descriptions of emotion cues.
Alongside the annotations, we include baseline models for sentiment and emotion classification.
SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training
Nie Lin*, Takehiko Ohkawa*, Yifei Huang, Mingfang Zhang, Minjie Cai, Ming Li, Ryosuke Furuta, and Yoichi Sato (*equal contribution)
International Conference on Learning Representations (ICLR), 2025 HANDS, European Conference on Computer Vision Workshops (ECCVW), 2024
Invited Oral Presentation at Meeting on Image Recognition and Understanding (MIRU), 2025 [Project][Paper][Code][Poster]
We present a framework for pre-training of 3D hand pose estimation from in-the-wild hand images sharing with similar hand characteristics, dubbed SiMHand.
We present EgoYC2, a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view.
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan*, Takehiko Ohkawa*, Linlin Yang*, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, and Angela Yao (*equal contribution)
European Conference on Computer Vision (ECCV), 2024 [Paper][Poster]
We present a comprehensive summary of the HANDS23 challenge using the AssemblyHands and ARCTIC datasets. Based on the results of the top submitted methods and more recent baselines on the leaderboards, we perform a thorough analysis on 3D hand(-object) reconstruction tasks.
We present AssemblyHands, a large-scale benchmark dataset with accurate 3D hand pose annotations, to facilitate the study of challenging hand-object interactions from egocentric videos.
We present a systematic review of 3D hand pose estimation from the perspective of efficient annotation and learning. 3D hand pose estimation has been an important research area owing to its potential to enable various applications, such as video understanding, AR/VR, and robotics.
We tackled domain adaptation of hand keypoint regression and hand segmentation to in-the-wild egocentric videos with new imaging conditions (e.g., Ego4D).
We propose Background Mixup augmentation that leverages data-mixing regularization for hand-object detection while avoiding unintended effect produced by naive Mixup.
We developed a domain adaptation method for hand segmentation, consisting of appearance gap reduction by stylization and learning with pseudo-labels generated by network consensus.
We developed extended consistency regularization for stabilizing the training of image translation models using real, fake, and reconstructed samples.
Honors & Awards
EgoVis Distinguished Paper Award@CVPR, 2025
Google PhD Fellowship in Machine Perception, 2024
JSPS DC1 Special Stipends for Excellent Research Results, 2024
JSPS Research Fellowship for Young Scientists (DC1), 2022-2024
ETH Zurich Leading House Asia "Young Researchers' Exchange Programme", 2023
Microsoft Research Asia Collaborative Research Program D-CORE, 2023
UTokyo-IIS Research Collaboration Initiative Award, 2021
MIRU Student Encouragement Award, 2021
PRMU Best Presentation of the Month, 2020
Grants
ACT-X Travel Grant for International Research Meetings, 2024
UTokyo-IIS Travel Grant for International Research Meetings, 2024
JST ACT-X Acceleration Phase of "Frontier of Mathematics and Information Science", 2023
JST ACT-X "Frontier of Mathematics and Information Science", 2020-2022
UTokyo-IIS Travel Grant for International Research Meetings, 2022
JASSO Scholarship for Excellent Master Students at UTokyo, 2021
JEES/Softbank AI Scholarship, 2020
Tokio Marine Kagami Memorial Foundation Scholarship, 2018-2020
Talks
Google Developer Groups AI for Science - Japan, Dec 2025. [Link (ja)][Slides]
IPSJ Seminar Series: "Frontiers in Sensing and Analyzing Human Behavior", Nov 2025. [Link (ja)]
UIUC Vision Seminar (hosted by Saurabh Gupta), "Understanding human hands in interactions”, Apr 2025.
NUS Seminar (hosted by Angela Yao), "Perceiving hand and action across ego-exo views", Jul 2023.
Academic Activities
Professional Service:
Reviewers: CVPR, ECCV, ICCV, ICLR, TPAMI, ACM MM, ACM IMWUT
Organization:
HANDS Workshop at ICCV 2025, ECCV 2024, and ICCV 2023