Rediscovering Intelligence:
Can AI Still Learn from Humans?

ReLearn studies the relationship between human and machine intelligence, exploring how cognitive and psychological insights can still guide the future of AI.

At a Glance
Format: Half-day · In person
Location: CVPR 2026 · Denver, Colorado
Focus: Human learning + hybrid intelligence
Participation: Talks · Posters · Panel
Date: June 3, 2026
Room and Time: Mile High 4AB, 1-6 pm

Overview

As AI capabilities surge, ReLearn asks whether machines can still learn from humans and how cognitive science can shape the next generation of systems.

We bridge computer vision, cognitive science, and psychology to study reasoning, social understanding, and hybrid learning that blends human insight with autonomous discovery.

Key Themes

  • Human-inspired foundations of reasoning and Theory of Mind
  • Learning with humans via feedback and interaction
  • Beyond the human blueprint through self-supervision

Important Dates

Feb 20, 2026
CVPR final decisions to authors
Mar 18, 2026
Workshop — paper submission deadline
Mar 25, 2026
Workshop — Notification to authors
Apr 10, 2026
Workshop — Camera-ready deadline
Jun 3, 2026
Workshop date

Invited Speakers

Alexei (Alyosha) Efros is a Professor in the Department of Electrical Engineering and Computer Sciences (EECS) at UC Berkeley. Prior to that, he was on the faculty of Carnegie Mellon University. His research is in the area of computer vision and computer graphics, especially at the intersection of the two. He is particularly interested in using data-driven techniques to tackle problems where large quantities of unlabeled visual data are readily available. He is a recipient of the CVPR Best Paper Award (2006), Sloan Fellowship (2008), Guggenheim Fellowship (2008), Okawa Grant (2008), SIGGRAPH Significant New Researcher Award (2010), three PAMI Helmholtz Test-of-Time Prizes (1999, 2003, 2005), the ACM Prize in Computing (2016), Diane McEntyre Award for Excellence in Teaching Computer Science (2019), Jim and Donna Gray Award for Excellence in Undergraduate Teaching of Computer Science (2023), and the PAMI Thomas S. Huang Memorial Prize (2023).

Dima Damen
Dima Damen
University of Bristol / Google DeepMind

Dima Damen is a Professor of Computer Vision at the University of Bristol and Senior Research Scientist at Google DeepMind. Dima is currently an EPSRC Fellow (2020-2026), focusing her research interests in the automatic understanding of object interactions, actions and activities using wearable visual (and depth) sensors. She is best known for her leading works in Egocentric Vision, and has also contributed to novel research questions including mono-to-3D, video object segmentation, assessing action completion, domain adaptation, skill and expertise determination from video sequences, discovering task-relevant objects, dual-domain and dual-time learning, as well as multi-modal fusion using vision, audio, and language.

Saining Xie
Saining Xie
NYU Courant

Saining Xie is an Assistant Professor of Computer Science at NYU Courant and part of the CILVR group. He is also affiliated with the NYU Center for Data Science. Before that he was a research scientist at Facebook AI Research (FAIR), Menlo Park. He received his Ph.D. and M.S. degrees from the CSE Department at UC San Diego, advised by Zhuowen Tu. During his PhD study, he also interned at NEC Labs, Adobe, Facebook, Google, and DeepMind. Prior to that, he obtained his bachelor degree from Shanghai Jiao Tong University. His primary areas of interest in research are computer vision and machine learning.

Manling Li
Manling Li
Northwestern University

Title: How Foundation Models Build (and Fail to Build) Spatial Minds: A Piagetian View

Abstract: Spatial cognition is a developmental capacity that, in humans, unfolds in stages. This talk asks how far foundation models have traveled along the same path. Following Piaget's account of spatial development, I will trace three layers: topological reasoning over invariants that survive deformation (MindTopo), projective reasoning that constructs beliefs about unseen structure through active exploration (Theory of Space), and metric reasoning that maintains a coherent mental map from only a few views (MindCube). Read through this lens, today's models show a consistent dissociation: they name spatial structure in a static scene yet fail to preserve or act on it once the world moves (ENACT). I will suggest that the missing ingredient is not sharper perception, but a structured and updatable model of space.

Manling Li is an Assistant Professor at Northwestern University and an Amazon Scholar. She was a postdoc at Stanford University, and obtained the PhD degree in Computer Science at University of Illinois Urbana-Champaign in 2023. She works on Reasoning, Planning and Compositionality, in the intersection of Language, Vision, and Robotics. Her work has been recognized as ACL 2025 Dissertation Award Honorable Mention, Outstanding Paper Award at ACL’24, Best Demo Paper Award at NAACL’21 and ACL’20, Best Paper Awards at NeurIPS/ICCV/RSS workshops, MIT Tech Review 35 Innovators Under 35, Microsoft Research PhD Fellowship, EE CS Rising Star, etc. She led the tutorials/workshops/challenges of Foundation Models meet Embodied Agents. Additional information is available at limanling.github.io.

Alan Yuille
Alan Yuille
Johns Hopkins University

Alan Yuille received the BA degree in mathematics from the University of Cambridge in 1976. His PhD on theoretical physics, supervised by Prof. S.W. Hawking, was approved in 1981. He was a research scientist in the Artificial Intelligence Laboratory at MIT and the Division of Applied Sciences at Harvard University from 1982 to 1988. He served as an assistant and associate professor at Harvard until 1996. He was a senior research scientist at the Smith-Kettlewell Eye Research Institute from 1996 to 2002. He was a full professor of Statistics at the University of California, Los Angeles, as a full professor with joint appointments in computer science, psychiatry, and psychology. He moved to Johns Hopkins University in January 2016. His research interests include computational models of vision, mathematical models of cognition, medical image analysis, and artificial intelligence and neural networks.

William T. Freeman
William T. Freeman
MIT CSAIL / Google Research

# Joint presentation with Eric Li, senior graduate student.

William T. Freeman is the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science (EECS) at MIT, and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) there. He was the Associate Department Head of EECS from 2011 – 2014. Since 2015, he has also been a research manager in Google Research in Cambridge, MA.

His current research interests include mid-level vision and computational photography. Previous research topics include steerable filters and pyramids, orientation histograms, the generic viewpoint assumption, color constancy, computer vision for computer games, motion magnification, and belief propagation in networks with loops. He received outstanding paper awards at computer vision or machine learning conferences in 1997, 2006, 2009, 2012 and 2019, and test-of-time awards for papers from 1990, 1995, 2002, 2005, and 2012. He shared the 2020 Breakthrough Prize in Physics for a consulting role with the Event Horizon Telescope collaboration, which reconstructed the first image of a black hole. He is a member of the National Academy of Engineering, and a Fellow of the IEEE, ACM, and AAAI. In 2019, he received the PAMI Distinguished Researcher Award, the highest award in computer vision. He is co-author of the computer vision textbook, https://visionbook.mit.edu/, also available through MIT Press.

Presented Papers

Multi-Modal Manipulation via Multi-Modal Policy Consensus

Haonan Chen, Jiaming Xu, Hongyu Chen, Kaiwen Hong, Binghao Huang, Chaoqi Liu, Jiayuan Mao, Yunzhu Li, Yilun Du, and Katherine Driggs-Campbell

G-RoLA: A Generative World Model Paradigm for Robotic Skill Acquisition from Any Image

Chenkai Gao and Yina Jian

UniVerse: Empower Unified Generation with Reasoning and Knowledge

Kaiyue Sun, Weiyang Jin, Chengqi Duan, Rongyao Fang, Xian Liu, Yuwei Niu, Chunwei Wang, Aoxue Li, and Xihui Liu

MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents

Ruoxuan Zhang, Qiyun Zheng, Zhiyu Zhou, Ziqi Liao, Siyu Wu, Jian-Yu Jiang-Lin, Bin Wen, Hongxia Xie, Jianlong Fu, and Wen-Huang Cheng

Spot The Ball: A Benchmark for Visual Social Inference

Neha Balamurugan, Sarah Wu, Cristobal Eyzaguirre, and Tobias Gerstenberg

Time Blindness: Why Video-Language Models Can't See What Humans Can?

Ujjwal Upadhyay, Mukul Ranjan, Zhiqiang Shen, and Mohamed Elhoseiny

Learning to See Through a Baby's Eyes: Early Visual Diets Enable Robust Visual Intelligence in Humans and Machines

Yusen Cai, Qing Lin, Bhargava Satya Nunna, and Mengmi Zhang

BabyVLM-V2: Toward Developmentally Grounded Pretraining and Benchmarking of Vision Foundation Models

Shengao Wang, Wenqi Wang, Zecheng Wang, Max Whitton, Michael Wakeham, Arjun Chandra, Joey Huang, Pengyue Zhu, Helen Chen, David Li, Jeffrey Li, Shawn Li, Andrew Zagula, Amy Zhao, Andrew Zhu, Sayaka Nakamura, Yuki Yamamoto, Jerry Jun Yokono, Aaron Mueller, Bryan A. Plummer, Kate Saenko, Venkatesh Saligrama, and Boqing Gong

Schedule

Wednesday, June 3, 2026 · Mile High 4AB · PM slot (afternoon)

13:20 Welcome & introduction
13:30 Keynote 1 — Alexei (Alyosha) Efros
14:00 Keynote 2 — Manling Li
14:30 Keynote 3 — Dima Damen
15:00 Presentations of challenge winners
15:10 Posters & coffee break
16:00 Keynote 4 — Alan Yuille
16:30 Keynote 5 — Saining Xie (remote)
17:00 Keynote 6 — William T. Freeman, with Eric Li
17:30 Closing remarks

Call for Papers

We invite papers aligned with the workshop themes, spanning human-inspired foundations, learning with humans, and hybrid intelligence.

Submissions will follow CVPR 2026 formatting and length guidelines. Accepted papers will be presented in oral/spotlight and poster formats.

Suggested Topics

  • Human-inspired architectures, reasoning, and abstraction
  • Theory of Mind and social understanding in AI
  • Human-in-the-loop learning, feedback, and demonstrations
  • Egocentric, multimodal, and embodied interaction
  • Synthetic data, simulation, and self-supervised learning
  • Grounded language and cognitive evaluation
  • Hybrid intelligence, trust, alignment, and safety
  • Ethical and philosophical perspectives on learning

Submission Guidelines

We invite submissions of a maximum of 8 pages, excluding references, using the CVPR template. Submissions should follow CVPR 2026 instructions. All papers will be subject to a double-blind review process, i.e. authors must not identify themselves on the submitted papers. The reviewing process is single-stage without rebuttals.

  • Online Submission System: OpenReview
  • Submission Format: CVPR template (double column; no more than 8 pages, excluding reference). Submissions are anonymous and should not include any author names, affiliations, and contact information in the PDF.

If you have any questions, feel free to reach out to us.

Challenge

Multimodal Theory of Mind (ToM) Challenge: infer goals and beliefs from videos, textual scene descriptions, and dialogues.

The challenge includes two tracks: single-agent reasoning and multi-agent reasoning.

Challenge website: relearnchallenge.onrender.com.

Use this ReLearn workshop site for workshop updates and paper submissions, and the challenge site for team registration, benchmark resources, and Track 1/Track 2 submissions.

Track 1

Reasoning from a single agent's behavior.

Track 2

Reasoning from multi-agent interactions.

Timeline

Opens Feb 23, 2026 · Submissions due May 3, 2026.

Organizers

Xi Wang
Xi Wang
ETH Zurich and TUM
Yen-Ling Kuo
Yen-Ling Kuo
University of Virginia
Tianmin Shu
Tianmin Shu
Johns Hopkins University
Asen Nachkov
Asen Nachkov
INSAIT Sofia University
Yan Zhuang
Yan Zhuang
University of Virginia
Chuanyang Jin
Chuanyang Jin
Johns Hopkins University
Luc Van Gool
Luc Van Gool
INSAIT Sofia University
Marc Pollefeys
Marc Pollefeys
ETH Zurich / Microsoft

For inquiries regarding organization, please contact:

Sponsors

Lambda provides awards: one best paper ($3,000 compute credits), two runner-up awards ($1,500 credits each), and $400 credits for each accepted paper.