0% found this document useful (0 votes)
22 views43 pages

Lec1 - Introduction

The document outlines a course titled 'How to AI (Almost) Anything' taught by Paul Liang at MIT, focusing on creating human-AI symbiosis across various modalities to enhance productivity and wellbeing. It covers topics such as multimodal AI, interactive agents, and holistic health, with a structure that includes lectures, discussions, and a significant research project. The course aims to improve critical thinking and explore new research ideas in AI, with specific grading criteria for assignments and participation.

Uploaded by

谢雪飞
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views43 pages

Lec1 - Introduction

The document outlines a course titled 'How to AI (Almost) Anything' taught by Paul Liang at MIT, focusing on creating human-AI symbiosis across various modalities to enhance productivity and wellbeing. It covers topics such as multimodal AI, interactive agents, and holistic health, with a structure that includes lectures, discussions, and a significant research project. The course aims to improve critical thinking and explore new research ideas in AI, with specific grading criteria for assignments and participation.

Uploaded by

谢雪飞
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

How to AI (Almost) Anything


Lecture 1 - Introduction
Paul Liang
Assistant Professor
MIT Media Lab & MIT EECS

https://pliang279.github.io
[email protected]
@pliang279
2

Your Teaching Team, Spring 2025

Paul Liang
[email protected]
Course instructor

Chanakya Ekbote David Dai


[email protected] [email protected]
Teaching Assistant Teaching Assistant
3

Creating human-AI symbiosis


across scales and sensory mediums
to enhance productivity, creativity, and wellbeing.

Foundations of Enhancing the Real-world


multisensory AI human experience interaction
4

AI for Anything!

Gestures Radiology
Voice Histology

Spoken words
Multisensory world Sensors
robot icon

Wearables
Tables

Mobile
Vision Emotional
Activities states
Networks Movement
5

AI for Physical Sensing


Sensing in physical systems, manufacturing, smart cities, IoT, robotics

[Lee et al., Making Sense of Vision and Touch: Learning Multimodal Representations for Contact Tasks. ICRA 2019]
[Feng et al., SmellNet: A Large-scale Hierarchical Database for Real-world Smell Recognition. In progress 2024]
6

Multimodal Generative AI
Multimedia, content creation, creativity and the arts

[Kondratyuk et al., VideoPoet: A Large Language Model for Zero-shot Video Generation. ICML 2024]
7

Holistic Health: Physical, Social, and Emotional


Majority of medical indicators will not be taken in the doctor’s office

Wearables
Radiology Speech

Histology Gestures

Physical Emotional
health wellbeing
Sensors
Mobile

Tables Emotional
Social states
wellness

Vision Dialog

Activities
Movement
Networks

[Dai et al., Clinical Behavioral Atlas. NEJM AI 2025]


[Hu et al., OpenFace 3.0: An Open-source Foundation Model for Facial Behavior Analysis. In progress 2024]
[Mathur et al., Advancing Social Intelligence In AI: Technical Challenges and Open Questions. EMNLP 2024]
8

Interactive Agents
AI agents for the web and digital automation
Example task: Purchase a set of earphones with at least 4.5 stars in rating and ship it to me.

Instructions
Feedback
online assistant icon

Actions Webpages
Clarification PowerPoint
Spreadsheets
Computer tasks

[Zhou et al., WebArena: A Realistic Web Environment for Building Autonomous Agents. ICLR 2024]
[Jang et al., VideoWebArena: Evaluating Multimodal Agents on Video Understanding Web Tasks. ICLR 2025]
9

Time for Introductions!

Your name, department and programs

Your favorite modality(ies)!

Previous research experience in AI

Why are you interested in this course?


10

Course Overview

1. AI for new modalities: data, modeling, evaluation, deployment

Gestures Radiology Sensors Wearables Mobile Networks

2. Multimodal AI: connecting multiple different data sources

Language Gestures Sensing Actuation


11

Learning Objectives

1 Study recent technical achievements in AI research

2 Improve critical and creative thinking skills

3 Understand future research challenges in AI

4 Explore and implement new research ideas in AI


12

Preferred Pre-requisites

1 Some knowledge of programming (ideally in Python)

2 Some basic understanding of modern AI capabilities & limitations

3 Bring external (non-AI) domain knowledge about your problem

4 Bonus: worked on AI for some modality


13

Course delivery format

• 1-hour lecture every Tuesday


• 1-hour discussion or hands-on tutorial every Thursday
• Reading assignments outside of class
• Significant research project outside of class, with reports and presentations
14

Lecture Topics (subject to change, based on student interests and course discussions)

Module 1: Foundations of AI

Week 1 (2/4): Introduction to AI and AI research


Week 2 (2/11): Data, structure, and information
Week 3 (2/18): Common model architectures
Week 4 (2/25): Learning and generalization

Loss
Spatial Hierarchical Epoch
15

Lecture Topics (subject to change, based on student interests and course discussions)

Module 2: Foundations of multimodal AI

Week 5 (3/4): Multimodal connections and alignment


Week 6 (3/11): Multimodal interactions and fusion
Week 7 (3/18): Cross-modal transfer
Week 8 – No class, spring break
𝑦

𝑦
16

Lecture Topics (subject to change, based on student interests and course discussions)

Module 3: Large models and modern AI

Week 9 (4/1): Pre-training, scaling, fine-tuning LLMs


Week 10 – No class, member’s week
Week 11 (4/15): Large multimodal models
Week 12 (4/22): Modern generative AI

An armchair in
the shape of an
avocado
17

Lecture Topics (subject to change, based on student interests and course discussions)

Module 4: Interactive AI

Week 13 (4/29): Multi-step reasoning


Week 14 (5/6): Interactive and embodied AI
Week 15 (5/13): Human-AI interaction and safety

online assistant icon


18

Grading Overview

• 40% of the grade:


• Reading assignments
• Small group discussions
• Synopsis leads
• 60% of the grade:
• A high-quality research project:
• Proposal with literature review
• Midterm and final reports and presentations
• Bi-weekly updates
19

Reading Assignments

• 7 readings assignments, with usually 2 required papers and some suggested


(but optional) papers, and 5-6 discussion probes.
• Three main assignment parts (due Monday night before discussion):
• Reading notes: Read the assigned papers and summarize the main take-away points
• Optional: if you have clarification questions about the papers
• Paper scouting: Scout for extra papers, blog posts or other resources related to these
question probes
• Discussion points: Reflect on the question probes related to the reading papers and
prepare discussion points.
20

Weekly Thursday Class

• Joint portion (about 15 mins)


• Short presentation presenting the scouted papers and answering student questions about
the required papers.
• Separate (TBD based on final enrollment) discussion groups (about 40 mins)
• Two groups of 8-10 students, one instructor per group
• Round-table discussions: Discuss the research question probes. Each student is expected to
actively participate in this discussion.
• Two note-takers per discussion groups (alternate note-taking).
21

Discussion Roles

Reading leads (1 per discussion group, 2 total per week):


1. Short presentation (10-15 mins), done Sunday night - Thursday
a) Answer questions from other students
b) Summarize and highlight scouted papers
2. Help with note-taking during discussions
Synopsis leads (1 per discussion group, 2 total per week):
1. Note-taking during discussions
2. Summarize discussion synopsis, done Thursday - Monday
a) Merge notes from both groups
b) Summarize the main discussion points
c) Organize into an overview schema, table or figure
22

Discussion Topics (subject to change, based on student interests and course discussions)

Week 4 (2/27): Learning and generalization


Week 5 (3/6): Specialized vs general architectures
Week 6 (3/13): Cross-modal transfer
Week 7 (3/20): Large language models
Week 11 (4/17): Large multimodal models
Week 12 (4/24): Modern generative AI
Week 13 (5/1): Human-AI interaction
23

Grading Scheme for Readings and Discussions

• Reading assignments 15%


• 6 points per reading assignment session
• 1 point for scouting relevant resources
• 2 points for take-away messages from the assigned papers
• 3 points for reflections and thoughts on open discussion probes
• Total 7 reading assignments
24

Grading Scheme for Readings and Discussions

• Participation and discussions 15%


• 4 points per discussion session
• 2 points for the insight and quality of the shared discussion points
• 2 points for interactivity and participation as follow-up to other’s questions and suggestions.
• Total 7 reading discussions
25

Grading Scheme for Readings and Discussions

• Special leads 10%


• Reading leads:
• 4 points for preparing and delivering the presentation at the start of class
• 1 point for taking notes during the discussion
• Synopsis leads:
• 1 point for taking notes during the discussion
• 4 points for creating the post-discussion synopsis summarizing the take-home messages
• 1-2 times over the semester for each student
26

Other Discussion Roles


Scientific Peer Reviewer. The paper has not been published yet and is Hacker. You’re a hacker who needs a demo of this paper ASAP.
currently submitted to a top conference where you’ve been assigned as a Implement a small part or simplified version of the paper on a small dataset
peer reviewer. Complete a full review of the paper answering all prompts of or toy problem. Prepare to share the core code of the algorithm to the class
the official review form of the top venue in this research area (e.g., NeurIPS). and demo your implementation. Do not simply download and run an
This includes recommending whether to accept or reject the paper. existing implementation – though you are welcome to use (and give credit
to) an existing implementation for “backbone” code.
Archaeologist. This paper was found buried under ground in the desert.
You’re an archeologist who must determine where this paper sits in the Private Investigator. You are a detective who needs to run a
context of previous and subsequent work. Find and report on background check on one of the paper’s authors. Where have they
one older paper cited within the current paper that substantially influenced worked? What did they study? What previous projects might have led to
the current paper and one newer paper that cites this current paper. working on this one? What motivated them to work on this project? Feel
free to contact the authors, but remember to be courteous, polite, and on-
Academic Researcher. You’re a researcher who is working on a new topic.
project in this area. Propose an imaginary follow-up project not just based
on the current but only possible due to the existence and success of the Social Impact Assessor. Identify how this paper self-assesses its (likely
current paper. positive) impact on the world. Have any additional positive social impacts
left out? What are possible negative social impacts that were overlooked or
Industry Practitioner. You work at a company or organization developing omitted?
an application or product of your choice (that has not already been
suggested in a prior session). Bring a convincing pitch for why you should be
paid to implement the method in the paper, and discuss at least one positive
and negative impact of this application.

Thanks to Alec Jacobson and Colin Raffel


27

A Typical Week for Reading Assignments

• Previous Wednesday - @All reading assignment released


• Monday - @All reading assignment due
• Wednesday - @Reading leads make slides for clarifications + scouted papers
• Thursday - @Reading leads present slides
• Thursday - @All discussion in 2 groups
• Thursday - @Synopsis leads take notes with help from @Reading leads
• Thursday - @Reading leads submit slides for grading
• Thursday - @Synopsis leads submit 2 sets of notes
• Next Monday - @Synopsis leads merge notes and create coherent synopsis
28

Which weeks would you prefer to lead reading & synopsis?

Week 4 (2/27): Learning and generalization


Week 5 (3/6): Specialized vs general architectures
Week 6 (3/13): Cross-modal transfer
Week 7 (3/20): Large language models
Week 11 (4/17): Large multimodal models
Week 12 (4/24): Modern generative AI
Week 13 (5/1): Human-AI interaction
29

Research Project

• Similar in spirit to an independent study project


• Project teams of 1 to 3 students
• Final report should be like a research paper
• Expected to explore new research ideas
• Regular meetings with instructors on Thursday
30

Research Projects on New Modalities


Motivation: Many tasks of real-world impact go beyond image and text.

Challenges:
- AI with non-deep-learning effective modalities (e.g., tabular, time-series)
- Multimodal deep learning + time-series analysis + tabular models
- AI for physiological sensing, IoT sensing in cities, climate and environment sensing
- Smell, taste, art, music, tangible and embodied systems

Potential models and dataset to start with


- Brain EEG Signal: https://arxiv.org/abs/2306.16934
- Speech: https://arxiv.org/pdf/2310.02050.pdf
- Facial Motion: https://arxiv.org/abs/2308.10897
- Tactile: https://arxiv.org/pdf/2204.00117.pdf
31

Research Projects on AI Reasoning


Motivation: Robust, reliable, interpretable reasoning in (multimodal) LLMs.

Challenges:
- Fine-grained and compositional reasoning
- Neuro-symbolic reasoning
- Emergent reasoning in foundation models

Potential models and dataset to start with


- Can LLMs actually reason and plan?
- Code for VQA: CodeVQA: https://arxiv.org/pdf/2306.05392.pdf, VisProg:
https://prior.allenai.org/projects/visprog, Viper: https://viper.cs.columbia.edu/
- Cola: https://openreview.net/pdf?id=kdHpWogtX6Y
- NLVR2: https://arxiv.org/abs/1811.00491
- Reference games: https://mcgill-nlp.github.io/imagecode/, https://github.com/Alab-
NII/onecommon, https://dmg-photobook.github.io/
32

Research Projects on Interactive Agents


online assistant icon

Motivation: Grounding AI models in the web, computer,


or other virtual worlds to help humans with digital tasks.

Challenges:
- Web visual understanding is quite different from natural image understanding
- Instructions and language grounded in web images, tools, APIs
- Asking for human clarification, human-in-the-loop
- Search over environment and planning

Potential models and dataset to start with


- WebArena: https://arxiv.org/pdf/2307.13854.pdf
- AgentBench: https://arxiv.org/pdf/2308.03688.pdf
- ToolFormer: https://arxiv.org/abs/2302.04761
- SeeAct: https://osu-nlp-group.github.io/SeeAct/
33

Research Projects on Embodied and Tangible AI


Motivation: Building tangible and embodied AI systems that
help humans in physical tasks.

Challenges:
- Perception, reasoning, and interaction
- Connecting sensing and actuation
- Efficient models that can run on hardware
- Understanding influence of actions on the world (world model)

Potential models and dataset to start with


- Virtual Home: http://virtual-home.org/paper/virtualhome.pdf
- Habitat 3.0 https://ai.meta.com/static-resource/habitat3
- RoboThor: https://ai2thor.allenai.org/robothor
- LangSuite-E: https://github.com/bigai-nlco/langsuite
- Language models and world models: https://arxiv.org/pdf/2305.10626.pdf
34

Research Projects on Socially Intelligent AI


Motivation: Building AI that can understand and interact
with humans in social situations.

Challenges:
- Social interaction, reasoning, and commonsense.
- Building social relationships over months and years.
- Theory-of-Mind and multi-party social interactions.

Potential models and dataset to start with


- Multimodal WereWolf: https://persuasion-deductiongame.socialai-data.org/
- Ego4D: https://arxiv.org/abs/2110.07058
- MMToM-QA: https://openreview.net/pdf?id=jbLM1yvxaL
- 11866 Artificial Social Intelligence: https://cmu-multicomp-lab.github.io/asi-course/spring2023/
35

Research Projects on Human-AI Interaction


online assistant icon

Motivation: What is the right medium for human-AI


interaction? How can we really trust AI? How do we
enable collaboration and synergy?

Challenges:
- Modeling and conveying model uncertainty – text input uncertainty, visual uncertainty,
multimodal uncertainty? cross-modal interaction uncertainty?
- Asking for human clarification, human-in-the-loop, types of human feedback and ways to learn
from human feedback through all modalities.
- New mediums to interact with AI. New tasks beyond imitating humans, leading to collaboration.

Potential models and dataset to start with


- MMHal-Bench: https://arxiv.org/pdf/2309.14525.pdf aligning multimodal LLMs
- HACL: https://arxiv.org/pdf/2312.06968.pdf hallucination + LLM
36

Research Projects on Ethics and Safety


Motivation: Large AI models are can emit unsafe text
content, generate or retrieve biased images.

Challenges:
- Taxonomizing types of biases: text, vision, audio, generation, etc.
- Tracing biases to pretraining data, seeing how bias can be amplified during training, fine-tuning.
- New ways of mitigating biases and aligning to human preferences.

Potential models and dataset to start with


- Many works on fairness in LLMs -> how to extend to multimodal?
- Mitigating bias in text generation, image-captioning, image generation
37

Bi-weekly Project Meetings and Updates


• Required meetings on a bi-weekly basis
• About 20 minutes per meeting on Thursday afternoon, after class
• Primary mentor for each team
• Bi-weekly written updates
• Either Google Slides (preferred) or Google Docs
• Due Tuesdays at 9pm before the meeting
38

Schedule for Bi-Weekly Written Updates and Reports


• Week 3: Proposal presentations
• Week 4: Proposal report: baseline results and new ideas
• Week 6: Initial implementation of new ideas
• Week 8: Spring break (no meetings, no work, relax )
• Week 9: Midterm report: first complete round of results for idea
• Week 9: Midterm presentations
• Week 11: Updated results for research idea
• Week 13: Error analysis, ablations, and visualizations
• Week 14: Project presentations
• Week 16: Final report
39

Course Project Timeline


• Project preferences (Due Tuesday 2/11 at 9pm ET) – You should have selected
your teammates, have ideas about your dataset and task
• Proposal report (Due Tuesday 2/25 at 9pm ET) – Research ideas, review of
relevant papers and initial results
• Midterm report (Due Tuesday 4/1 at 9pm ET) – Intermediate report
documenting the updated results exploring your research ideas.
• Final report (Due Tuesday 5/20 at 9pm ET) – Final report describing explored
research ideas, with results, analysis and discussion.
40

Overall Grades
• First 40% for reading assignments and discussions.

• The second 60% comes from the course project:


• Proposal report and presentation 10%
• Midterm report and presentation 15%
• Final report and presentation 25%
• Bi-weekly written updates 10%
41

Absences and Late Submissions


• Lectures are not recorded, students expected to attend live
• If you plan to miss more than one lecture this semester, let us know as soon as possible.

• Reading assignment wildcards (2 per student)


• 24-hours extension, max 1 per week

• Project report wildcards (2 per team)


• 24-hours extension, can be used together
42

Course Websites
• Course website
• A public version of the course information
• Discussion synopsis will be posted here
• https://mit-mi.github.io/how2ai/spring2025/
• We will setup canvas for submissions
43

Assignments for This Coming Week


No reading assignment this week.
For project:
• Project preference form (Due Tuesday 2/11 at 9pm ET)
• To help with team matching
• Google Form link will be available on Piazza
- Start thinking about what project you want to work on! and potential group
mates.

This Thursday: lecture on how to do AI research


- From reading papers, to generating ideas, to execution, to paper writing

You might also like