0% found this document useful (0 votes)
34 views52 pages

Week 8 - MMML - Introduction

Uploaded by

Nemesis Ccc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views52 pages

Week 8 - MMML - Introduction

Uploaded by

Nemesis Ccc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Cognitive Computing

Lecture 8
Introduction to Multimodal Machine Learning

Dr. Hany Hanafy Mahmoud


Table of Contents
• What is Multimodal cognitive system?
• Multimodal History
• Multimodal learning
• Core Technical Challenges
• Multimodal Research Task
What is Multimodal cognitive system?

• Science related to data with more sensory modalities

• This approach is rooted in the theoretical assumption that


cognitive performance can be influenced by other modes
of psychological processing

• E.g., perceptual, emotional, social, and responses to the


physical environment.

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
What is Multimodal cognitive system?

Multimodal Communicative Behaviors

• Verbal: What you see?

• Vocal: How you say it?

• Visual: How visual behavior looks like?

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
What is Multimodal cognitive system?

• Verbal:
• Lexicon (words), Syntax (POS), …
• Lexical analyzer: divides the text into words, phrases, and
paragraphs. It identifies the structure of words in sentences
• Semantic Analysis: it determines if the text has any meaning and
attempts to discover its true meaning.

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
What is Multimodal cognitive system?

• Vocal:
• Voice quality
• Intonation
• Vocal expressions (laugher,…)

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
What is Multimodal cognitive system?

• Visual:
• Gestures: head gestures, eye gestures
• Body language: arm movements, body posture, proxemics
• Eye contact and head gaze
• Facial expressions: smile, …

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
What is Multimodal cognitive system?
• Modality: is a certain type if information & data representation
format.
• Sensory Modality: primary forms of sensation as vision,
hearing, touch, ...
• Medium: is instrumentation for storing & communicating
information.

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
What is Multimodal cognitive system?
Multiple Communicates

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
What is Multimodal cognitive system?
Examples of Modalities
• NLP (text & speech)
• Visual (images or videos)
• Auditory (voice, sound or music)
• Smell, taste, touch
• Physiological Signals; Electrocardiogram, ECG, skin conductance
• Other Modalities: infrared images, depth images, fMRI

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
What is Multimodal cognitive system?
Different modalities: show diverse qualities, structures and
representations.

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
What is Multimodal cognitive system?

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
What is Multimodal cognitive system?

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
What is Multimodal cognitive system?

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
What is Multimodal cognitive system?
Connection types:
• Correlation: there is a statistical association / relationship bet. variables. It reflects
things that appear to behave in a “similar” way.
• Causation: a change in one variable causes a change in another variable. It is when
you say something causes something else to happen.
• Co-occurrence: refers to the frequency with which two / more entities (such as
words, phrases, or concepts) appear together within a given context, such as a
document. It is a measure of how often entities are found in proximity to each other,
indicating potential relationships or associations between them.
• Associations: refers to any relationship between two variables, including linear,
curvilinear, or non-linear relationships. Therefore, all correlations are associations, but
not all associations are correlations

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
What is Multimodal cognitive system?

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
What is Multimodal cognitive system?

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
What is Multimodal cognitive system?

Multi-Modal Machine Learning (MMML): is the study of


computer algorithms that learn and improve through the use
and experience of data from multiple modalities.

Artificial Intelligence for Multimodal data: are able to


demonstrate intelligence capabilities such as understanding,
reasoning, planning, …

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
What is Multimodal cognitive system?

New Modality
Representation

Prediction

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
Core Multimodal Challenges

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
1. Representation:
It reflects cross-modal interactions between individual elements
across different modalities

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
1. Representation:

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
2. Alignment:
Identifying cross-modal connections between all elements of
multiple modalities, building from the data structure.
Most modalities have internal structure with multiple elements

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
2. Alignment:

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
3. Reasoning:
Combine knowledge through multiple inferential steps,
exploiting multimodal alignment and problem structure

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
4. Generation:
Learn a generative process to produce raw modalities that
reflects cross-modal interactions, structure and coherence.

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
5. Transference:
Transfer knowledge between modalities to help target modality
which may be noisy or with limited resources.

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
5. Transference:

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal Challenges
6. Quantification:
Theoretical study to better understand heterogeneity, cross-
modal interactions and the multimodal learning process.

https://www.youtube.com/watch?v=DPkwjgaRvyI&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW
Multimodal History
• Behavioral: 1970 till late 1980s
• Computational: late 1980s sill late 2000
• Interaction: 2000 to 2010
• Deep learning: 2010s until now

• Next era: ?

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
‫‪Multimodal History‬‬
‫‪• Behavioral: 1970 till late 1980s‬‬

‫اإليماءات هي في الواقع تفكير المتحدث في العمل ومكونات متكاملة للكالم‪ ،‬وليس مجرد مرافقات أو إضافات‬

‫‪https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s‬‬
Multimodal History
• Computational: late 1980s sill late 2000
The goal of affective computing is to create a computing
system capable of perceiving, recognizing, and
understanding human emotions and responding
intelligently, sensitively, and naturally, thus making human–
computer interaction more natural

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
Multimodal History
• Interaction: 2000 to 2010

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
Multimodal History
• Deep learning: 2010s until now

https://www.youtube.com/watch?v=VIq5r7mCAyw&t=4131s
Multimodal History
• 1990 to 202X Timeline:

https://www.youtube.com/watch?v=607EcmU9mFs&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW&index=3
Multimodal Research Task
Real-world tasks for MMML:
A. Affected recognition: recognize emotions, sentiment
B. Media description: image and video captioning
C. Multimodal QA: image and video QA, visual reasoning
D. Multimodal navigation: language guided navigation, autonomous
driving
E. Multimodal Dialog: ground dialog
F. Event recognition: action recognition and segmentation
G. Multimedia information retrieval: content based, cross media

https://www.youtube.com/watch?v=607EcmU9mFs&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW&index=3
Multimodal Research Task
• Dataset

Datasets: https://www.youtube.com/watch?v=607EcmU9mFs&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW&index=2

On GitHub: https://github.com/topics/multimodal-datasets

https://www.youtube.com/watch?v=607EcmU9mFs&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW&index=3
Multimodal Research Task
• Dataset

Datasets:
https://www.youtube.com/watch?v=607EcmU9mFs&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW&index=2
On GitHub:
https://github.com/topics/multimodal-datasets
https://www.youtube.com/watch?v=607EcmU9mFs&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW&index=3
Datasets Affect Recognition

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Datasets Affect Recognition

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Datasets Affect Recognition
Cross media Retrieval

Confounding variable is an unmeasured third variable that


influences both the supposed cause and the supposed effect.

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Datasets Media Description

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Datasets Media Description

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Datasets Multimedia QA

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Datasets Multimedia QA

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Datasets Media Description

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Example 1: Select-Additive Learning
Sentiment classification task for verbal, acoustic, visual. It improves the
generalizability of trained neural networks for multimodal sentiment analysis

https://arxiv.org/abs/1609.05244
Confounding variables are factors that can influence both the independent and dependent variables in a study, leading to
biased or incorrect conclusions about the relationship between them. In machine learning, addressing confounding variables is
crucial for accurate causal inference and prediction.
https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Example 1: Select-Additive Learning

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Example 2: World-level gated Fusion

Multimodal Sentiment Analysis: Gated Multimodal Embedding LSTM with Temporal Attention (GME-LSTM(A)) model
https://arxiv.org/abs/1802.00924

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Example 2: World-level gated Fusion

GME: Gated Multimodal Embedding

https://www.youtube.com/watch?v=fBYu8I52nVM&list=UULFqlHIJTGYhiwQpNuPU5e2gg&index=54
Multimodal Research Task
Datasets Requirements for the project
• Dataset should have at least two modalities
• Teams of 2 or 3 students
• Stages:
• Pre-proposal: define dataset and research task
• Study related work to your selected research topic
• Experiment with Unimodal representations
• Implement & evaluate state-of-the-art model(s)
• Create GitHub repository & it is accessible by course staff
• Each report should include a description of the task from each team member.
• Make a video that present the robot in action
• Write a paper.
https://www.youtube.com/watch?v=607EcmU9mFs&list=PL-Fhd_vrvisMYs8A5j7sj8YW1wHhoJSmW&index=3

You might also like