0% found this document useful (0 votes)
68 views9 pages

Paper 2

research paper wow

Uploaded by

talib dbouk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views9 pages

Paper 2

research paper wow

Uploaded by

talib dbouk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Making Your Dreams A Reality: Decoding the Dreams into a

Coherent Video Story from fMRI Signals†


Yanwei Fu∗ , Jianxiong Gao, Baofeng Yang, Jianfeng Feng

Real Visual Stimulus Perception

Dreams Observed
Imaged Visuals
during Sleep
arXiv:2501.09350v1 [cs.CV] 16 Jan 2025

Decoding

Shared Brain fMRI Signal Real Visual Stimulus


Activity Patterns

Visual Imagery Dream Narrative


Decoding … Integration

Dream
t=1 … t=N Video
t=2
fMRI Signal
Dream Snapshot

Figure 1: Our Dream decoding process blends fMRI decoding with tasks related to real and imagined visuals, seamlessly turning
disjointed dream images into complete stories.

ABSTRACT long time, the fleeting and deeply personal nature of dreams has
This paper studies the brave new idea for Multimedia community, made them hard to study. However, thanks to new brain imaging
and proposes a novel framework to convert dreams into coherent tools, especially functional magnetic resonance imaging (fMRI), we
video narratives using fMRI data. Essentially, dreams have intrigued are starting to understand what happens in the brain when we
humanity for centuries, offering glimpses into our subconscious dream. Using fMRI, researchers can now see which parts of the
minds. Recent advancements in brain imaging, particularly func- brain are active during sleep and relate these patterns to the stories
tional magnetic resonance imaging (fMRI), have provided new ways and emotions people describe after they wake up. We consider the
to explore the neural basis of dreaming. By combining subjective task of turning dreams into coherent video stories a very brave new
dream experiences with objective neurophysiological data, we aim idea in our Multimedia Community. It offers the unique opportunity
to understand the visual aspects of dreams and create complete to transform your dreams into reality using a completely new type
video narratives. Our process involves three main steps: reconstruct- of input, the fMRI brain scans, and helps us understand some basic
ing visual perception, decoding dream imagery, and integrating capabilities of humans.
dream stories. Using innovative techniques in fMRI analysis and lan- The main goal of our task is to use fMRI to transform the visual
guage modeling, we seek to push the boundaries of dream research elements of dreams into detailed videos, going beyond basic catego-
and gain deeper insights into visual experiences during sleep. This rization to fill gaps in our current understanding of dream content.
technical report introduces a novel approach to visually decoding Analyzing the possibilities and challenges of this task, we recognize
dreams using fMRI signals and weaving dream visuals into narra- that while people can describe their dreams, fMRI reveals the spe-
tives using language models. We gather a dataset of dreams along cific brain areas involved in creating and processing these dreams.
with descriptions to assess the effectiveness of our framework. By combining personal accounts with brain imaging, researchers
can start identifying the brain’s "signatures" for different aspects
1 INTRODUCTION of dreaming, such as memory, emotion, and sensory experiences
during sleep. Through analyzing brain activity during dreams and
Dreams have always captured the imagination of people, from
employing sophisticated techniques to decode these signals, we
scientists and philosophers to artists. Although dreams give us a
achieve a high-fidelity reconstruction of visual experiences within
peek into the hidden parts of our minds, revealing the intricate
dreams, surpassing the limitations of semantic categorization. How-
ways we think and feel, they are still shrouded in mystery. For a
ever, the primary challenge in interpreting dreams is bridging the
∗ Corresponding authors. Email: [email protected]
† Work in progress
Making Your Dreams A Reality, Work in progress, 2024 Yanwei Fu∗ , Jianxiong Gao, Baofeng Yang, Jianfeng Feng

gap between subjective experiences and objective neurophysiological these phenomena stem from a shared neural substrate between
data. wakefulness and sleep states. Several studies analyzed brain re-
Furthermore, dreams are not merely isolated images but are gion activations across different states to investigate the neural
composed of cohesive, narrative sequences [16]. The conventional commonalities and differences [3, 17, 24]. Moreover, Northoff et
method [20] of decoding individual dream images falls short in al. [27] introduced a specific spatiotemporal model of dreams to
capturing the continuous brain activity that generates dream expe- bridge the gap between neural fluctuations and mental experiences.
riences. Therefore, we propose designing a novel toolkit to auto- However, these studies have not precisely determined how specific
matically put together different media elements decoded based on visual content is represented in brain activity. Horikawa et al. [20]
what the user dreams, and create a coherent video story reflecting pioneered the use of functional magnetic resonance imaging (fMRI)
the dreams. Critically, our novel toolkit also presents the textual and machine learning analysis to decode spontaneous brain activ-
descriptions by leveraging the Large Language Models (LLMs) [37] ity during sleep to identify visual content in dreams. Subsequently,
to interpret the decoded images and integrate them into compre- hierarchical visual feature decoding techniques were employed
hensive dream experiences. to determine brain decoding of objects observed or imagined by
As the brave new idea for this novel applications, there is no individuals [18], and more accurately, dream object decoding [19].
previous work or research that conducts this task. We thus propose However, these methods of visual decoding only achieve seman-
a novel pipeline that, for the first time, enables neurally decoding tic category-level classification, making it challenging to visualize
dreams. Particularly, as shown in Fig. 1, our approach involves the real visual experiences of dreams. Therefore, this paper aims
several important steps: modeling how dreams appear visually in to fill this research gap by utilizing state-of-the-art fMRI-to-image
the brain, combining fMRI decoding with tasks involving both real reconstruction techniques to achieve precise reconstruction of the
and imagined visuals, and turning disjointed dream images into fine-grained visual experiences in dreams.
complete narratives.
We give the detailed process of decoding dreams from fMRI scans
into three main steps. First, in the Reconstructing Visual Stimulus
Perception stage, we decode brain activity linked to seeing real ob- 2.2 Visual Stimulus Decoding
jects or scenes by analyzing fMRI data collected when subjects are Compared to Dream Visual Decoding, decoding fMRI signals un-
awake and viewing visual stimuli, aiming to recreate these images der real visual stimuli has been studied more extensively. Visual
based on brain processes. Next, in the Decoding Dream Visual Im- Stimulus Decoding relies on observed images and their correspond-
agery stage, we use brain activity patterns from both awake visual ing fMRI responses to achieve fMRI-to-image reconstruction. Early
experiences and dream visuals to decode images from the dream, Visual Stimulus Decoding work primarily focused on high-level
analyzing sleep-state fMRI data to capture dream visuals as they semantic information of images [6, 15, 22, 35], or specific categories
occur at different times. Finally, in the Integrating Dream Narratives of image tasks such as face recognition. The development of gen-
stage, we stitch together the decoded images into a full, flowing erative models like GANs and VAEs subsequently led to a trend
dream story using advanced language models, aiming to create a in decoding natural images from fMRI [14, 26, 31, 34]. However,
cohesive narrative that captures the entire dream experience. the images decoded by these methods still suffer from blurriness
This is a ‘brand new’ novel idea of making your dreams a reality. and distortions. Recently, benefiting from the powerful generative
Our central contribution is to make this incredible task work. We capabilities of Diffusion Models [32], researchers have used fMRI
summarize the key technical novelties that are important to our signals as conditions for Diffusion Models to obtain high-quality
tasks. visual reconstruction results [5, 9, 10, 28, 33, 36]. Chen et al. [5]
(1) We make a first try of introducing a novel method using fMRI decomposed fMRI-to-image reconstruction into fMRI embedding
signals to visually decode the dreams, and enabling dreams into a learning based on MAE and conditional attention layers based on
coherent video story. fine-tuning LDM, improving the semantic relevance and visual
(2) We propose bridging the gap between real and dream visual quality of reconstruction results. Ozcelik et al. [28] and Scotti et
experiences, providing deeper insights into dream generation; al. [33] proposed learning mappings from fMRI signals to CLIP
(3) We present integrating dream visualizations into complete visual text and image features, VAE latent codes respectively, leveraging
narratives using LLMs, which helps us better understand what high-level CLIP text and image features and low-level intermediate
dreams mean. images to improve the consistency between reconstruction results
We collect a dataset of dreams along with detailed descriptions and original visual inputs at the low level. However, these works
to evaluate how well our framework works. Usually, it’s quite chal- can only be trained on individual subjects and cannot learn shared
lenging for participants to provide precise descriptions of their knowledge among multiple individuals. Qian et al. [29] utilized 40k
dreams. As a result, the gathered descriptions mostly serve as gen- subjects from the UK Biobank dataset (UKB) [25] to train a large-
eral indicators to gauge the effectiveness of our framework. scale fMRI representation model, which can encode multi-individual
brain signals into a unified large-scale latent space. NeuroPictor
2 RELATED WORK [21] then adopts fMRI-to-image cross-subject pre-training based
on this unified representation to learn shared visual perception
2.1 Dream Visual Decoding among multiple individuals and proposes using coupled low-level
Dreams often present themselves as disjointed and fantastical scenes, manipulation and high-level guiding networks to recover original
plots, and feelings. Early research suggested that similarities in images.
Making Your Dreams A Reality: Decoding the Dreams into a Coherent Video Story from fMRI Signals† Making Your Dreams A Reality, Work in progress, 2024

Inspired by the multi-subject pre-training fMRI-to-image meth- During the fMRI sessions, participants are comfortably posi-
ods in Visual Stimulus Decoding, we aim to transfer the knowledge tioned within the MRI scanner, ensuring optimal alignment for the
learned from Visual Stimulus Decoding, involving different individ- precise recording of their neural activity during the sleep cycle. Fol-
uals and shared brain activity patterns under real and dreamlike lowing [12, 13], the data acquisition was conducted using a 3 Tesla
stimuli, to Dream Visual Decoding. (3T) MRI scanner and a 32-channel RF head coil with a temporal
resolution of 1.25Hz. To facilitate an in-depth exploration of sleep
3 PROBLEM SETUPS phenomena, volunteers were afforded a duration of sleep ranging
from 40 to 70 minutes within the controlled environment of the
Dream decoding involves analyzing fMRI data recorded from in-
MRI scanner. This extended period allowed for the exploration of
dividuals during sleep to extract their virtual experiences, termed
various sleep stages characterized by distinct patterns of neural
dreams. In this endeavor, individuals are subjected to fMRI scans
activity and physiological changes. Throughout the data collection
while they are in a state of sleep, resulting in the continuous record-
process, we employed monitoring devices to obtain participants’
ing of brain activity patterns throughout the sleep cycle. These
head movements and eyelid states, assisting us in determining
recorded fMRI data form a spatiotemporal sequence denoted as
whether participants remained in a stable sleep state throughout
{𝐹𝑖 }(𝑖 = 1, · · · , 𝑛) where 𝑛 is the recorded fMRI temporal length,
the recording session.
representing the dynamic neural activity occurring during sleep.
For preprocessing, we followed a processing pipeline [7, 8] simi-
The objective of dream decoding is to extract from this complex
lar to that described in [12, 29] to map the neural activity of different
spatiotemporal sequence the virtual visual experiences, commonly
participants onto a common surface map. This process facilitates
referred to as dreams∗ that individuals undergo during sleep. How-
the dissemination of knowledge across datasets, which is crucial for
ever, dreams manifest as intricate and often surreal amalgamations
inter-individual understanding. Specifically, we convert the fMRI
of images, scenarios, and emotions, which are subjectively experi-
time series to fsLR32K surface space using Connectome Workbench,
enced by individuals during sleep. To achieve this, two fundamental
followed by z-scoring values across each session and rendering
aspects need consideration: (1) Virtual Visual Frames Generation:
them into 2D images. This process yields brain surface images re-
At specific moments during sleep, individuals generate virtual vi-
sembling a "butterfly" representing cortical activation. Finally, the
sual frames encapsulating their dream experiences. These frames
early and higher visual cortical (VC) regions, including “V1, V2, V3,
represent snapshots of the dream content perceived by the individ-
V3A, V3B, V3CD, V4, LO1, LO2, LO3, PIT, V4t, V6, V6A, V7, V8, PH,
ual at distinct time points during the sleep cycle. (2) Visual Story
FFC, IP0, MT, MST, FST, VVC, VMV1, VMV2, VMV3, PHA1, PHA2,
Information: These individual virtual visual frames are not isolated
PHA3”, are selected for further analysis.
entities but rather pieces of a larger narrative. When integrated,
they form a cohesive visual story that conveys the progression and
thematic elements of the dream experience. 5 METHOD
In summary, we need to associate the brain activity patterns at Overview. We divide fMRI dream decoding into three stages
each time point with the corresponding virtual visual frames, yield- (illustrated in Figure 3): Visual Stimulus Perception Reconstruction,
ing the 𝑖-th snapshot image 𝐼𝑖 corresponding to 𝐹𝑖 . Subsequently, by Dream Visual Imagery Decoding and Dream Narrative Integration.
dynamically assembling these snapshot images, we can construct a i) Visual Stimulus Perception Reconstruction: This stage involves
complete visual narrative of the dream, denoted as 𝑉 . decoding brain activity patterns associated with the perception
of real visual stimuli. By analyzing fMRI data collected during
4 DATA COLLECTION AND PREPROCESSING wakefulness when subjects are presented with visual stimuli, we
aim to reconstruct these stimuli using the neural representations.
ii) Dream Visual Imagery Decoding: In this stage, we leverage
shared brain activity patterns across both real visual stimuli and
dream-induced visual experiences to decode snapshots of the dream
content. By analyzing fMRI data collected during the dream state,
our aim is to decode the virtual visual frames representing the
Sleep-stage
fMRI
dream experiences perceived by individuals at distinct time points
session during the sleep cycle.
iii) Dream Narrative Integration: This final stage involves integrat-
ing the decoded single-frame dream visualizations into a compre-
fMRI Scanner hensive narrative of the entire dream sequence. Leveraging Large
Language Models (LLM), we aim to synthesize fragmented dream
visualizations into cohesive narratives, providing a complete inter-
Figure 2: We use a 3 Tesla (3T) MRI scanner for data collection. pretation of the dream experience.
Participants are positioned within the MRI scanner.
5.1 Visual Stimulus Perception Reconstruction
∗ Pleasenote that we specifically refer to "dreams" to make it easier for our Multimedia There are two key challenges in directly using dream data for model
community to understand the new task. However, it is important to acknowledge that
in neurology, there are precise definitions for different stages of sleep, and only certain training: data scarcity and the lack of real dream visual images. On
stages are categorized as dreams. For detailed definitions, please refer to [11]. one hand, collecting dream data is constrained by the fact that
Making Your Dreams A Reality, Work in progress, 2024 Yanwei Fu∗ , Jianxiong Gao, Baofeng Yang, Jianfeng Feng

Stage 2: Dream Visual Imagery Decoding

FMRI Input
GT Image

NeuroPictor
fMRI
Encoder
NeuroPictor

High-Level
Guiding
Network
Stable
Diffusion
Low-Level
Manipulation
Network Stage 3:
Dream Narrative
Reconstruction Integration

Stage 1: Visual Stimulus Perception Reconstruction

Figure 3: The fMRI dream decoding process is divided into three stages: i) Visual Stimulus Perception Reconstruction: Decoding
brain activity patterns associated with real visual stimuli perception. ii) Dream Visual Imagery Decoding: Shared brain activity
patterns between real visual stimuli and dream-induced visual experiences aid in decoding snapshots of dream content from
fMRI data. iii) Dream Narrative Integration: By leveraging LLM, we synthesize fragmented dream visualizations into cohesive
narratives, providing a complete interpretation of the dream experience.

This collection represents scenes


Task from a dream I had, and I want to
Prompt structure them into my subjective
description of the dream…

Single-Shot Dream Story


Dream Description Generation
Image 1: [caption of Image 1] Dream Snapshot Script Video
Integration
Image 2: [caption of Image 2] Dream Title
LENS

Dream
Image N: [caption of Image N] Audio Recommendation Video

Dream Snapshot

Figure 4: Pipeline of Dream Narrative Integration. This process includes three steps: Single-Shot Dream Description, Dream
Story Composition, and Video Integration.

it can only be obtained during participants’ sleep states, and the terms of interpreting dream content, dream reports provide us
availability of effective data relies on the number of dreams recalled with captions of the dreams, but these are limited to higher-level
by participants. Dreams are spontaneously generated by the brain semantics. While such semantic-level labels can be used for training
during sleep stages, and even with prolonged sleep, the number classification tasks, they are insufficient for reconstructing visual
of accurately recallable dreams is limited. On the other hand, in images from dreams, making supervised training challenging.
Making Your Dreams A Reality: Decoding the Dreams into a Coherent Video Story from fMRI Signals† Making Your Dreams A Reality, Work in progress, 2024

Therefore, to address these issues, we propose to introduce an decoding image corresponding to the i-th moment:
fMRI-to-image task under real visual stimuli. In the Visual Stimulus
Perception Reconstruction task, subjects observe specific images 𝐼𝑖𝑑𝑟𝑒𝑎𝑚 = G(E(𝐹𝑖 )), (3)
while brain activity is recorded, and then the task involves decoding
real visual stimuli from fMRI signals. This dataset of fMRI-image where 𝐹𝑖 is the original fMRI signal in the i-th moment and 𝐼𝑖𝑑𝑟𝑒𝑎𝑚
pairs is significantly larger in scale than dream datasets, thus ad- is the corresponding dream visual image decoded by 𝐹𝑖 .
dressing the issue of original data scarcity. Moreover, inspired by In contrast to previous approaches in dream decoding, which
research [20] suggesting that specific visual experiences during relied on dream report labels to provide only coarse-level supervi-
sleep are represented by brain activity patterns shared by stimu- sion for learning to decode the types of objects present in dreams,
lus perception, we attempt to utilize shared brain activity patterns our method directly decodes visual images from dream experiences.
across both real visual stimuli and dream-induced visual experi- This direct decoding approach allows us to visualize the content
ences to decode snapshots of dream content. Specifically, we first of dreams directly rather than conducting rough category-level
learn to reconstruct real visual stimuli from fMRI signals, and then classification.
transfer the learned brain activity patterns associated with real
visual stimuli to dreamlike visual perceptions. 5.3 Dream Narrative Integration
We base our study on the Natural Scenes Dataset (NSD) [1] for Reconstructing dream visuals frame by frame can only discretely
learning the Visual Stimulus Perception Reconstruction task. NSD represent the dream scenes generated by participants at specific
comprises fMRI-image data from 8 individuals, with the original time points. However, the dream experiences formed by individuals
image data consisting of approximately 65k images from the MS- during the sleep stage unfold as a continuous narrative.
COCO [23] dataset. This large-scale dataset provides a basis for We turn to the assistance of large-scale language models to
learning cross-individual, generalizable fMRI-to-image reconstruc- accomplish the integration of dream stories, by employing these
tion. We follow NeuroPictor [21] for the reconstruction process models to bridge the fragments into a cohesive narrative. This
as it allows for training a unified model across multiple subjects. endeavor can be defined as follows: given discrete dream visual
Specifically, we utilize the fMRI surface map 𝐹 as input, initially reconstruction results {𝐼𝑖𝑑𝑟𝑒𝑎𝑚 }(𝑖 = 1, · · · , 𝑁 ), we aim to dynami-
processed through a transformer encoder to obtain a unified fMRI cally compose these individual shots into a unified story. Inspired
representation across different persons: by the Intelligent Director Framework proposed in [37] for auto-
mated video synthesis using ChatGPT [4], we adjust this pipeline
𝐹 𝑟 = E(𝐹 ), (1)
to fit the task of Dream Narrative Integration. This process can be
where E denotes the fMRI encider. Then, we employ a diffusion divided into three steps: Single-Shot Dream Description, Dream
generative model integrated from a High-Level Guiding Network Story Composition, and Video Integration.
and Low-Level Manipulation Network to guide semantic and low- In the Single-Shot Dream Description phase, we first employ the
level details’ generation as follows: Image-Text QA model LENS [2] to generate textual descriptions
for each individual shot’s dream visual reconstruction 𝐼𝑖𝑑𝑟𝑒𝑎𝑚 . This
𝐼 𝑟𝑒𝑎𝑙 = G(𝐹 𝑟 ), (2) provides a detailed description of each dream shot 𝐼𝑖𝑑𝑟𝑒𝑎𝑚 . Subse-
where 𝐼 𝑟𝑒𝑎𝑙 is the reconstructed image, and G is the integrated quently, we organize these descriptions into a sequentially arranged
generative model. We follow the training objectives in [21] on the caption prompt using the following template:
NSD dataset of eight individuals, facilitating subsequent knowl-
edge transfer between different individuals and between real visual “ Image 1: [caption of Image 1]
responses and dream-induced visual responses. Image 2: [caption of Image 2]
..
5.2 Dream Visual Imagery Decoding .
Image N: [caption of Image N] ”.
Building upon the Visual Stimulus Perception Reconstruction model
obtained from the previous section, we aim to accomplish Dream During the Dream Story Composition phase, we utilize ChatGPT
Visual Imagery Decoding by leveraging shared brain activation to generate a logically coherent narrative with a dream-like story-
patterns between real and dream experiences. Benefiting from the telling style. We tailor a question prompt to transform our dream
large scale and image diversity of the NSD dataset, our trained narrative integration task into ChatGPT-based script generation
generative model can cover different individuals and a wide range based on the captions of individual dream shots. The prompt for
of images. Considering previous research suggesting shared visual task description is defined as follows:
cortex activation responses between real and dream experiences,
we directly transfer the trained Visual Stimulus Perception Recon- “ I have a collection of photos and videos, with a fixed or-
der. I need your help to organize these materials according
struction model to the Dream Visual Imagery Decoding task. For
to their input sequence. This collection represents scenes
the fMRI sequences collected during sleep stages, we first average from a dream I had, and I want to structure them into my
them according to a window size to align with the fMRI acquisition subjective description of the dream based on their captions.
process under real visual stimuli, where participants need to ob- Additionally, I require a smoothly written script that con-
serve images for 3-4 seconds. Then, we perform zero-shot decoding nects these images into a cohesive narrative.
on each discrete timestamp of the fMRI to obtain the dream visual I need you to do two things:
Making Your Dreams A Reality, Work in progress, 2024 Yanwei Fu∗ , Jianxiong Gao, Baofeng Yang, Jianfeng Feng

(1) Provide a subjective description of my dream from my Table 1: Instances of clearly recalled dreams.
perspective based on the captions of these images and videos,
keeping the order fixed according to the input sequence.
Subject Caption
(2) Write a script according to the input material sequence.
The script should be concise, fluent, vivid, and the transitions Skiing with a snowboard.
between different materials should be natural. ” Sub-1
Enjoying a cup of milk tea.
In addition to the task description prompt, we also customize Sub-2 People running.
a standardized output prompt. To achieve a harmonious blend of
audio and visual elements in the dream story video, we request Some cats.
Sub-3
ChatGPT to generate a script for each dream shot, including dream Some fruits, with plenty of grapes.
titles, concluding remarks, and a recommended audio track. The
complete prompt template is available in the supplementary ma-
terials. Thus, we utilize ChatGPT to integrate logically coherent
6.3 Evaluation Metrics
dream narrative textual material.
Finally, based on the dream narrative text generated by Chat- We utilize image category prediction to assess the alignment be-
GPT, we proceed with video integration. Specifically, following the tween reconstructed dream images and ground truth descriptions.
sequence of dream shots, we insert corresponding scripts for each Specifically, we augment the 80 class labels from the COCO dataset
shot and combine them with dream titles, concluding remarks, and [23] with additional class labels extracted from the ground truth
recommended audio tracks to create a comprehensive video. This descriptions, merging identical labels to form the text categories
final video encompasses dream imagery, narrative text, and suitable used for dream evaluation. As these introduced labels differ from
music, thereby reflecting a complete dream experience during the those in standard datasets, we employ CLIP [30] for zero-shot simi-
sleep stage. larity computation. We organize text categories into captions using
the template "a photo of [label]", then use CLIP to calculate the
similarity between each reconstructed dream image and text, fol-
6 EXPERIMENTS lowed by softmax normalization to obtain the final image-category
6.1 Dataset similarity.
6.1.1 Natural Scenes Dataset. For the Visual Stimulus Perception
Reconstruction task, we utilized all subjects from the Natural Scenes 6.4 Qualitative Results
Dataset (NSD) [1] to train a multi-subject fMRI-to-image model. We present visualizations of dream narratives for "cat," "skiing with
The NSD, collected using a 7T MRI scanner, is currently the largest- a snowboard," and "people running" in Figures 5, 6, and 7, respec-
scale and highest-quality dataset for visual decoding. This enabled tively. It’s evident that we initially decode individual dream im-
us to train the model using over 64,000 images and their corre- ages from the fMRI sequences, followed by assembling them into
sponding fMRI signals from eight participants to reconstruct real a cohesive dream narrative video. The resulting videos align with
visual perceptions. participants’ descriptions. Leveraging the power of LLM, our con-
structed dream narratives are vivid, coherent, seamlessly weaving
6.1.2 fMRI-Dream Dataset. The dream dataset we collected in- together disparate dream scenes. Complete videos are available in
cludes 3.3 hours of fMRI data from three participants during their the supplementary materials.
sleep stages. Based on participants’ dream reports, this consists of
five instances of clearly recalled prolonged dream experiences and 6.5 Quantitative Results
several brief, indistinct dreams that couldn’t be precisely recalled.
We selected the five instances with clear recall as ground truth la- To assess the correspondence between our decoded dreams and
bels to validate the accuracy of our dream interpretation. The other actual dream experiences, we partition the dataset into positive and
brief, indistinct dreams were excluded from consideration since negative samples for each genuine dream description in Table 1.
participants couldn’t definitively confirm their recollection. The Positive samples refer to the sleep fMRI data where participants
five instances of clearly recalled dreams are detailed in Table 1. We reported the presence of the corresponding dream, while negative
subsequently qualitatively and quantitatively analyze three of these samples denote sleep fMRI data without that particular dream. Since
segments. The performance of the other two samples is relatively the dreams of the three participants are distinct, for a given dream
lower due to unavoidable domain gaps and complex semantics. See description, we can designate the corresponding participant’s sleep
supplementary materials for further analysis. data as positive samples and the sleep data from the other partici-
pants as negative samples.
Following the methodology outlined in Section 6.3, we compute
6.2 Implementation Details the average similarity between the ground-truth dream descriptions
During the Visual Stimulus Perception Reconstruction phase, we and the corresponding positive and negative samples’ fMRI data sep-
train the fMRI-to-image reconstruction model using data from 8 arately. The results are depicted in Figure 8. It can be observed that
subjects from the NSD. In the Dream Visual Imagery Decoding the average similarity of positive instances is consistently higher
and Dream Narrative Integration stages, we employ a single GTX than that of negative instances, indicating a closer match between
3090Ti GPU for image and text inference. our decoded dream imagery and the actual dream descriptions.
Making Your Dreams A Reality: Decoding the Dreams into a Coherent Video Story from fMRI Signals† Making Your Dreams A Reality, Work in progress, 2024

fMRI
Surface

Dream
Snapshot

Dream
Narrative
Video

Figure 5: Visualization of Dream Narrative Video "some cat".

fMRI
Surface

Dream
Snapshot

Dream
Narrative
Video

Figure 6: Visualization of Dream Narrative Video "skiing with a snowboard".

Furthermore, we conducted standard Mann-Whitney U tests samples. While this result does not meet the conventional threshold
on the similarity sequences of positive and negative instances. As of significance (𝑝 < 0.05), it still hints at a potential distinction.
shown in Table 2, for the ground-truth dream descriptions "cat"
and "people running," the p-value is smaller than 0.05, indicating a
significant difference in the distribution of positive and negative 7 CONCLUSIONS
samples. For the description "skis," the p-value was 0.062, suggesting This paper has presented an innovative framework for converting
a marginal difference in the distribution of positive and negative dreams into coherent video narratives using fMRI data, marking
a significant advancement in the field of multimedia and dream
Making Your Dreams A Reality, Work in progress, 2024 Yanwei Fu∗ , Jianxiong Gao, Baofeng Yang, Jianfeng Feng

fMRI
Surface

Dream
Snapshot

Dream
Narrative
Video

Figure 7: Visualization of Dream Narrative Video "people running".

analysis and language modeling to bridge the gap between subjec-


tive dream experiences and objective neurophysiological data. By
reconstructing visual perceptions, decoding dream imagery, and
integrating these elements into flowing narratives, we provide a
novel method for visualizing and understanding dreams as compre-
hensive video stories. The implications of our work extend beyond
the scientific exploration of dreams. They open up new possibilities
for creative expression, allowing individuals to explore their dream
experiences.

REFERENCES
[1] Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince,
Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al.
2022. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial
intelligence. Nature neuroscience 25, 1 (2022), 116–126.
[2] William Berrios, Gautam Mittal, Tristan Thrush, Douwe Kiela, and Amanpreet
Singh. 2023. Towards Language Models That Can See: Computer Vision Through
the LENS of Natural Language. arXiv:2306.16410 [cs.CL]
[3] Allen R Braun, TJ Balkin, NJ Wesenten, Richard Ellis Carson, M Varga, Pl Baldwin,
Figure 8: Comparision of the average similarity between the S Selbie, Gregory Belenky, and Peter Herscovitch. 1997. Regional cerebral blood
ground-truth dream descriptions and the corresponding pos- flow throughout the sleep-wake cycle. An H2 (15) O PET study. Brain: a journal
of neurology 120, 7 (1997), 1173–1197.
itive and negative samples’ fMRI data. [4] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan,
Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda
Table 2: P-value of Mann-Whitney U test for positive and Askell, et al. 2020. Language models are few-shot learners. Advances in neural
information processing systems 33 (2020), 1877–1901.
negative samples. [5] Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou.
2023. Seeing beyond the brain: Conditional diffusion model with sparse masked
modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on
Dream Label skis cat people running Computer Vision and Pattern Recognition. 22710–22720.
[6] David D Cox and Robert L Savoy. 2003. Functional magnetic resonance imaging
p-value 0.062 0.0142 0.0001 (fMRI)“brain reading”: detecting and classifying distributed patterns of fMRI
activity in human visual cortex. Neuroimage 19, 2 (2003), 261–270.
[7] Oscar Esteban, Ross Blair, Christopher J. Markiewicz, Shoshana L. Berleant,
Craig Moodie, Feilong Ma, Ayse Ilkay Isik, Asier Erramuzpe, Mathias Kent,
James D. andGoncalves, Elizabeth DuPre, Kevin R. Sitek, Daniel E. P. Gomez,
research. Our approach integrates cutting-edge techniques in fMRI Daniel J. Lurie, Zhifang Ye, Russell A. Poldrack, and Krzysztof J. Gorgolewski.
Making Your Dreams A Reality: Decoding the Dreams into a Coherent Video Story from fMRI Signals† Making Your Dreams A Reality, Work in progress, 2024

2018. fMRIPrep. Software (2018). https://doi.org/10.5281/zenodo.852659 [30] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh,
[8] Oscar Esteban, Christopher Markiewicz, Ross W Blair, Craig Moodie, Ayse Ilkay Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark,
Isik, Asier Erramuzpe Aliaga, James Kent, Mathias Goncalves, Elizabeth DuPre, et al. 2021. Learning transferable visual models from natural language supervision.
Madeleine Snyder, Hiroyuki Oya, Satrajit Ghosh, Jessey Wright, Joke Durnez, In International conference on machine learning. PMLR, 8748–8763.
Russell Poldrack, and Krzysztof Jacek Gorgolewski. 2019. fMRIPrep: a robust [31] Ziqi Ren, Jie Li, Xuetong Xue, Xin Li, Fan Yang, Zhicheng Jiao, and Xinbo Gao.
preprocessing pipeline for functional MRI. Nature Methods 16 (2019), 111–116. 2021. Reconstructing seen image from brain activity by visually-guided cognitive
https://doi.org/10.1038/s41592-018-0235-4 representation and adversarial learning. NeuroImage 228 (2021), 117602.
[9] Tao Fang, Qian Zheng, and Gang Pan. 2024. Alleviating the Semantic Gap for [32] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn
Generalized fMRI-to-Image Reconstruction. Advances in Neural Information Ommer. 2022. High-resolution image synthesis with latent diffusion models. In
Processing Systems 36 (2024). Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Matteo Ferrante, Furkan Ozcelik, Tommaso Boccato, Rufin VanRullen, and Nicola 10684–10695.
Toschi. 2023. Brain Captioning: Decoding human brain activity into images and [33] Paul S Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen,
text. arXiv preprint arXiv:2305.11560 (2023). Ethan Cohen, Aidan J Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg,
[11] William David Foulkes. 1962. Dream reports from different stages of sleep. The et al. 2023. Reconstructing the Mind’s Eye: fMRI-to-Image with Contrastive
Journal of Abnormal and Social Psychology 65, 1 (1962), 14. Learning and Diffusion Priors. arXiv preprint arXiv:2305.18274 (2023).
[12] Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, and Yanwei [34] Guohua Shen, Tomoyasu Horikawa, Kei Majima, and Yukiyasu Kamitani. 2019.
Fu. 2023. MinD-3D: Reconstruct High-quality 3D objects in Human Brain. arXiv Deep image reconstruction from human brain activity. PLoS computational
preprint arXiv:2312.07485 (2023). biology 15, 1 (2019), e1006633.
[13] Jianxiong Gao, Yuqian Fu, Yun Wang, Xuelin Qian, Jianfeng Feng, and Yanwei [35] Bertrand Thirion, Edouard Duchesnay, Edward Hubbard, Jessica Dubois, Jean-
Fu. 2024. fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Baptiste Poline, Denis Lebihan, and Stanislas Dehaene. 2006. Inverse retinotopy:
Reconstruction. arXiv preprint arXiv:2409.11315 (2024). inferring the visual content of images from brain activation patterns. Neuroimage
[14] Zijin Gu, Keith Wakefield Jamison, Meenakshi Khosla, Emily J Allen, Yihan Wu, 33, 4 (2006), 1104–1116.
Ghislain St-Yves, Thomas Naselaris, Kendrick Kay, Mert R Sabuncu, and Amy [36] Bohan Zeng, Shanglin Li, Xuhui Liu, Sicheng Gao, Xiaolong Jiang, Xu Tang,
Kuceyeski. 2022. Neurogen: activation optimized image synthesis for discovery Yao Hu, Jianzhuang Liu, and Baochang Zhang. 2023. Controllable Mind Visual
neuroscience. NeuroImage 247 (2022), 118812. Diffusion Model. arXiv preprint arXiv:2305.10135 (2023).
[15] James V Haxby, M Ida Gobbini, Maura L Furey, Alumit Ishai, Jennifer L Schouten, [37] Sixiao Zheng, Jingyang Huo, Yu Wang, and Yanwei Fu. 2024. Intelligent Director:
and Pietro Pietrini. 2001. Distributed and overlapping representations of faces An Automatic Framework for Dynamic Visual Composition using ChatGPT.
and objects in ventral temporal cortex. Science 293, 5539 (2001), 2425–2430. arXiv preprint arXiv:2402.15746 (2024).
[16] J Allan Hobson, Edward F Pace-Schott, and Robert Stickgold. 2000. Dreaming
and the brain: toward a cognitive neuroscience of conscious states. Behavioral
and brain sciences 23, 6 (2000), 793–842.
[17] Charles Chong-Hwa Hong, James C Harris, Godfrey D Pearlson, Jin-Suh Kim,
Vince D Calhoun, James H Fallon, Xavier Golay, Joseph S Gillen, Daniel J Sim-
monds, Peter CM Van Zijl, et al. 2009. fMRI evidence for multisensory recruitment
associated with rapid eye movements during sleep. Human brain mapping 30, 5
(2009), 1705–1722.
[18] Tomoyasu Horikawa and Yukiyasu Kamitani. 2017. Generic decoding of seen
and imagined objects using hierarchical visual features. Nature communications
8, 1 (2017), 15037.
[19] Tomoyasu Horikawa and Yukiyasu Kamitani. 2017. Hierarchical neural represen-
tation of dreamed objects revealed by brain decoding with deep neural network
features. Frontiers in computational neuroscience 11 (2017), 4.
[20] Tomoyasu Horikawa, Masako Tamaki, Yoichi Miyawaki, and Yukiyasu Kamitani.
2013. Neural decoding of visual imagery during sleep. Science 340, 6132 (2013),
639–642.
[21] Jingyang Huo, Yikai Wang, Yun Wang, Xuelin Qian, Chong Li, Yanwei Fu, and
Jianfeng Feng. 2025. Neuropictor: Refining fmri-to-image reconstruction via
multi-individual pretraining and multi-level modulation. In European Conference
on Computer Vision. Springer, 56–73.
[22] Kendrick N Kay, Thomas Naselaris, Ryan J Prenger, and Jack L Gallant. 2008.
Identifying natural images from human brain activity. Nature 452, 7185 (2008),
352–355.
[23] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva
Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common
objects in context. In Computer Vision–ECCV 2014: 13th European Conference,
Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740–
755.
[24] Pierre Maquet, Jean-Marie Péters, Joël Aerts, Guy Delfiore, Christian Degueldre,
André Luxen, and Georges Franck. 1996. Functional neuroanatomy of human
rapid-eye-movement sleep and dreaming. Nature 383, 6596 (1996), 163–166.
[25] Karla L Miller, Fidel Alfaro-Almagro, Neal K Bangerter, David L Thomas, Essa
Yacoub, Junqian Xu, Andreas J Bartsch, Saad Jbabdi, Stamatios N Sotiropoulos,
Jesper LR Andersson, et al. 2016. Multimodal population brain imaging in the
UK Biobank prospective epidemiological study. Nature neuroscience 19, 11 (2016),
1523–1536.
[26] Milad Mozafari, Leila Reddy, and Rufin VanRullen. 2020. Reconstructing Natural
Scenes from fMRI Patterns using BigBiGAN. In 2020 International Joint Conference
on Neural Networks, IJCNN 2020, Glasgow, United Kingdom, July 19-24, 2020. IEEE,
1–8.
[27] Georg Northoff, Andrea Scalabrini, and Stuart Fogel. 2023. Topographic-dynamic
reorganisation model of dreams (TRoD)–A spatiotemporal approach. Neuroscience
& Biobehavioral Reviews 148 (2023), 105117.
[28] Furkan Ozcelik and Rufin VanRullen. 2023. Natural scene reconstruction from
fMRI signals using generative latent diffusion. Scientific Reports 13, 1 (2023),
15666.
[29] Xuelin Qian, Yun Wang, Jingyang Huo, Jianfeng Feng, and Yanwei Fu. 2023.
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject
Brain Activity Decoding. arXiv preprint arXiv:2311.00342 (2023).

You might also like