Spatiotemporal Dynamics of Object Location
Spatiotemporal Dynamics of Object Location
[Link]
To interact with objects in complex environments, we must know what they are and where they are in spite of challenging
viewing conditions. Here, we investigated where, how and when representations of object location and category emerge in the
human brain when objects appear on cluttered natural scene images using a combination of functional magnetic resonance
imaging, electroencephalography and computational models. We found location representations to emerge along the ventral
visual stream towards lateral occipital complex, mirrored by gradual emergence in deep neural networks. Time-resolved analy-
sis suggested that computing object location representations involves recurrent processing in high-level visual cortex. Object
category representations also emerged gradually along the ventral visual stream, with evidence for recurrent computations.
These results resolve the spatiotemporal dynamics of the ventral visual stream that give rise to representations of where and
what objects are present in a scene under challenging viewing conditions.
T
o interact with objects in our environments, the two arguably ventral visual cortex is known to be retinotopically organized15–17
most basic questions that our brains must answer are what and exhibits an eccentricity bias18–20.
objects are present and where they are. To address the first How can we adjudicate between these hypotheses given the
question and identify an object, we must recognize objects indepen- mixed empirical support? We propose that it is key to acknowl-
dently of the viewing conditions of a given scene, such as where the edge the importance of assessing object location representations
object is located. A large body of research has shown that the ven- under conditions that increase the complexity of the visual scene
tral visual stream1–4, a hierarchically interconnected set of regions, to increase ecological validity. Previous research typically investi-
achieves this by transforming retinal input in successive stages gated object location representations by presenting cut-out objects
marked by increasing tolerance and complexity. At its high stages in on blank backgrounds. This creates a direct mapping between the
high-level ventral visual cortex, object representations are tolerant location of visual stimulation and the active portions of retino-
to changes in retinotopic location5–7. topically organized cortex (Fig. 1b, left). In contrast, in daily life,
In contrast, we know considerably less about how the brain objects appear on backgrounds cluttered by other elements21,22.
determines where an object is located. Current empirical data imply This activates a large swath of cortex, independently of where the
three different theoretical accounts. object is (Fig. 1b, right). Whereas in the former case location infor-
One hypothesis (H1) is that object location representations are mation can be directly accessible through retinotopic activation
already present at the early stages of visual processing (H1, Fig. 1a) in early visual areas (supporting H1), in the latter case additional
and thus no further computation is required. Given the idea that processing might be required to distil out location information
ventral stream representations become successively more tolerant (supporting H2 or H3).
to changes in viewing conditions such as location1, it seems plau- Taking the importance of background into consideration, we
sible that object location representations are to be found at the used a combination of methods to distinguish between the pro-
early stages of the processing hierarchy. Consistent with this view, posed theoretical hypotheses. We used functional MRI (fMRI),
human studies using multivariate analysis have shown that object deep neural networks (DNNs) and electroencephalography (EEG)
location is often strongest in early visual cortex8,9, likely related to to assess where, how and when location representations emerge in
its small receptive field size which allows for spatial coding with the human brain. We quantified the presence of location representa-
high resolution10. tions by the performance of a multivariate pattern classifier to pre-
An alternative account (H2) is that location representations dict object location from brain measurements.
emerge in the dorsal visual stream (H2, Fig. 1a)11. This view is sup- Assessed in this way, the predictions for the hypotheses are as
ported by findings from neuropsychology2,4,11 and by studies finding follows: If H1 is correct, independent of the nature of the object’s
object location information along the dorsal pathway2,12. background, object location information peaks in early visual cor-
A third possibility is that location representations emerge tex (Fig. 1c, left), early in the DNN processing hierarchy (Fig. 1d,
through extensive processing but in the ventral visual stream (H3, left) and early during visual processing (Fig. 1e, left). For H2 and
Fig. 1a). This view receives support from the observation that object H3, the prediction of peak location information depends on the
location information was found across the entire ventral visual background. For cut-out isolated objects, location information is
stream including high-level ventral visual cortex in human5,8,9,13 and high across the entire dorsal and ventral pathways, and the pro-
non-human primates14. In line with these observations, high-level cessing hierarchy of the DNN (Fig. 1c,d, middle and right, grey).
1
Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany. 2Berlin School of Mind and Brain, Faculty of Philosophy,
Humboldt-Universität zu Berlin, Berlin, Germany. 3Department of Computer Science, Goethe Universität, Frankfurt am Main, Germany. 4Bernstein Center
for Computational Neuroscience Berlin, Berlin, Germany. ✉e-mail: [Link]@[Link]; rmcichy@[Link]
Results
To investigate where, how and when representations of object loca-
tion emerge in the brain, we created a visual stimulus set (Fig. 2a) a
H1: early visual cortex H2: dorsal stream H3: ventral stream
with the three orthogonal factors objects (three exemplars each in
four object categories), locations (four quadrants) and backgrounds
Hypotheses
(three kinds: uniform grey, low- and high-cluttered natural scenes,
referred to as ‘no’, ‘low’ and ‘high’ clutter). Collapsing across exem-
plars, we used a fully crossed design with four categories × four
locations × three background conditions, resulting in 48 stimulus
conditions. This design allowed us to also investigate representa- Location information: high Low
Fig. 1 | Hypotheses and predictions about the pathway of object location Artificial laboratory Real world
Background
representations in the human brain. a, H1: representations of object
location emerge in early visual cortex and degrade along further processing c
H1 H2 H3
stages. H2 and H3: object location representations emerge gradually
along the dorsal (H2) or ventral (H3) visual stream. b, Left: when objects
are presented on a blank background, object location in the visual field
maps retinotopically onto early visual cortex, allowing for direct location
Space (fMRI)
a
Exemplar Left up Right up Example condition
Cars Animals
No
1 4
clutter
3°
Category
3° 4,2°
Low
Faces
clutter
2 3
Chairs
High
Left bottom Right bottom clutter
b
0.5 s
0.5 s
Fig. 2 | Experimental design and tasks. a, Experimental design. We used a fully crossed design with factors of object category, location and background.
Note that, for copyright reasons, all example backgrounds shown are for illustrative purposes and were not used in the experiment. b, Tasks. The
experimental design was adapted to the specifics of each modality by adjusting the interstimulus interval. On each trial, participants viewed images for
500 ms followed by a blank interval (0.5–0.6 s in EEG, 2.5 s in fMRI). The task was to respond with button press to catch trials that were presented on
every fourth trial on average. Catch trials were marked by the presence of a probe (glass) in the EEG experiment and by an image repetition (one-back) in
the fMRI experiment. Image presentation was followed by blank screen (1 s in EEG, 2.5 s in fMRI).
accuracy quantifies object location information independent of objects were presented on cluttered backgrounds, location informa-
object category. This procedure was performed in a space-resolved tion emerged along the ventral visual processing hierarchy with less
fashion for fMRI and in a time-resolved fashion for EEG (see information in early visual areas than in LOC (Fig. 3b, green and
Supplementary Fig. 1a for details). blue bars; N = 25, 5 × 3 repeated-measures ANOVA, post hoc t tests
Tukey corrected; see Supplementary Table 2 for P values). These
The locus of object location representations. To determine the results are at odds with H1, which predicts that location informa-
locus of object location representations, we used a regions of inter- tion decreases along the ventral stream independent of background
est (ROI) fMRI analysis, including early visual regions (V1, V2 condition. Instead, the observed increase of location information
and V3) shared to the hierarchy of the ventral (V4 and LOC23) and along the ventral visual stream with cluttered backgrounds is con-
the dorsal visual stream (intraparietal sulcus: IPS0, IPS1, IPS2 and sistent with H3.
superior parietal lobule (SPL)). We ascertained these observations statistically with a 5 × 3
As expected, we found that most regions contained above-chance repeated-measures ANOVA with factors ROI (V1, V2, V3, V4 and
level location information in all background clutter conditions (Fig. LOC) and background (no, low and high clutter). Besides both
3b; N = 25, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR main effects (ROI: F(4,96) = 18.30, P < 0.001, partial η2 = 0.43; back-
corrected; see Supplementary Table 1 for P values). However, the ground: F(1.44,34.48) = 64.11, P < 0.001, partial η2 = 0.73), we crucially
amount of location information depended critically on the brain found the interaction to be significant (F(8,192) = 5.40, P < 0.001, par-
region and background condition. tial η2 = 0.18). As the interaction makes the main effects difficult
Focusing on the ventral visual stream first, we observed similar to interpret, we conducted post hoc paired t tests (all reported in
amounts of location information across regions when objects were Supplementary Table 2, Tukey corrected). The statistical analysis
presented without clutter (Fig. 3b, grey bars). In contrast, when confirmed all the qualitative observations: There were no significant
versus
Classification accuracy –
15 10 9
0 0 0
versus
b 40
d
No clutter No clutter
Low clutter Low clutter
35 High clutter High clutter
Classification accuracy – chance level (%)
30
50
Fig. 3 | fMRI results of location classification. a, Classification scheme for object location across category. We trained an SVM to distinguish between
brain activation patterns evoked by objects of a particular category presented at two locations (here: faces bottom left and right) and tested the SVM
on activation patterns evoked by objects of another category (here: animals) presented at the same locations. Objects are enlarged for visibility and did
not extend into another quadrant in the original stimuli. b, Location classification in early visual cortex, ventral and dorsal visual ROIs (N = 25, two-tailed
Wilcoxon signed-rank test, P < 0.05, FDR corrected). With no clutter, location information was high across early visual cortex and ventral ROIs. In the
low- and high-clutter conditions, location representations emerged gradually along the ventral stream. In dorsal ROIs, location information was low,
independent of background condition. Stars above bars indicate significance above chance (see Supplementary Tables 1, 2 and 3 for P values). Error bars
represent s.e.m. Dots represent single subject data. c, fMRI searchlight result for classification of object location (N = 25, two-tailed Wilcoxon signed-rank
test, P < 0.05, FDR corrected). Peak classification accuracy is indicated by colour-coded circles (no clutter: left V3 (grey, XYZ coordinates −19 mm,
−97 mm, 13 mm); low clutter: left V1 (green, −5 mm, −86 mm, −3 mm); high clutter: left LOC (blue, −44 mm, −83 mm, 8 mm)). Millimetres (mm)
indicate axial slice position along z axis in Montreal Neurological Institute space. d, Location classification in a DNN. In the high-clutter condition, location
information emerged along the processing hierarchy, analogous to the ventral visual stream.
differences between ROIs in the no-clutter condition, except visual cortex than in dorsal regions (N = 25, post hoc t tests, Tukey
between V2 and V3 (P = 0.009) and between V2 and V4 (P = 0.001). corrected; see Supplementary Table 3 for P values). This is inconsis-
There was more location information in LOC than in V1, V2 and tent with H2, which predicts an increase of object location informa-
V3 when background clutter (both low and high) was present than tion along the dorsal stream.
when it was not (Fig. 3b; all P < 0.03, see Supplementary Table 2 Consistent with these qualitative observations, statistical testing
for P values, Tukey corrected). This effect was robust for the com- by 7 × 3 repeated-measures ANOVA with factors ROI (V1, V2, V3,
parison of locations across, but not within, visual hemifields (Fig. IPS0, IPS1, IPS2 and SPL) and background (no, low and high clut-
4a,b): post hoc tests comparing early visual areas versus LOC in the ter) did not provide statistical evidence for H2. We found significant
high-clutter condition were significant for the cross-hemifield clas- main (ROI: F(3.16,75.93) = 36.2, P < 0.001, partial η2 = 0.60; background:
sification (Fig. 4a; V1: P = 0.003; V2: P < 0.001; V3: P = 0.004, Tukey F(2,48) = 35.8, P < 0.001, partial η2 = 0.60) and interaction effects
corrected), but not for the within-hemifield classification (Fig. 4b; (F(6.25,149.89) = 14.5, P < 0.001, partial η2 = 0.38). The post hoc tests
V1: P = 0.697; V2: P = 0.281; V3: P = 1.00, Tukey corrected). showed that location information was higher in V1, V2 and V3
Focusing next on the dorsal visual stream, we observed low compared with dorsal regions in the no- and low-clutter conditions
object location information independent of background condition (Fig. 3b, grey and green, except V1 and V2 versus IPS2 and SPL with
(Fig. 3b; N = 25, 7 × 3 repeated-measures ANOVA). In the no- and low clutter, which were n.s.; see Supplementary Table 3 for P val-
low-clutter conditions, location information was higher in early ues). With high clutter, there was more location information in V3
a b
45 45
No clutter No clutter
Low clutter Low clutter
Cross-hemifield classification accuracy – chance level (%) 40 40
35 35
30 30
25 * 25
*
* *
20 20 *
* * *
* *
15 15
* * *
* * *
10 * **
* * 10
* * * * *
* * ** *
5 ** 5 * *
* * *
* *
0 0
–5 –5
V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL
ROI ROI
c d
40 No clutter No clutter
1.2
Low clutter Low clutter
35 High clutter High clutter
Classification accuracy – chance level (%)
1
30 *
* **
** *** * * *** *** *** *** ***
25 *
0.8
20
t value
0.6
15
10 0.4
5
0.2
0
–5 0
IPS3 IPS4 IPS5 V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL
ROI ROI
Fig. 4 | Location classification within and across hemifields, in IPS3–5 and univariate ROI results. a, Results of location classification across
categories between visual hemifields (left up versus right up, left bottom versus right bottom). Similar to the classification across four locations, the
repeated-measures ANOVA along the ventral stream (five ROIs × three clutter levels) yielded significant main (ROI: F(4,96) = 24.62, P < 0.001, partial
η2 = 0.51; background: F(1.49,35.85) = 45.34, P < 0.001, partial η2 = 0.65) and interaction effects (F(8,192) = 2.95, P = 0.004, partial η2 = 0.11). Post hoc tests
yielded results comparable to the main results (V1, V2 and V3 < LOC with high clutter). Stars above bars indicate significance above chance (N = 25,
two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected). b, Location classification across categories within visual hemifields (left up versus
left bottom, right up versus right bottom). As for the main analysis, the ANOVA yielded significant main (ROI: F(4,96) = 4.16, P = 0.004, partial η2 = 0.15;
background: F(1.60,38.43) = 57.90, P < 0.001, partial η2 = 0.71) and interaction effects (F(8,192) = 5.84, P < 0.001, partial η2 = 0.20). The post hoc tests revealed a
significant difference between V3 and LOC in the noclutter condition (P = 0.030). Stars above bars indicate significance above chance (N = 25, two-tailed
Wilcoxon signed-rank test, P < 0.05, FDR corrected). c, Classification accuracies in IPS3, IPS4 and IPS5 were not significantly higher than chance level
in all background conditions (N = 25, two-sided Wilcoxon signed-rank test, P > 0.05, FDR corrected). Error bars represent s.e.m. Dots represent single/
subject data. d, Absolute t values in each background condition and ROI, averaged across locations and categories. A 9 × 3 repeated-measures ANOVA
with factors ROI and clutter revealed a significant main effect of ROI (F(2.60,62.43) = 9.19, P < 0.001, partial η2 = 0.18) and a significant interaction effect
(F(3.40,81.64) = 9.89, P < 0.001, partial η2 = 0.03). Significant post hoc tests are listed in Supplementary Table 4. Overall, post hoc tests showed no clear
pattern of results between early, ventral and dorsal areas, except for higher activation in V1 than in dorsal areas and LOC with no clutter. Stars above bars
indicate significance above chance (N = 25, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected).
than in IPS0, IPS1 and IPS2. Location classification in IPS3, IPS4 responses were comparable across regions overall (Fig. 4d). Post
and IPS5 did not reveal significant information above chance level hoc tests to a 9 × 3 repeated-measures ANOVA with factors ROI
(Fig. 4c; N = 25, two-tailed Wilcoxon signed-rank test, all P > 0.05 (V1, V2, V3, V4, LOC, IPS0, IPS1, IPS2 and SPL) and background
FDR corrected, see Supplementary Table 1 for P values). Univariate (no, low and high clutter) revealed that responses were significantly
a b c
No clutter No clutter
25
Classification accuracy – chance level (%)
15
350 *
317 ms
300
5 200
150
0
100
Classification accuracy –
–5 50 chance level (%)
d e f No clutter
High clutter
0.6
Left bottom Right bottom 600
Training set, no clutter
versus time *
400
t 4 *
Spearman’s R
0.2
*
200 0
2
Testing set, high clutter
Test
time
t –0.2
versus t+1 0
t+2 0
–0.4
t+n 0 200 400 600
Test time (ms), high clutter
V1 V2 V3 V4 LOC
Ventral stream ROI
Fig. 5 | Temporal dynamics of object location representations. a, Results of time-resolved location classification across category from EEG data. Results
are colour coded by background condition, with significant time points indicated by lines below curves (N = 27, two-tailed Wilcoxon signed-rank test,
P < 0.05, FDR corrected), 95% CI of peak latencies indicated by lines above curves. Shaded areas around curves indicate s.e.m. Inset text at arrows
indicates peak latency (140 ms, 133 ms and 317 ms in the no-, low- and high-clutter condition, respectively). b, Comparison of peak latencies of curves in
a. Error bars represent 95% CI. Stars indicate significant peak latency differences (P < 0.05; N = 27, bootstrap test with 10,000 bootstraps). c, Results of
location across category classification searchlight in EEG channel space at peak latencies in no-, low- and high-clutter condition, down-sampled to 10 ms
steps. Significant electrodes are marked in grey (N = 27, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected across electrodes and time points).
d, Time generalization analysis scheme for classifying object location across category and background condition. The classification scheme was the same
as in a with the differences that (i) the training set conditions always came from the no-clutter while the testing set conditions came from the high-clutter
condition and (ii) training and testing was repeated across all combinations of time points for a peri-stimulus time window between −100 and 600 ms
(see Supplementary Fig. 1b for details). Objects are enlarged for visibility and did not extend into another quadrant in the original stimuli. e, Results of the
time generalization analysis. Dashed black lines indicate stimulus onset; oblique black line highlights the diagonal. Solid white outlines indicate significant
time points (N = 27, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected). Dashed white outline highlights delayed clusters. f, EEG–fMRI fusion.
Results represent the correlations between single-subject fMRI RDVs of classification accuracies and group-averaged RDVs of the EEG peaks in a. Stars
above bars indicate significance above chance (N = 27, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected). Error bars represent the s.e.m. Dots
represent single-subject data points.
signed-rank test, P < 0.05, FDR corrected). While the ‘change’ in a supplementary analysis on the group-averaged peak in Fig. 5e
hypothesis predicts highest classification accuracies on the diago- (Euclidean distance 49.50 ms; 10,000 bootstraps; one-tailed boot-
nal, the ‘delay’ hypothesis predicts highest classification accuracies strap test against zero, P = 0.010; 95% CI 14.14–77.78). Classification
below the diagonal. The results are reported in Fig. 5e. We found that accuracies were significantly higher below than above the diagonal
peak latencies in location information as tested across subjects were between ~120 and 240 ms in the no-clutter condition and from
significantly shifted below the diagonal (mean Euclidean distance ~200 ms in the high-clutter condition (N = 27, two-tailed Wilcoxon
56.31 ms; N = 27, two-tailed Wilcoxon signed-rank test, P < 0.001, signed-rank test, P < 0.05, FDR corrected; Supplementary Fig. 6b).
r = 0.65, s.e.m. 1.55; see Supplementary Fig. 6a for single-subject Together, these results provide evidence for the ‘delay’ hypothesis
peaks), indicating that location representations in the no-clutter and demonstrate that object location representations in the no- and
condition generalized to the high-clutter condition at later time the high-clutter condition emerge at the same processing stage with
points (Fig. 5e, white dashed outline). This result was confirmed a temporal delay.
a b
Faces Animals
Training set
left bottom
versus No clutter
Low clutter
High clutter
20
right bottom
Testing set
c 10
15 No clutter
Low clutter
Classification accuracy – chance level (%)
–5
–5
V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL
Time (ms)
d * e
*
300
No clutter Low clutter High clutter
250
Peak latency (ms)
Classification accuracy –
200 No clutter chance level (%)
Low clutter
150 High clutter –10 –6 –2 2 6 10
100
50
f g
600
Classification accuracy – chance level (%)
Faces Animals
2
Training set
Train 400
versus time
t
0
Test 200
time
right bottom
Testing set
t
versus t+1
t+2
0 –2
t+n
Fig. 6 | Spatial and temporal dynamics of object category representations. a, Classification scheme of category across location. b, Location-tolerant
category representations in the ventral and dorsal streams. Stars indicate classification above chance level (two-tailed Wilcoxon signed-rank test,
P < 0.05, FDR corrected). Conventions as in Fig. 3b. c, Results of the time-resolved category classification across locations from EEG activation patterns.
Conventions and statistics as in Fig. 5a. d, Peak latencies of curves in c. Statistics and conventions as in Fig. 5b. e, Results of searchlight in EEG channel
space at peak latencies in no-, low- and high-clutter condition, down-sampled to 10 ms steps. Significant electrodes are marked in grey (N = 27, two-tailed
Wilcoxon signed-rank test, P < 0.05, FDR corrected across electrodes and time points). f, Time generalization analysis scheme for classifying object
category across location and background condition. g, Results of the time generalization analysis (N = 27, two-tailed Wilcoxon signed-rank test, P < 0.05,
FDR corrected). Conventions as in Fig. 5e.
receptive field properties for the eccentricities used in this study59,60, interact with objects in our environment. Our results reveal the
which allows it to encode object location on clutter better than other basis of this knowledge by revealing representations of location and
high-level ventral ROIs. These questions need more investigation in category in the human brain when viewing conditions are challeng-
future research. ing, as encountered outside of the laboratory. Both object location
Our empirical findings were reinforced by the observation that and category representations emerge along the ventral visual stream
representations of object location emerge in DNNs in a similar way towards LOC and depend on recurrent processing. Together, our
as they emerge in the human brain. Importantly, the DNNs used results provide a spatiotemporally resolved account of object vision
here were trained on object categorization and not localization. Our in the human brain when viewing conditions are cluttered.
results thus show that representations of object properties for which
the network is not optimized can emerge in such networks14. One Methods
limitation of our approach is that the models used here were specifi- Participants in EEG and fMRI experiments. The experiment was approved
cally designed to model the ventral visual stream25–30, even though by the ethics committee of the Department of Education and Psychology of the
Freie Universität Berlin (ethics reference number 104/2015) and was conducted
they have been shown to predict brain responses in the dorsal in accordance with the Declaration of Helsinki. Twenty-nine participants
stream, too32,33. Therefore, the presented modelling results cannot participated in the EEG experiment, of whom two were excluded because
distinguish between H2 and H3. Future studies could compare loca- of equipment failure (N = 27, mean age 26.8 years, s.d. 4.3 years, 22 female).
tion representations in DNNs that model the dorsal versus the ven- Twenty-five participants (mean age 28.8 years , s.d. 4.0 years, 17 female) completed
the fMRI experiment. The participant pools of the experiments did not overlap
tral stream and investigate how the model’s representations relate to
except for two participants. Sample size was chosen to exceed comparable
brain representations in the two streams. magnetoencephalography, EEG and fMRI classification studies to enhance
The time-resolved EEG analyses and the EEG–fMRI fusion power8,9,43,66–68. All participants had normal or corrected-to-normal vision and
analysis38 revealed together that location representations of objects no history of neurological disorders. All participants provided informed
with high clutter were delayed due to a temporal shift within the consent prior to the studies and received a monetary reward or course credit
for their participation.
same processing stage in LOC. Since temporal delays at the same
processing stage cannot be explained purely by a feedforward Experimental design. To enable us to investigate the representation of object
neural architecture, this indicates the involvement of recurrence. location, category and background independently, we used a fully crossed design
Physiologically, this might be implemented via lateral connections with factors of category (four values: animals, cars, faces and chairs; Fig. 2a, left,
within LOC, resulting in slower information accumulation61,62. with three exemplars per category), location (four values: left up, left bottom, right
Furthermore, we found not only location but also object category up and right bottom; Fig. 2a left centre) and background clutter (three values:
no, low and high clutter; Fig. 2a, right centre). This amounted to 144 individual
representations to be delayed when objects were superimposed on condition combinations (12 object exemplars × 4 locations × 3 background clutter
natural scenes. Together with previous reports that object category levels). We analysed the data at the level of category, effectively resulting in 48
processing can be delayed when objects are degraded, occluded or experimental conditions (4 categories × 4 locations × 3 background clutter levels).
are hard to categorize44–46,48, our results add to the emergent view
that recurrent computations are critically involved in the process- Stimulus set generation. The stimulus material was created by superimposing
three-dimensional (3D) rendered objects (Fig. 2a, left) with Gouraud shading
ing of fundamental object properties such as what objects are62 and
in one of four image locations (Fig. 2a, left centre) onto images of real-world
where they are in real world vision. Future studies could provide backgrounds (Fig. 2a, right centre).
more direct evidence for recurrence by manipulating it experi- In detail, in each category, one of the objects was rotated by 45°, one by
mentally, for example, by adding a masking condition to the study 22.5° and the third by −45° with respect to the frontal view to introduce equal
design used here. variance in the viewing angle for each category. Locations were in the four
quadrants of the screen (Fig. 2a, left centre). Expressing locations in degrees of
We find that both object category and object location representa- visual angle, the object’s centre was 3° visual angle away from the vertical and
tions emerged gradually along the ventral visual stream. This might horizontal central midlines (that is, 4.2° from image centre; Fig. 2a, right). The
seem counter-intuitive, given that transformations that lead to the size of the objects was adjusted so that all of them fitted into one quadrant of the
emergence of category representations in LOC have been linked to aperture, while maintaining a similar size (mean (s.d.) size: vertical, 2.4° (0.4°);
building increasing tolerance to viewing conditions, in particular horizontal, 2.2° (0.6°)).
We used backgrounds with three different clutter levels: no, low and high
to changes in object location5–7. However, this apparent contradic- (Fig. 2a, right centre; note that example backgrounds shown here are for illustrative
tion is qualified by the observation that the observed tolerance to purposes and were not used in the experiment. The original stimulus material
changes in viewing conditions is graded rather than absolute63, mir- is available for download together with the data). We defined clutter as the
rored by the presence of cells in high-level ventral visual cortex with organization and quantity of objects that fill up a visual scene69. In the no-clutter
large overlapping receptive fields10,17. Such tuning properties provide condition, the background was uniform grey. In the low- and the high-clutter
condition, we selected a set of 60 natural scene images each from the Places365
the spatial resolution needed for localization64, while also providing database ([Link] that had low or high
robustness to location translation65, needed for object categorization. clutter, respectively, and did not contain objects of the categories defined in our
In this study, we deliberately avoided congruence between objects experimental design (that is, no animals, cars, faces or chairs). We converted the
and backgrounds, which is known to lead to interaction effects with images to greyscale and superimposed a circular aperture of 15° visual angle. The
category processing40. However, this deviation from normality in visual angle was the same in the EEG and fMRI experiments.
We confirmed that our selection of low- and high-clutter images was
our stimulus set might have triggered mismatch responses that lead appropriate by an independent behavioural rating experiment (N = 10) in
to additional recurrent processing for disambiguation or attentional which participants rated clutter level on a scale from 1 to 6 (mean (s.d.) clutter
responses triggered by atypical object appearance (for example, size image rating: low clutter, 2.52 (0.85); high clutter, 5.04 (0.87); the difference was
and texture). Further, because objects and backgrounds did not significant: N = 10, paired-sample t test, P < 0.0001, t = 14.96).
form a coherent scene, objects and backgrounds might have been From the set of 60 low- and high-clutter images, we selected 48, one for
each experimental condition of our experimental design. We then randomly
represented more independently. Another design limitation is that paired objects to background images to avoid systematic congruencies between
we constrain the number of locations to four to fully cross all stimu- backgrounds and objects. This was done for each of the 20 runs of the EEG
lus conditions while maintaining a feasible session duration. Future experiment and for the 10 runs of the fMRI experiment. This resulted in 144
research will have to establish whether congruent versus incongru- individual images per run, one for each condition (that is, 12 object exemplars
× 4 locations × 3 background clutter levels). The remaining set of 12 low-
ent scene–object pairings yield different location representations
and high-clutter images was used separately to create catch trials in the EEG
on cluttered backgrounds and whether our results generalize to experiment (see details below).
more locations.
What an object is and where an object is are arguably the two Experimental procedures. fMRI main experiment. Each participant completed
most fundamental properties that we need to know to be able to one fMRI recording session consisting of ten runs (run duration 552 s), resulting
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis For the data preprocessing and analysis we used the following software: MATLAB R2018b, EEGLAB toolbox (version 14), SASICA plugin for
EEGLAB, LIBSVM-3.11, SPM8 toolbox, CoSMoMVPA toolbox.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
April 2020
The experimental stimuli used in this study, the fMRI and EEG data as well as neural network activations are publicly available via [Link]
view_only=21a714db58584ffeb2837fc0548bf659.
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see [Link]/documents/[Link]
Research sample 29 participants participated in the EEG experiment of which two were excluded because of equipment failure (N=27, mean age 26.8
years, SD=4.3, 22 female). 25 participants (mean age 28.8, SD=4.0, 17 female) completed the fMRI experiment. The participant pools
of the experiments did not overlap except for two participants. All participants provided informed consent prior to the studies and
received a monetary reward or course credit for their participation.
Sampling strategy Participants were selected according to the following requirements: 18-40 years old, with normal or corrected-to-normal vision,
fulfillment of the MR security criteria (no implants or metal parts, tattoos, non-removable piercing, claustrophobia, pregnancy,
neurological disorders, etc.).
Sample size was chosen to exceed comparable M/EEG and fMRI classification studies to enhance power.
Data collection During both experiments, participants' responses were recorded with a computer, while the ongoing brain activity during the task
was recorded using the 3T fMRI scanner (experiment 1) and the EEG (experiment 2). No one was present in the room together with
the participants during the experiments. Blinding to the experimental conditions or the study hypothesis was not possible, but data
was analyzed using a single pipeline for all subjects.
Timing 1) fMRI experiment: the data collection started February 2019 and ended in March 2019. 2) EEG experiment: the data collection
started in May 2017 and ended in November 2017, with a short gap from July to September 2017 for data analysis.
Data exclusions 1) No participants were excluded in the fMRI experiment. 2) Two participants were excluded in the EEG experiment because of
equipment failure.
Recruitment Participants were recruited using the mailing lists for study participation of the psychology program, of the cognitive
2
Recruitment neuroscience program and of the medical studies program from the following Berlin universities: Freie Universität Berlin,
Humboldt Universität zu Berlin, Charite.
Design specifications Each participant completed one fMRI recording session consisting of 10 runs (run duration: 552 s), resulting in 92
minutes of fMRI recording of the main experiment. During each run, each of the 144 images of the stimulus set was
shown once (regular trials). Image duration was 0.5 s, with a 2.5 s inter-stimulus-interval (ISI). Regular trials were
interspersed every 3rd to 5th trial (equally probable, in total 36 per run) with catch trials. Catch trials repeated the
image shown on the previous trial. Participants were instructed to respond with a button press to these repetitions (i.e.
a one-back task).
Behavioral performance measures Button presses and response times were recorded for each subject during the experiment. Responses were recorded to
ensure that participants were directing their attention towards the stimuli. Response trials were excluded from analysis.
Acquisition
Imaging type(s) functional and structural MRI
Sequence & imaging parameters We acquired functional images covering the entire brain using a T2*-weighted gradient-echo planar sequence (TR=2,
TE=30 ms, 70° flip angle, 3-mm3 voxel size, 37 slices, 20% gap, 192-mm field of view, 64 × 64 matrix size, interleaved
acquisition).
Preprocessing
Preprocessing software We preprocessed fMRI data using SPM8. This involved realignment, coregistration and normalization to the structural MNI
template brain. FMRI data from the localizer was smoothed with an 8 mm FWHM Gaussian kernel, but the main experiment
data was left unsmoothed.
Normalization The normalization method applied on all functional brain data was non-linear. We entered the subject specific T1 structural
image as source image and the MNI standard T1 provided in the SPM8 toolbox as template image.
Normalization template We used the T1 template in MNI space provided in the SPM8 toolbox.
Noise and artifact removal To remove movement artifacts from the fMRI time-series, we realigned the functional brain images in SPM8 using default
parameters. In the GLM, movement parameters were entered as nuisance regressors. We applied no artifact removal for
heart rate and respiration.
Effect(s) tested Whole-brain: for all voxels, we tested whether classification accuracies significantly exceeded chance level. This was done
separately for three background conditions (no, low and high background clutter).
ROI: using a repeated-measures ANOVA with a 5×3 design, we tested for the interaction between 5 regions-of-interest in the
ventral stream (V1, V2, V3, V4, LOC) and 3 background conditions (no, low and high cluttered backgrounds).
April 2020
Another repeated measures ANOVA with 7 ×3 design tested the interaction between 7 regions-of-interest in the dorsal
stream (V1,V2,V3,IPS0,IPS1,IPS2,SPL) and 3 background conditions (no, low and high cluttered backgrounds).
When the assumption of sphericity was violated, the degrees of freedom were corrected using the Greenhouse-Geisser
estimates of sphericity.
3
We first defined ROIs in early visual cortex (V1, V2, V3), in the ventral stream (V4, LOC) and in the dorsal
Statistic type for inference We tested whether classification accuracies significantly exceeded chance-level. This was done per ROI and in the the whole-
(See Eklund et al. 2016) brain searchlight it was done voxel-wise. In both cases we tested this with non-parametric, two-tailed Wilcoxon signed rank
tests. In each case the null hypothesis was that the observed classification accuracies came from a distribution with a median
of chance level performance (i.e., 50% for pairwise classification).
Correction The P-values resulting from the Wilcoxon signed rank tests were corrected for multiple comparisons using false discovery
rate at 5% level under the assumption of independent or positively correlated tests.
Multivariate modeling and predictive analysis For the ROI-based analysis, for each ROI separately we extracted and arranged t-values into pattern vectors
for each of the 48 conditions and 10 runs. To increase the SNR, we randomly binned run-wise pattern
vectors into five bins of two runs which were averaged, resulting in five pseudo-run pattern vectors. We then
performed 5-fold leave-one-pseudo-run-out-cross validation. In detail, we assigned four pseudo-trials per
location condition of the same category to the training set. We then tested the SVM on one pseudo-trial for
each of the same two location conditions, but now from a different category yielding percent classification
accuracy (50% chance level) as output. Equivalent SVM training and testing was repeated for all
combinations of location and category pairs before results were averaged. The result reflects how much
category-tolerant location information was present for each ROI, participant and background condition
separately.
The searchlight procedure was conceptually equivalent to the ROI-based analysis with the difference of the
selection of voxel patterns entering the analysis.
April 2020