0% found this document useful (0 votes)
24 views20 pages

Spatiotemporal Dynamics of Object Location

This study investigates the spatiotemporal dynamics of object location and category representations in the human brain using fMRI, EEG, and computational models. The findings indicate that object location representations emerge gradually along the ventral visual stream, particularly in the lateral occipital complex, and involve recurrent processing, especially under cluttered viewing conditions. The results provide insights into how the brain processes object information in complex environments, resolving debates about the pathways involved in visual processing.

Uploaded by

oh3y3y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views20 pages

Spatiotemporal Dynamics of Object Location

This study investigates the spatiotemporal dynamics of object location and category representations in the human brain using fMRI, EEG, and computational models. The findings indicate that object location representations emerge gradually along the ventral visual stream, particularly in the lateral occipital complex, and involve recurrent processing, especially under cluttered viewing conditions. The results provide insights into how the brain processes object information in complex environments, resolving debates about the pathways involved in visual processing.

Uploaded by

oh3y3y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Articles

[Link]

The spatiotemporal neural dynamics of object


location representations in the human brain
Monika Graumann 1,2 ✉, Caterina Ciuffi1, Kshitij Dwivedi , Gemma Roig
1,3 3
and
1,2,4 ✉
Radoslaw M. Cichy

To interact with objects in complex environments, we must know what they are and where they are in spite of challenging
viewing conditions. Here, we investigated where, how and when representations of object location and category emerge in the
human brain when objects appear on cluttered natural scene images using a combination of functional magnetic resonance
imaging, electroencephalography and computational models. We found location representations to emerge along the ventral
visual stream towards lateral occipital complex, mirrored by gradual emergence in deep neural networks. Time-resolved analy-
sis suggested that computing object location representations involves recurrent processing in high-level visual cortex. Object
category representations also emerged gradually along the ventral visual stream, with evidence for recurrent computations.
These results resolve the spatiotemporal dynamics of the ventral visual stream that give rise to representations of where and
what objects are present in a scene under challenging viewing conditions.

T
o interact with objects in our environments, the two arguably ventral visual cortex is known to be retinotopically organized15–17
most basic questions that our brains must answer are what and exhibits an eccentricity bias18–20.
objects are present and where they are. To address the first How can we adjudicate between these hypotheses given the
question and identify an object, we must recognize objects indepen- mixed empirical support? We propose that it is key to acknowl-
dently of the viewing conditions of a given scene, such as where the edge the importance of assessing object location representations
object is located. A large body of research has shown that the ven- under conditions that increase the complexity of the visual scene
tral visual stream1–4, a hierarchically interconnected set of regions, to increase ecological validity. Previous research typically investi-
achieves this by transforming retinal input in successive stages gated object location representations by presenting cut-out objects
marked by increasing tolerance and complexity. At its high stages in on blank backgrounds. This creates a direct mapping between the
high-level ventral visual cortex, object representations are tolerant location of visual stimulation and the active portions of retino-
to changes in retinotopic location5–7. topically organized cortex (Fig. 1b, left). In contrast, in daily life,
In contrast, we know considerably less about how the brain objects appear on backgrounds cluttered by other elements21,22.
determines where an object is located. Current empirical data imply This activates a large swath of cortex, independently of where the
three different theoretical accounts. object is (Fig. 1b, right). Whereas in the former case location infor-
One hypothesis (H1) is that object location representations are mation can be directly accessible through retinotopic activation
already present at the early stages of visual processing (H1, Fig. 1a) in early visual areas (supporting H1), in the latter case additional
and thus no further computation is required. Given the idea that processing might be required to distil out location information
ventral stream representations become successively more tolerant (supporting H2 or H3).
to changes in viewing conditions such as location1, it seems plau- Taking the importance of background into consideration, we
sible that object location representations are to be found at the used a combination of methods to distinguish between the pro-
early stages of the processing hierarchy. Consistent with this view, posed theoretical hypotheses. We used functional MRI (fMRI),
human studies using multivariate analysis have shown that object deep neural networks (DNNs) and electroencephalography (EEG)
location is often strongest in early visual cortex8,9, likely related to to assess where, how and when location representations emerge in
its small receptive field size which allows for spatial coding with the human brain. We quantified the presence of location representa-
high resolution10. tions by the performance of a multivariate pattern classifier to pre-
An alternative account (H2) is that location representations dict object location from brain measurements.
emerge in the dorsal visual stream (H2, Fig. 1a)11. This view is sup- Assessed in this way, the predictions for the hypotheses are as
ported by findings from neuropsychology2,4,11 and by studies finding follows: If H1 is correct, independent of the nature of the object’s
object location information along the dorsal pathway2,12. background, object location information peaks in early visual cor-
A third possibility is that location representations emerge tex (Fig. 1c, left), early in the DNN processing hierarchy (Fig. 1d,
through extensive processing but in the ventral visual stream (H3, left) and early during visual processing (Fig. 1e, left). For H2 and
Fig. 1a). This view receives support from the observation that object H3, the prediction of peak location information depends on the
location information was found across the entire ventral visual background. For cut-out isolated objects, location information is
stream including high-level ventral visual cortex in human5,8,9,13 and high across the entire dorsal and ventral pathways, and the pro-
non-human primates14. In line with these observations, high-level cessing hierarchy of the DNN (Fig. 1c,d, middle and right, grey).

1
Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany. 2Berlin School of Mind and Brain, Faculty of Philosophy,
Humboldt-Universität zu Berlin, Berlin, Germany. 3Department of Computer Science, Goethe Universität, Frankfurt am Main, Germany. 4Bernstein Center
for Computational Neuroscience Berlin, Berlin, Germany. ✉e-mail: [Link]@[Link]; rmcichy@[Link]

796 Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav


NaTure Human BehaviOur Articles
In contrast, for objects appearing on cluttered backgrounds, object imaging modality (Fig. 2b). On each trial, participants viewed
location information emerges late in the DNN hierarchy (Fig. 1d, individual stimuli while fixating on a central fixation cross and
right, blue) and late in time (Fig. 1e, middle and right, blue). H2 performing a one-back (fMRI) or a detection task (EEG) to direct
and H3 differ in predicting location information to peak in dorsal participants’ attention towards the images (Fig. 2b). Response trials
(Fig. 1c, middle, blue) or ventral visual cortex (Fig. 1c, right, blue), were excluded from analysis.
respectively. We used multivariate pattern classification to track the emer-
To anticipate, our results strongly support H3. When objects gence of object location representations. We consider the peaks in
appear on cluttered backgrounds, object location representations information, that is, in classification, as indicators of where (fMRI)
emerge late in the hierarchy of the ventral visual stream and of the and when (EEG) location representations become most untangled
DNN, as well as late in time, indicating recurrent processing. A and are thus explicitly represented1. In each case, we trained a sup-
corresponding analysis of object category representations revealed port vector machine (SVM) to pairwise classify between activation
an equivalent pattern of results with emergence along the ventral patterns belonging to one object category shown at two different
visual stream and temporal dynamics suggesting recurrence. Taken locations (Fig. 3a, faces at bottom left and right). We then tested the
together, our results resolve where, when and how object represen- SVM on activation patterns of the same locations with a new object
tations emerge in the human brain when objects are viewed under category (Fig. 3a, animals at bottom left and right). Repeated for all
more challenging viewing conditions. combinations of locations and categories, the averaged classification

Results
To investigate where, how and when representations of object loca-
tion emerge in the brain, we created a visual stimulus set (Fig. 2a) a
H1: early visual cortex H2: dorsal stream H3: ventral stream
with the three orthogonal factors objects (three exemplars each in
four object categories), locations (four quadrants) and backgrounds

Hypotheses
(three kinds: uniform grey, low- and high-cluttered natural scenes,
referred to as ‘no’, ‘low’ and ‘high’ clutter). Collapsing across exem-
plars, we used a fully crossed design with four categories × four
locations × three background conditions, resulting in 48 stimulus
conditions. This design allowed us to also investigate representa- Location information: high Low

tions of object category as a secondary question of the study. b


To resolve human brain responses with high spatial and tem-
poral resolution, participants viewed images from the stimulus set
while we recorded fMRI (N = 25) and EEG (N = 27) data in sepa-
rate sessions. Experimental parameters were optimized for each

No clutter Low clutter High clutter

Fig. 1 | Hypotheses and predictions about the pathway of object location Artificial laboratory Real world
Background
representations in the human brain. a, H1: representations of object
location emerge in early visual cortex and degrade along further processing c
H1 H2 H3
stages. H2 and H3: object location representations emerge gradually
along the dorsal (H2) or ventral (H3) visual stream. b, Left: when objects
are presented on a blank background, object location in the visual field
maps retinotopically onto early visual cortex, allowing for direct location
Space (fMRI)

read-out (grey). Right: when objects appear in a cluttered scene, large


parts of early visual cortex are activated, hindering a direct read-out (blue).
Representations are quantified as linearly classifiable object location
information from brain or model activity patterns1. c, Predictions in space,
Information

colour-coded by background condition: no (grey), low (green) and high


(blue) clutter. H1 predicts that independent of the object’s background,
location information for the object is highest in early processing stages
in space. H2 and H3 predict similar levels of location information with
no clutter across the entire processing pathway in all assessments. Ventral stream Dorsal stream Ventral stream
For highly cluttered backgrounds, H2 and H3 predict the emergence of
d
location representations in late processing stages of the dorsal (H2, c) and H1 H3
ventral (H3, c) stream. Location information in the low-clutter condition
Computation

is expected to be in between the no- and the high-clutter condition. d,


(DNN)

Computational model of the ventral visual stream. H1 (left) predicts highest


location information in early layers of the model in all conditions. H3
(right) predicts high location information across all layers with no clutter
and highest location information in late layers with high clutter. Location e
H1 H2 H3
information in the low-clutter condition is expected to be in between the
other two conditions. Since this is a model of the ventral stream, it does not
Time (EEG)
information

make predictions about the dorsal stream (H2). e, Location information


in time. H1 predicts that location information peaks early in time in all
conditions. Both H2 and H3 predict an early peak with no and a late peak
with high clutter. The peak for low clutter is expected to be in between no
and high clutter. Time Time Time

Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav 797


Articles NaTure Human BehaviOur

a
Exemplar Left up Right up Example condition

Cars Animals
No
1 4
clutter

Category

3° 4,2°
Low
Faces

clutter

2 3
Chairs

High
Left bottom Right bottom clutter

Four categories Four locations Three background conditions 48 conditions

b
0.5 s

EEG 0.5 s 0.5–0.6 s 1s

fMRI 0.5 s 2.5 s 2.5 s

0.5 s

Fig. 2 | Experimental design and tasks. a, Experimental design. We used a fully crossed design with factors of object category, location and background.
Note that, for copyright reasons, all example backgrounds shown are for illustrative purposes and were not used in the experiment. b, Tasks. The
experimental design was adapted to the specifics of each modality by adjusting the interstimulus interval. On each trial, participants viewed images for
500 ms followed by a blank interval (0.5–0.6 s in EEG, 2.5 s in fMRI). The task was to respond with button press to catch trials that were presented on
every fourth trial on average. Catch trials were marked by the presence of a probe (glass) in the EEG experiment and by an image repetition (one-back) in
the fMRI experiment. Image presentation was followed by blank screen (1 s in EEG, 2.5 s in fMRI).

accuracy quantifies object location information independent of objects were presented on cluttered backgrounds, location informa-
object category. This procedure was performed in a space-resolved tion emerged along the ventral visual processing hierarchy with less
fashion for fMRI and in a time-resolved fashion for EEG (see information in early visual areas than in LOC (Fig. 3b, green and
Supplementary Fig. 1a for details). blue bars; N = 25, 5 × 3 repeated-measures ANOVA, post hoc t tests
Tukey corrected; see Supplementary Table 2 for P values). These
The locus of object location representations. To determine the results are at odds with H1, which predicts that location informa-
locus of object location representations, we used a regions of inter- tion decreases along the ventral stream independent of background
est (ROI) fMRI analysis, including early visual regions (V1, V2 condition. Instead, the observed increase of location information
and V3) shared to the hierarchy of the ventral (V4 and LOC23) and along the ventral visual stream with cluttered backgrounds is con-
the dorsal visual stream (intraparietal sulcus: IPS0, IPS1, IPS2 and sistent with H3.
superior parietal lobule (SPL)). We ascertained these observations statistically with a 5 × 3
As expected, we found that most regions contained above-chance repeated-measures ANOVA with factors ROI (V1, V2, V3, V4 and
level location information in all background clutter conditions (Fig. LOC) and background (no, low and high clutter). Besides both
3b; N = 25, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR main effects (ROI: F(4,96) = 18.30, P < 0.001, partial η2 = 0.43; back-
corrected; see Supplementary Table 1 for P values). However, the ground: F(1.44,34.48) = 64.11, P < 0.001, partial η2 = 0.73), we crucially
amount of location information depended critically on the brain found the interaction to be significant (F(8,192) = 5.40, P < 0.001, par-
region and background condition. tial η2 = 0.18). As the interaction makes the main effects difficult
Focusing on the ventral visual stream first, we observed similar to interpret, we conducted post hoc paired t tests (all reported in
amounts of location information across regions when objects were Supplementary Table 2, Tukey corrected). The statistical analysis
presented without clutter (Fig. 3b, grey bars). In contrast, when confirmed all the qualitative observations: There were no significant

798 Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav


NaTure Human BehaviOur Articles
a c
Left bottom Right bottom
No clutter Low clutter High clutter
Training set faces 13 mm –3 mm 8 mm

versus

Classification accuracy –
15 10 9

chance level (%)


Testing set animals

0 0 0
versus

b 40
d
No clutter No clutter
Low clutter Low clutter
35 High clutter High clutter
Classification accuracy – chance level (%)

30
50

Classification accuracy – chance level (%)


25
*
20 * 40
**
*
15 * *
30
* *
10 * *
* * 20
5 * *
* ** **
* * * *
0
10
–5
0
V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL V1Cor V2Cor V4Cor ITCor
ROI CORnet layer

Fig. 3 | fMRI results of location classification. a, Classification scheme for object location across category. We trained an SVM to distinguish between
brain activation patterns evoked by objects of a particular category presented at two locations (here: faces bottom left and right) and tested the SVM
on activation patterns evoked by objects of another category (here: animals) presented at the same locations. Objects are enlarged for visibility and did
not extend into another quadrant in the original stimuli. b, Location classification in early visual cortex, ventral and dorsal visual ROIs (N = 25, two-tailed
Wilcoxon signed-rank test, P < 0.05, FDR corrected). With no clutter, location information was high across early visual cortex and ventral ROIs. In the
low- and high-clutter conditions, location representations emerged gradually along the ventral stream. In dorsal ROIs, location information was low,
independent of background condition. Stars above bars indicate significance above chance (see Supplementary Tables 1, 2 and 3 for P values). Error bars
represent s.e.m. Dots represent single subject data. c, fMRI searchlight result for classification of object location (N = 25, two-tailed Wilcoxon signed-rank
test, P < 0.05, FDR corrected). Peak classification accuracy is indicated by colour-coded circles (no clutter: left V3 (grey, XYZ coordinates −19 mm,
−97 mm, 13 mm); low clutter: left V1 (green, −5 mm, −86 mm, −3 mm); high clutter: left LOC (blue, −44 mm, −83 mm, 8 mm)). Millimetres (mm)
indicate axial slice position along z axis in Montreal Neurological Institute space. d, Location classification in a DNN. In the high-clutter condition, location
information emerged along the processing hierarchy, analogous to the ventral visual stream.

differences between ROIs in the no-clutter condition, except visual cortex than in dorsal regions (N = 25, post hoc t tests, Tukey
between V2 and V3 (P = 0.009) and between V2 and V4 (P = 0.001). corrected; see Supplementary Table 3 for P values). This is inconsis-
There was more location information in LOC than in V1, V2 and tent with H2, which predicts an increase of object location informa-
V3 when background clutter (both low and high) was present than tion along the dorsal stream.
when it was not (Fig. 3b; all P < 0.03, see Supplementary Table 2 Consistent with these qualitative observations, statistical testing
for P values, Tukey corrected). This effect was robust for the com- by 7 × 3 repeated-measures ANOVA with factors ROI (V1, V2, V3,
parison of locations across, but not within, visual hemifields (Fig. IPS0, IPS1, IPS2 and SPL) and background (no, low and high clut-
4a,b): post hoc tests comparing early visual areas versus LOC in the ter) did not provide statistical evidence for H2. We found significant
high-clutter condition were significant for the cross-hemifield clas- main (ROI: F(3.16,75.93) = 36.2, P < 0.001, partial η2 = 0.60; background:
sification (Fig. 4a; V1: P = 0.003; V2: P < 0.001; V3: P = 0.004, Tukey F(2,48) = 35.8, P < 0.001, partial η2 = 0.60) and interaction effects
corrected), but not for the within-hemifield classification (Fig. 4b; (F(6.25,149.89) = 14.5, P < 0.001, partial η2 = 0.38). The post hoc tests
V1: P = 0.697; V2: P = 0.281; V3: P = 1.00, Tukey corrected). showed that location information was higher in V1, V2 and V3
Focusing next on the dorsal visual stream, we observed low compared with dorsal regions in the no- and low-clutter conditions
object location information independent of background condition (Fig. 3b, grey and green, except V1 and V2 versus IPS2 and SPL with
(Fig. 3b; N = 25, 7 × 3 repeated-measures ANOVA). In the no- and low clutter, which were n.s.; see Supplementary Table 3 for P val-
low-clutter conditions, location information was higher in early ues). With high clutter, there was more location information in V3

Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav 799


Articles NaTure Human BehaviOur

a b
45 45
No clutter No clutter
Low clutter Low clutter
Cross-hemifield classification accuracy – chance level (%) 40 40

Within-hemifield classification accuracy – chance level (%)


High clutter High clutter

35 35

30 30

25 * 25
*
* *
20 20 *
* * *
* *
15 15
* * *
* * *
10 * **
* * 10
* * * * *
* * ** *
5 ** 5 * *
* * *
* *
0 0

–5 –5

V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL
ROI ROI

c d
40 No clutter No clutter
1.2
Low clutter Low clutter
35 High clutter High clutter
Classification accuracy – chance level (%)

1
30 *
* **
** *** * * *** *** *** *** ***
25 *
0.8

20
t value

0.6
15

10 0.4

5
0.2
0

–5 0
IPS3 IPS4 IPS5 V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL
ROI ROI

Fig. 4 | Location classification within and across hemifields, in IPS3–5 and univariate ROI results. a, Results of location classification across
categories between visual hemifields (left up versus right up, left bottom versus right bottom). Similar to the classification across four locations, the
repeated-measures ANOVA along the ventral stream (five ROIs × three clutter levels) yielded significant main (ROI: F(4,96) = 24.62, P < 0.001, partial
η2 = 0.51; background: F(1.49,35.85) = 45.34, P < 0.001, partial η2 = 0.65) and interaction effects (F(8,192) = 2.95, P = 0.004, partial η2 = 0.11). Post hoc tests
yielded results comparable to the main results (V1, V2 and V3 < LOC with high clutter). Stars above bars indicate significance above chance (N = 25,
two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected). b, Location classification across categories within visual hemifields (left up versus
left bottom, right up versus right bottom). As for the main analysis, the ANOVA yielded significant main (ROI: F(4,96) = 4.16, P = 0.004, partial η2 = 0.15;
background: F(1.60,38.43) = 57.90, P < 0.001, partial η2 = 0.71) and interaction effects (F(8,192) = 5.84, P < 0.001, partial η2 = 0.20). The post hoc tests revealed a
significant difference between V3 and LOC in the noclutter condition (P = 0.030). Stars above bars indicate significance above chance (N = 25, two-tailed
Wilcoxon signed-rank test, P < 0.05, FDR corrected). c, Classification accuracies in IPS3, IPS4 and IPS5 were not significantly higher than chance level
in all background conditions (N = 25, two-sided Wilcoxon signed-rank test, P > 0.05, FDR corrected). Error bars represent s.e.m. Dots represent single/
subject data. d, Absolute t values in each background condition and ROI, averaged across locations and categories. A 9 × 3 repeated-measures ANOVA
with factors ROI and clutter revealed a significant main effect of ROI (F(2.60,62.43) = 9.19, P < 0.001, partial η2 = 0.18) and a significant interaction effect
(F(3.40,81.64) = 9.89, P < 0.001, partial η2 = 0.03). Significant post hoc tests are listed in Supplementary Table 4. Overall, post hoc tests showed no clear
pattern of results between early, ventral and dorsal areas, except for higher activation in V1 than in dorsal areas and LOC with no clutter. Stars above bars
indicate significance above chance (N = 25, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected).

than in IPS0, IPS1 and IPS2. Location classification in IPS3, IPS4 responses were comparable across regions overall (Fig. 4d). Post
and IPS5 did not reveal significant information above chance level hoc tests to a 9 × 3 repeated-measures ANOVA with factors ROI
(Fig. 4c; N = 25, two-tailed Wilcoxon signed-rank test, all P > 0.05 (V1, V2, V3, V4, LOC, IPS0, IPS1, IPS2 and SPL) and background
FDR corrected, see Supplementary Table 1 for P values). Univariate (no, low and high clutter) revealed that responses were significantly

800 Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav


NaTure Human BehaviOur Articles
higher in V1 compared with the other ROIs in the no-clutter condi- models of this kind have been found to predict human brain activity
tion (all P < 0.03; all P values listed in Supplementary Table 4, Tukey in both the ventral and dorsal stream32,33.
corrected), but there was no significant difference in activation
between LOC and dorsal areas (Fig. 4d, all P values in Supplementary Temporal dynamics of object location representations. We con-
Table 4, Tukey corrected). ducted time-resolved multivariate EEG analysis to determine the
To explore whether any other brain regions beyond the investi- time course with which object location representations emerge.
gated ROIs contain location information, we used a spatially unbi- The general analysis scheme was the same as for the fMRI analysis
ased fMRI searchlight analysis24. We did not find statistical evidence presented above (Fig. 3a) but applied to time-specific EEG channel
for location information beyond the ventral and dorsal stream, activation patterns rather than fMRI activation patterns.
and the pattern of results was consistent with the outcome of the The analysis revealed location information for all background
ROI analysis (Supplementary Fig. 2). There was widespread loca- clutter levels (Fig. 5a, N = 27, two-tailed Wilcoxon signed-rank test,
tion information (N = 25, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected), but with different dynamics (Fig. 5b, see
P < 0.05, FDR corrected) from the occipital cortex up into the dorsal Supplementary Table 5 for classification onsets and peak values).
(precuneus, superior parietal lobule) and ventral (fusiform gyrus) We report peak latencies with 95% confidence intervals (N = 27,
visual stream. Depending on background condition, location infor- 10,000 bootstraps). Whereas the peak latency was similar for the
mation peaked in different visual areas. In the no-clutter condition, no- (140 ms (133–147 ms)) and the low-clutter (133 ms (121–
the peak was in left V3, in the low-clutter condition in left V1 and 233 ms)) condition, it was delayed in the high-clutter condition
in the high-clutter condition in left LOC (Fig. 3c, see caption for (317 ms (250–336 ms)). Statistical analysis (N = 27, bootstrap test,
coordinates). Distances between peaks were significantly larger 10,000 bootstraps, P < 0.05, one-tailed bootstrap test against zero,
than chance (N = 25, bootstrapping of condition labels, 10,000 FDR corrected) ascertained that the peak latency difference was
bootstraps, P < 0.05 one-tailed bootstrap test against chance level, significant between the high-clutter and the no-clutter conditions
Bonferroni corrected) between the no- and the high-clutter condi- (N = 27, 177 ms (94–190 ms), P < 0.001) and between the high- and
tion (Euclidean distance 15.9, CI 1.0–3.6, P < 0.001) and between the low-clutter conditions (184 ms (16–196 ms), P = 0.023), but not
the low- and the high-clutter condition (Euclidean distance 22.0, CI between the no- and the low-clutter conditions (7 ms (−11–156 ms),
2.0–16.3, P = 0.002), but not for the no- and low-clutter condition P = 0.620). These delays were also robust when classifying locations
(Euclidean distance 13.6, CI 1.4–14.7, P = 0.275). across or within visual hemifields (Supplementary Fig. 4). A search-
Together, these results provide consistent evidence for the light in EEG sensor space showed that location information at the
hypothesis that representations of object location across visual peaks of the three background conditions was highest at occipital,
hemifields emerge in the ventral visual stream (H3) when objects occipito-parietal and occpito-temporal electrodes (Fig. 5c; N = 27,
appear in cluttered scenes. two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected
across electrodes and time points; see Supplementary Fig. 5a–c for
Computational modelling. DNNs trained on object categorization time courses), suggesting sources in those areas, which is in line
are currently the best predicting models of ventral visual stream with the fMRI searchlight results (Supplementary Fig. 2) and with
representations25–27 and show a spatiotemporal correspondence in univariate EEG topographies (Supplementary Fig. 5d–f).
their processing hierarchy to the visual brain25,28–30. Therefore, they In sum, this result shows that object location representations
constitute feasible biologically inspired models for computing com- emerge later when objects appear on cluttered backgrounds than
plex visual representations28,31. If such DNNs are useful models of when they appear on blank backgrounds. This provides further
visual processing in human visual cortex, they should show a simi- concurrent evidence against H1 and is consistent with H2 and
lar pattern of results as the ventral visual stream in the representa- H3, that is, that object location representations emerge at late
tion of object location, too. stages of visual processing when objects are viewed under complex
To evaluate this prediction, we chose the recurrent CORnet-S visual conditions.
model because it is among the best-performing models on a bench- How is the delay in the peak latencies of the no- and the
mark for predicting neural responses in monkey inferior temporal high-clutter conditions to be interpreted? Assuming that in object
cortex (IT)26,27 and approximates explicitly the hierarchy of the ven- processing the brain runs through a series of distinct stages, we see
tral visual system. Each region of the ventral stream is modelled as two possible explanations.
one processing block with a corresponding name (V1Cor, V2Cor, etc.). One explanation is that the peak latency delay indicates a change
Analogous to the fMRI analysis, we extracted the unit activation pat- in the processing stage at which object location representations
terns to our stimulus set at the last layer of each block and classified emerge. This would mean that in the no-clutter condition, location
object location across category to identify the processing stage of the representations emerge in an early stage whereas with high clutter
DNN at which object location representations emerge (Fig. 3d). they emerge during a different, later processing stage (the ‘change’
We found that in the no- and low-clutter conditions, location hypothesis). An alternative explanation is that the processing stage
information was at or close to ceiling in all layers. In the high-clutter at which object location representations emerge remains the same,
condition however, location information was low in V1Cor and but its emergence is delayed in time in the high-clutter condition
emerged along the processing hierarchy. Qualitatively equivalent (the ‘delay’ hypothesis).
results were obtained in three other DNNs (Alexnet, ResNet-50 and To distinguish between these explanations, we used temporal
CORnet-Z; Supplementary Fig. 3a–c), demonstrating the generaliz- generalization analysis34, comparing the representational dynamics
ability of the results pattern. This result was still robust in all four with which object location representations emerge in the no- and
DNNs when limiting the classification to either horizontal or verti- the high-clutter conditions across time (Fig. 5d). Used in this way,
cal location comparisons (Supplementary Fig. 3e,f). the time generalization analysis yields a two-dimensional matrix
In sum, we found that DNNs trained on object categorization indexed in time, indicating at which time points location repre-
show a similar pattern of location representations along their pro- sentations in the no- and the high-clutter conditions are similar.
cessing hierarchy as the human brain. This demonstrates how object We implemented time generalization by classifying object location
location representations might be computed in biological systems. across category and background condition for all time point combi-
This result lends independent evidence against H1 and yields plau- nations (Fig. 5d and Supplementary Fig. 1b). Overall, we observed
sibility to H3 since CORnet-S was built to model the ventral stream. a large significant cluster of above-chance classification accuracies
However, this result cannot disambiguate between H2 and H3, as across the time generalization matrix (N = 27, two-tailed Wilcoxon

Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav 801


Articles NaTure Human BehaviOur

a b c
No clutter No clutter
25
Classification accuracy – chance level (%)

140 ms Low clutter Low clutter


High clutter High clutter
20 133 ms * No clutter Low clutter High clutter

15
350 *
317 ms
300

Peak latency (ms)


10
250

5 200

150
0
100
Classification accuracy –
–5 50 chance level (%)

0 200 400 600 800 1,000 0 –20 –10 0 10 20


Time (ms)

d e f No clutter
High clutter
0.6
Left bottom Right bottom 600
Training set, no clutter

Classification accuracy – chance level (%)


Train 0.4
Train time (ms), no clutter

versus time *
400
t 4 *

Spearman’s R
0.2
*
200 0
2
Testing set, high clutter

Test
time
t –0.2
versus t+1 0
t+2 0
–0.4
t+n 0 200 400 600
Test time (ms), high clutter
V1 V2 V3 V4 LOC
Ventral stream ROI

Fig. 5 | Temporal dynamics of object location representations. a, Results of time-resolved location classification across category from EEG data. Results
are colour coded by background condition, with significant time points indicated by lines below curves (N = 27, two-tailed Wilcoxon signed-rank test,
P < 0.05, FDR corrected), 95% CI of peak latencies indicated by lines above curves. Shaded areas around curves indicate s.e.m. Inset text at arrows
indicates peak latency (140 ms, 133 ms and 317 ms in the no-, low- and high-clutter condition, respectively). b, Comparison of peak latencies of curves in
a. Error bars represent 95% CI. Stars indicate significant peak latency differences (P < 0.05; N = 27, bootstrap test with 10,000 bootstraps). c, Results of
location across category classification searchlight in EEG channel space at peak latencies in no-, low- and high-clutter condition, down-sampled to 10 ms
steps. Significant electrodes are marked in grey (N = 27, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected across electrodes and time points).
d, Time generalization analysis scheme for classifying object location across category and background condition. The classification scheme was the same
as in a with the differences that (i) the training set conditions always came from the no-clutter while the testing set conditions came from the high-clutter
condition and (ii) training and testing was repeated across all combinations of time points for a peri-stimulus time window between −100 and 600 ms
(see Supplementary Fig. 1b for details). Objects are enlarged for visibility and did not extend into another quadrant in the original stimuli. e, Results of the
time generalization analysis. Dashed black lines indicate stimulus onset; oblique black line highlights the diagonal. Solid white outlines indicate significant
time points (N = 27, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected). Dashed white outline highlights delayed clusters. f, EEG–fMRI fusion.
Results represent the correlations between single-subject fMRI RDVs of classification accuracies and group-averaged RDVs of the EEG peaks in a. Stars
above bars indicate significance above chance (N = 27, two-tailed Wilcoxon signed-rank test, P < 0.05, FDR corrected). Error bars represent the s.e.m. Dots
represent single-subject data points.

signed-rank test, P < 0.05, FDR corrected). While the ‘change’ in a supplementary analysis on the group-averaged peak in Fig. 5e
hypothesis predicts highest classification accuracies on the diago- (Euclidean distance 49.50 ms; 10,000 bootstraps; one-tailed boot-
nal, the ‘delay’ hypothesis predicts highest classification accuracies strap test against zero, P = 0.010; 95% CI 14.14–77.78). Classification
below the diagonal. The results are reported in Fig. 5e. We found that accuracies were significantly higher below than above the diagonal
peak latencies in location information as tested across subjects were between ~120 and 240 ms in the no-clutter condition and from
significantly shifted below the diagonal (mean Euclidean distance ~200 ms in the high-clutter condition (N = 27, two-tailed Wilcoxon
56.31 ms; N = 27, two-tailed Wilcoxon signed-rank test, P < 0.001, signed-rank test, P < 0.05, FDR corrected; Supplementary Fig. 6b).
r = 0.65, s.e.m. 1.55; see Supplementary Fig. 6a for single-subject Together, these results provide evidence for the ‘delay’ hypothesis
peaks), indicating that location representations in the no-clutter and demonstrate that object location representations in the no- and
condition generalized to the high-clutter condition at later time the high-clutter condition emerge at the same processing stage with
points (Fig. 5e, white dashed outline). This result was confirmed a temporal delay.

802 Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav


NaTure Human BehaviOur Articles
Spatiotemporal similarity of location representations. Temporal described in previous sections but exchanging the role of experi-
delays for the same processing stage cannot be explained by a purely mental factors location and category. In essence, we performed
feedforward process, suggesting instead the involvement of recur- cross-classification analyses of category across location (Fig. 6a) to
rent processing. Recurrent processes could account for the observed determine where and when location-tolerant object category repre-
delay with lateral connections within the same area35,36. The shared sentations emerge in the human brain.
processing stage underlying early and late location representations
in the no- and the high-clutter conditions should have a common The locus of object category representations. We investigated object
origin in space, too. Based on the fMRI results, we hypothesized category representations tolerant to changes in location using an
that this origin would be in LOC. To test this hypothesis directly, ROI-based fMRI analysis. We observed that location-tolerant object
we used EEG–fMRI fusion based on representational similarity of category could be classified in the ventral stream in V4 and LOC
object location representations37–39. (Fig. 6b; N = 25, two-tailed Wilcoxon signed-rank test, P < 0.05,
The processing stage at which location representations emerge FDR corrected, all P values in Supplementary Table 1), but not at
corresponds to the peak latency of location classification in the EEG earlier stages and not in dorsal ROIs except IPS0 with high clutter
for the no- and the high-clutter condition. We thus determined (P = 0.005). This pattern was not influenced by the level of clutter,
whether location representations identified with EEG at these time suggesting that object category representations that are tolerant to
points are representationally similar to those identified with fMRI location variations are unaffected by the clutter level of the back-
in ventral stream regions for the no- and the high-clutter condition ground on which the object appears.
separately. Specifically, we averaged the representational dissimilar- These observations were statistically ascertained in a 5 × 3
ity vectors (RDVs) of the time-resolved EEG classification accuracies ANOVA along the ventral stream with factors ventral ROIs (V1,
in Fig. 5a across subjects and time points within the 95% confidence V2, V3, V4 and LOC) and background (no, low and high clutter),
intervals over the peaks. This yielded one RDV per background revealing a significant main effect of ROI (F(2.42,58.03)=21.97, P < 0.001,
condition that was then correlated with the single-subject RDV of partial η2 = 0.48), but not of background (F(2,48) = 0.68, P = 0.510)
an fMRI ROI in the same background condition. Results within and no interaction (F(8,192)=1.85, P = 0.070, see Supplementary
background and ROI were averaged across fMRI participants. Fig. 8 for searchlight results and Supplementary Table 7 for
We found a spatiotemporal correspondence with EEG peak post hoc tests, Tukey corrected). In the 7 × 3 repeated-measures
latency for the no-clutter condition in V4 and LOC but for the ANOVA along the dorsal stream with factors ROI (V1, V2, V3,
high-clutter condition in LOC only (Fig. 5f; N = 25, two-tailed IPS0, IPS1, IPS2 and SPL) and background (no, low and high
Wilcoxon signed-rank test, P < 0.05, FDR corrected). This estab- clutter) we found no significant main effect (ROI: F(6,144)=1.38,
lishes LOC as the cortical locus at which object location representa- P = 0.227; background: F(2,48) = 0.94, P = 0.396) or interaction effect
tions emerge independent of background condition, but involving (F(12,288) = 0.96, P = 0.463).
additional recurrent processing when the background is cluttered. In sum, our results confirm that the ventral stream constructs
Post hoc tests to a 5 × 2 repeated-measures ANOVA with factors object representations that are robust to changes in viewing condi-
ROI (V1, V2, V3, V4 and LOC) and clutter (no and high) addition- tions and show in particular that location-tolerant category repre-
ally showed that correlations were higher in V4 and LOC than in sentations emerge in the ventral stream unaffected by the clutter
V1, V2 and V3 with no clutter (see Supplementary Table 6 for P level in the object’s background.
values; main effect of ROI: F(4,96) = 14.30, P < 0.001, partial η2 = 0.37;
n.s. main effect of background: F(1,24) = 3.62, P = 0.069; interaction: Object category representations in time. Emergence of object cate-
F(4,96) = 8.17, P < 0.001, partial η2 = 0.25). The notion that loca- gory representations can be delayed, for example when objects are
tion representations emerge in LOC with recurrence when back- occluded or are hard to categorize44–46. This suggests that object cat-
ground is cluttered finds further support from a supplementary egory representations might emerge with a delay also when objects
analysis showing that location representations with no and high appear on cluttered backgrounds, for example because additional
clutter were significantly similar in LOC, but not in other regions grouping and segmentation operations are necessary that depend
(Supplementary Fig. 7; N = 25, two-tailed Wilcoxon signed-rank on recurrence and hence require additional time47–49.
test, P < 0.05, FDR corrected). Furthermore, recurrent DNNs We therefore investigated whether background clutter influences
showed an advantage compared with shallow feedforward DNNs the timing with which location-tolerant category representations
for the classification of location with high clutter and for the predic- emerge using time-resolved multivariate EEG analysis (Fig. 6c). We
tion of location representations in LOC (Supplementary Fig. 3c,d; found that object category could be reliably classified for all back-
N = 25, 4 × 2 repeated-measures ANOVA). Together, these results ground conditions from the EEG data (Fig. 6c, N = 27, two-tailed
suggest that location information of objects on highly cluttered Wilcoxon signed-rank test, P < 0.05, FDR corrected), but with dis-
scenes emerges in LOC with local recurrent processes. tinct temporal dynamics (see Supplementary Table 5 for classifica-
tion onsets and peak values). Classification peaks were 18 ms later
Object category representations. The observation that representa- in the high-clutter than in the no- and the low-clutter conditions
tions of object location depend on the background on which the (no clutter: 215 ms (213–219 ms); low clutter: 215 ms (203–236 ms);
object appears immediately raises the question of whether represen- high clutter: 233 ms (214–303 ms)). The delay (95% difference CI
tations of object category are affected by background, too. Previous no clutter: 16–173 ms; P < 0.001; low clutter: 13–171 ms; P = 0.029)
research suggests opposite answers to this question. One line of was significant (N = 27, bootstrap test, 10,000 bootstraps, P < 0.05,
research demonstrated that object representations in the ventral one-tailed bootstrap test against zero, FDR corrected; Fig. 6d).
stream are modulated by the presence of other objects and the back- Location-independent category information at the peaks of the
ground on which they are viewed40–43. Another line of research has three background conditions was most pronounced at occipital
provided strong evidence that the ventral stream constructs object and temporal electrodes as revealed in the EEG searchlight in sen-
representations that are increasingly tolerant to changes in view- sor space (Fig. 6e and Supplementary Fig. 5g–i; N = 27, two-tailed
ing conditions1,5,8, suggesting that object category representations Wilcoxon signed-rank test, P < 0.05, FDR corrected across elec-
should be unaffected by the background of the objects. Here we trodes and time points). This is in line with the results from the
bring these two lines of research together by explicitly investigating fMRI searchlight analysis (Supplementary Fig. 8), together suggest-
how background impacts object category representations that are ing neural sources of the peaks in Fig. 6c in occipital and temporal
tolerant to location. To do this, we analysed EEG and fMRI data as regions. Univariate EEG activity was strongest in occipital rather

Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav 803


Articles NaTure Human BehaviOur

a b
Faces Animals

Training set
left bottom
versus No clutter
Low clutter
High clutter
20
right bottom
Testing set

Classification accuracy – chance level (%)


versus 15

c 10
15 No clutter
Low clutter
Classification accuracy – chance level (%)

215 ms High clutter


215 ms * *
10 *
5
** *
233 ms
*
5
0

–5
–5
V1 V2 V3 V4 LOC IPS0 IPS1 IPS2 SPL

0 200 400 600 800 1,000 ROI

Time (ms)

d * e
*
300
No clutter Low clutter High clutter
250
Peak latency (ms)

Classification accuracy –
200 No clutter chance level (%)
Low clutter
150 High clutter –10 –6 –2 2 6 10

100

50

f g
600
Classification accuracy – chance level (%)

Faces Animals
2
Training set

Train time (ms), no clutter


left bottom

Train 400
versus time
t

0
Test 200
time
right bottom
Testing set

t
versus t+1
t+2
0 –2
t+n

0 200 400 600


Test time (ms), high clutter

Fig. 6 | Spatial and temporal dynamics of object category representations. a, Classification scheme of category across location. b, Location-tolerant
category representations in the ventral and dorsal streams. Stars indicate classification above chance level (two-tailed Wilcoxon signed-rank test,
P < 0.05, FDR corrected). Conventions as in Fig. 3b. c, Results of the time-resolved category classification across locations from EEG activation patterns.
Conventions and statistics as in Fig. 5a. d, Peak latencies of curves in c. Statistics and conventions as in Fig. 5b. e, Results of searchlight in EEG channel
space at peak latencies in no-, low- and high-clutter condition, down-sampled to 10 ms steps. Significant electrodes are marked in grey (N = 27, two-tailed
Wilcoxon signed-rank test, P < 0.05, FDR corrected across electrodes and time points). f, Time generalization analysis scheme for classifying object
category across location and background condition. g, Results of the time generalization analysis (N = 27, two-tailed Wilcoxon signed-rank test, P < 0.05,
FDR corrected). Conventions as in Fig. 5e.

804 Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav


NaTure Human BehaviOur Articles
than in temporal electrodes (Supplementary Fig. 5,j,k,l). Together, We observed that location representations with high clutter
this shows that cortical processing of object category requires more increased along the ventral stream for the classification of cross-
time when objects appear in cluttered scenes compared with artifi- but not within-hemifield locations. This pattern of results might be
cial blank backgrounds. due to several factors. For one, statistical power is reduced when
Analogous to the delay in location processing (Fig. 5a,b,e), we assessing results of cross- and within-hemifield location classifica-
asked whether this delay indicates a temporal shift in the processing tion separately rather than combined, the test for which our study
cascade or reflects a change to a later processing stage. To disambigu- was originally planned. Second, cross-hemifield location repre-
ate, we classified object category across locations in a time generaliza- sentations might be more distinguishable as there is less integra-
tion analysis across the no- and the high-clutter conditions (Fig. 6f). tion of location information across than within hemispheres:
We identified three main clusters of high classification accuracy cross-hemifield integration requires trans-callosal connections,
with timing corresponding roughly to the timing of the three peaks whereas within-hemifield integration does not. Third, factors unre-
observed in the time courses of the no- and the high-clutter con- lated to location representations that however affect hemispheres
ditions (Fig. 6g; see Supplementary Table 8 for timing details). To differently, such as possible vascular changes, can contribute to the
test whether category information in the no-clutter condition gen- effect. Importantly, we do not see a difference between within- ver-
eralized to later time points in the high-clutter condition and thus sus across-hemifield classification in the high-clutter condition in
was shifted below the diagonal, we computed the single-subject dis- the EEG and DNN results, supporting our main conclusions and
tances from the peak in the time generalization matrix to the diago- suggesting that the discrepancy in the fMRI results might be related
nal. Category information peaks were significantly shifted below to a decreased signal-to-noise ratio.
the diagonal as tested across subjects (mean Euclidean distance When objects are viewed on blank backgrounds rather than
27.24 ms; N = 27, two-sided Wilcoxon signed-rank test, P = 0.025, on cluttered backgrounds, location information can be read out
r = 0.43, s.e.m. 2.50; single-subject peaks shown in Supplementary from V1 because there is a direct mapping from stimulus loca-
Fig. 6c), but not as tested for the group-averaged peak (Euclidean tion to the retinotopic location in V1 that is activated. With clut-
distance 28.28 ms; 10,000 bootstraps; one-tailed bootstrap test ter, there is no such mapping (Fig. 1b) and therefore visual input
against zero, P = 0.230; 95% CI −7.07 to 35.35). Classification accu- is processed through the ventral visual stream cascade where LOC
racies were significantly higher below than above the diagonal from but not V1 reliably indicates object location representations. Under
~190 ms (no clutter) and ~240 ms (high clutter) until ~360 ms (no this assumption, location information in V1 might be an epiphe-
clutter) and ~400 ms (high clutter) (Supplementary Fig. 6d). This nomenon caused by artificial stimulation conditions, revealing
pattern of results suggests that object category representations of information that can be measured by the experimenter but is not
objects on blank and cluttered backgrounds emerge at a similar pro- necessarily used by the brain51–53 and relevant for behaviour at this
cessing stage. This stage emerges with a delay when objects are pre- stage of processing. Our results thus further emphasize the impor-
sented on cluttered backgrounds, indicating recurrent processing. tance of increasing image complexity to increase the ecological
validity of experimental stimuli21. While our study was designed to
Discussion establish the presence and nature of object location representations
Using multivariate analysis of fMRI and EEG data and computa- in the brain, it cannot establish the behavioural relevance of those
tional model comparison, we resolved where, how and when object representations. Future studies could investigate this, for example,
location representations emerge in the human brain. Our results by using speeded detection tasks for objects presented in different
are three fold and depend crucially on whether objects appeared on locations and relating detection speed and performance to location
cluttered backgrounds or on blank backgrounds. First, location rep- representations across the brain.
resentations emerged along the ventral visual pathway and peaked Our results are seemingly at odds with neuropsychological find-
in region LOC when viewed on cluttered backgrounds. Second, this ings showing that patients with ventral lesions performed well on
pattern of results was mirrored in DNNs trained on object categori- localization tasks2. However, later studies showed that in fact just
zation. Third, location representations emerged later in time when localization behaviour was intact in those patients54–56, but not loca-
objects were viewed on cluttered backgrounds than when viewed tion perception. It is conceivable that these patients recruited sparse
on blank backgrounds. In-depth analysis suggested that this delay location information from spared early visual areas to accomplish
indexed recurrent processing in LOC. Together, these results pro- the localization tasks (similar to blindsight) and that tasks involving
vide converging evidence against the hypothesis that object location more cluttered displays would have been more challenging for these
is processed in early visual cortex (H1), and in addition the results patients. In line with this, other patients with occipito-temporal
in space provide evidence for the hypothesis that object location lesions had problems with tasks requiring figure–ground segmenta-
emerges along the ventral stream (H3, Fig. 1a). A corresponding tion57 or perceptual grouping58, both of which are essential to dissect
analysis of object category representations revealed equivalently an an object from its background in a cluttered scene. Thus, neuropsy-
emergence in the ventral visual stream, and a delay when objects chological studies taking background clutter into account are neces-
appear on cluttered backgrounds due to a temporal shift in the pro- sary to resolve this issue.
cessing cascade, related to recurrent processing. Thus, the two argu- While we do observe location information in dorsal and ven-
ably most fundamental properties of objects, that is, what the object tral regions anterior and medial from LOC, the fMRI searchlight
is and where it is, emerge in the ventral visual stream with a similar analysis (Supplementary Fig. 2) shows the peak in LOC. Why did
spatiotemporal processing pattern. location information not peak in other high-level ventral or dorsal
Our fMRI results single out the ventral stream with a peak in areas? It is possible that IPS would represent object location more
LOC (H3), rather than early visual areas (H1) or the dorsal stream prominently if we optimized our stimulus selection for it by includ-
(H2), as the processing hierarchy responsible for computing ing tools51. However, the univariate response profile of the dorsal
object location in the human brain when objects appear on clut- and ventral ROIs in our study tentatively suggests comparable acti-
tered backgrounds. This concurs with a primate study14 that found vations across ROIs (Fig. 4d and Supplementary Table 4), indicating
category-orthogonal object representations to emerge in IT (the that univariate activation was not the source of lower information
putative homologue of human LOC50) rather than V4. Together, in IPS. Likewise, it is possible that different stimuli (for example,
these results indicate that object location representations emerge faces) would have yielded stronger effects in other high-level,
along the ventral stream towards LOC when viewing conditions are category-selective ventral regions (for example, fusiform face area
realistic and challenging. or occiptal face area). Another possibility is that LOC has optimal

Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav 805


Articles NaTure Human BehaviOur

receptive field properties for the eccentricities used in this study59,60, interact with objects in our environment. Our results reveal the
which allows it to encode object location on clutter better than other basis of this knowledge by revealing representations of location and
high-level ventral ROIs. These questions need more investigation in category in the human brain when viewing conditions are challeng-
future research. ing, as encountered outside of the laboratory. Both object location
Our empirical findings were reinforced by the observation that and category representations emerge along the ventral visual stream
representations of object location emerge in DNNs in a similar way towards LOC and depend on recurrent processing. Together, our
as they emerge in the human brain. Importantly, the DNNs used results provide a spatiotemporally resolved account of object vision
here were trained on object categorization and not localization. Our in the human brain when viewing conditions are cluttered.
results thus show that representations of object properties for which
the network is not optimized can emerge in such networks14. One Methods
limitation of our approach is that the models used here were specifi- Participants in EEG and fMRI experiments. The experiment was approved
cally designed to model the ventral visual stream25–30, even though by the ethics committee of the Department of Education and Psychology of the
Freie Universität Berlin (ethics reference number 104/2015) and was conducted
they have been shown to predict brain responses in the dorsal in accordance with the Declaration of Helsinki. Twenty-nine participants
stream, too32,33. Therefore, the presented modelling results cannot participated in the EEG experiment, of whom two were excluded because
distinguish between H2 and H3. Future studies could compare loca- of equipment failure (N = 27, mean age 26.8 years, s.d. 4.3 years, 22 female).
tion representations in DNNs that model the dorsal versus the ven- Twenty-five participants (mean age 28.8 years , s.d. 4.0 years, 17 female) completed
the fMRI experiment. The participant pools of the experiments did not overlap
tral stream and investigate how the model’s representations relate to
except for two participants. Sample size was chosen to exceed comparable
brain representations in the two streams. magnetoencephalography, EEG and fMRI classification studies to enhance
The time-resolved EEG analyses and the EEG–fMRI fusion power8,9,43,66–68. All participants had normal or corrected-to-normal vision and
analysis38 revealed together that location representations of objects no history of neurological disorders. All participants provided informed
with high clutter were delayed due to a temporal shift within the consent prior to the studies and received a monetary reward or course credit
for their participation.
same processing stage in LOC. Since temporal delays at the same
processing stage cannot be explained purely by a feedforward Experimental design. To enable us to investigate the representation of object
neural architecture, this indicates the involvement of recurrence. location, category and background independently, we used a fully crossed design
Physiologically, this might be implemented via lateral connections with factors of category (four values: animals, cars, faces and chairs; Fig. 2a, left,
within LOC, resulting in slower information accumulation61,62. with three exemplars per category), location (four values: left up, left bottom, right
Furthermore, we found not only location but also object category up and right bottom; Fig. 2a left centre) and background clutter (three values:
no, low and high clutter; Fig. 2a, right centre). This amounted to 144 individual
representations to be delayed when objects were superimposed on condition combinations (12 object exemplars × 4 locations × 3 background clutter
natural scenes. Together with previous reports that object category levels). We analysed the data at the level of category, effectively resulting in 48
processing can be delayed when objects are degraded, occluded or experimental conditions (4 categories × 4 locations × 3 background clutter levels).
are hard to categorize44–46,48, our results add to the emergent view
that recurrent computations are critically involved in the process- Stimulus set generation. The stimulus material was created by superimposing
three-dimensional (3D) rendered objects (Fig. 2a, left) with Gouraud shading
ing of fundamental object properties such as what objects are62 and
in one of four image locations (Fig. 2a, left centre) onto images of real-world
where they are in real world vision. Future studies could provide backgrounds (Fig. 2a, right centre).
more direct evidence for recurrence by manipulating it experi- In detail, in each category, one of the objects was rotated by 45°, one by
mentally, for example, by adding a masking condition to the study 22.5° and the third by −45° with respect to the frontal view to introduce equal
design used here. variance in the viewing angle for each category. Locations were in the four
quadrants of the screen (Fig. 2a, left centre). Expressing locations in degrees of
We find that both object category and object location representa- visual angle, the object’s centre was 3° visual angle away from the vertical and
tions emerged gradually along the ventral visual stream. This might horizontal central midlines (that is, 4.2° from image centre; Fig. 2a, right). The
seem counter-intuitive, given that transformations that lead to the size of the objects was adjusted so that all of them fitted into one quadrant of the
emergence of category representations in LOC have been linked to aperture, while maintaining a similar size (mean (s.d.) size: vertical, 2.4° (0.4°);
building increasing tolerance to viewing conditions, in particular horizontal, 2.2° (0.6°)).
We used backgrounds with three different clutter levels: no, low and high
to changes in object location5–7. However, this apparent contradic- (Fig. 2a, right centre; note that example backgrounds shown here are for illustrative
tion is qualified by the observation that the observed tolerance to purposes and were not used in the experiment. The original stimulus material
changes in viewing conditions is graded rather than absolute63, mir- is available for download together with the data). We defined clutter as the
rored by the presence of cells in high-level ventral visual cortex with organization and quantity of objects that fill up a visual scene69. In the no-clutter
large overlapping receptive fields10,17. Such tuning properties provide condition, the background was uniform grey. In the low- and the high-clutter
condition, we selected a set of 60 natural scene images each from the Places365
the spatial resolution needed for localization64, while also providing database ([Link] that had low or high
robustness to location translation65, needed for object categorization. clutter, respectively, and did not contain objects of the categories defined in our
In this study, we deliberately avoided congruence between objects experimental design (that is, no animals, cars, faces or chairs). We converted the
and backgrounds, which is known to lead to interaction effects with images to greyscale and superimposed a circular aperture of 15° visual angle. The
category processing40. However, this deviation from normality in visual angle was the same in the EEG and fMRI experiments.
We confirmed that our selection of low- and high-clutter images was
our stimulus set might have triggered mismatch responses that lead appropriate by an independent behavioural rating experiment (N = 10) in
to additional recurrent processing for disambiguation or attentional which participants rated clutter level on a scale from 1 to 6 (mean (s.d.) clutter
responses triggered by atypical object appearance (for example, size image rating: low clutter, 2.52 (0.85); high clutter, 5.04 (0.87); the difference was
and texture). Further, because objects and backgrounds did not significant: N = 10, paired-sample t test, P < 0.0001, t = 14.96).
form a coherent scene, objects and backgrounds might have been From the set of 60 low- and high-clutter images, we selected 48, one for
each experimental condition of our experimental design. We then randomly
represented more independently. Another design limitation is that paired objects to background images to avoid systematic congruencies between
we constrain the number of locations to four to fully cross all stimu- backgrounds and objects. This was done for each of the 20 runs of the EEG
lus conditions while maintaining a feasible session duration. Future experiment and for the 10 runs of the fMRI experiment. This resulted in 144
research will have to establish whether congruent versus incongru- individual images per run, one for each condition (that is, 12 object exemplars
× 4 locations × 3 background clutter levels). The remaining set of 12 low-
ent scene–object pairings yield different location representations
and high-clutter images was used separately to create catch trials in the EEG
on cluttered backgrounds and whether our results generalize to experiment (see details below).
more locations.
What an object is and where an object is are arguably the two Experimental procedures. fMRI main experiment. Each participant completed
most fundamental properties that we need to know to be able to one fMRI recording session consisting of ten runs (run duration 552 s), resulting

806 Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav


NaTure Human BehaviOur Articles
in 92 min of fMRI recording of the main experiment. During each run, each of the masks from a probabilistic atlas70 for both hemispheres combined (three early
144 images of the stimulus set was shown once (denoted here as ‘regular’ trials) visual ROIs for regions shared between the ventral and dorsal stream (V1, V2 and
in random order. Image duration was 0.5 s, with a 2.5 s inter-stimulus interval V3), two ROIs in mid- and high-level ventral visual cortex (V4 and LOC) and
(ISI). Images were presented at the centre of a black screen, overlaid with a red four ROIs in dorsal visual cortex (IPS0, IPS1, IPS2 and SPL)). To avoid overlap
fixation cross in the centre. Participants were asked to fixate their eyes on the between the ROI masks we removed all overlapping voxels. In a second step we
central cross at all times. Regular trials were interspersed every third to fifth trial selected the 325 most activated voxels of the participant-specific localizer results
(equally probable, in total 36 per run) with catch trials. Catch trials repeated the within the masks, using the objects > scrambled contrast for LOC and the objects
image shown on the previous trial (Fig. 2b, bottom). Participants were instructed & scrambled objects > baseline contrast for the remaining ROIs. This yielded
to respond with a button press to these repetitions (that is, a one-back task). Catch participant-specific ROI definitions.
trials were excluded from further analysis. Since this was a repeated-measures
design, data collection and analysis were not performed blind to the conditions EEG acquisition and pre-processing. We recorded EEG data using an EASYCAP
of the experiment. 64-channel system and a Brainvision actiCHamp amplifier at a sampling rate of
1,000 Hz. The electrodes were placed according to the standard 10–10 system. The
fMRI localizer experiment. To define ROIs in early visual, dorsal and ventral visual data were filtered online between 0.03 and 100 Hz and re-referenced online to FCz.
stream areas, we performed a separate localizer experiment prior to the main Offline pre-processing was conducted using the EEGLAB toolbox (version
fMRI experiment with images in three experimental conditions: faces, objects and 14)71 and incorporated a low-pass filter with a cut-off at 50 Hz and epoching
scrambled objects. Each image shown in the localizer experiment consisted of four trials between −100 ms and 999 ms with respect to stimulus onset. Epochs
identical versions of an object presented at the four locations as defined in the were baseline corrected by subtracting the mean of the 100 ms prestimulus time
main experiment (for example, one particular face shown in all four quadrants) to window from the entire epoch. To clean the data from artefacts such as eye blinks,
approximate the stimulation conditions of the main experiment. eye movements and muscular contractions, we used independent component
The localizer experiment consisted of a single run lasting 384 s, comprising six analysis as implemented in the EEGLAB toolbox. SASICA72 was used to guide the
blocks of presentation of faces, objects, scrambled objects and a blank background visual inspection of components for removal. Components related to horizontal
as baseline. Each stimulation block was 16 s long with presentations of 20 different eye movements were identified using two lateral frontal electrodes (F7 and F8).
objects (500 ms on, 300 ms off), including two one-back repetitions that participants In the last six participants, additional external electrodes were available that
were instructed to respond to with a button press. Stimulation block order was allowed for the direct recording of the horizontal electro-oculogram to identify
first order counterbalanced, with triplets of stimulation blocks being presented in and remove components related to horizontal eye movements. For blink artefact
random order and being interspersed regularly with blank background blocks. detection based on the vertical electro-oculogram, we used two frontal electrodes
(Fp1 and Fp2). On average, 11 (s.d. 4) components were removed per participant.
EEG main experiment. The EEG experiment was a modified version of the fMRI As a final step, we applied multivariate noise normalization to improve the
main experiment with adjusted timing parameters and a different task (Fig. 2b, signal-to-noise ratio and reliability of the data (following the recommendation
top). The EEG recording session consisted of 20 runs of 205 s each (that is, in total of Guggenmos et al.73).
68 min). Twenty-three participants completed all 20 runs, while four participants
completed fewer runs due to technical problems (12 runs, 17 runs and 2 × 13 Object location classification from brain measurements. To determine the
runs). Image duration was 0.5 s, with a 0.5 or 0.6 s ISI (equally probable) on regular amount of location information independent of category present in multivariate
trials. Participants were asked to fixate their eyes on the central cross at all times. brain measurements, we applied a common multivariate cross-classification
Catch trials consisted of the presentation of the target object (a glass) at any of scheme8,66–68. In essence, separately for each background condition, we classified
the four locations and on any type of background. Participants were instructed location while assigning data from different object categories to the training and
to respond with a button press to the glass (that is, a detection task), and to testing sets (Supplementary Fig. 1a). All classification analyses relied on binary
blink their eyes to minimize eye blink contamination on regular trials. To avoid c-support vector classification with a linear kernel as implemented in the libsvm
contamination of movement and eye blink artefacts on subsequent trials, the ISI toolbox74 ([Link] Furthermore, all analyses
was 1 s on catch trials. Catch trials were excluded from further analysis. Since this were conducted in a participant-specific manner.
was a repeated-measures design, data collection and analysis were not performed
blind to the conditions of the experiment. Spatially resolved multivariate fMRI analysis. We conducted an ROI-based and
a spatially unbiased volumetric searchlight procedure24,75. For the ROI-based
Pre-processing and univariate fMRI analysis. fMRI acquisition and pre-processing. analysis, for each ROI separately, we extracted and arranged t values into pattern
We acquired MRI data on a 3-T Siemens Tim Trio scanner with a 12-channel vectors for each of the 48 conditions and 10 runs. To increase the signal-to-noise
head coil. We obtained a structural image using a T1-weighted sequence ratio, we randomly binned run-wise pattern vectors into five bins of two runs,
(magnetization-prepared rapid gradient-echo, 1 mm3 voxel size). For the main which were averaged, resulting in five pseudo-run pattern vectors. We then
experiment and the localizer run, we obtained functional images covering the performed five-fold leave-one-pseudo-run-out cross-validation, training on four
entire brain using a T2*-weighted gradient-echo planar sequence (repetition time and testing on one pseudo-trial per classification iteration. In detail, we assigned
2 ms, echo time 30 ms, 70° flip angle, 3 mm3 voxel size, 37 slices, 20% gap, 192 mm four pseudo-trials per location condition of the same category to the training set
field of view, 64 × 64 matrix size, interleaved acquisition). (Supplementary Fig. 1a). We then tested the SVM on one pseudo-trial for each
We pre-processed fMRI data using SPM8 ([Link] of the same two location conditions, but now from a different category, yielding
This involved realignment, coregistration and normalization to the structural per cent classification accuracy (50% chance level) as output. Equivalent SVM
Montreal Neurological Institute template brain. fMRI data from the localizer was training and testing was repeated for all combinations of location and category
smoothed with an 8 mm full-width at half-maximum Gaussian kernel, but the pairs. With four locations that were all classified pairwise once, this resulted in six
main experiment data was left unsmoothed. pairwise location classifications. In addition, each pairwise location classification
was iterated across all possible training and testing combinations of the four
Univariate fMRI analysis. For the main experiment, we modelled the fMRI categories. This yielded an additional 12 iterations per location classification across
responses to the 48 experimental conditions for each run using a general linear training and testing pairs of categories. Therefore, in total 72 (6 × 12) classification
model (GLM). The onsets and durations of each image presentation entered the accuracies were averaged during each of the five-fold cross-validation iterations,
GLM as regressors and were convolved with a haemodynamic response function. resulting in 360 averaged accuracies in total. The result reflects how much
Movement parameters entered the GLM as nuisance regressors. For each of the 48 category-tolerant location information was present for each ROI, participant and
conditions, we converted GLM parameter estimates into t values by contrasting background condition separately.
each parameter estimate against the implicit baseline. This resulted in 48 The searchlight procedure was conceptually equivalent to the ROI-based
condition-specific t value maps per run and participant. analysis with the difference of the selection of voxel patterns entering the analysis.
For the localizer experiment, we modelled the fMRI response to the three For each voxel vi in the 3D t value maps, we defined a sphere with a radius of
experimental conditions, entering block onsets and durations as regressors of four voxels centred around voxel vi. For each condition and run, we extracted
interest and movement parameters as nuisance regressors before convolving and arranged the t values for each voxel of the sphere into pattern vectors.
with the haemodynamic response function. From the resulting three parameter Classification of location across category proceeded as described above. This
estimates, we generated two contrasts. The first contrast served to localize resulted in one average classification accuracy for voxel vi. Iterated across all voxels,
activations in early, mid-level ventral and dorsal visual regions (V1, V2, this yielded a 3D volume of classification accuracies across the brain for each
V3, V4, IPS0, IPS1, IPS2 and SPL) and was defined as objects + scrambled participant and background condition separately.
objects > baseline. The second contrast served to localize activations in
object-selective area LOC and was defined as objects > scrambled objects. In sum, Time-resolved classification of location from EEG data. To determine the timing
this resulted in two t value maps for the localizer run per participant. with which category-independent location information emerges in the brain, we
conducted time-resolved EEG classification68,76. This procedure was conceptually
Definition of ROIs. To identify regions along the ventral and dorsal visual streams, equivalent to the fMRI location classification in that it classified location
we defined ROIs in a two-step procedure. We first defined ROIs using anatomical while assigning data from different categories to the training and testing sets

Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav 807


Articles NaTure Human BehaviOur
and was conducted separately for each background condition and participant EEG–fMRI fusion. To determine the spatiotemporal correspondence between
(Supplementary Fig. 1a). object location representations revealed at particular time points in the EEG signals
For each time point of the epoched EEG data, we extracted 63 EEG channel and localized in particular cortical regions using fMRI, we used representational
activations and arranged them into pattern vectors for each of the 48 conditions similarity analysis-based EEG–fMRI fusion37–39. We focused the analysis on
and 60 raw trials. To increase the signal-to-noise ratio, we randomly assigned raw representations emerging at peak latencies in the EEG and on ventral stream ROIs.
trials into four bins of 15 trials each and averaged them into four pseudo-trials. The rationale for this approach is that time points and ROIs are linked if they
The classification was conducted on those four pseudo-trials. We trained the SVM represent object locations similarly, that is, if their representational geometries
on three pseudo-trials and tested it on the remaining pseudo-trial, yielding per (dissimilarity relations between representations) are comparable.
cent classification accuracy (50% chance level, binary classification) as output. As a measure of (dis-)similarity relations between location representations,
This procedure was repeated 100 times with random assignment of trials to we used the classification results from the multivariate analyses conducted. This
pseudo-trials, and across all combinations of location and all category pairs. As choice assumes that representations for two locations will be classified more easily
for the fMRI classification, in total 72 (6 location pairs × 12 category train–test if they are more dissimilar. In detail, we considered the pairwise classification
pairs) classification accuracies were averaged. With 100 iterations to randomly accuracies between all pairs of locations (six) and all training and testing pairs
assign trials to training and testing bins, this yielded a total of 7,200 classification across categories (six) in both training and testing directions (two), resulting in
accuracies, which were averaged per background condition and participant. The a 72 × 1 RDV. For EEG, we extracted the RDVs for the time points within the
result reflects how much category-tolerant location information was present at confidence intervals around the EEG peak latency, averaged them across time
each time point, participant and background condition separately. points and, following the method employed previously32,77,78, averaged them across
participants, resulting in one EEG RDV per background condition. For fMRI ROIs,
Time-resolved EEG searchlight in sensor space. We conducted an EEG searchlight we extracted the RDVs for each participant and background condition separately.
analysis resolved in time and sensor space (that is, across EEG channels) We compared fMRI and EEG RDVs for representational similarity by
to gain insights into which EEG channels contained the highest amount of correlating (using Spearman’s R) the averaged EEG RDV with the subject-specific
location information and therefore contributed most to the results of the fMRI ROI RDVs, resulting in one correlation per subject, background condition
time-resolved analysis described above. For the EEG searchlight, we conducted and ROI.
the time-resolved EEG classification as described above with the following
difference: For each EEG channel c, we conducted the classification procedure Multivariate classification of category. We conducted a set of spatially resolved
on the five closest channels surrounding c. The classification accuracy was (fMRI: ROI and searchlight), time-resolved and temporally generalized analyses
stored at the position of c. After iterating across all channels and down-sampling (EEG) of object category. The analyses were equivalent to the procedures described
the time points to a 10 ms resolution, this yielded a classification accuracy map above with the crucial difference that the role of the experimental factors location
across all channels and down-sampled time points, for each participant and and category was reversed (Fig. 6a,f).
background condition separately.
Object location classification in DNNs. We investigated whether DNNs
Time generalization analysis of location from EEG data. To determine when object trained on object categorization display a similar pattern of gradually emerging
location representations are similar across background conditions and time, we location representations along their processing hierarchy as we observed in the
used temporal generalization analysis34,38,68,76. human brain.
The procedure was equivalent to the multivariate time-resolved EEG We selected the DNN CORnet-S for investigation, on the basis of its top
location classification analysis but with two crucial differences. First, data from performance in predictivity of neural responses in the ventral stream as quantified
the no-clutter condition were assigned to the training set while data from the on the Brain-Score platform27. CORnet-S is a shallow recurrent DNN consisting of
high-clutter condition were assigned to the testing set (Supplementary Fig. 1b). four computational blocks referred to as areas, analogous to ventral visual areas V1,
The second difference was that the SVM was not only tested on data from the same V2, V4 and IT. Each block consists of four convolutional layers with self-recurrence
time point as that from which the testing data were derived, but additionally on and a skip connection followed by group normalization and a rectified linear unit.
data from each time point from the −100 to 600 ms peri-stimulus time window The response of the final IT block is averaged over the entire receptive field and
(in 10 ms steps). Like previously, training was conducted on three and testing on mapped to categories using a fully connected linear decoder.
one pseudo-trial, resulting in 7,200 classification accuracies (6 location pairs × 12 To investigate the representation of object location in CORnet-S, we performed
category train–test pairs × 100 randomization iterations), which were averaged multivariate pattern analysis analogous to the analysis performed on brain
per time point and participant. This resulted in a two-dimensional matrix of data, classifying object location across category separately for each background
classification accuracies indicating the combination of time points in the no- and condition. For this, we extracted unit activations of the last layer in each block of
high-clutter conditions at which object location representations were similar in the the DNN after running a forward pass of the stimulus material from the 20 runs of
no- and the high-clutter conditions. the EEG experiment.
For the top layer of each block, we arranged the unit activations into
Off-diagonal peak shift in time generalization matrix. To quantify whether pattern vectors for each of the 48 conditions and 60 trials. We then proceeded with
classification accuracies were significantly higher below than on or above the the analysis as done with the EEG data (Supplementary Fig. 1a). We randomly
diagonal, we computed the distance from the post-stimulus classification peak to assigned raw trials into four bins of 15 trials each and averaged them into four
the diagonal for single subjects. For this, we first determined the peak coordinates pseudo-trials. We trained the SVM on three pseudo-trials and tested it on the
(px, py) along the x and y axes. We then computed the coordinates of the point on remaining pseudo-trial. This procedure was repeated 100 times with random
the diagonal that was closest to the peak using assignment of trials to pseudo-trials, and across all combinations of location
(px + py ) and all category pairs before results were averaged. This resulted in one averaged
bx = classification accuracy value per top layer of each CORnet-S block and per
2
background condition. The result reflects how much category-tolerant location
since on the diagonal, bx = by. This allowed us to compute the shortest information was present in CORnet-S.
perpendicular Euclidean distance between the peak and the diagonal as
Statistical testing. Wilcoxon signed-rank test. We performed non-parametric
two-tailed Wilcoxon signed-rank tests to test for above-chance classification

dEuclidean = (px − bx )2 + (py − bx )2 .
accuracy at time points in the EEG time courses, in the EEG time generalization
matrix, for Euclidean distances from peak to diagonal in the time generalization
To be able to later test group distances against zero, we set
matrices, for above-chance classification in the ROI and fusion results and for
dEuclidean = dEuclidean × −1 significant voxels in the fMRI searchlight results. In each case, the null hypothesis
was that the observed parameter (classification accuracy, correlation or Euclidean
for all cases where px < py, which is the case for all peaks above the diagonal. distance) came from a distribution with a median of chance-level performance
(that is, 50% for pairwise classification and zero correlation or Euclidean distance).
Diagonal difference in temporal generalization matrix. To obtain a temporally The resulting P values were corrected for multiple comparisons using false
resolved estimate of the time points at which the classification accuracy was discovery rate (FDR) at 5% level if more than one test was conducted.
higher below than above the diagonal, we subtracted the classification accuracies
above the diagonal from the accuracies below the diagonal. Specifically, we Bootstrap tests. We used bootstrapping to compute confidence intervals and
subtracted each time point from the time point with the equivalent coordinates to determine the significance of peak-to-peak differences in EEG latencies,
mirrored along the diagonal. For example, the time point with coordinates peak-to-peak distances of fMRI searchlight classification peaks and for the distance
300 ms in the no-clutter (y axis) and 100 ms in the high-clutter (x axis) condition from the group-averaged classification peak in the temporal generalization
(above diagonal) was subtracted from the time point with coordinates 100 ms matrix to the diagonal in Figs. 5e and 6g. In each case, we sampled the participant
in the no-clutter (y axis) and 300 ms in the high-clutter (x axis) condition pool 10,000 times with replacement and for each sample calculated the statistic
(below diagonal). of interest.

808 Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav


NaTure Human BehaviOur Articles
For the fMRI searchlight peak distances, we first shuffled condition labels of Received: 26 March 2021; Accepted: 14 January 2022;
two background conditions to then generate a distribution of peak distances under Published online: 24 February 2022
the null hypothesis.
To determine whether peak-to-peak Euclidean distances in searchlight
classification maps were significantly longer than expected independent of References
background, we set P < 0.05. If the computed P value was smaller than this 1. DiCarlo, J. J. & Cox, D. D. Untangling invariant object recognition. Trends
threshold with Bonferroni correction, we rejected the null hypothesis of no Cogn. Sci. 11, 333–341 (2007).
peak-to-peak distance. 2. Ungerleider, L. & Haxby, J. V. ‘What’ and ‘where’ in the human brain.
For the EEG peak-to-peak latency differences, we bootstrapped the latency Curr. Opin. Neurobiol. 4, 157–165 (1994).
difference between two background conditions, yielding an empirical distribution 3. DiCarlo, J. J., Zoccolan, D. & Rust, N. C. How does the brain solve visual
that could be compared with zero. object recognition? Neuron 73, 415–434 (2012).
To determine whether peak-to-peak latencies in the EEG time courses were 4. Milner, A. D. & Goodale, M. A. The Visual Brain in Action (Oxford Univ.
significantly different from zero, we computed the proportion of values that were Press, 2006).
equal to or smaller than zero and corrected them for multiple comparisons using 5. Schwarzlose, R. F., Swisher, J. D., Dang, S. & Kanwisher, N. The distribution
FDR at P = 0.05. To compute 95% confidence intervals for single peak latencies in of category and location information across object-selective regions in human
the EEG time courses, we bootstrapped the peaks for each background condition visual cortex. Proc. Natl Acad. Sci. USA 105, 4447–4452 (2008).
and determined the 95% percentiles of this distribution. 6. Rust, N. C. & DiCarlo, J. J. Selectivity and tolerance (‘invariance’) both
increase as visual information propagates from cortical area V4 to IT.
ANOVAs. We ran sets of ANOVAs to test for main effects and the interaction J. Neurosci. 30, 12978–12995 (2010).
between ROIs along the ventral and dorsal stream and background condition, 7. Baeck, A., Wagemans, J. & Op de Beeck, H. P. The distributed representation
which we detail below. For all reported ANOVAs, we tested whether the of random and meaningful object pairs in human occipitotemporal cortex:
assumption of sphericity had been met using Mauchly’s test. Below, we report the the weighted average as a general rule. Neuroimage 70, 37–47 (2013).
effects for which the assumption of sphericity had been violated and for which the 8. Cichy, R. M. et al. Probing principles of large-scale object representation:
Greenhouse–Geisser estimates of sphericity were used to correct the degrees of category preference and location encoding. Hum. Brain Mapp. 34,
freedom. For all remaining effects, the assumption of sphericity had been met. 1636–1651 (2013).
To test for main effects and the interaction between ROIs along the ventral 9. Golomb, J. D. & Kanwisher, N. Higher level visual cortex represents
stream and background condition, we ran two 5 × 3 repeated-measures ANOVAs retinotopic, not spatiotopic, object location. Cereb. Cortex 22,
with within-subject factors of ROI (V1, V2, V3, V4 and LOC) and background (no, 2794–2810 (2012).
low and high clutter). The first ANOVA tested the results of location classification 10. Wandell, B. A. & Winawer, J. Computational neuroimaging and population
across categories. Mauchly’s test indicated that the assumption of sphericity had receptive fields. Trends Cogn. Sci. 19, 349–357 (2015).
been violated for the main effect of background (P = 0.003). Therefore, the degrees 11. Kravitz, D. J., Saleem, K. S., Baker, C. I. & Mishkin, M. A new
of freedom were corrected using the Greenhouse–Geisser estimates of sphericity neural framework for visuospatial processing. Nat. Rev. Neurosci. 12,
(ε = 0.72). The second ANOVA tested the results of category classification across 217–30 (2011).
locations. Mauchly’s test indicated that the assumption of sphericity had been 12. Zachariou, V. et al. Common dorsal stream substrates for the mapping of
violated for the main effect of ROI (P < 0.001). The degrees of freedom were surface texture to object parts and visual spatial processing. J. Cogn. Neurosci.
corrected using the Greenhouse–Geisser estimates of sphericity (ε = 0.61). 27, 2442–2461 (2015).
To test for main effects and the interaction between ROIs along the dorsal 13. Xu, Y. & Vaziri-Pashkam, M. Examining the coding strength of
stream and background condition, we ran two 7 × 3 repeated-measures ANOVAs object identity and nonidentity features in human occipito-temporal
with within-subject factors of ROI (V1, V2, V3, IPS0, IPS1, IPS2 and SPL) and cortex and convolutional neural networks. J. Neurosci. 41, 4234–4252
background (no, low and high clutter). The first ANOVA tested the results (2021).
of location classification across categories. Mauchly’s test indicated that the 14. Hong, H., Yamins, D. L. K., Majaj, N. J. & DiCarlo, J. J. Explicit information
assumption of sphericity had been violated for the main effect of ROI (P < 0.001) for category-orthogonal object properties increases along the ventral stream.
and for the interaction (P = 0.028). Therefore, the degrees of freedom were Nat. Neurosci. 19, 613–622 (2016).
corrected using the Greenhouse–Geisser estimates of sphericity (ε = 0.53 for the 15. Brewer, A. A., Liu, J., Wade, A. R. & Wandell, B. A. Visual field maps and
main effect of ROI, ε = 0.52 for the interaction). The second ANOVA tested the stimulus selectivity in human ventral occipital cortex. Nat. Neurosci. 8,
results of category classification across locations. Mauchly’s test indicated that the 1102–1109 (2005).
assumption of sphericity had been violated for the interaction (P < 0.001). The 16. Larsson, J. & Heeger, D. J. Two retinotopic visual areas in human lateral
degrees of freedom were corrected using the Greenhouse–Geisser estimates of occipital cortex. J. Neurosci. 26, 13128–13142 (2006).
sphericity (ε = 0.59). 17. Groen, I. I. A., Silson, E. H. & Baker, C. I. Contributions of low- and
To test for main effects and the interaction in the results of the EEG–fMRI high-level properties to neural processing of visual scenes in the human
fusion, we ran a 5 × 2 repeated-measures ANOVA with factors of ROI (V1, V2, V3, brain. Philos. Trans. R. Soc. B 372, 20160102 (2017).
V4 and LOC) and clutter (no, high). The assumption of sphericity had been met 18. Grill-Spector, K., Kourtzi, Z. & Kanwisher, N. The lateral occipital complex
for all main and interaction effects. and its role in object recognition. Vis. Res. 41, 1409–1422 (2001).
All post hoc tests were conducted using pairwise t tests, and P values were 19. Malach, R., Levy, I. & Hasson, U. The topography of high-order human
corrected for multiple comparisons using Tukey correction. object areas. Trends Cogn. Sci. 6, 176–184 (2002).
20. Levy, I., Hasson, U., Avidan, G., Hendler, T. & Malach, R. Center–periphery
Effect sizes. For the main and interaction effects of the ANOVAs, we computed the organization of human object areas. Nat. Neurosci. 4, 533–539 (2001).
partial η2 using 21. Sonkusare, S., Breakspear, M. & Guo, C. Naturalistic stimuli in neuroscience:
critically acclaimed. Trends Cogn. Sci. 23, 699–714 (2019).
Sum of squares (SS)Effect 22. Henderson, J. M. & Hollingworth, A. High-level scene perception. Annu. Rev.
Partial η2 = Psychol. 50, 243–271 (1999).
SSEffect + SSResidual
23. Malach, R. et al. Object-related activity revealed by functional magnetic
and the effect size estimate r (ref. 79) for the off-diagonal peak shifts across subjects, resonance imaging in human occipital cortex. Proc. Natl Acad. Sci. USA 92,
as tested with the Wilcoxon signed-rank test, using 8135–8139 (1995).
24. Kriegeskorte, N., Goebel, R. & Bandettini, P. Information-based functional
Z brain mapping. Proc. Natl Acad. Sci. USA 103, 3863–3868 (2006).
r= √ .
N 25. Yamins, D. L. K. & DiCarlo, J. J. Using goal-driven deep learning models to
understand sensory cortex. Nat. Neurosci. 19, 356–365 (2016).
Reporting Summary. Further information on research design is available in the 26. Kubilius, J. et al. in Advances in Neural Information Processing Systems
Nature Research Reporting Summary linked to this article. (eds. Wallach, H. et al.) 32, 12805–12816 (Curran Associates, 2019).
27. Schrimpf, M. et al. Integrative benchmarking to advance neurally mechanistic
models of human intelligence. Neuron 108, 413–423 (2020).
Data availability 28. Kriegeskorte, N. & Douglas, P. K. Cognitive computational neuroscience.
The experimental stimuli, fMRI data, EEG data and the neural network activations Nat. Neurosci. 21, 1148–1160 (2018).
are publicly available via [Link] 29. Yamins, D. L. K. et al. Performance-optimized hierarchical models predict
7fc0548bf659. neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111,
8619–8624 (2014).
Code availability 30. Güçlü, U. & van Gerven, M. A. J. Deep neural networks reveal a gradient in
Analysis code is publicly available via [Link] the complexity of neural representations across the ventral stream. J. Neurosci.
ObjectLocationRepresentations. 35, 10005–10014 (2015).

Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav 809


Articles NaTure Human BehaviOur
31. Cichy, R. M. & Kaiser, D. Deep neural networks as scientific models. Trends 62. Kietzmann, T. C. et al. Recurrence is required to capture the representational
Cogn. Sci. 23, 305–317 (2019). dynamics of the human visual system. Proc. Natl Acad. Sci. USA 116,
32. Cichy, R. M., Pantazis, D. & Oliva, A. Similarity-based fusion of MEG and 21854–21863 (2019).
fMRI reveals spatio-temporal dynamics in human cortex during visual object 63. Eger, E., Kell, C. A. & Kleinschmidt, A. Graded size sensitivity of
recognition. Cereb. Cortex 26, 1–17 (2016). object-exemplar-evoked activity patterns within human LOC subregions. J.
33. Güçlü, U. & van Gerven, M. A. J. Increasingly complex representations of Neurophysiol. 100, 2038–2047 (2008).
natural movies across the dorsal stream are shared between subjects. 64. Eurich, C. W. & Schwegler, H. Coarse coding: calculation of the resolution
Neuroimage 145, 329–336 (2017). achieved by a population of large receptive field neurons. Biol. Cybern. 76,
34. King, J. R. & Dehaene, S. Characterizing the dynamics of mental 357–363 (1997).
representations: the temporal generalization method. Trends Cogn. Sci. 18, 65. Spirkovska, L. & Reid, M. B. Coarse-coded higher-order neural networks for
203–210 (2014). PSRI object recognition. IEE Trans. Neural Netw. 4, 276–283 (1993).
35. Spoerer, C. J., McClure, P. & Kriegeskorte, N. Recurrent convolutional neural 66. Cichy, R. M., Chen, Y. & Haynes, J. D. Encoding the identity and location of
networks: a better model of biological object recognition. Front. Psychol. 8, objects in human LOC. Neuroimage 54, 2297–2307 (2011).
1551 (2017). 67. Carlson, T., Hogendoorn, H., Fonteijn, H. & Verstraten, F. A. J. Spatial coding
36. Spoerer, C. J., Kietzmann, T. C., Mehrer, J., Charest, I. & Kriegeskorte, N. and invariance in object-selective cortex. Cortex 47, 14–22 (2011).
Recurrent neural networks can explain flexible trading of speed and accuracy 68. Isik, L., Meyers, E. M., Leibo, J. Z. & Poggio, T. The dynamics of invariant
in biological vision. PLoS Comput. Biol. 16, e1008215 (2020). object recognition in the human visual system. J. Neurophysiol. 111,
37. Cichy, R. M. & Oliva, A. A M/EEG-fMRI fusion primer: resolving human 91–102 (2014).
brain responses in space and time. Neuron 107, 772–781 (2020). 69. Park, S., Konkle, T. & Oliva, A. Parametric coding of the size and clutter of
38. Cichy, R. M., Pantazis, D. & Oliva, A. Resolving human object recognition in natural scenes in the human brain. Cereb. Cortex 25, 1792–1805 (2015).
space and time. Nat. Neurosci. 17, 455–462 (2014). 70. Wang, L., Mruczek, R. E. B., Arcaro, M. J. & Kastner, S. Probabilistic maps of
39. Kriegeskorte, N., Mur, M. & Bandettini, P. Representational similarity visual topography in human cortex. Cereb. Cortex 25, 3911–3931 (2015).
analysis – connecting the branches of systems neuroscience. Front. Syst. 71. Delorme, A. & Makeig, S. EEGLAB: an open source toolbox for analysis of
Neurosci. 2, 4 (2008). single-trial EEG dynamics including independent component analysis.
40. Kaiser, D., Quek, G. L., Cichy, R. M. & Peelen, M. V. Object vision in a J. Neurosci. Methods 134, 9–21 (2004).
structured world. Trends Cogn. Sci. 23, 672–685 (2019). 72. Chaumon, M., Bishop, D. V. M. & Busch, N. A. A practical guide to the
41. Võ, M. L. H., Boettcher, S. E. & Draschkow, D. Reading scenes: how scene selection of independent components of the electroencephalogram for artifact
grammar guides attention and aids perception in real-world environments. correction. J. Neurosci. Methods 250, 47–63 (2015).
Curr. Opin. Psychol. 29, 205–210 (2019). 73. Guggenmos, M., Sterzer, P. & Cichy, R. M. Multivariate pattern analysis
42. Biederman, I., Mezzanotte, R. J. & Rabinowitz, J. C. Scene perception: for MEG: a comparison of dissimilarity measures. Neuroimage 173,
detecting and judging objects undergoing relational violations. Cogn. Psychol. 434–447 (2018).
14, 143–177 (1982). 74. Chang, C.-C. & Lin, C.-J. Libsvm: a library for support vector machines.
43. Brandman, T. & Peelen, M. V. Interaction between scene and object ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011).
processing revealed by human fMRI and MEG decoding. J. Neurosci. 37, 75. Haynes, J. D. et al. Reading hidden intentions in the human brain. Curr. Biol.
7700–7710 (2017). 17, 323–328 (2007).
44. Tang, H. et al. Spatiotemporal dynamics underlying object completion in 76. Carlson, T. A., Hogendoorn, H., Kanai, R., Mesik, J. & Turret, J. High
human ventral visual cortex. Neuron 83, 736–748 (2014). temporal resolution decoding of object position and category. J. Vis. 11,
45. Kar, K., Kubilius, J., Schmidt, K., Issa, E. B. & DiCarlo, J. J. Evidence that 1–17 (2011).
recurrent circuits are critical to the ventral stream’s execution of core object 77. Mohsenzadeh, Y., Qin, S., Cichy, R. M. & Pantazis, D. Ultra-rapid serial visual
recognition behavior. Nat. Neurosci. 22, 974–983 (2019). presentation reveals dynamics of feedforward and feedback processes in the
46. Rajaei, K., Mohsenzadeh, Y., Ebrahimpour, R. & Khaligh-Razavi, S.-M. ventral visual pathway. eLife 7, 1–23 (2018).
Beyond core object recognition: recurrent processes account for object 78. Cichy, R. M. & Teng, S. Resolving the neural dynamics of visual and auditory
recognition under occlusion. PLOS Comput. Biol. 15, e1007001 (2019). scene processing in the human brain: a methodological approach. Philos.
47. Lamme, V. A. F. & Roelfsema, P. R. The distinct modes of vision offered by Trans. R. Soc. B 372, 1714 (2017).
feedforward and recurrent processing. Trends Neurosci. 23, 571–579 (2000). 79. Rosenthal, R. Meta-analytic Procedures for Social Research (Sage, 1991).
48. Groen, I. I. A. et al. Scene complexity modulates degree of feedback
activity during object detection in natural scenes. PLoS Comput. Biol. 14, Acknowledgements
e1006690 (2018). We thank D. Kaiser for comments and support. We thank S. Shrestha for
49. Seijdel, N., Tsakmakidis, N., De Haan, E. H. F., Bohte, S. M. & Scholte, H. S. helpful conversations on the math. Computing resources were provided by the
Depth in convolutional neural networks solves scene segmentation. PLoS high-performance computing facilities at ZEDAT, Freie Universität Berlin. EEG and
Comput. Biol. 16, e1008022 (2020). fMRI data were acquired at the Center for Cognitive Neuroscience (CCNB), Freie
50. Kriegeskorte, N. et al. Matching categorical object representations in inferior Universität Berlin, Berlin. The study was supported by the German Research Council
temporal cortex of man and monkey. Neuron 60, 1126–1141 (2008). (DFG) (CI241/1-1, CI241/3-1, R.M.C. and M.G.), by the European Research Council
51. Williams, M. A., Dang, S. & Kanwisher, N. G. Only some spatial patterns (ERC-StG-2018-803370, R.M.C.) and by the Alfons and Gertrud Kassel Foundation
of fMRI response are read out in task performance. Nat. Neurosci. 10, (G.R. and K.D.). The funders had no role in study design, data collection and analysis,
685–686 (2007). decision to publish or preparation of the manuscript.
52. Grootswagers, T., Cichy, R. M. & Carlson, T. A. Finding decodable information
that can be read out in behaviour. Neuroimage 179, 252–262 (2018).
53. de-Wit, L., Alexander, D., Ekroll, V. & Wagemans, J. Is neuroimaging Author contributions
measuring information in the brain? Psychon. Bull. Rev. 23, 1415–1428 (2016). M.G., C.C. and R.M.C. designed research. M.G. and C.C. performed experiments.
54. Milner, A. D. et al. Perception and action in ‘visual form agnosia’. Brain 114, M.G. performed data analyses. K.D. performed computational modelling. M.G., K.D.
405–428 (1991). and R.M.C. wrote the manuscript. G.R. acquired funding.
55. James, T. W., Culham, J., Humphrey, G. K., Milner, A. D. & Goodale, M. A.
Ventral occipital lesions impair object recognition but not object-directed Funding
grasping: an fMRI study. Brain 126, 2463–2475 (2003). Open access funding provided by Freie Universität Berlin.
56. Goodale, M. A. & Milner, A. D. Separate visual pathways for perception and
action. Essent. Sources Sci. Stud Consciousness 15, 20–25 (1992).
57. De Renzi, E. & Lucchelli, F. The fuzzy boundaries of apperceptive agnosia. Competing interests
Cortex 29, 187–215 (1993). The authors declare no competing interests.
58. Riddoch, M. J. & Humphreys, G. W. A case of integrative visual agnosia.
Brain 110, 1431–1462 (1987). Additional information
59. Sayres, R. & Grill-Spector, K. Relating retinotopic and object-selective Supplementary information The online version contains supplementary material
responses in human lateral occipital cortex. J. Neurophysiol. 100, available at [Link]
249–267 (2008).
60. Alvarez, I., de Haas, B., Clark, C. A., Rees, G. & Samuel Schwarzkopf, D. Correspondence and requests for materials should be addressed to Monika Graumann
Comparing different stimulus configurations for population receptive field or Radoslaw M. Cichy.
mapping in human fMRI. Front. Hum. Neurosci. 9, 1–16 (2015). Peer review information Nature Human Behaviour thanks Talia Brandman, Christian
61. Felleman, D. & Van Essen, D. C. Distributed hierarchical processing in the Olivers and the other, anonymous, reviewer(s) for their contribution to the peer review
primate cerebral cortex. Cereb. Cortex 1, 1–47 (1991). of this work.

810 Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav


NaTure Human BehaviOur Articles
Reprints and permissions information is available at [Link]/reprints. the Creative Commons license, and indicate if changes were made. The images or other
third party material in this article are included in the article’s Creative Commons license,
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
unless indicated otherwise in a credit line to the material. If material is not included in
published maps and institutional affiliations.
the article’s Creative Commons license and your intended use is not permitted by statu-
Open Access This article is licensed under a Creative Commons tory regulation or exceeds the permitted use, you will need to obtain permission directly
Attribution 4.0 International License, which permits use, sharing, adap- from the copyright holder. To view a copy of this license, visit [Link]
tation, distribution and reproduction in any medium or format, as long org/licenses/by/4.0/.
as you give appropriate credit to the original author(s) and the source, provide a link to © The Author(s) 2022

Nature Human Behaviour | VOL 6 | June 2022 | 796–811 | [Link]/nathumbehav 811


nature research | reporting summary
Monika Graumann
Corresponding author(s): Radoslaw Martin Cichy
Last updated by author(s): Dec 17, 2021

Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient)
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.

Software and code


Policy information about availability of computer code
Data collection The data was collected using Matlab and the experimental paradigms were presented using the Psychophysics Toolbox Version 3.0.12
(PTB-3).

Data analysis For the data preprocessing and analysis we used the following software: MATLAB R2018b, EEGLAB toolbox (version 14), SASICA plugin for
EEGLAB, LIBSVM-3.11, SPM8 toolbox, CoSMoMVPA toolbox.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
April 2020

- A list of figures that have associated raw data


- A description of any restrictions on data availability

The experimental stimuli used in this study, the fMRI and EEG data as well as neural network activations are publicly available via [Link]
view_only=21a714db58584ffeb2837fc0548bf659.

1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see [Link]/documents/[Link]

Behavioural & social sciences study design


All studies must disclose on these points even when the disclosure is negative.
Study description In this study we recorded quantitative data separately from two experiments. 1) 3 Tesla functional magnetic resonance imaging
(fMRI) data to acquire human brain activity data with high spatial resolution. 2) Electroencephalography (EEG) data to acquire human
brain activity data with high temporal resolution. In both experiments, participants were performing a visual task while we recorded
data.

Research sample 29 participants participated in the EEG experiment of which two were excluded because of equipment failure (N=27, mean age 26.8
years, SD=4.3, 22 female). 25 participants (mean age 28.8, SD=4.0, 17 female) completed the fMRI experiment. The participant pools
of the experiments did not overlap except for two participants. All participants provided informed consent prior to the studies and
received a monetary reward or course credit for their participation.

Sampling strategy Participants were selected according to the following requirements: 18-40 years old, with normal or corrected-to-normal vision,
fulfillment of the MR security criteria (no implants or metal parts, tattoos, non-removable piercing, claustrophobia, pregnancy,
neurological disorders, etc.).
Sample size was chosen to exceed comparable M/EEG and fMRI classification studies to enhance power.

Data collection During both experiments, participants' responses were recorded with a computer, while the ongoing brain activity during the task
was recorded using the 3T fMRI scanner (experiment 1) and the EEG (experiment 2). No one was present in the room together with
the participants during the experiments. Blinding to the experimental conditions or the study hypothesis was not possible, but data
was analyzed using a single pipeline for all subjects.

Timing 1) fMRI experiment: the data collection started February 2019 and ended in March 2019. 2) EEG experiment: the data collection
started in May 2017 and ended in November 2017, with a short gap from July to September 2017 for data analysis.

Data exclusions 1) No participants were excluded in the fMRI experiment. 2) Two participants were excluded in the EEG experiment because of
equipment failure.

Non-participation No participants declined participation or dropped out.

Randomization Participants were not allocated into experimental groups.

Reporting for specific materials, systems and methods


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material,
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Materials & experimental systems Methods


n/a Involved in the study n/a Involved in the study
Antibodies ChIP-seq
Eukaryotic cell lines Flow cytometry
Palaeontology and archaeology MRI-based neuroimaging
Animals and other organisms
Human research participants
Clinical data
Dual use research of concern
April 2020

Human research participants


Policy information about studies involving human research participants
Population characteristics See above.

Recruitment Participants were recruited using the mailing lists for study participation of the psychology program, of the cognitive

2
Recruitment neuroscience program and of the medical studies program from the following Berlin universities: Freie Universität Berlin,
Humboldt Universität zu Berlin, Charite.

nature research | reporting summary


Ethics oversight The study was approved by the ethics committee of the Department of Education and Psychology of the Freie Universität
Berlin, Germany.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Magnetic resonance imaging


Experimental design
Design type Event-related fMRI design.

Design specifications Each participant completed one fMRI recording session consisting of 10 runs (run duration: 552 s), resulting in 92
minutes of fMRI recording of the main experiment. During each run, each of the 144 images of the stimulus set was
shown once (regular trials). Image duration was 0.5 s, with a 2.5 s inter-stimulus-interval (ISI). Regular trials were
interspersed every 3rd to 5th trial (equally probable, in total 36 per run) with catch trials. Catch trials repeated the
image shown on the previous trial. Participants were instructed to respond with a button press to these repetitions (i.e.
a one-back task).

Behavioral performance measures Button presses and response times were recorded for each subject during the experiment. Responses were recorded to
ensure that participants were directing their attention towards the stimuli. Response trials were excluded from analysis.

Acquisition
Imaging type(s) functional and structural MRI

Field strength 3 Tesla

Sequence & imaging parameters We acquired functional images covering the entire brain using a T2*-weighted gradient-echo planar sequence (TR=2,
TE=30 ms, 70° flip angle, 3-mm3 voxel size, 37 slices, 20% gap, 192-mm field of view, 64 × 64 matrix size, interleaved
acquisition).

Area of acquisition Whole brain.

Diffusion MRI Used Not used

Preprocessing
Preprocessing software We preprocessed fMRI data using SPM8. This involved realignment, coregistration and normalization to the structural MNI
template brain. FMRI data from the localizer was smoothed with an 8 mm FWHM Gaussian kernel, but the main experiment
data was left unsmoothed.

Normalization The normalization method applied on all functional brain data was non-linear. We entered the subject specific T1 structural
image as source image and the MNI standard T1 provided in the SPM8 toolbox as template image.

Normalization template We used the T1 template in MNI space provided in the SPM8 toolbox.

Noise and artifact removal To remove movement artifacts from the fMRI time-series, we realigned the functional brain images in SPM8 using default
parameters. In the GLM, movement parameters were entered as nuisance regressors. We applied no artifact removal for
heart rate and respiration.

Volume censoring Was not applied.

Statistical modeling & inference


Model type and settings We performed multivariate pattern analysis on the brain activity data. Specifically, we trained and tested support-vector
machines on the individual participants' data and performed a statistical analysis on classification results.

Effect(s) tested Whole-brain: for all voxels, we tested whether classification accuracies significantly exceeded chance level. This was done
separately for three background conditions (no, low and high background clutter).
ROI: using a repeated-measures ANOVA with a 5×3 design, we tested for the interaction between 5 regions-of-interest in the
ventral stream (V1, V2, V3, V4, LOC) and 3 background conditions (no, low and high cluttered backgrounds).
April 2020

Another repeated measures ANOVA with 7 ×3 design tested the interaction between 7 regions-of-interest in the dorsal
stream (V1,V2,V3,IPS0,IPS1,IPS2,SPL) and 3 background conditions (no, low and high cluttered backgrounds).
When the assumption of sphericity was violated, the degrees of freedom were corrected using the Greenhouse-Geisser
estimates of sphericity.

Specify type of analysis: Whole brain ROI-based Both

3
We first defined ROIs in early visual cortex (V1, V2, V3), in the ventral stream (V4, LOC) and in the dorsal

nature research | reporting summary


stream (IPS0,IPS1,IPS2,SPL) using anatomical masks from a probabilistic atlas (Wang et al., 2015) for both
hemispheres combined. To avoid overlap between the ROI masks we removed all overlapping voxels. In a
Anatomical location(s)
second step we selected the 325 most activated voxels in the participant-specific localizer results, using
the objects > scrambled contrast for LOC and the objects & scrambled objects > baseline contrast for the
remaining ROIs. This yielded participant-specific ROI definitions.

Statistic type for inference We tested whether classification accuracies significantly exceeded chance-level. This was done per ROI and in the the whole-
(See Eklund et al. 2016) brain searchlight it was done voxel-wise. In both cases we tested this with non-parametric, two-tailed Wilcoxon signed rank
tests. In each case the null hypothesis was that the observed classification accuracies came from a distribution with a median
of chance level performance (i.e., 50% for pairwise classification).

Correction The P-values resulting from the Wilcoxon signed rank tests were corrected for multiple comparisons using false discovery
rate at 5% level under the assumption of independent or positively correlated tests.

Models & analysis


n/a Involved in the study
Functional and/or effective connectivity
Graph analysis
Multivariate modeling or predictive analysis

Multivariate modeling and predictive analysis For the ROI-based analysis, for each ROI separately we extracted and arranged t-values into pattern vectors
for each of the 48 conditions and 10 runs. To increase the SNR, we randomly binned run-wise pattern
vectors into five bins of two runs which were averaged, resulting in five pseudo-run pattern vectors. We then
performed 5-fold leave-one-pseudo-run-out-cross validation. In detail, we assigned four pseudo-trials per
location condition of the same category to the training set. We then tested the SVM on one pseudo-trial for
each of the same two location conditions, but now from a different category yielding percent classification
accuracy (50% chance level) as output. Equivalent SVM training and testing was repeated for all
combinations of location and category pairs before results were averaged. The result reflects how much
category-tolerant location information was present for each ROI, participant and background condition
separately.

The searchlight procedure was conceptually equivalent to the ROI-based analysis with the difference of the
selection of voxel patterns entering the analysis.

April 2020

You might also like