0% found this document useful (0 votes)

92 views13 pages

Pedestrian Detection: Domain Generalization, CNNS, Transformers and Beyond

The document discusses challenges with current pedestrian detection methods generalizing across datasets. It attributes this to methods being tailored to target datasets and training datasets lacking diversity. It proposes using web-crawled data for pre-training and a progressive fine-tuning strategy to improve generalization.

Uploaded by

huangmjbuaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views13 pages

Pedestrian Detection: Domain Generalization, CNNS, Transformers and Beyond

Uploaded by

huangmjbuaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1

Pedestrian Detection: Domain Generalization,

CNNs, Transformers and Beyond
Irtiza Hasan, Shengcai Liao, Senior Member, IEEE, Jinpeng Li, Saad Ullah Akram,
and Ling Shao, Fellow, IEEE,

Abstract—Pedestrian detection is the cornerstone of many vision based applications, starting from object tracking to video
surveillance and more recently, autonomous driving. With the rapid development of deep learning in object detection, pedestrian
detection has achieved very good performance in traditional single-dataset training and evaluation setting. However, in this study on
generalizable pedestrian detectors, we show that, current pedestrian detectors poorly handle even small domain shifts in cross-dataset
evaluation. We attribute the limited generalization to two main factors, the method and the current sources of data. Regarding the
arXiv:2201.03176v2 [[Link]] 2 Mar 2022

method, we illustrate that biasness present in the design choices (e.g anchor settings) of current pedestrian detectors are the main
contributing factor to the limited generalization. Most modern pedestrian detectors are tailored towards target dataset, where they do
achieve high performance in traditional single training and testing pipeline, but suffer a degrade in performance when evaluated
through cross-dataset evaluation. Consequently, a general object detector performs better in cross-dataset evaluation compared with
state of the art pedestrian detectors, due to its generic design. As for the data, we show that the autonomous driving benchmarks are
monotonous in nature, that is, they are not diverse in scenarios and dense in pedestrians. Therefore, benchmarks curated by crawling
the web (which contain diverse and dense scenarios), are an efficient source of pre-training for providing a more robust representation.
Accordingly, we propose a progressive fine-tuning strategy which improves generalization. Additionally, this work also investigate the
recent Transformer Networks as backbones to test generalization. We demonstrate that as of now, CNNs outperform transformer
networks in terms of generalization and absorbing large scale datasets for learning robust representation. In conclusion, this paper
suggests a paradigm shift towards cross-dataset evaluation, for the next generation of pedestrian detectors. Code and models can be
accessed at [Link]

Index Terms—Pedestrian detection, Object detection, Generilizable pedestrian detection, Autonomous driving, Surveillance

1 I NTRODUCTION of robustness limits the full potential of these methods as most

applications utilizing pedestrian detectors require very low failure
Pedestrian detection is a critical component of many real-world rate.
applications, such as, autonomous driving [9], [18], robotic navi- So far, robustness and generalizability, despite being vital for
gation, and video surveillance [17]. It is one of the most heavily most applications, have not been the main focus in the pedes-
researched computer vision problems and has seen huge improve- trian detection research. Performance characteristics of pedestrian
ment over the years. In last decade, like most other computer detectors for the same source and target dataset have previously
vision problems, deep learning based techniques have enabled sig- been investigated, but poor cross-dataset performance has not been
nificant progress for the pedestrian detection. Currently, on certain thoroughly investigated or discussed. In this paper, we hypothesize
benchmarks, the automated methods are approaching the human that the poor cross-dataset performance is primarily due to over-
performance as shown in Fig. 1 left. However, current methods fitting caused by the fact that the current state-of-the-art pedestrian
are not very robust and their performance varies significantly detectors are heavily tailored for target datasets, which biases their
across the datasets as can be seen in Table 7. Some widely-used overall design towards the target datasets.
pedestrian detection methods suffer from over-fitting to source In addition, we hypothesize that the training datasets are
datasets, especially in the case of autonomous driving, as shown generally not dense in pedestrians, lack diversity of environments
in Fig. 1 right. The performance of these detectors on a target and scenerios, and have relatively small number of pedestrians,
dataset is considerably worse, even when trained on a relatively which also limits the performance of current pedestrian detec-
large source dataset, without any obvious domain shift. This lack tors. Almost all state-of-the-art pedestrian detection methods are
based on deep learning and their performance depend heavily
• Irtiza Hasan is with Group 42, UAE. Part of this work was done while he on the quantity and quality of data, and there is evidence that
was a researcher at Inception Institute of Artificial Intelligence. the performance on some other computer vision tasks (e.g. im-
[Link]@[Link]
• Shengcai Liao is with Inception Institute of artificial Intelligence, UAE. He
age classification, object detection, image segmentation) keeps
is also the corresponding author. improving at least up-to billions of samples [33]. All current
[Link]@[Link] pedestrian detection datasets for autonomous driving have at least
• Jinpeng Li is with Department of Computer Science and Engineering, The three main limitations. Firstly, they contain limited number of
Chinese University of Hong Kong.
• Saad Ullah Akram is with Aalto Univeristy, Finland. unique pedestrians. Secondly, they have low pedestrian density,
• Ling Shao is with National Center for Artificial Intelligence, Riyadh, Saudi i.e. relatively small proportion of pedestrians have challenging
Arabia. occlusions. The performance of current method is significantly
worse for these challenging occluded pedestrians as can be seen
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2

Fig. 1: Left: Pedestrian detection performance over the years for Caltech, CityPersons and EuroCityPersons on the reasonable subset.
EuroCityPersons was released in 2018 but we include results of few older models on it as well. Dotted line marks the human performance
on Caltech. Right: We show comparison between traditional single-dataset train and test evaluation on Caltech [13] vs. cross-dataset
evaluation for three pedestrian detectors and one general object detector (Cascade R-CNN). Methods enclosed with bounding boxes are
trained on CityPersons [52] and evaluated on Caltech [13], while others are trained on Caltech.

in Table 4. Thirdly, these datasets have limited diversity as they based backbones such as MobileNet [21] Finally, we also compare
are captured during small number of sessions by a small team, the generaliztaion ability of CNNs against the recent transformer
primarily for dataset creation. Thesedays, dashcam videos are network (Swin-Transformer) [30]. We illustrate that, despite supe-
widely available online, e.g. youtube, facebook, etc, enabling the rior performance of Swin Transformer [30], it struggles when the
potential curation of much more diverse and realistic datasets. domain shift is large, in comparison with CNNs. To the best of
In recent years, few large and diverse person detection datasets, our knowledge, this is the first study to objectively illustrate this.
e.g. CrowdHuman [39], WiderPerson [54] and Wider Pedestrian The paper is organized in the following way. Section 2 re-
[31], have been created using images and videos available online views the relevant literature. We introduce datasets and evaluation
and through surveillance cameras. These datasets advance the protocol in Sec. 3. We benchmark our baseline in Sec. 4. We
general person detection research significantly but are not the most test the generalization capabilities of the pedestrian specific and
suitable datasets for pedestrian detection, as they contain people general object detectors in Sec. 5, along with qualitative results.
in a lot more diverse scenarios, than are relevant for autonomous Subsequently, we compare CNNs with Transformer networks in
driving. Nevertheless, they are still beneficial for learning a more Sec. 6. We also discuss effect of fine-tuning on the target set in
general and robust model of pedestrians as they contain more Sec. 7. Finally, conclude the paper in Section 8.
people per image, and they are likely to contain more human
poses, appearances and occlusion scenarios, which is beneficial 2 R ELATED W ORK
for autonomous driving scenarios, provided current pedestrian Pedestrian detection. Prior to CNNs, the pioneering work of
detectors have the capacity to digest these large-scale data. Viola and Jones [43] which slid windows over all scales and loca-
In this paper, we investigate the performance characteristics tions motivated many pedestrian detection methods. To better de-
of current pedestrian detection methods in cross-dataset setting. scribe the features of pedestrians, Histogram of Oriented Gradients
We show that 1) the existing methods fare poorly compared to (HOG) was presented in the work of Dalal and Triggs [11]. The
general object detectors, without any adaptations, when provided aggregate channel feature (ACF) leveraged features in extended
with larger and more diverse datasets, and 2) when carefully channels to improve the speed of pedestrian detection [12]. In
trained, the state-of-the-art general object detectors, without any similar ways, pedestrian detectors in [52], [36] employed spatial
pedestrian-specific adaptation on the target data, can significantly pooling with low-level features and filtered channel features,
out-perform pedestrian-specific detection methods on pedestrian respectively. Nonetheless, their performance and generalization
detection task (see Fig. 1 right). In addition, we propose a ability were still limited by the hand-crafted features.
progressive training pipeline for better utilization of general person As the great progress of Convolutional Neural Networks
datasets for improving the pedestrian detection performance in (CNNS), they dominated the research field of generic object
autonomous driving scenarios. We show that by progressively detection [38], [19], [42], [26] and considerably improved the
fine-tuning the models, from the dataset furthest from the target accuracy. The pedestrian detectors [1], [20], [7] also benefit from
domain to the dataset closest to the target domain, large gains in this powerful paradigm. R-CNN detector [16] was utilized in some
performance can be achieved in terms of M R−2 on reasonable of the pioneering efforts for pedestrian detection using CNNs
subset of Caltech (3.7%) and CityPerson (1.5%) without fine- [20], [51] and is still widely employed in this research field.
tuning on the target domain. These improvement hold true for RPN+BF [50] combined Region Proposal Network and boosted
models from all pedestrian detection families that we tested such forest to enhance the performance of pedestrian detection, which
as Cascade R-CNN [8], Faster RCNN [38] and embedded vision overcame the problems of poor resolution and imbalanced classes
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 3

in Faster RCNN [38]. Although the performance of RPN+BF was TABLE 1: Experimental settings.
outstanding, its learning ability was limited by the non-end-to- Setting Height Visibility
end trainable architecture. Since the strong performance and high Reasonable [50, inf] [0.65, inf]
expandability of Faster RCNN [38] , it inspired a broad spectrum Small [50, 75] [0.65, inf]
of pedestrian detectors [55], [52], [6], [5], [34]. Heavy [50, inf] [0.2, 0.65]
Heavy* [50, inf] [0.0, 0.65]
Some pedestrian detection methods designed more sophis- All [20, inf] [0.2, inf]
ticated architectures and leveraged extra information to further
boost the detection performance. ALF [28] employed several
progressive detection heads on Single Shot MultiBox Detector autonomous driving to evaluate and compare with the state-of-
(SSD) [26] to gradually refine the initial anchors, which inher- the-art pedestrian detection algorithms. These three benchmarks
ited the merit of high efficiency in single stage detectors and include Caltech [13], CityPersons [52] and EuroCity Persons [4],
further improved the detection accuracy. Inspired by the blob and are categorized into the autonomous driving datasets in this
detection, CSP [29] reformulated the pedestrian detection task work. Caltech [13] is one of the most popular dataset in the
into an anchor-free manner which only needed to locate the center research field of pedestrian detection. It recorded 10 hours of
points and regress the scales of pedestrians without relying on the video in Los Angeles, USA by a front-view camera of vehicle,
complicated settings of anchor boxes. To improve the detection and contained roughly 43K images and 13K persons extracted
performance on occluded pedestrians, extra information of the from the video. We perform the evaluations on the refined Caltech
visible-area bounding-box was utilized as the guidance of attention annotations from [51]. Compared to Caltech, CityPersons [52] is
mask in MGAN [37]. built upon the dataset of Cityscapes, and contains more diverse
Pedestrian detection benchmarks. Due to the large practical scenarios. It was recorded from the street scenes of different
value of pedestrian detection, a lot of works were devoted to create cities in and close to Germany. CityPersons contains 2,975, 500,
benchmarks to promote the development of pedestrian detection, 1,575 images in the training, validation and testing sets, and
such as Daimler-DB [35], TownCenter [2], USC [47], INRIA [11], provides the full bounding-boxes and visible bounding-boxes for
ETH [14],and TUDBrussels [46], which were all from surveil- 31k pedestrians. EuroCity Persons (ECP) [4] is a recently re-
lance scenarios and not suitable for applications in autonomous leased larege-scale dataset recoded in 31 different European cities,
driving. Recently, the great progress of pedestrian detection also which contains more diverse scenarios and is more challenging
attracted the attention of autopilot systems, and several datasets for pedestrian detectors compared to the datasets of Caltech and
were created for this context, such as Caltech [13], KITTI [15], CityPersons. It contains two subsets of ECP day-time and ECP
CityPersons [52] and ECP [4]. The cameras in these datasets were night-time based on the recording time. ECP dataset has roughly
typically installed on the front windshield of cars to collect images 200K bounding-boxes. Similar to the evaluation procedure in
from the similar field of views as human drivers. Caltech [13] ECP [4], we conduct the experiments on the subset of ECP day-
and CityPersons [52] are the most popular benchmarks for recent time for the fair comparisons with existing literature. Unless other
learning-based pedestrian detectors, while their small data sizes statement, all experimental results are from the validation sets
and monotonous scenarios limit their capabilities in training more due to the frequent submissions to online testing server is not
robust methods. To solve these limitations, ECP [4] dataset col- allowed. Except to the autonomous driving datasets of Caltech,
lected images from diverse scenarios including various cities, all CityPersons and ECP, we further conduct the experiments on
seasons, and day and night times, which contains almost ten times two web-crawled datasets of CrowdHuman and Wider Pedestrian1
more images and eight times more persons than CityPersons [52]. [31]. We provide more details of above datasets in Table 2.
Although ECP [4] has a much larger scale, it still suffers from the Evaluation protocol. We evaluate the performance of pedes-
low density of persons and high similarity of background scenes, trian detectors by the widely used metric of log average miss
which could be the focus of future datasets. Thus, in this work, we rate over False Positive Per Image (FPPI) over range [10−2 ,
argue that the low density and diversity of these datasets constrains 100 ] (M R−2 ) on Caltech [13], CityPersons [52] and ECP [4].
the generalization ability of pedestrian detectors, while the web The experimental results on different occlusion levels including
crawled datasets, such as CrowdHuman [39], WiderPerson [54] Reasonable, Small, Heavy, Heavy*2 and All are reported unless
and Wider Pedestrian [31], including much diverse scenes and stated otherwise. Table 1 provides the specific settings of each set.
denser persons may increase the upper bound of pedestrian detec- Cross-dataset evaluation. To evaluate the generalization abil-
tors’ generalization ability. ity of pedestrian detectors, we perform cross-dataset evaluation
Cross-dataset evaluation. Some existing works [52], [4], [39] by only using the training set of dataset A to train models and
explored the relations between the performance of pedestrian directly testing then on the validation/testing set of dataset B. This
detectors and training datasets, whose purpose was to show how training and testing procedure is consistent for all experiments,
much performance advantage could be obtained on target datasets and denoted as A→B.
by pre-training on more diverse and dense scenes datasets. But, Baseline. Because most of the high performance pedestrian
in this work, we aim to thoroughly evaluate the generalization detectors on the datasets of Caltech, CityPersons and ECP are
abilities of some popular pedestrian detection methods by using built upon the two-stage detectors of Faster/Mask R-CNN [38],
the cross-dataset evaluation.
1. Wider Pedestrian contains images from the scenarios of autonomous
driving and surveillance. The data provided in 2019 challenge was used in
3 E XPERIMENTS our experiments. Data can be downloaded from : [Link]
3.1 Experimental Settings org/competitions/20132
2. In the case of CityPersons, for the fair comparison with some previous
Datasets. We conduct extensive experiments on three public methods under the same setting, we also report the numbers under the visibility
pedestrian detection datasets which collectd from the scenario of level between [0.0,0.65] which is denoted as Heavy* occlusion.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 4

TABLE 2: Datasets statistics. ‡ Fixed aspect-ratio for bounding boxes.

Caltech ‡ CityPersons ‡ ECP CrowdHuman Wider Pedestrian
images 42,782 2,975 21,795 15,000 90,000
persons 13,674 19,238 201,323 339,565 287,131
persons/image 0.32 6.47 9.2 22.64 3.2
unique persons 1,273 19,238 201,323 339,565 287,131

TABLE 3: Evaluating generalization abilities of different backbones using our baseline detector.
Backbone Training Testing Reasonable
HRNet WiderPedestrian + CrowdHuman CityPersons 12.8
ResNeXt WiderPedestrian + CrowdHuman CityPersons 12.9
Resnet-101 WiderPedestrian + CrowdHuman CityPersons 15.8
ResNet-50 WiderPedestrian + CrowdHuman CityPersons 16.0

[19], the more powerful multi-stage detector of Cascade R-CNN data are from the same domain dataset, i.e., within-dataset evalua-
[8] which also belongs to the R-CNN family is chosen as the tion. However, we argue that this algorithm development pipeline
baselines. In this work, the terms of baseline and Cascade RCNN ignores the generalization capability of pedestrian detectors and
are interchangeably used, which both means the same method of is easy to over-fit on a specific dataset. Thus, in this work, we
Cascade R-CNN [8]. Multiple detection heads are used step-by- emphasize the importance of cross-dataset evaluation in the design
step in the Cascade R-CNN to gradually filter out the false positive of pedestrian detectors. We can clearly show how well pedestrian
anchors and generate more and more high-quality proposals. We detectors perform on unseen domain by cross-dataset evaluation.
equipped different backbones on our baseline method to evaluate Thus, extensive cross-dataset experiments are conducted in this
its robustness, and the details of these experiments are shown section to evaluate the robustness of pedestrian detectors.
in Table 3. Among them, ResNeXt [48] and HRNet [44] show
the top ranked performance. Unless other statements, HRNet 5.1 Dataset Illustrations
[44] is used as the default backbone network of our baseline.
We showcase some examples of datasets related to pedestrian de-
HRNet simultaneously processes multiple levels feature maps
tection in Figure 2. Top row depicts different scenarios in diverse
in a parallel way retaining both low-level details and high-level
and dense datasets collected by crawling on web. Bottom row
semantic information, which may greatly benefit the pedestrian
illustrates images from traditional autonomous driving datasets. It
detection under the large scale variations.
can be observed that web-crawled datasets provide more enriched
representation of pedestrians, since they cover several scenarios,
4 B ENCHMARKING such as different poses, illumination and different types of occlu-
The benchmarking results of our baseline, Cascade R-CNN [8],
sion. Whereas, autonomous driving benchmarks are monotonous
on three autonomous driving datasets including Caltech [13]
in nature, i.e. same background, view-point etc. Interestingly,
dataset, CityPersons [52] and on ECP [4] are presented in Table
ECP [4] and CityPersons [52], illustrate striking resemblance
4. Without any ”bells and whistles”, our baseline achieved the
(where the camera is mounted, image resolution, geographical
performance comparable to the specially customized pedestrian
location etc.), this further stresses the point that even when the
detectors on the datasets of Caltech and CityPersons. Interestingly,
target domains are not drastically different, current pedestrian
the performance gap between our baseline and the state-of-the-art
detectors do not generalize well (cf. Tables 5,6 and 7 in the paper).
algorithms changes as the sizes of datasets increase. The relative
performance of our baseline is lowest on the smallest dataset of
Caltech and significantly improves on the largest dataset of ECP. 5.2 Cross Dataset Evaluation of Existing State-of-the-
Art
TABLE 4: Benchmarking on autonomous driving datasets. In this section we demonstrate that existing state-of-the art pedes-
Method Testing Reasonable Small Heavy trian detectors generalize worse than general object detector. We
ALFNet [28] Caltech 6.1 7.9 51.0 show that this is mainly due to the biases in the design of methods
Rep Loss [45] Caltech 5.0 5.2 47.9 for the target set, even when other factors, such as backbone, are
CSP [29] Caltech 5.0 6.8 46.6 kept consistent.
Cascade R-CNN [8] Caltech 6.2 7.4 55.3
To see how well state-of-the-art pedestrian detectors generalize
RepLoss [45] CityPersons 13.2 - - to different datasets, we performed cross dataset evaluation of
ALFNet [28] CityPersons 12.0 19.0 48.1
CSP [29] CityPersons 11.0 16.0 39.4 five state-of-the-art pedestrian detectors and our baseline (Cas-
Cascade R-CNN [8] CityPersons 11.2 14.0 37.1 cade RCNN) on CityPersons [52] and Caltech [13] datasets. We
Faster R-CNN [4] ECP 7.3 16.6 52.0 evaluated recently proposed BGCNet [23], CSP [29], PRNet [41],
YOLOv3 [4] ECP 8.5 17.8 37.0 ALFNet [28] and FRCNN [52](tailored for pedestrian detection).
SSD [4] ECP 10.5 20.5 42.0 Furthermore, we added along with baseline, Faster R-CNN [38],
Cascade R-CNN [8] ECP 6.6 13.6 33.3 without “bells and whistles”, but with a more recent backbone
ResNext-101 [48] with FPN [24]. Moreover, we implemented a
vanilla FRCNN [52] with VGG-16 [40] as a backbone and with no
5 G ENERALIZATION C APABILITIES pedestrian specific adaptations proposed in [52] (namely quantized
As previously mentioned, existing works evaluate the pedestrian anchors, input scaling, finer feature stride, adam solver, ignore
detectors with a traditional manner where training and evaluation region handling, etc).
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 5

Fig. 2: Illustration of benchmarks. Top row shows images from diverse and dense datasets, such as CrowdHuman [39] and Wider
Pedestrian [31]. Bottom row presents images of autonomous driving benchmarks, ECP [4], CityPersons [52] Caltech [13]

TABLE 5: Cross dataset evaluation on Caltech and CityPersons. A→B refers to training on A and testing on B.

Method Bakcbone CityPersons→CityPersons CityPersons→Caltech Caltech→Caltech Caltech→CityPersons

FRCNN [52] VGG-16 15.4 21.1 8.7 46.9
Vanilla FRCNN [52] VGG-16 24.1 17.6 12.2 52.4
ALFNET [28] ResNet-50 12.0 17.8 6.1 47.3
CSP [29] ResNet-50 11.0 12.1 5.0 43.7
PRNet [41] ResNet-50 10.8 10.7 - -
BGCNet [23] HRNet 8.8 10.2 4.1 41.4
Faster R-CNN [38] ResNext-101 16.4 11.8 9.7 40.8
Cascade R-CNN [8] HRNet 11.2 8.8 6.2 36.5
Cascade R-CNN [8] Swin Transformer 9.2 9.1 8.0 41.9

TABLE 6: Cross dataset evaluation of (Casc. R-CNN and CSP) on Autonomous driving benchmarks. Both detectors are trained with
HRNet as a backbone.
Method Training Testing Reasonable Small Heavy
Casc. RCNN CityPersons CityPersons 11.2 14.0 37.0
CSP CityPersons CityPersons 9.4 11.4 36.7
Casc. RCNN ECP CityPersons 10.9 11.4 40.9
CSP ECP CityPersons 11.5 16.6 38.2
Casc. RCNN ECP ECP 6.9 12.6 33.1
CSP ECP ECP 19.4 50.4 57.3
Casc. RCNN CityPersons ECP 17.4 40.5 49.3
CSP CityPersons ECP 19.6 51.0 56.4
Casc. RCNN CityPersons Caltech 8.8 9.8 28.8
CSP CityPersons Caltech 10.1 13.3 34.4
Casc. RCNN ECP Caltech 8.1 9.6 29.9
CSP ECP Caltech 10.4 13.7 31.3

We present results for Caltech and CityPersons in Table 5, on Caltech. Particularly, BCGNet [23], CSP [29], ALFNet [28]
respectively. We also report results when training is done on target and FRCNN [52] degraded by more than 100 % (in comparison
dataset for readability purpose. For our results presented in Table 5 with fifth column, Caltech→Caltech). Whereas in the case of
(Fourth column, CityPersons→Caltech), we trained each detector Cascade R-CNN [8], performance remained comparable to the
on CityPersons and tested on Caltech. Similarly, in the last column model trained and tested on target set. Since, CityPersons is a
of the Table 5, all detectors were trained on the Caltech and relatively diverse and dense dataset in comparison with Caltech,
evaluated on CityPersons benchmark. As expected, all methods this performance deterioration cannot be linked to dataset scale
suffer a performance drop when trained on CityPersons and tested and crowd density. This illustrates better generalization ability of
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 6

general object detectors over state-of-the-art pedestrian detectors. training and testing datasets are varied. For fair comparsions, the
Moreover, it is noteworthy that BGCNet [23] like the Cascade R- backbone of CSP [29] is replaced from ResNet-50 to HRNet [44].
CNN [8], also uses HRNet [44] as a backbone, making it directly As shown in the second row of Table 6, this change improves the
comparably to the Cascade R-CNN [8]. performance of CSP from 11.0%M R−2 to 9.4%M R−2
Importantly, pedestrian specific FRCNN [52] performs worse First, we train Cascade RCNN and CSP on the pedestrian
in cross dataset (fourth column only), compared with its direct detection dataset of ECP which contains more countries and cities
variant vanilla FRCNN. The only difference between between and is the largest benchmark with regard to pedestrian density
the two being pedestrian specific adaptations for the target set, and diversity from the context of autonomous driving. CityPer-
highlighting the bias in the design of tailored pedestrian detectors. sons [52] is chosen as the evaluation benchmark, and results
Similarly, standard Faster R-CNN [38], though performs worse are shown in the third and fourth row of Table 6, respectively.
than FRCNN [52] when trained and tested on the target dataset, it It is clear that in the case of providing the same backbone,
performs better than FRCNN [52] when it is evaluated on Caltech Cascade RCNN generalizes better on CityPersons than CSP in
without any training on Caltech. the reasonable setting. Considering CSP significantly outperforms
It is noteworthy that Faster R-CNN [38] outperforms state-of- Cascade RCNN by nearly 2% M R−2 points when they are
the-art pedestrian detectors (except for BGCNet [23]) as well in evaluated in the within-dataset setting, it is surprising to see that
cross dataset evaluation, presented in Table 5. We again attribute the results are turn over.
this to the bias present in the design of current state-of-the-art Secondly, we train CSP and Cascade RCNN on CityPersons
pedestrian detectors, which are tailored for specific datasets and and evaluate them on ECP [4] to further study the generaliza-
therefore limit their generalization ability. Moreover, a significant tion abilities of them under different diverse degrees of training
performance drop for all methods (though ranking is preserved datasets. Similarly, when training dataset is with low diversity,
except for vanilla FRCNN), including Cascade R-CNN [8], can Cascade RCNN still outperforms than CSP. The performance
be seen in Table 5, last column. However, this performance drop difference is 10.5 % M R−2 , 7.1 % M R−2 and 2.2 % M R−2 in
is attributed to lack of diversity and density of the Caltech dataset. the small, heavy and reasonable settings, respectively.
Caltech dataset has less annotations than CityPersons and number Finally, we combine CityPersons and ECP as the training data
of people per frame is less than 1 as reported in Table 2. However, and perform the evaluate on the Caltech which is the smallest data
still it is important to highlight, even when trained on a limited source. The results of Cascade RCNN and CSP in all settings are
dataset, usually general object detectors are better at generalization shown in the last four rows of Table 6. We conclude that when we
than state-of-the-art pedestrian detectors. Interestingly, Faster R- use a diverse and dense training dataset, ECP Cascade RCNN has
CNN’s [38] error is nearly twice as high as that of BGCNet [23] more robust performance than CSP on all evaluating subsets.
in within-dataset evaluation, whereas it outperforms in BGCNet
[23] in cross-dataset evaluation.
As discussed previously, most pedestrian detection meth- 5.4 Diverse General Person Detection Datasets for
ods are extensions of general object detectors (FR-CNN, SSD, Generalization
etc.). However, they adapt to the task of pedestrian detection. We study how much performance improvement dense and diverse
These adaptations are often too specific to the dataset or detec- datasets can bring. When the testing source is small datasets
tor/backbones (e.g. anchor settings [52], [28], finer stride [52], from the context of autonomous driving, such as Caltech [13],
additional annotations [55], [37], constraining aspect-ratios and diverse and dense datasets are still beneficial for generalization
fixed body-line annotation [29], [23] etc.). These adaptations usu- even under large domain gaps between training and evaluation
ally limit the generalization as shown in Table 5, also discussed, datasets. Moreover, diverse and dense datasets can bring more
task specific configurations of anchors limits generalization as benefits to the general object detection methods, such as Cascade
discussed in [27]. RCNN than the specially tailored pedestrian detectors, such as
CSP.
5.3 Autonomous Driving Datasets for Generalization CrowdHuman [39] and Wider Pedestrian [31] datasets are two
We show that the general object detectors outperforms existing diverse and dense pedestrian detection datasets collected from
pedestrian detection methods (such as CSP [29]) as learning a web-crawling and surveillance cameras. Unlike the autonomous
generic feature representation for pedestrians, even when they driving datasets, the crowd density and scenario diversities are
are trained on the large dataset (such as ECP) and tested on the large in CrowdHuman [39] and Wider Pedestrian [31] since
small dataset (such as Caltech). Furthermore, detectors achieve they include images in diverse sources, such as street views and
higher generalization from larger and denser autonomous driving surveillance, which increases the data diversity from a different
datasets. form. Thus, they are idea sources to pre-train models. We pre-
As shown in the last section, cross dataset evaluation can shed train Cascade R-CNN [8] and CSP [29] on CrowdHuman [39] and
light on the generalization capabilities of pedestrian detectors. Wider Pedestrian [31] datasets, and shows corresponding results
Moreover, the characteristic of dataset is also an important de- in Table 7. It can be seen that pre-training significantly boost
terminant for the model generalization. The intrinsic nature of the the performance of pedestrian detectors. When tested on Caltech
real world can be more effectively draw by diverse datasets [4]. dataset, Cascade R-CNN outperforms all previous methods which
Consequently, such datasets potentially provide the chances for are only trained on Caltech, and the test error is reduced nearly
pedestrian detectors to learn more generic feature representation to by half. The trend of performance improvement is observed on
robustly tackle the domain shifts. Instead of exploring the impact the results of CSP [29], although its improvement is less than
of dataset in generalization as existing studies [4], [39], [54], we Cascaded R-CNN. The performance of either Cascade RCNN or
aim at presenting a detailed comparison of a general object de- CSP are not improved on CityPersons [52], when they are trained
tector and state-of-the-art pedestrian detection methods when the on CrowdHuman [39]. This is reasonable due to CityPersons [52]
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 7

TABLE 7: Benchmarking with CrowdHuman and Wider Pedestrian dataset.

Method Training Testing Reasonable Small Heavy
Casc. RCNN CrowdHuman Caltech 3.4 11.2 32.3
CSP CrowdHuman Caltech 4.8 5.7 31.9
Casc. RCNN CrowdHuman CityPersons 15.1 21.4 49.8
CSP CrowdHuman CityPersons 11.8 18.3 44.8
Casc. RCNN CrowdHuman ECP 17.9 36.5 56.9
CSP CrowdHuman ECP 19.8 48.9 60.1
Casc. RCNN Wider Pedestrian Caltech 3.2 10.8 31.7
CSP Wider Pedestrian Caltech 3.4 3.0 29.5
Casc. RCNN Wider Pedestrian CityPersons 16.0 21.6 57.4
CSP Wider Pedestrian CityPersons 17.0 22.4 58.2
Casc. RCNN Wider Pedestrian ECP 16.1 32.8 58.0
CSP Wider Pedestrian ECP 24.1 62.6 76.7

TABLE 8: Investigating the effect on performance when CrowdHuman, Wider Pedestrian and ECP are merged and Cascade R-CNN
[8] is trained only on the merged dataset.
Method Training Testing Reasonable Small Heavy
Casc. RCNN CrowdHuman → ECP CP 10.3 12.6 40.7
CSP CrowdHuman → ECP CP 10.4 10.0 36.2
Casc. RCNN Wider Pedestrian → ECP CP 9.7 11.8 37.7
CSP Wider Pedestrian → ECP CP 9.8 14.6 35.4
Casc. RCNN CrowdHuman → ECP Caltech 2.9 11.4 30.8
CSP CrowdHuman → ECP Caltech 11.0 14.7 32.2
Casc. RCNN Wider Pedestrian → ECP Caltech 2.5 9.9 31.0
CSP Wider Pedestrian → ECP Caltech 8.6 12.0 30.3

is more difficult than Caltech [13] with regard to density and Wider Pedestrian [31] → ECP [4]. Besides, progressive training
diversity. Similar trends are observed in Table 6 when detectors also helps Cascade R-CNN achieve new state-of-the-art results on
are tested on ECP [4]. Training on CityPersons [52] brings better the Caltech [13] dataset. It is worth noting that our performance
performance than training on CrowdHuman [39]. From the bottom on Caltech [13] is very close to the human baseline (0.88).
half of Table 7, we can see that general object detector benefits Finally, we show the experimental results of directly merg-
more from training on Wider Pedestrian [31]. We hypothesis ing all datasets in the third and fourth rows of Table 8. This
this is because that Wider Pedestrian [31] is with larger scale training strategy can also improve the performance but it still
and more similar to target domain than CrowdHuman [39]. The cannot reach the performance of progressive training pipeline,
domain difference is reflected in the scenarios of images where which demonstrates the value of pre-training on general dataset
CrowdHuman [39] includes web-crawled persons with diverse and then fine-tuning on the autonomous driving dataset. Without
poses while Wider Pedestrian [31] are mainly from street views touching the data in target domain, our progressive training helps
and surveillance cameras. to effectively improve the pedestrian detection performance of
state-of-the-art detectors. These experiments demonstrate that our
training pipeline explores a way to significantly improve the
5.5 Progressive Training Pipeline
generalization capability of Cascade R-CNN and makes it on
We propose a progressive training pipeline to take full charge a level with state-of-the-art detectors on CityPersons [52] and
of multi-source datasets and thus further improve the pedestrian achieve best performance on Caltech [13].
detection performance. This pipeline first trains detectors on a
dataset that is farther from target domain but general diverse
enough and then fine-tune them on a dataset which is similar to 5.6 Application Oriented Models
the target domain. In this section, we conducted experiments to show that pre-training
Extensive experiments are conducted to demonstrate the value on dense and diverse datasets can help a light-weight neural net-
of progressive training. To be in line with the study described in work architecture, MobileNet [21], to achieve competitive results
the previous section, target domain dataset is not touched, and as state-of-the-art detectors, such as CSP, on CityPersons [52]
only the training subset of each corresponding dataset is used in dataset.
our pipeline. We use the symbol of A → B to denote pre-training The computational cost and model size of pedestrian detectors
model on A dataset and fine-tuning it on B dataset. Besides, two are important factors in many real-world applications, such as
datasets of A and B could be directly merged together to train the drones and autonomous driving cars, which require real-time
model, which is denoted as A + B. In this section, Caltech [13] and detection and are usually run on limited hardware. To study
CityPersons [52] datasets are respectively used as the evaluation whether progressive training pipeline is still effective in improving
benchmark, and corresponding results are shown in Table 8. The the performance of light-weight backbone, we conduct experi-
upper part of Table 8 clearly shows that the performance of ments with a widely used light-weight neural network backbone,
Cascade RCNN can be significantly improved by the progressive MobileNet [21] v2, proposed for embedded and mobile computer
training pipeline. Noticeably, without training on CityPersons [52] vision tasks.
dataset, Cascade R-CNN achieves comparable results as the state- We replace the backbone of Cascade R-CNN [8] as a Mo-
of-the-art detectors through the progressive training pipeline of bileNet [21], and present the results on CityPersons [52] in
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 8

Table 9. To establish reference performance, we train and test (d), small-scale pedestrians Fig.5 (g) to (l), and occlusion Fig.5
MobileNet [21] on CityPersons [52], and show its results in the (b, c, e, f, k). On the contrary, Cascade RCNN seems to be robust
first row of Table 9. Intuitively, the performance of MobileNet [21] to such domain shifts and handles the above mentioned challeng-
is lower than HRNet [44]. But, the results demonstrate that ing scenarios better than CSP. We argue that probable reasons
progressive training by pre-training MobileNet [21] on CrowdHu- behind CSP’s poor generalizations stems from the fact that it is a
man [39] and then fine-tuning on ECP [4] still effectively improves single stage detector without feature alignment compared to two-
the detection performance on CityPersons [52]. Moreover, further stage detector like Cascade RCNN. Feature alignment improves
performance improvement can be achieved by replacing the pre- generalization [10]. Moreover, two-stage detectors have specific
training dataset of CrowdHuman [39] as Wider Pedestrian [31]. module for feature-alignment (ROI-Align), therefore leading them
As shown in the first and forth rows, our progressive pipeline to more aligned features on unseen domains. This explicit feature
of Wider Pedestrian [31] → ECP [4] improves the performance alignment is absent in CSP and on unseen domains leads to failure
of 0.6% M R−2 n the reasonable subset of CityPersons [52]. cases, such as occlusion and small-scale pedestrians, where feature
The results in Table 9 and 7 both demonstrate that pre-training alignment becomes pivotal for precise localization.
on Wider Pedestrian [31] dataset can bring larger performance
improvement than CrowdHuman [39] when the evaluation bench-
mark is CityPersons [52]. This is because Wider Pedestrian [31]
includes scenario of autonomous driving and shares more common 6 H OW T RANSFORMERS FARE W ITH R ESPECT TO
characteristics with target domain. It is worth noting that our
progressive training pipeline makes Cascade R-CNN with a light-
CNN
weight backbone of MobileNet [21] approach the performance of
state-of-the-art method, CSP [29], which is equipped with a much In this section, we discuss the results when CNNs are pitched
larger backbone of ResNet-50. against the recent Transformer Networks. Intuitively, transformer
networks do outperform CNN based backbones in direct and cross
TABLE 9: Investigating the performance of embedded vision dataset evaluation. However, when the domain gap is large and
model, when pre-trained on diverse and dense datasets. we increase the size of the training dataset along with more
sources of pre-training, we quantitatively illustrate that CNNs are
Training Testing Reasonable Small Heavy
CP CP 12.0 15.3 47.8 more robust to domain shifts and have better ability to digest
ECP CP 19.1 19.3 51.3 data, compared with transformer networks. To the best of our
CrowdHuman→ECP CP 11.9 15.7 48.9 knowledge, this is one of the first studies that objectively illustrates
Wider Pedestrian→ECP CP 11.4 14.6 43.4 this.
In order to make comparison fair, we used the same detector,
Cascade R-CNN, in both backbone networks HRNet and Swin
5.7 Qualitative Results Transformer. We included Swin Transformer for evaluation, since,
We show detection quality of Cascade R-CNN† on different it has achieved state of the art performance on general object
datasets in Figure 3. Top row contain results from Caltech and detection benchmarks. Therefore, we start by benchmarking in
bottom row show images from CityPersons and ECP. One could Table 10. It can be observed that Swin Transformer outperforms
observe that Cascade R-CNN† is robust to crowd density, as HRNet on direct and cross-dataset evaluation. Which is intuitive
presented images show several instances of occlusion, people as, Swin Transformer also outperforms HRNet on general object
walking in close vicinity, etc. Furthermore, we extracted images detection as well, thanks to the shited window based self attention
from different regions across the globe under varying conditions, which captures a more powerful representation compared with
shown in Figure, 4. Four different conditions are showcased, for a CNN based backbone such as the HRNet. However, when
example, Netherlands shows rainy conditions, where people are the domain shift is large, sources of pre-training are fixed to
wearing jackets with hood, rain coats and holding umbrellas. In CrowdHuman and Wider Pedestrian and tested on autonomous
Italy, summer season is illustrated and people are walking in driving oriented datasets, we notice the Swin Transformer gets
city-center in close vicinity, often occluding each other. Winter outperformed by HRNet significantly, across all settings as shown
season can be seen in Germany, as snow is visible. Finally, France in Table11. This finding illustrate that CNNs are more tolerant
illustrates a low-illumination scenario, where car headlights can be to domain shifts, especially, if the shifts are not subtle. Inline
seen illuminating the scene (also bringing reflections and shadows with the studies in sections above, we test Swin Transformer,
into play). Aim of this figure is to demonstrate the robustness of through our progressive training pipeline. In table 12, we pre-
Cascade R-CNN† , since the pre-training is done on web-crawled train on diverse general person detection dataset and finetune on
datasets and thanks to the several diverse and dense scenes, ECP which is closer to the target domain. Except for the third
Cascade R-CNN† has learnt a representation capable of handling row, HRNet outperformed Swin Tranformers on all datasets. This
several real-world scenarios efficiently. trend persists even when we add the target set to the progressive
Finally, we show qualitatively in Figure 5, how pedestrian training pipeline as shown in Table 13. We attribute this to the
detectors such as CSP lacks in generalization compared with a fact that potentially one of the main reasons are that the hyper-
general object detector such as Cascade RCNN. In the figure 5, we parameters of transformer networks are not as optmized as they
trained both detectors with HRNet on CityPersons and evaluate on for CNNs. Moreover, transformer networks are also more prone
ECP. We picked cases from different cities, under varying lighting to overfitting compared with the CNNs. Nonetheless, it is still
conditions (afternoon, evening etc.) and under different weathers relatively early for transformer networks compared with CNNs,
such as sunny, raining or snowing. Common failure cases for which have been around and used extensively for nearly a decade
CSP includes low-illumination/blurry pedestrians Fig.5 (a) and and their hyperparameters are optimized quite thoroughly.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 9

Fig. 3: Qualitative results of Cascade R-CNN on different benchmarks. Top row includes example from Caltech, bottom left CityPersons
and ECP on bottom right. †

Fig. 4: Cascade R-CNN † across different scenarios, such as summer, winter, rain and low-illumination, illustrating the robustness of
the general object detector.

7 F INETUNING ON THE TARGET S ET of completeness, we also include results for target set fine-tuning.
Visible improvement across all splits can be observed for both
Finally, we add the training part of our target set to our progressive
methods. Interestingly, on CityPersons, CSP gets better than
training pipeline as illustrated in Table 14. Although this paper
Cascade RCNN. We attribute this to the fact that CSP’s design
stresses the importance of cross-dataset evaluation, for the sake
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 10

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Fig. 5: Generalization comparison between CSP [29] with HRNet and Cascade RCNN[8]. Both methods are trained on CityPersons
and evaluated on ECP. Images with Yellow bounding box are CSP’s detections where as magenta bounding box are Cascade RCNN’s
detections. Dotted green line illustrates instances where CSP fails to detect.

choice is optimized for CityPersons (at the cost of generalization). to real-world problems.
Therefore, it benefits more than a general design object detector
for target set fine-tuning on CityPersons. However, both methods 7.1 Quantitative Results on Leaderboard
are still comparable and in the case of Caltech, Cascade R- We further evaluated proposed training pipeline along with en-
CNN outperforms CSP significantly. Moreover, the performance semble of our two models, one pre-trained on CrowdHuman [39]
on Caltech is nearly in the same order, as that of humans (0.88). and the other one on Wider Pedestrian [31]. Ensembling is
This also brings to the conclusion that Caltech dataset is almost performed by combining the detections followed by non-maxima
solved. Next generation pedestrian detectors should use Caltech suppression, using soft-nms [3]. Final results are evaluated on the
as a reference, but focus on more challenging dataset such as dedicated server3 (test set annotations are withheld) of CityPer-
ECP and CityPersons. Importantly, general object detector such
as, Cascade RCNN, without fine-tuning on the target set (cf. Table 3. [Link]
8), can already achieve comparable results to that with fine-tuning [Link]
‡ : Correspond to our submissions and use of additional training data
as in Table 14, making it practical, ready to use and more suitable
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 11

TABLE 10: Cross dataset evaluation of HRNet and Swin-Trans. on Autonomous driving benchmarks. Both detectors are trained with
Casc. R-CNN as a backbone.
Method Training Testing Reasonable Small Heavy
HRNet CityPersons CityPersons 11.2 14.0 37.0
Swin-Trans CityPersons CityPersons 9.2 10.5 36.9
HRNet ECP CityPersons 10.9 11.4 40.9
Swin-Trans ECP CityPersons 10.2 12.9 39.6
HRNet ECP ECP 6.9 12.6 33.1
Swin-Trans ECP ECP 4.5 9.6 25.2
HRNet CityPersons ECP 17.4 40.5 49.3
Swin-Trans CityPersons ECP 14.8 28.3 50.2

TABLE 11: Cross-dataset evaluation of HRNet and Swin-Trans., when sources of training are fixed to Wider Pedestrian and
CrowdHuman
Method Training Testing Reasonable Small Heavy
HRNet CrowdHuman Caltech 3.4 11.2 32.3
Swin-Trans CrowdHuman Caltech 10.1 37.2 55.1
HRNet CrowdHuman CityPersons 15.1 21.4 49.8
Swin-Trans CrowdHuman CityPersons 16.7 24.3 55.0
HRNet CrowdHuman ECP 17.9 36.5 56.9
Swin-Trans CrowdHuman ECP 21.1 42.8 56.0
HRNet Wider Pedestrian Caltech 3.2 10.8 31.7
Swin-Trans Wider Pedestrian Caltech 9.8 30.1 55.6
HRNet Wider Pedestrian CityPersons 16.0 21.6 57.4
Swin-Trans Wider Pedestrian CityPersons 13.9 32.5 57.7
HRNet Wider Pedestrian ECP 16.1 32.8 58.0
Swin-Trans Wider Pedestrian ECP 18.1 32.5 65.8

TABLE 12: Investigating the effect on performance when CrowdHuman, Wider Pedestrian and ECP are merged and both backbones
(HRNet and Swin-Trans.) are trained only on the merged dataset.
Method Training Testing Reasonable Small Heavy
HRNet CrowdHuman → ECP CP 10.3 12.6 40.7
Swin Trans. CrowdHuman → ECP CP 11.0 12.4 43.4
HRNet Wider Pedestrian → ECP CP 9.7 11.8 37.7
Swin Trans. Wider Pedestrian → ECP CP 9.5 10.8 39.7
HRNet CrowdHuman → ECP Caltech 2.9 11.4 30.8
Swin Trans. CrowdHuman → ECP Caltech 8.0 28.0 54.4
HRNet Wider Pedestrian → ECP Caltech 2.5 9.9 31.0
Swin Trans. Wider Pedestrian → ECP Caltech 8.8 28.1 33.9

TABLE 13: Evaluation of HRNet and Swin-Trans. after fine-tuning on the target set.
Method Training Testing Reasonable Small Heavy
HRNet CrowdHuman → ECP → CP CP 8.0 8.5 27.0
SwinTrans. CrowdHuman → ECP → CP CP 9.1 10.0 30.9
HRNet Wider Pedestrian → ECP → CP CP 7.5 8.0 28.0
SwinTrans. Wider Pedestrian → ECP → CP CP 8.9 10.4 33.8

TABLE 14: After fine-tuning on the target set.

Method Training Testing Reasonable Small Heavy
Casc. RCNN CrowdHuman → ECP → CP CP 8.0 8.5 27.0
CSP CrowdHuman → ECP → CP CP 8.1 8.9 29.3
Casc. RCNN Wider Pedestrian → ECP → CP CP 7.5 8.0 28.0
CSP Wider Pedestrian → ECP → CP CP 7.1 7.8 26.9
Casc. RCNN CrowdHuman → ECP → Caltech Caltech 2.2 8.1 30.7
CSP CrowdHuman → ECP → Caltech Caltech 3.8 9.2 31.2
Casc. RCNN Wider Pedestrian → ECP → Caltech Caltech 1.7 7.2 25.7
CSP Wider Pedestrian → ECP → Caltech Caltech 2.4 7.5 30.3

sons [52] and ECP [4], maintained by the benchmark publishers respectively. These results serve as a reference for future methods.
and frequency of submissions are constraint. Moreover, we have However, no other method to the best of our knowledge uses
included results only for the published methods (detailed eval- extra training data. Therefore, giving our submissions an unfair
uations of all methods can be seen on the urls provided in the advantage. Finally, as stated above, fine-tuning on target set is not
footnote 3). Results are presented in Table 15 and 16. Our submis- the goal of the paper and in many cases it is not practical. In this
sion (Cascade RCNN) achieves 1st and 2nd on both leaderboards work, we argue in the favor of cross-dataset evaluation and its
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 12

importance. efficient source of pre-training, especially in the context of au-

tonomous driving, where most benchmarks are monotonous in
TABLE 15: Results on the withheld Test Set of CityPersons. nature. Finally, findings presented in this paper can inspire the
next generation of pedestrian detectors, which are more generic,
Methods Reasonable Small Heavy
MS-CNN [6] 13.32 15.86 51.88
practical and are ready to use.
FRCNN [52] 12.97 37.24 50.47
Cascade MS-CNN [8] 11.62 13.64 47.14
Repulsion Loss [45] 11.48 5.67 52.59
AdaptiveNMS [25] 11.40 13.64 46.99
R EFERENCES
OR-CNN [53] 11.32 14.19 51.43
HBA-RCNN [32] 11.26 15.68 43.7 [1] A. Angelova, A. Krizhevsky, V. Vanhoucke, A. Ogale, and D. Ferguson.
Real-time pedestrian detection with deep network cascades. 2015.
MGAN [37] 9.29 11.38 40.97 [2] B. Benfold and I. Reid. Stable multi-target tracking in real-time surveil-
CrowdHuman →ECP→CP ‡ 8.35 9.85 27.87 lance video. In CVPR 2011, pages 3457–3464. IEEE, 2011.
WiderPerson→ECP→CP ‡ 8.31 10.23 28.18 [3] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis. Soft-nms–improving
object detection with one line of code. In Proceedings of the IEEE
Ensemble ‡ 7.69 9.16 27.08
international conference on computer vision, pages 5561–5569, 2017.
APD-Pretrain [49] 7.3 10.8 28.0 [4] M. Braun, S. Krebs, F. Flohr, and D. M. Gavrila. Eurocity persons: A
novel benchmark for person detection in traffic scenes. IEEE transactions
on pattern analysis and machine intelligence, 41(8):1844–1861, 2019.
[5] G. Brazil, X. Yin, and X. Liu. Illuminating pedestrians via simultaneous
TABLE 16: Results on the withheld Test Set of ECP.
detection & segmentation. In Proceedings of the IEEE International
Conference on Computer Vision, pages 4950–4959, 2017.
Training Reasonable Small Heavy [6] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos. A unified multi-scale
R-FCN (with OHEM) [4] 16.3 24.5 50.7 deep convolutional neural network for fast object detection. In European
SSD [4] 13.1 23.5 46.0 conference on computer vision, pages 354–370. Springer, 2016.
[7] Z. Cai, M. Saberian, and N. Vasconcelos. Learning complexity-aware
Faster R-CNN [4] 10.1 19.6 38.1 cascades for deep pedestrian detection. In Proceedings of the IEEE
YOLOv3 [4] 9.7 18.6 40.1 International Conference on Computer Vision, pages 3361–3369, 2015.
CrowdHuman→ECP ‡ 6.6 13.6 31.3 [8] Z. Cai and N. Vasconcelos. Cascade r-cnn: High quality object detection
and instance segmentation. IEEE Transactions on Pattern Analysis and
WiderPerson→ECP ‡ 5.5 11.7 26.1 Machine Intelligence, 2019.
APD-Pretrain [49] 5.3 12 .4 26.8 [9] V. Campmany, S. Silva, A. Espinosa, J. C. Moure, D. Vázquez, and
Ensemble ‡ 5.1 11.2 25.4 A. M. López. Gpu-based pedestrian detection for autonomous driving.
SPNet [22] 4.2 9.5 21.6 Procedia Computer Science, 80:2377–2381, 2016.
[10] C. Chen, Q. Dou, H. Chen, J. Qin, and P.-A. Heng. Synergistic image
and feature adaptation: Towards cross-modality domain adaptation for
medical image segmentation. In Proceedings of the AAAI Conference on
Artificial Intelligence, volume 33, pages 865–872, 2019.
[11] N. Dalal and B. Triggs. Histograms of oriented gradients for human
8 C ONCLUSIONS detection. In international Conference on computer vision & Pattern
In view of the recent developments in the field of pedestrian Recognition (CVPR’05), volume 1, pages 886–893. IEEE Computer
Society, 2005.
detection on existing benchmarks for autonomous driving. We [12] P. Dollár, R. Appel, S. Belongie, and P. Perona. Fast feature pyramids for
thoroughly investigated and evaluated existing state of the art object detection. IEEE Transactions on Pattern Analysis and Machine
pedestrian detectors, moreover, assessed their practicality and Intelligence, 36(8):1532–1545, 2014.
[13] P. Dollar, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: An
generalization using cross-dataset evaluation as an evaluation evaluation of the state of the art. IEEE transactions on pattern analysis
protocol. Through this study, we have pointed out an important, and machine intelligence, 34(4):743–761, 2012.
but often overlooked conclusion, that is, the current pedestrian [14] A. Ess, B. Leibe, and L. Van Gool. Depth and appearance for mobile
scene analysis. In 2007 IEEE 11th international conference on computer
detectors lack generalisation and do not handle well even small vision, pages 1–8. IEEE, 2007.
domain shifts. We have attributed lack of generalization to the [15] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous
fact that current pedestrian detectors are tailored towards target driving? the kitti vision benchmark suite. In 2012 IEEE Conference
on Computer Vision and Pattern Recognition, pages 3354–3361. IEEE,
datasets, and their architectural design contains strong biasness 2012.
towards the target domain, at the cost of generalisation. Therefore, [16] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies
we have illustrated that general object detectors, are more robust for accurate object detection and semantic segmentation. In Proceedings
of the IEEE conference on computer vision and pattern recognition,
and are better at handling domain shifts due to generic design. pages 580–587, 2014.
Besides detectors, we also investigated the role of backbone- [17] H. Hattori, V. Naresh Boddeti, K. M. Kitani, and T. Kanade. Learning
network architectures in generalization. We have compared the scene-specific pedestrian detectors without real data. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, pages
generalization ability of convolutional neural networks and the
3819–3827, 2015.
recent transformer networks, using the same detection architecture [18] A. Hbaieb, J. Rezgui, and L. Chaari. Pedestrian detection for autonomous
(Cascade R-CNN). We have shown that, although transformers out driving within cooperative communication system. In 2019 IEEE Wire-
perform CNNs in direct dataset evaluation, still, the generalization less Communications and Networking Conference (WCNC), pages 1–6.
IEEE, 2019.
ability of CNNs is better than that of transformer networks. [19] K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask r-cnn. In
Moreover, CNNs have shown to be better at digesting largescale Proceedings of the IEEE international conference on computer vision,
datasets than transformers, potentially due to the fact that CNN’s pages 2961–2969, 2017.
[20] J. Hosang, M. Omran, R. Benenson, and B. Schiele. Taking a deeper
hyper-parameters are thoroughly optimized over decade of rigor- look at pedestrians. In Proceedings of the IEEE Conference on Computer
ous research, and it is still relatively early days for transformer Vision and Pattern Recognition, pages 4073–4082, 2015.
networks. Additionally, we have proposed a progressive-training [21] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang,
T. Weyand, M. Andreetto, and H. Adam. Mobilenets: Efficient con-
pipeline in this paper, where we have shown that benchmarks volutional neural networks for mobile vision applications. arXiv preprint
diverse in scenes and dense in pedestrians, serves to be an arXiv:1704.04861, 2017.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 13

[22] C. Jiang, H. Xu, W. Zhang, X. Liang, and Z. Li. Sp-nas: Serial-to-parallel [48] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated residual
backbone search for object detection. In Proceedings of the IEEE/CVF transformations for deep neural networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, pages 11863– conference on computer vision and pattern recognition, pages 1492–
11872, 2020. 1500, 2017.
[23] J. Li, S. Liao, H. Jiang, and L. Shao. Box guided convolution for [49] J. Zhang, L. Lin, J. Zhu, Y. Li, Y.-c. Chen, Y. Hu, and C. S. Hoi. Attribute-
pedestrian detection. In Proceedings of the 28th ACM International aware pedestrian detection in a crowd. IEEE Transactions on Multimedia,
Conference on Multimedia, pages 1615–1624, 2020. 2020.
[24] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. [50] L. Zhang, L. Lin, X. Liang, and K. He. Is faster r-cnn doing well for
Feature pyramid networks for object detection. In Proceedings of the pedestrian detection? In European conference on computer vision, pages
IEEE conference on computer vision and pattern recognition, pages 443–457. Springer, 2016.
2117–2125, 2017. [51] S. Zhang, R. Benenson, M. Omran, J. Hosang, and B. Schiele. How far
[25] S. Liu, D. Huang, and Y. Wang. Adaptive nms: Refining pedestrian are we from solving pedestrian detection? In Proceedings of the IEEE
detection in a crowd. In Proceedings of the IEEE Conference on Conference on Computer Vision and Pattern Recognition, pages 1259–
Computer Vision and Pattern Recognition, pages 6459–6468, 2019. 1267, 2016.
[26] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. [52] S. Zhang, R. Benenson, and B. Schiele. Citypersons: A diverse dataset
Berg. Ssd: Single shot multibox detector. In European conference on for pedestrian detection. In Proceedings of the IEEE Conference on
computer vision, pages 21–37. Springer, 2016. Computer Vision and Pattern Recognition, pages 3213–3221, 2017.
[27] W. Liu, I. Hasan, and S. Liao. Center and scale prediction: A [53] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li. Occlusion-aware r-
box-free approach for pedestrian and face detection. arXiv preprint cnn: detecting pedestrians in a crowd. In Proceedings of the European
arXiv:1904.02948, 2019. Conference on Computer Vision (ECCV), pages 637–653, 2018.
[28] W. Liu, S. Liao, W. Hu, X. Liang, and X. Chen. Learning efficient [54] S. Zhang, Y. Xie, J. Wan, H. Xia, S. Z. Li, and G. Guo. Widerperson:
single-stage pedestrian detectors by asymptotic localization fitting. In A diverse dataset for dense pedestrian detection in the wild. IEEE
Proceedings of the European Conference on Computer Vision (ECCV), Transactions on Multimedia, 2019.
pages 618–634, 2018. [55] C. Zhou and J. Yuan. Bi-box regression for pedestrian detection and
[29] W. Liu, S. Liao, W. Ren, W. Hu, and Y. Yu. High-level semantic feature occlusion estimation. In Proceedings of the European Conference on
detection: A new perspective for pedestrian detection. In Proceedings Computer Vision (ECCV), pages 135–151, 2018.
of the IEEE Conference on Computer Vision and Pattern Recognition,
2019.
[30] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin
transformer: Hierarchical vision transformer using shifted windows.
arXiv preprint arXiv:2103.14030, 2021.
[31] C. C. Loy, D. Lin, W. Ouyang, Y. Xiong, S. Yang, Q. Huang, D. Zhou,
W. Xia, Q. Li, P. Luo, et al. Wider face and pedestrian challenge 2018:
Methods and results. arXiv preprint arXiv:1902.06854, 2019.
[32] R. Lu and H. Ma. Semantic head enhanced pedestrian detection in a
crowd. arXiv preprint arXiv:1911.11985, 2019.
[33] D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li,
A. Bharambe, and L. van der Maaten. Exploring the limits of weakly
supervised pretraining. In Proceedings of the European Conference on
Computer Vision (ECCV), pages 181–196, 2018.
[34] J. Mao, T. Xiao, Y. Jiang, and Z. Cao. What can help pedestrian
detection? In Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 3127–3136, 2017.
[35] S. Munder and D. M. Gavrila. An experimental study on pedestrian
classification. IEEE transactions on pattern analysis and machine
intelligence, 28(11):1863–1868, 2006.
[36] S. Paisitkriangkrai, C. Shen, and A. Van Den Hengel. Strengthening
the effectiveness of pedestrian detection with spatially pooled features.
In European Conference on Computer Vision, pages 546–561. Springer,
2014.
[37] Y. Pang, J. Xie, M. H. Khan, R. M. Anwer, F. S. Khan, and L. Shao.
Mask-guided attention network for occluded pedestrian detection, 2019.
[38] S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time
object detection with region proposal networks. In Advances in neural
information processing systems, pages 91–99, 2015.
[39] S. Shao, Z. Zhao, B. Li, T. Xiao, G. Yu, X. Zhang, and J. Sun.
Crowdhuman: A benchmark for detecting human in a crowd. arXiv
preprint arXiv:1805.00123, 2018.
[40] K. Simonyan and A. Zisserman. Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[41] X. Song, K. Zhao, W.-S. C. H. Zhang, and J. Guo. Progressive
refinement network for occluded pedestrian detection. In Proc. European
Conference on Computer Vision, volume 7, page 9, 2020.
[42] S. Sun, J. Pang, J. Shi, S. Yi, and W. Ouyang. Fishnet: A versatile
backbone for image, region, and pixel level prediction. In Advances in
Neural Information Processing Systems, pages 760–770, 2018.
[43] P. Viola and M. J. Jones. Robust real-time face detection. International
journal of computer vision, 57(2):137–154, 2004.
[44] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu,
M. Tan, X. Wang, et al. Deep high-resolution representation learning for
visual recognition. arXiv preprint arXiv:1908.07919, 2019.
[45] X. Wang, T. Xiao, Y. Jiang, S. Shao, J. Sun, and C. Shen. Repulsion loss:
detecting pedestrians in a crowd. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 7774–7783, 2018.
[46] C. Wojek, S. Walk, and B. Schiele. Multi-cue onboard pedestrian
detection. In 2009 IEEE Conference on Computer Vision and Pattern
Recognition, pages 794–801. IEEE, 2009.
[47] B. Wu and R. Nevatia. Cluster boosted tree classifier for multi-view,
multi-pose object detection. In 2007 IEEE 11th International Conference
on Computer Vision, pages 1–8. IEEE, 2007.

Pedestrian Detection For Autonomous Cars Inference Fusion of Deep Neural Networks
No ratings yet
Pedestrian Detection For Autonomous Cars Inference Fusion of Deep Neural Networks
11 pages
Pedestrian Detection: Methods & Advances
No ratings yet
Pedestrian Detection: Methods & Advances
9 pages
IET Image Processing - 2020 - Xiao - Deep Learnin
No ratings yet
IET Image Processing - 2020 - Xiao - Deep Learnin
17 pages
Vision-Based Pedestrian Detection Method
No ratings yet
Vision-Based Pedestrian Detection Method
8 pages
Occlusion Handling in Pedestrian Detection
No ratings yet
Occlusion Handling in Pedestrian Detection
21 pages
Pedestrian Detection with Triplet Loss
No ratings yet
Pedestrian Detection with Triplet Loss
13 pages
Pami09 Compressed
No ratings yet
Pami09 Compressed
18 pages
CNN vs Transformer: Object Detection Survey
No ratings yet
CNN vs Transformer: Object Detection Survey
31 pages
Deep Learning-Based Pedestrian Detection Using RGB Images and Sparse LiDAR Point Clouds
No ratings yet
Deep Learning-Based Pedestrian Detection Using RGB Images and Sparse LiDAR Point Clouds
13 pages
Autonomous Pedestrian Prediction
No ratings yet
Autonomous Pedestrian Prediction
5 pages
A Survey Object Detection Methods From CNN To Tran
No ratings yet
A Survey Object Detection Methods From CNN To Tran
32 pages
Spatio Contextual Deep Network Ujjwal Sir Paper
No ratings yet
Spatio Contextual Deep Network Ujjwal Sir Paper
11 pages
Predicting Pedestrian Crossing Intentions
No ratings yet
Predicting Pedestrian Crossing Intentions
12 pages
Pedestrian Detection in Automotive Safet
No ratings yet
Pedestrian Detection in Automotive Safet
27 pages
Mini-Project Report Format (CSE)
No ratings yet
Mini-Project Report Format (CSE)
10 pages
Deep Learning for Pedestrian Detection
No ratings yet
Deep Learning for Pedestrian Detection
5 pages
Mate Szarvas Pedestrian Detection With Convolutional Neural Networks IV 2005 Final PDF
No ratings yet
Mate Szarvas Pedestrian Detection With Convolutional Neural Networks IV 2005 Final PDF
6 pages
Methods of Research
No ratings yet
Methods of Research
17 pages
JOURNAL - 2020 - A Multispectral Feature Fusion Network For Robust Pedestrian Detection
No ratings yet
JOURNAL - 2020 - A Multispectral Feature Fusion Network For Robust Pedestrian Detection
13 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Self-Driving Carpdf
No ratings yet
Self-Driving Carpdf
1 page
Pedestrian Detection - Research Paper
No ratings yet
Pedestrian Detection - Research Paper
9 pages
Final Report - Removed
No ratings yet
Final Report - Removed
43 pages
Robust Multi-Resolution Pedestrian Detection
No ratings yet
Robust Multi-Resolution Pedestrian Detection
8 pages
Investigations of Object Detection in Im
No ratings yet
Investigations of Object Detection in Im
46 pages
Sat - 49.Pdf - PEdestrian Detection Using Compact-CNN
No ratings yet
Sat - 49.Pdf - PEdestrian Detection Using Compact-CNN
11 pages
Object Detection Techniques with ODUELAN
No ratings yet
Object Detection Techniques with ODUELAN
6 pages
Pedestrian Detection For Autonomous Vehicle Using Multi-Spectral Cameras
No ratings yet
Pedestrian Detection For Autonomous Vehicle Using Multi-Spectral Cameras
9 pages
81-120 Inetigacion
No ratings yet
81-120 Inetigacion
50 pages
Object Detection Using Deep Learning
No ratings yet
Object Detection Using Deep Learning
5 pages
Thesis Abstract Final
No ratings yet
Thesis Abstract Final
6 pages
1 Synopsis On Pedestrian Controlling On Zebra Crossing
No ratings yet
1 Synopsis On Pedestrian Controlling On Zebra Crossing
7 pages
Pedestrian Detection and Tracking
No ratings yet
Pedestrian Detection and Tracking
13 pages
Object Recognition and Detection With Deep Learning For Autonomous Driving Applications
No ratings yet
Object Recognition and Detection With Deep Learning For Autonomous Driving Applications
11 pages
Survey Published Object Detection and Crowd Analysis Using Deep Learning Techniques
No ratings yet
Survey Published Object Detection and Crowd Analysis Using Deep Learning Techniques
32 pages
Electronics 13 02790
No ratings yet
Electronics 13 02790
15 pages
Review of Deep Learning for Object Detection
No ratings yet
Review of Deep Learning for Object Detection
21 pages
Object Detection and Recognition System (Using TensorFlow)
No ratings yet
Object Detection and Recognition System (Using TensorFlow)
8 pages
Proceedingbook-Anas Mustafa
No ratings yet
Proceedingbook-Anas Mustafa
10 pages
Machine Learning for Object Detection
No ratings yet
Machine Learning for Object Detection
3 pages
Prior Knowledge in Weakly Supervised Detection
No ratings yet
Prior Knowledge in Weakly Supervised Detection
105 pages
From Classical Techniques To Convolution-Based Models: A Review of Object Detection Algorithms
No ratings yet
From Classical Techniques To Convolution-Based Models: A Review of Object Detection Algorithms
6 pages
Mini Project Report
No ratings yet
Mini Project Report
15 pages
CAPformer Pedestrian Crossing Action Prediction Us
No ratings yet
CAPformer Pedestrian Crossing Action Prediction Us
22 pages
Object Detectionwith Convolutional Neural Networks
No ratings yet
Object Detectionwith Convolutional Neural Networks
12 pages
Conclusion on Object Detection Advances
No ratings yet
Conclusion on Object Detection Advances
17 pages
Deep Learning Object Detection Survey
No ratings yet
Deep Learning Object Detection Survey
30 pages
1 PB
No ratings yet
1 PB
8 pages
Computer Vision Application
No ratings yet
Computer Vision Application
2 pages
Transformer For Object Detection Review and Benchmark
No ratings yet
Transformer For Object Detection Review and Benchmark
16 pages
Real Time Object Recognition and Classification
No ratings yet
Real Time Object Recognition and Classification
6 pages
An Investigation of Deep Neural Network Based Techniques For Object Detection An
No ratings yet
An Investigation of Deep Neural Network Based Techniques For Object Detection An
6 pages
Pedestrian Detection at Crossroads
No ratings yet
Pedestrian Detection at Crossroads
10 pages
Deep Learning Object Detection in TensorFlow
No ratings yet
Deep Learning Object Detection in TensorFlow
10 pages
The University of Texas at Austin: Vision-Based Pedestrian Detection For Driving Assistance
No ratings yet
The University of Texas at Austin: Vision-Based Pedestrian Detection For Driving Assistance
6 pages
StreetSign Transliteration SIH2024
No ratings yet
StreetSign Transliteration SIH2024
7 pages
Z-Dimensional v. Sakar International
No ratings yet
Z-Dimensional v. Sakar International
3 pages
CBSE Class 10 Science Answer Key 2025 - Download PDF Solutions For All Sets
No ratings yet
CBSE Class 10 Science Answer Key 2025 - Download PDF Solutions For All Sets
17 pages
What Is The Fuse Rating of A 500 kVA Transformer On A 33 KV HT Line? - Quora
No ratings yet
What Is The Fuse Rating of A 500 kVA Transformer On A 33 KV HT Line? - Quora
1 page
Development Economics Ch. 4&5
No ratings yet
Development Economics Ch. 4&5
30 pages
Work Breakdown Structure Guide
No ratings yet
Work Breakdown Structure Guide
24 pages
Eng Final Annual Report 2021 PC1 Compressed 1
No ratings yet
Eng Final Annual Report 2021 PC1 Compressed 1
105 pages
Study Notes Consumer Protection Act, 2019
No ratings yet
Study Notes Consumer Protection Act, 2019
40 pages
Connection - 2
No ratings yet
Connection - 2
12 pages
Glossary
No ratings yet
Glossary
1 page
MFI Business Plan Format Guide
No ratings yet
MFI Business Plan Format Guide
16 pages
Cost Volume Profit Analysis
67% (3)
Cost Volume Profit Analysis
12 pages
PE 3 4D B.inggris Tema QW Be Going To
No ratings yet
PE 3 4D B.inggris Tema QW Be Going To
1 page
Auto Ex
No ratings yet
Auto Ex
2 pages
Self Employed Schedule C
No ratings yet
Self Employed Schedule C
1 page
SAT Suite Question Bank - Results
No ratings yet
SAT Suite Question Bank - Results
15 pages
Ramsons Theory of Sexuality Agression Re PDF
No ratings yet
Ramsons Theory of Sexuality Agression Re PDF
177 pages
Week 13 Risk Management
No ratings yet
Week 13 Risk Management
15 pages
Comprehensive Analysis of Hypertension by Harrison
No ratings yet
Comprehensive Analysis of Hypertension by Harrison
12 pages
Philippines Cable Channel List
No ratings yet
Philippines Cable Channel List
1 page
Personal Development 2nd Quarter Exam
50% (2)
Personal Development 2nd Quarter Exam
3 pages
Ben 10 - Omniverse - Ben 10 Wiki - Fandom
No ratings yet
Ben 10 - Omniverse - Ben 10 Wiki - Fandom
13 pages
Honda Non-Gear Two-Wheeler Satisfaction Study
No ratings yet
Honda Non-Gear Two-Wheeler Satisfaction Study
72 pages
Real Estate Portfolio Optimization Insights
No ratings yet
Real Estate Portfolio Optimization Insights
13 pages
Berjaya Food Berhad-2016 Annual Report (Part 1)
No ratings yet
Berjaya Food Berhad-2016 Annual Report (Part 1)
40 pages
14268-019 Convocation Center M160-00 - 10 - 00-Solicitation - GC As Revised
No ratings yet
14268-019 Convocation Center M160-00 - 10 - 00-Solicitation - GC As Revised
4 pages
Afo, Nabard, RRB So Pyq (2016-2025)
No ratings yet
Afo, Nabard, RRB So Pyq (2016-2025)
143 pages
MICROPARA - Midterm
No ratings yet
MICROPARA - Midterm
4 pages
Biochem - Enzymes Report Script
No ratings yet
Biochem - Enzymes Report Script
5 pages
Registered Political Parties As of September 30, 2021
100% (1)
Registered Political Parties As of September 30, 2021
10 pages