Towards Open-Set Object Detection and Discovery
Towards Open-Set Object Detection and Discovery
Jiyang Zheng⋆† Weihao Li† Jie Hong⋆† Lars Petersson† Nick Barnes⋆
⋆
The Australian National University † Data61-CSIRO
†
[Link]@{⋆ [Link], [Link]}
3961
child’s perception system will learn from these previously Task Dataset Known classes Unknown classes
unseen animals’ appearances and cluster them into different ODL Open-Set Non-Action Loc/Cat
categories even without being told what species they are. OSOD Open-Set Detect Loc
In this work, we consider a new task, where we aim to OSODD (Ours) Open-Set Detect Loc/Cat
localise objects of both known and unknown classes, as-
sign pre-defined category labels for known objects, and dis- Table 1. Comparisons of different Object Detection and Discovery
cover new categories for objects of unknown classes (see tasks. OSOD: open-set object detection; ODL: Object discovery
Fig. 1(c)). We term this task Open-Set Object Detection and localization. Loc means localise the objects of interest; Cat
and Discovery (OSODD). We motivate our proposed task, means discover novel categories.
OSODD, by suggesting that it is better suited to extracting
classifier. Recently, Liu et al. [31] proposed a deep metric
information from images. New category discovery provides
learning method to identify unseen classes for imbalanced
additional knowledge of data belonging to classes not seen
datasets. Self-supervised learning [14, 35, 43] approaches
before, helping intelligent vision-based systems to handle
have been explored to minimise external supervision.
more realistic use cases.
Miller et al. [32] first investigate the utility of label
We propose a two-stage framework to tackle the prob-
uncertainty in object detection under open-set conditions
lem of OSODD. First, we leverage the ability of an open
using dropout sampling. Dhamija et al. [11] define the
set object detector to detect objects of known classes and
problem of open-set object detection (OSOD) and con-
identify objects of unknown classes. The predicted propos-
ducted a study on traditional object detectors for their
als of objects of known and unknown classes are saved to a
abilities in avoiding classifying objects of unknown classes
memory buffer; Second, we explore the recurring pattern of
into one of the known classes. An evaluation metric is also
all objects and discover new categories from objects of un-
provided to assess the performance of the object detector
known classes. Specifically, we develop a self-supervised
under the open-set condition.
contrastive learning approach with domain-agnostic data
augmentation and semi-supervised k-means clustering for
category discovery. Open-World Recognition. The open-world setting intro-
duced a continual learning paradigm that extends the open-
Our contributions: set condition by assuming new semantic classes are in-
• We formalise the task Open-Set Object Detection and troduced gradually at each incremental time step. Ben-
Discovery (OSODD), which enables a richer under- dale et al. [2] first formalise the open-world setting for im-
standing within real-world detection systems. age recognition and propose an open-set classifier using the
nearest non-outlier algorithm. The model evolves when new
• We propose a two-stage framework to tackle this prob- labels for the unknown are provided by re-calibrating the
lem, and we present a comprehensive protocol to eval- class probabilities.
uate the object detection and category discovery per- Joseph et al. [24] transfer the open-world setting to an
formance. object detection system and propose the task of open-world
object detection (OWOD). The model uses example replay
• We propose a category discovery method in our frame-
to make the open-set detector learn new classes incremen-
work using domain-agnostic augmentation, contrastive
tally without forgetting the previous ones. The OWOD or
learning and semi-supervised clustering. The novel
OSOD model cannot explore the semantics of the identified
method outperforms other baseline methods in experi-
unknown objects, and extra human annotation is required to
ments.
learn novel classes incrementally. In contrast, our OSODD
model can discover novel category labels for objects of
2. Related Work
unknown classes without human effort.
Open-Set Recognition. Compared with closed-set learn-
ing, which assumes that only previously known classes Novel Category Discovery. The novel category discovery
are present during testing, open-set learning assumes the task aims to identify similar recurring patterns in the unla-
co-existence of known and unknown classes. Scheirer et belled dataset. In image recognition, it was earlier viewed
al. [40] first introduce the problem of open-set recognition as an unsupervised clustering problem. Xie et al. [46] pro-
with incomplete knowledge at training time, i.e., unknown posed the deep embedding network that can cluster data
classes can appear during testing. They developed a classi- and at the same time learn a data representation. Han et
fier in a one-vs-rest setting, which enables the rejection of al. [18] formulated the task of novel class discovery (NCD),
unknown samples. [22, 41] extend the framework in [40] to which clusters the unlabelled images into novel categories
a multi-class classifier using probabilistic models with the using deep transfer clustering. The NCD setting assumes
extreme value theory to minimise fading confidence of the that the training set contains both labelled and unlabelled
3962
data, the knowledge learned on labelled data could be studies the recurring pattern of the objects from the memory
transferred to targeted unlabelled data for category discov- buffer and discovers novel categories in the working mem-
ery [13, 17, 23, 48, 52]. ory. We assign the predicted objects of unknown classes
Object discovery and localisation (ODL) [6,9,27–29,36] from the detector with novel category labels using the dis-
aims to jointly discover and localise dominant objects from covered categories. The visualisation is shown in Fig. 4.
an image collection with multiple object classes in an unsu- The OCD module explores the working memory to dis-
pervised manner. Lee and Grauman [27] used object-graph cover new visual categories. It consists of an encoder com-
and appearance features for unsupervised discovery. Ramb- ponent as the feature extractor and a discriminator which
hat et al. [36] assumed partial knowledge of class labels and clusters the object representations. To train the encoder, we
conducted the discovery leveraging a dual memory module. first retrieve the predicted objects from known classes saved
Compared to ODL, our OSODD both performs detection in the known memory and the identified objects of unknown
on previously known classes and discovers novel categories classes saved in working memory. Then, these instance
for unknown objects, which provide a comprehensive scene samples are transformed using class-agnostic augmentation
understanding. to create a generalised view over the data [10, 26, 51]. We
Please refer to Tab. 1 to see the summarised differences use unsupervised contrastive learning where the predicted
between our setting and other similar settings in the object labels for the objects of known classes are ignored, the pair-
detection problem. wise contrastive loss [33] penalises dissimilarity of the same
object in different views regardless of the semantic informa-
3. Task Format tion. The contrastive learning enables the encoder to learn
a more discriminating feature representation in the latent
In this section, we formulate the task of Open-Set space [7, 19]. Lastly, with the learned feature space from
Detection and Discovery (OSODD). We have a set of the encoder, the discriminator clusters the object embed-
known object classes Ck = {C1 , C2 , · · · , Cm }, and ding into novel categories by using the constrained k-means
there exists a set of unknown visual categories Cu = clustering algorithm [44].
{Cm+1 , Cm+2 , · · · , Cm+n }, where Ck ∩ Cu = ∅. The
training dataset contains objects from Ck , and the testing 4.1. Object Detection and Retrieval
dataset contains objects from Ck ∪ Cu . An object instance Open-Set Object Detector. An open-set object detector
I is represented by I = [c, x, y, w, h], denoting the class predicts the location of all objects of interest. Then it clas-
label (c ∈ Ck or Cu ), the top-left x, y coordinates, and the sifies the objects into semantic classes and identifies the un-
width and height from the centre of the object bounding box seen objects as unknown (See ‘OSOD’ in Fig. 3).
respectively. A model is trained to localise all objects of in- We use the Faster RCNN architecture [38] as the base-
terest. Then, it classifies objects of a known class as one line model, following ORE [24]. Leveraging the class-
of Ckt and clusters objects of an unknown class into novel agnostic property of the region proposal network, we utilise
visual categories Cut . an unknown-aware RPN to identify unknown objects. The
unknown-aware RPN labels the proposals that have high
4. Our Approach scores but do not overlap with any ground-truth bounding
box as the potential unknown objects. To learn a more dis-
This section describes our approach for tackling OS-
criminative representation for each class, we use a prototype
ODD, beginning with an overview of our framework. We
based constrictive loss on the feature vectors fc generated
propose a generic framework consisting of two main mod-
by an intermediate layer in the ROI pooling head. A class
ules, Object Detection and Retrieval (ODR) and Object Cat-
prototype pi is computed by the moving average of the class
egory Discovery (OCD) (see Fig. 2).
instance representations, and the features fc of objects will
The ODR module uses an open-set object detector with
keep approaching their class prototype in the latent space.
a dual memory buffer for object instances detection and re-
The objective is formulated as:
trieval. The detector predicts objects of known classes with
their semantic labels from Ck and the location information, c
X
where the unknown objects are localised but with no seman- ℓpcl (fc ) = ℓ(fc , pi )
tic information available. We store the predicted objects in i=0
( (1)
the memory buffer [36], which is used to explore novel cate- ∥fc , pi ∥ if i = c
gories. The buffer is divided into two parts: known memory ℓ(fc , pi ) =
max (0, ∆ − ∥fc , pi ∥) otherwise
and working memory. The known memory contains pre-
dicted objects of known classes with semantic labels; the where fc is the feature vector of class c, pi is the prototype
working memory stores all current identified objects of un- of class i, ∥f, p∥ measures the distance between feature vec-
known classes without categorical information. The model tors and ∆ is a fixed value that defines the maximum dis-
3963
Unsupervised Constrained
Object-wise Novel
Image set S mix-up k-means
Predictions contrastive learning categories
augmentation clustering
Unknown
ENCODER Aware RPN
Known Unknown
objects objects
ROI Head
Object and its Other Objects
augmented version
OSODD Prediction
OSOD Prediction Novel Category Label
Figure 2. Illustration of the two-stage method for Open-Set Object Detection and Discovery (OSODD). The first stage includes detecting
objects of known classes and identifying objects of unknown classes using an open-set object detector. The instances of unknown classes
are saved into the working memory for category discovery. The instances of known classes are saved into the known memory with their
predicted semantic categories to assist the representation learning and clustering. The second stage pre-processes the objects from the
memory buffer in an unsupervised manner. The representations of these saved objects are first learned in the latent space by contrastive
learning, followed by a constrained k-means clustering used to find the novel categories beyond the known classes. Lastly, we update the
open-set detection predictions with the novel category labels to generate the final OSODD prediction (See visualisations in Figs. 3 and 4).
tance for dissimilar pairs. The total loss for the region of
interest pooling is defined as:
3964
as the positive key, and k − is the representations of other Task-1 Task-2 Task-3
an unsupervised augmentation strategy [26] which replaces Known/Unknown Class 20/60 40/40 60/20
all samples with mixed samples. It minimises the vicinal Training Set 16551 45520 39402
risk [5] which discriminates classes with very different pat- Validation Set 1000
tern distributions and create more training samples [47]. For Test Set 4952
each sample in the queue {k}, we combine it with the query
object representation q via linear interpolation and generate Table 2. Details of class split for the Benchmark. Task-1, Task-
a new view km,i . Correspondingly, a new virtual label vi 2 and Task-3 have different dataset splits of known and unknown
for the ith mix sample xm,i is defined as: classes.
3965
standard mean average precision (mAP) at IoU threshold energy-based classifier to discriminate the representations
of 0.5 [38]. To show the incremental learning ability, of known and unknown data. Our generic framework could
we provide the mAP measurement for the newly in- cooperate with any open-set object detector, hence it is
troduced known classes and previously known classes highly flexible.
separately [24, 34].
Category Discovery Baselines. We compare our novel
Category Discovery Metrics. Category discovery can be method with three baseline methods, including k-means,
evaluated using clustering metrics [18, 21, 27, 36, 44, 50]. FINCH [39] and a modified approach from DTC [18].
We adopt the three most commonly used clustering met- K-means clustering is a non-parametric clustering
rics for our object-based category discovery performance. method that minimises within-cluster variances. In every
Suppose a predicted proposal of an object of an unknown iteration, the algorithm first assigns the data points to the
class has matched to a ground truth unknown object. Let cluster with the minimum pairwise squared deviations be-
the predicted category label of the object proposal be yˆi , tween samples and centroids; then, it updates cluster cen-
the ground truth label for the object is denoted as yi . We troids with the current data points belonging to the cluster.
calculate the clustering accuracy (ACC) [18] by: FINCH [39] is a parameter-free clustering method that
discovers linking chains in the data by using the first near-
N
1 X est neighbour. The method directly develops the grouping
ACC = max 1{yi = p(yˆi )} (6)
p∈Py N of data without any external parameters. To make a fair
i=1
comparison, we set the number of clusters to the same as
where N is the number of clusters, and Py is the set of all the other baseline methods. We discuss the performance
permutations of the unknown class labels. of FINCH in estimating the number of novel classes in
Mutual Information I(X, Y ) quantifies the correlation Sec. 6.2.1.
between two random variables X and Y . The range of DTC+, the DTC method [18] is proposed for NCD prob-
I(X, Y ) is from 0 (Independent) to +∞. Normalised mu- lems [16], where the setting assumes the availability of un-
tual information (NMI) [42] is bounded in the range [0, 1]. labelled data at the training phase. The algorithm modifies
Let Cl be the set of ground truth classes, and Cl
c be the set deep embedded clustering [46] to learn knowledge of the
of predicted clusters. The NMI is formulated as: labelled subset and transfer it to the unlabelled subset. This
setting requires the unlabelled data in the training and test-
I(Cl, Cl)
c ing set to be from the same classes. However, no unknown
NMI = (7)
[H(Cl) + H(Cl)]/2
c instances are available in training under the open-set detec-
tion setting. Hence, the NCD-based approaches, such as
where I(Cl, Cl)
c is the sum of mutual information between DTC cannot be directly applied to our problem. To facil-
each class-cluster pair. H(Cl) and H(Cl)
c compute the en- itate the method in our settings, we modify it by transfer-
tropy using maximum likelihood estimation. The Purity of ring a portion of the classes from the known memory to the
the clusters is defined as: working memory during training and treating them as addi-
tional unknown classes. We evaluate DTC’s generalisation
N performance on our problem in Sec. 6.2.3.
1 X
Purity = max |Clk ∩ Cl
ci | (8)
N i=1 k 6.2. Experimental Results
Here, N is the number of clusters and max is the highest We report the quantitative results of the novel category
count of objects for a single class within each cluster. number estimation, object detection and novel category dis-
covery performance in Secs. 6.2.1 to 6.2.3. We show and
6. Results and Analysis discuss the qualitative results in Fig. 4 and in the supple-
mentary material.
6.1. Baselines
Object Detection Baselines. Our framework uses an 6.2.1 Novel Category Number Estimation
open-set object detector for known and unknown instance
detection. We compare two recent approaches: Faster- We show the results of estimating the number of novel
RCNN+ [24] and ORE [24]. The Faster-RCNN+ is categories in Tab. 3. The middle two columns show the
a popular two-stage object detection method, which is automatically discovered grouping by the FINCH algo-
modified from Faster RCNN [38] to localise objects of un- rithm [39]. The numbers are under-estimated by a large
known classes by additionally adapting an unknown-aware margin of 30%, 32.5% and 40% respectively. The last two
regional proposal. ORE uses contrastive clustering and an columns show the result using DTC [18]. It is found that
3966
Task GT FINCH [39] Error Est. [18] Error Task-1 Task-2 Task-3
Method mAP UDR UDP mAP UDR UDP mAP UDR UDP
1 60 42 30% 48 20%
F-RCNN + -/ 56.16 20.14 - 51.09/ 23.84 21.54 - 35.69/ 11.53 30.01 -
2 40 27 32.5% 31 22.5%
ORE [24] -/ 56.02 20.10 36.74 52.19/ 25.03 22.63 21.51 37.23/ 12.02 31.82 23.55
3 20 12 40% 16 20%
3967
Figure 4. Visualisation of OSODD predictions for Task-1. The tennis racket, stop sign, fire hydrant, clock, giraffe and zebra are the novel
classes that have not been introduced at this stage. The same bounding box colour indicates objects that belong to the same class or novel
category. The last column demonstrates a failure case where a giraffe is not detected, and one of the zebras is assigned to the wrong visual
category. More visualised results are provided in the supplementary material.
Table 7. Ablation Study on components of our proposed category discovery method. The complete method with all the proposed modules
achieves the best-aggregated performance in all tasks, which shows the importance of each component contributing to the method.
semi-supervised clustering in Case III-1, III-2 and IV. the supplementary material.
In Case III-1. We make the clustering algorithm fully
unsupervised by removing the labelled centroids and 7. Conclusion
instances. The results decrease by around 8% in all tasks.
Since the FINCH algorithm [39] shows a competitive In this work, we propose a framework to detect known
result in Tab. 5 and Tab. 6. In Case III-2, we replace the objects and discover novel visual categories for unknown
semi-supervised clustering with the FINCH algorithm. The objects. We term this task Open-Set Object Detection and
results show that Case IV outperforms Case III-2 in the Discovery (OSODD), as a natural extension of open-set ob-
task aggregation scores, which indicates our model bet- ject detection tasks. We develop a two-stage framework and
ter clusters the samples with the same learned feature space. a novel method for label assignment, outperforming other
popular baselines. Compared to detection and discovery
tasks, OSODD can provide more comprehensive informa-
Memory Module. To show the effects of the current mem- tion for real-world practices. We hope our work will con-
ory design, we ablate the module by removing the known tribute to the object detection community and motivate fur-
memory in representation learning. We report the results in ther research in this area.
3968
References [15] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE inter-
national conference on computer vision, pages 1440–1448,
[1] David Arthur and Sergei Vassilvitskii. k-means++: The 2015. 1
advantages of careful seeding. Technical report, Stanford, [16] Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, An-
2006. 5 drea Vedaldi, and Andrew Zisserman. Automatically discov-
[2] Abhijit Bendale and Terrance Boult. Towards open world ering and learning new visual categories with ranking statis-
recognition. In Proceedings of the IEEE conference on tics. arXiv preprint arXiv:2002.05714, 2020. 6
computer vision and pattern recognition, pages 1893–1902, [17] Kai Han, Sylvestre-Alvise Rebuffi, Sebastien Ehrhardt, An-
2015. 2 drea Vedaldi, and Andrew Zisserman. Autonovel: Automati-
[3] Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delv- cally discovering and learning novel visual categories. IEEE
ing into high quality object detection. In Proceedings of the Transactions on Pattern Analysis and Machine Intelligence,
IEEE conference on computer vision and pattern recogni- 2021. 3
tion, pages 6154–6162, 2018. 1 [18] Kai Han, Andrea Vedaldi, and Andrew Zisserman. Learning
[4] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas to discover novel visual categories via deep transfer cluster-
Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- ing. In Proceedings of the IEEE/CVF International Confer-
end object detection with transformers. In European confer- ence on Computer Vision (ICCV), October 2019. 2, 4, 6, 7
ence on computer vision, pages 213–229. Springer, 2020. 1 [19] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross
[5] Olivier Chapelle, Jason Weston, Léon Bottou, and Vladimir Girshick. Momentum contrast for unsupervised visual rep-
Vapnik. Vicinal risk minimization. Advances in neural in- resentation learning. In Proceedings of the IEEE/CVF Con-
formation processing systems, 13, 2000. 5 ference on Computer Vision and Pattern Recognition, pages
[6] Jia Chen, Yasong Chen, Weihao Li, Guoqin Ning, Ming- 9729–9738, 2020. 3, 4
wen Tong, and Adrian Hilton. Channel and spatial attention [20] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Gir-
based deep object co-segmentation. Knowledge-Based Sys- shick. Mask r-cnn. In Proceedings of the IEEE international
tems, 211:106550, 2021. 3 conference on computer vision, pages 2961–2969, 2017. 1
[7] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Ge- [21] Jie Hong, Weihao Li, Junlin Han, Jiyang Zheng, Pengfei
offrey Hinton. A simple framework for contrastive learning Fang, Mehrtash Harandi, and Lars Petersson. Goss: Towards
of visual representations. In International conference on ma- generalized open-set semantic segmentation. arXiv preprint
chine learning, pages 1597–1607. PMLR, 2020. 3 arXiv:2203.12116, 2022. 6
[8] Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. [22] Lalit P Jain, Walter J Scheirer, and Terrance E Boult. Multi-
Improved baselines with momentum contrastive learning. class open set recognition using probability of inclusion. In
arXiv preprint arXiv:2003.04297, 2020. 4 European Conference on Computer Vision, pages 393–409.
Springer, 2014. 2
[9] Minsu Cho, Suha Kwak, Cordelia Schmid, and Jean Ponce.
[23] Xuhui Jia, Kai Han, Yukun Zhu, and Bradley Green.
Unsupervised object discovery and localization in the wild:
Joint representation learning and novel category discovery
Part-based matching with bottom-up region proposals. In
on single-and multi-modal data. In Proceedings of the
Proceedings of the IEEE conference on computer vision and
IEEE/CVF International Conference on Computer Vision,
pattern recognition, pages 1201–1210, 2015. 3
pages 610–619, 2021. 3
[10] Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasude- [24] KJ Joseph, Salman Khan, Fahad Shahbaz Khan, and Vi-
van, and Quoc V Le. Autoaugment: Learning augmentation neeth N Balasubramanian. Towards open world object detec-
policies from data. arXiv preprint arXiv:1805.09501, 2018. tion. In Proceedings of the IEEE/CVF Conference on Com-
3 puter Vision and Pattern Recognition, pages 5830–5840,
[11] Akshay Dhamija, Manuel Gunther, Jonathan Ventura, and 2021. 1, 2, 3, 6, 7
Terrance Boult. The overlooked elephant of object detection: [25] Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and
Open set. In Proceedings of the IEEE/CVF Winter Confer- F Huang. A tutorial on energy-based learning. Predicting
ence on Applications of Computer Vision, pages 1021–1030, structured data, 1(0), 2006. 4
2020. 1, 2, 5 [26] Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo
[12] Mark Everingham, Luc Van Gool, Christopher KI Williams, Shin, and Honglak Lee. I-mix: A domain-agnostic strat-
John Winn, and Andrew Zisserman. The pascal visual object egy for contrastive representation learning. arXiv preprint
classes (voc) challenge. International journal of computer arXiv:2010.08887, 2020. 3, 5
vision, 88(2):303–338, 2010. 5 [27] Yong Jae Lee and Kristen Grauman. Object-graphs for
[13] Enrico Fini, Enver Sangineto, Stéphane Lathuilière, Zhun context-aware category discovery. In 2010 IEEE Computer
Zhong, Moin Nabi, and Elisa Ricci. A unified objective for Society Conference on Computer Vision and Pattern Recog-
novel class discovery. In Proceedings of the IEEE/CVF Inter- nition, pages 1–8. IEEE, 2010. 3, 6
national Conference on Computer Vision, pages 9284–9292, [28] Weihao Li, Omid Hosseini Jafari, and Carsten Rother. Deep
2021. 3 object co-segmentation. In Asian Conference on Computer
[14] Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Un- Vision, pages 638–653. Springer, 2018. 3
supervised representation learning by predicting image rota- [29] Weihao Li, Omid Hosseini Jafari, and Carsten Rother. Local-
tions. In ICLR, 2018. 2 izing common objects using common component activation
3969
map. In Proceedings of the IEEE/CVF Conference on Com- tions. Journal of machine learning research, 3(Dec):583–
puter Vision and Pattern Recognition (CVPR) Workshops, 617, 2002. 6
June 2019. 3 [43] Jihoon Tack, Sangwoo Mo, Jongheon Jeong, and Jinwoo
[30] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Shin. Csi: Novelty detection via contrastive learning on dis-
Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence tributionally shifted instances. In NeurIPS, 2020. 2
Zitnick. Microsoft coco: Common objects in context. In [44] Sagar Vaze, Kai Han, Andrea Vedaldi, and Andrew Zis-
European conference on computer vision, pages 740–755. serman. Generalized category discovery. arXiv preprint
Springer, 2014. 5 arXiv:2201.02609, 2022. 3, 5, 6, 7
[31] Ziwei Liu, Zhongqi Miao, Xiaohang Zhan, Jiayun Wang, [45] Xin Wang, Thomas E Huang, Trevor Darrell, Joseph E Gon-
Boqing Gong, and Stella X Yu. Large-scale long-tailed zalez, and Fisher Yu. Frustratingly simple few-shot object
recognition in an open world. In Proceedings of the detection. arXiv preprint arXiv:2003.06957, 2020. 1
IEEE/CVF Conference on Computer Vision and Pattern [46] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised
Recognition, pages 2537–2546, 2019. 2 deep embedding for clustering analysis. In International
[32] Dimity Miller, Lachlan Nicholson, Feras Dayoub, and Niko conference on machine learning, pages 478–487. PMLR,
Sünderhauf. Dropout sampling for robust object detection 2016. 2, 6
in open-set conditions. In 2018 IEEE International Confer- [47] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and
ence on Robotics and Automation (ICRA), pages 3243–3249. David Lopez-Paz. mixup: Beyond empirical risk minimiza-
IEEE, 2018. 1, 2 tion. arXiv preprint arXiv:1710.09412, 2017. 5
[33] Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Repre- [48] Bingchen Zhao and Kai Han. Novel visual category discov-
sentation learning with contrastive predictive coding. arXiv ery with dual ranking statistics and mutual knowledge distil-
preprint arXiv:1807.03748, 2018. 3, 4 lation. Advances in Neural Information Processing Systems,
[34] Can Peng, Kun Zhao, and Brian C Lovell. Faster ilod: In- 34, 2021. 3
cremental learning for object detectors based on faster rcnn. [49] Xiaowei Zhao, Xianglong Liu, Yifan Shen, Yuqing Ma, Yix-
Pattern Recognition Letters, 140:109–115, 2020. 6 uan Qiao, and Duorui Wang. Revisiting open world object
[35] Pramuditha Perera, Vlad I Morariu, Rajiv Jain, Varun Man- detection. arXiv preprint arXiv:2201.00471, 2022. 5
junatha, Curtis Wigington, Vicente Ordonez, and Vishal M [50] Zhun Zhong, Enrico Fini, Subhankar Roy, Zhiming Luo,
Patel. Generative-discriminative feature representations for Elisa Ricci, and Nicu Sebe. Neighborhood contrastive
open-set recognition. In Proceedings of the IEEE/CVF Con- learning for novel class discovery. In Proceedings of the
ference on Computer Vision and Pattern Recognition, pages IEEE/CVF Conference on Computer Vision and Pattern
11814–11823, 2020. 2 Recognition, pages 10867–10875, 2021. 6
[36] Sai Saketh Rambhatla, Rama Chellappa, and Abhinav Shri- [51] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and
vastava. The pursuit of knowledge: Discovering and local- Yi Yang. Random erasing data augmentation. In Proceedings
izing novel categories using dual memory. arXiv preprint of the AAAI Conference on Artificial Intelligence, volume 34,
arXiv:2105.01652, 2021. 3, 6 pages 13001–13008, 2020. 3
[37] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali [52] Zhun Zhong, Linchao Zhu, Zhiming Luo, Shaozi Li, Yi
Farhadi. You only look once: Unified, real-time object de- Yang, and Nicu Sebe. Openmix: Reviving known knowledge
tection. In Proceedings of the IEEE conference on computer for discovering novel visual categories in an open world. In
vision and pattern recognition, pages 779–788, 2016. 1 Proceedings of the IEEE/CVF Conference on Computer Vi-
[38] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. sion and Pattern Recognition, pages 9462–9470, 2021. 3
Faster r-cnn: towards real-time object detection with region
proposal networks. IEEE transactions on pattern analysis
and machine intelligence, 39(6):1137–1149, 2016. 1, 3, 6
[39] Saquib Sarfraz, Vivek Sharma, and Rainer Stiefelhagen. Effi-
cient parameter-free clustering using first neighbor relations.
In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pages 8934–8943, 2019. 6,
7, 8
[40] Walter J Scheirer, Anderson de Rezende Rocha, Archana
Sapkota, and Terrance E Boult. Toward open set recogni-
tion. IEEE transactions on pattern analysis and machine
intelligence, 35(7):1757–1772, 2012. 2
[41] Walter J Scheirer, Lalit P Jain, and Terrance E Boult. Prob-
ability models for open set recognition. IEEE transactions
on pattern analysis and machine intelligence, 36(11):2317–
2324, 2014. 2
[42] Alexander Strehl and Joydeep Ghosh. Cluster ensembles—a
knowledge reuse framework for combining multiple parti-
3970