0% found this document useful (0 votes)

40 views10 pages

28 - Deep Bayesian Active Learning With Image Data

Uploaded by

Chee Zhen Qi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views10 pages

28 - Deep Bayesian Active Learning With Image Data

Uploaded by

Chee Zhen Qi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Deep Bayesian Active Learning with Image Data

Yarin Gal 1 2 Riashat Islam 1 Zoubin Ghahramani 1

Abstract amount of data (the initial training set), and an acquisition

Even though active learning forms an important function (often based on the model’s uncertainty) decides
pillar of machine learning, deep learning tools which data points to ask an external oracle for a label. The
acquisition function selects one or more points from a pool
arXiv:1703.02910v1 [cs.LG] 8 Mar 2017

are not prevalent within it. Deep learning poses

several difficulties when used in an active learn- of unlabelled data points, with the pool points lying outside
ing setting. First, active learning (AL) methods of the training set. An oracle (often a human expert) labels
generally rely on being able to learn and update the selected data points, these are added to the training set,
models from small amounts of data. Recent ad- and a new model is trained on the updated training set. This
vances in deep learning, on the other hand, are no- process is then repeated, with the training set increasing
torious for their dependence on large amounts of in size over time. The advantage of such systems is that
data. Second, many AL acquisition functions rely they often result in dramatic reductions in the amount of
on model uncertainty, yet deep learning methods labelling required to train an ML system (and therefore cost
rarely represent such model uncertainty. In this and time).
paper we combine recent advances in Bayesian Even though existing techniques for active learning have
deep learning into the active learning framework proven themselves useful in a variety of tasks, a major re-
in a practical way. We develop an active learn- maining challenge in active learning is its lack of scalability
ing framework for high dimensional data, a task to high-dimensional data (Tong, 2001). This data appears of-
which has been extremely challenging so far, with ten in image form, with a physician classifying MRI scans to
very sparse existing literature. Taking advantage diagnose Alzheimer’s for example (Marcus et al., 2010), or
of specialised models such as Bayesian convolu- an expert clinician diagnosing skin cancer from dermoscopic
tional neural networks, we demonstrate our active lesion images. To perform active learning, a model has to
learning techniques with image data, obtaining a be able to learn from small amounts of data and represent
significant improvement on existing active learn- its uncertainty over unseen data. This severely restricts the
ing approaches. We demonstrate this on both the class of models that can be used within the active learning
MNIST dataset, as well as for skin cancer diagno- framework. As a result most approaches to active learning
sis from lesion images (ISIC2016 task). have focused on low dimensional problems (Tong, 2001;
Hernandez-Lobato & Adams, 2015), with only a handful
of exceptions (Zhu et al., 2003; Holub et al., 2008; Joshi
1. Introduction et al., 2009) relying on kernel or graph-based approaches to
A big challenge in many machine learning applications handle high-dimensional data.
is obtaining labelled data. This can be a long, laborious, In recent years, with the increased availability of data in
and costly process, often making the deployment of ML some domains, attention within the machine learning com-
systems uneconomical. A framework where a system could munity has shifted from small data problems to big data
learn from small amounts of data, and choose by itself what problems (Sundermeyer et al., 2012; Krizhevsky et al.,
data it would like the user to label, would make machine 2012; Kalchbrenner & Blunsom, 2013; Sutskever et al.,
learning much more widely applicable. Such frameworks 2014). And with the increased interest in big data problems,
for learning are referred to as active learning (Cohn et al., new tools were developed and existing tools were refined
1996) (also known as “experiment design” in the statistics for handling high dimensional data within such regimes.
literature), and have been used successfully in fields such as Deep learning, and convolutional neural networks (CNNs)
medical diagnosis, microbiology, and manufacturing (Tong, (Rumelhart et al., 1985; LeCun et al., 1989) in particular, are
2001). In active learning, a model is trained on a small an example of such tools. Originally developed in 1989 to
1
University of Cambridge, UK 2 The Alan Turing Institute, UK.
parse handwritten zip codes, these tools have flourished and
Correspondence to: Yarin Gal <[email protected]>. were adapted to a point where a CNN is able to beat a hu-
man on object recognition tasks (given enough training data)
Deep Bayesian Active Learning with Image Data

(He et al., 2015). New techniques such as dropout (Hinton RBF kernel. Lastly, making use of unlabelled data as well,
et al., 2012; Srivastava et al., 2014) are used extensively to Zhu et al. (2003) acquire points using a Gaussian random
regularise these huge models, which often contain millions field model, evaluating an RBF kernel over raw images. We
of parameters (Jozefowicz et al., 2016). But even though ac- compare to this last technique and explain it in more detail
tive learning forms an important pillar of machine learning, below.
deep learning tools are not prevalent within it. Deep learn-
Other related work includes semi-supervised learning of im-
ing poses several difficulties when used in an active learning
age data (Weston et al., 2012; Kingma et al., 2014; Rasmus
setting. First, we have to be able to handle small amounts of
et al., 2015). In semi-supervised learning a model is given a
data. Recent advances in deep learning, on the other hand,
fixed set of labelled data, and a fixed set of unlabelled data.
are notorious for their dependence on large amounts of data
The model can use the unlabelled data to learn about the
(Krizhevsky et al., 2012). Second, many AL acquisition
distribution of the inputs, in the hopes that this information
functions rely on model uncertainty. But in deep learning
will aid in learning from the small labelled set as well. Al-
we rarely represent such model uncertainty.
though the learning paradigm is fairly different from active
Relying on Bayesian approaches to deep learning, in this learning, this research forms the closest modern literature
paper we combine recent advances in Bayesian deep learn- to active learning of image data. We will compare to these
ing into the active learning framework in a practical way. techniques below as well, in section 5.4.
We develop an active learning framework for high dimen-
sional data, a task which has been extremely challenging 3. Bayesian Convolutional Neural Networks
so far with very sparse existing literature from the past 15
years (Zhu et al., 2003; Li & Guo, 2013; Holub et al., 2008; In this paper we concentrate on high dimensional image
Joshi et al., 2009). Taking advantage of specialised models data, and need a model able to represent prediction uncer-
such as Bayesian convolutional neural networks (BCNNs) tainty on such data. Existing approaches such as (Zhu et al.,
(Gal & Ghahramani, 2016a;b), we demonstrate our active 2003; Li & Guo, 2013; Joshi et al., 2009) rely on kernel
learning techniques with image data. Using a small model, methods, and feed image pairs through linear, polynomial,
our system is able to achieve 5% test error on MNIST with and RBF kernels to capture image similarity as an input to
only 295 labelled images without relying on unlabelled data an SVM for example. In contrast, we rely on specialised
(in comparison, 835 labelled images are needed to achieve models for image data, and in particular on convolutional
5% test error using random sampling – requiring an expert neural networks (CNNs) (Rumelhart et al., 1985; LeCun
to label more than twice as many images to achieve the et al., 1989). Unlike the kernels above, which cannot cap-
same accuracy), and achieves 1.64% test error with 1000 ture spatial information in the input image, CNNs are de-
labelled images. This is in comparison to 2.40% test er- signed to use this spatial information, and have been used
ror of DGN (Kingma et al., 2014) or 1.53% test error of successfully to achieve state-of-the-art results (Krizhevsky
the Ladder Network Γ-model (Rasmus et al., 2015), both et al., 2012). To perform active learning with image data
semi-supervised learning techniques which additionally use we make use of the Bayesian equivalent of CNNs, proposed
the entire unlabelled training set. Finally, we study a real- in (Gal & Ghahramani, 2016a)1 . These Bayesian CNNs are
world application by diagnosing melanoma (skin cancer) CNNs with prior probability distributions placed over a set
from a small number of lesion images by fine-tuning the of model parameters ω = {W1 , ..., WL }:
VGG16 convolutional neural network (Simonyan & Zisser- ω ∼ p(ω),
man, 2015) on the ISIC 2016 dataset (Gutman et al., 2016). with for example a standard Gaussian prior p(ω). We further
define a likelihood model
2. Related Research p(y = c|x, ω) = softmax(f ω (x))
Past attempts at active learning of image data have concen- for the case of classification, or a Gaussian likelihood for
trated on kernel based methods. Using ideas from previous the case of regression, with f ω (x) model output (with pa-
research in active learning of low dimensional data (Tong, rameters ω).
2001), Joshi et al. (2009) used “margin-based uncertainty” To perform approximate inference in the Bayesian CNN
and extracted probabilistic outputs from support vector ma- model we make use of stochastic regularisation techniques
chines (SVM) (Cortes & Vapnik, 1995). They used linear, such as dropout (Hinton et al., 2012; Srivastava et al., 2014),
polynomial, and Radial Basis Function (RBF) kernels on originally used to regularise these models. As shown in
the raw images, picking the kernel that gave best classifica- (Gal & Ghahramani, 2016b; Gal, 2016) dropout and various
tion accuracy. Analogously to SVM approaches, Li & Guo
1
(2013) used Gaussian processes (GPs) with RBF kernels As far as we are aware, there are no other tools in current
literature that offer model uncertainty in specialised models for
to get model uncertainty. However Li & Guo (2013) fed
image data, which perform as well as CNNs.
low dimensional features (such as SIFT features) to their
Deep Bayesian Active Learning with Image Data

other stochastic regularisation techniques can be used to tropy (Max Entropy, (Shannon, 1948))
perform practical approximate inference in complex deep H[y|x, Dtrain ] :=
models. Inference is done by training a model with dropout X
before every weight layer, and by performing dropout at − p(y = c|x, Dtrain ) log p(y = c|x, Dtrain ).
c
test time as well to sample from the approximate posterior
(stochastic forward passes, referred to as MC dropout). 2. Choose pool points that are expected to maximise the
More formally, this approach is equivalent to performing information gained about the model parameters, i.e.
approximate variational inference where we find a distri- maximise the mutual information between predictions
bution qθ∗ (ω) in a tractable family which minimises the and model posterior (BALD, (Houlsby et al., 2011))

Kullback-Leibler (KL) divergence to the true model poste- I[y, ω|x, Dtrain ] = H[y|x, Dtrain ]−Ep(ω|Dtrain ) H[y|x, ω]
rior p(ω|Dtrain ) given a training set Dtrain . Dropout can be with ω the model parameters (here H[y|x, ω] is the
interpreted as a variational Bayesian approximation, where entropy of y given model weights ω). Points that max-
the approximating distribution is a mixture of two Gaussians imise this acquisition function are points on which the
with small variances and the mean of one of the Gaussians model is uncertain on average, but there exist model
is fixed at zero. The uncertainty in the weights induces pre- parameters that produce disagreeing predictions with
diction uncertainty by marginalising over the approximate high certainty. This is equivalent to points with high
posterior using Monte Carlo integration: variance in the input to the softmax layer (the logits)
Z
– thus each stochastic forward pass through the model
p(y = c|x, Dtrain ) = p(y = c|x, ω)p(ω|Dtrain )dω
would have the highest probability assigned to a differ-
ent class.
Z
≈ p(y = c|x, ω)qθ∗ (ω)dω
3. Maximise the Variation Ratios (Freeman, 1965)
T
1X variation-ratio[x] := 1 − max p(y|x, Dtrain )
≈ p(y = c|x, ω
b t) y
T t=1
Like Max Entropy, Variation Ratios measures lack of
b t ∼ qθ∗ (ω), where qθ (ω) is the Dropout distribution
with ω confidence.
(Gal, 2016).
Bayesian CNNs work well with small amounts of data (Gal 4. Maximise mean standard deviation (Mean STD)
& Ghahramani, 2016a), and possess uncertainty information (Kampffmeyer et al., 2016; Kendall et al., 2015)
q
that can be used with existing acquisition functions (Gal, σc = Eq(ω) [p(y = c|x, ω)2 ] − Eq(ω) [p(y = c|x, ω)]2
2016). Such acquisition functions for the case of classifica-
1 X
tion are discussed next. σ(x) = σc
C c
averaged over all c classes x can take. Compared to the
4. Acquisition Functions and their above acquisition functions, this is more of an ad-hoc
Approximations technique used in recent literature.
Given a model M, pool data Dpool , and inputs x ∈ Dpool ,
5. Random acquisition (baseline): a(x) = unif() with
an acquisition function a(x, M) is a function of x that the
unif() a function returning a draw from a uniform dis-
AL system uses to decide where to query next:
tribution over the interval [0, 1]. Using this acquisition
x∗ = argmaxx∈Dpool a(x, M). function is equivalent to choosing a point uniformly at
We next explore various acquisition functions appropriate random from the pool.
for our image data setting, and develop tractable approxi-
mations for us to use with our Bayesian CNNs. In tasks
These acquisition functions and their properties are dis-
involving regression we often use the predictive variance or
cussed in more detail in (Gal, 2016, pp. 48–52).
a quantity derived from this for our acquisition function (al-
though we still need to be careful to query from informative We can approximate each of these acquisition functions
areas rather than querying noise). For example, we might using our approximate distribution qθ∗ (ω). For BALD, for
look for images with high predictive variance and choose example, we can write the acquisition function as follows:
those to ask an expert to label – in the hope that these will

I[y, ω|x, Dtrain ] := H[y|x, Dtrain ] − Ep(ω|Dtrain ) H[y|x, ω]
decrease model uncertainty. However, many tasks involving X
image data are often phrased as classification problems. For =− p(y = c|x, Dtrain ) log p(y = c|x, Dtrain )
c
classification, several acquisition functions are available: X
+ Ep(ω|Dtrain ) p(y = c|x, ω) log p(y = c|x, ω) ,
1. Choose pool points that maximise the predictive en- c
Deep Bayesian Active Learning with Image Data

with c the possible classes y can take. I[y, ω|x, Dtrain ] can parison to a current technique for active learning with image
be approximatedR in our setting using the identity p(y = data, which relies on SVMs. We follow with a comparison to
c|x, Dtrain ) = p(y = c|x, ω)p(ω|Dtrain )dω: the closest modern models to our active learning with image
I[y, ω|x, Dtrain ] = data – semi-supervised techniques with image data. These
XZ semi-supervised techniques have access to much more data
− p(y = c|x, ω)p(ω|Dtrain )dω (the unlabelled data) than our active learning models, yet
c
Z we still perform in comparable terms to them. Finally, we
demonstrate the proposed methodology with a real world
· log p(y = c|x, ω)p(ω|Dtrain )dω
application of skin cancer diagnosis from a small number of
X lesion images, relying on fine-tuning of a large CNN model.
+ Ep(ω|Dtrain ) p(y = c|x, ω) log p(y = c|x, ω) .
c
5.1. Comparison of various acquisition functions
Swapping the posterior p(ω|Dtrain ) with our approximate
posterior qθ∗ (ω), and through MC sampling, we then have: We next study all acquisition functions above with our
XZ Bayesian CNN trained on the MNIST dataset (LeCun
≈− p(y = c|x, ω)qθ∗ (ω)dω & Cortes, 1998). All acquisition functions are as-
c
Z sessed with the same model structure: convolution-relu-
· log p(y = c|x, ω)qθ∗ (ω)dω convolution-relu-max pooling-dropout-dense-relu-dropout-
X dense-softmax, with 32 convolution kernels, 4x4 kernel size,
2x2 pooling, dense layer with 128 units, and dropout proba-
+ Eqθ∗ (ω) p(y = c|x, ω) log p(y = c|x, ω)
bilities 0.25 and 0.5 (following the example Keras MNIST
c
X 1 X X CNN implementation (fchollet, 2015)).
1
≈− pbtc log pbt
c
T t T t c All models are trained on the MNIST dataset with a (random
but balanced) initial training set of 20 data points, and a
1 X
+ pbt log pbtc =: bI[y, ω|x, Dtrain ] validation set of 100 points on which we optimise the weight
T c,t c decay (this is a realistic validation set size, in comparison
defining our approximation, with pbtc the probability of input to the standard validation set size of 5K used in similar
x with model parameters ω b t ∼ qθ∗ (ω) to take class c: applications such as semi-supervised learning on MNIST).
t We further use the standard test set of 10K points, and the
p
b = [b p1 , ..., pbC ] = softmax(f ωb t (x)).
t t
rest of the points are used as a pool set. The test error of
We then have each model and each acquisition function was assessed after
bI[y, ω|x, Dtrain ] −−−−→ H[y|x, q ∗ ] − Eq∗ (ω) H[y|x, ω]

θ θ
T →∞
≈ I[y, ω|x, Dtrain ],
100
resulting in a computationally tractable estimator approxi-
mating the BALD acquisition function. The other acquisi- 98

tion functions can be approximated similarly. 96

In the next section we will experiment with these acquisi- 94

tion functions and assess them empirically. These will be
92
compared to the baseline acquisition function which uni-
formly acquires new data points from the pool set at random, 90

and to various other techniques for active learning of image

88
data and semi-supervised learning. This is followed by a
real-world case study using cancer diagnosis. 86

84 BALD

5. Active Learning with Bayesian Var Ratios

Max Entropy
82
Convolutional Neural Networks Mean STD
Random
80
0 100 200 300 400 500 600 700 800 900 1000
We study the proposed technique for active learning of im-
age data. We compare the various acquisition functions
Figure 1. MNIST test accuracy as a function of number of ac-
relying on Bayesian CNN uncertainty with a simple image
quired images from the pool set (up to 1000 images, using valida-
classification benchmark. We then study the importance of tion set size 100, and averaged over 3 repetitions). Four acquisition
model uncertainty by evaluating the same acquisition func- functions (BALD, Variation Ratios, Max Entropy, and Mean STD)
tions with a deterministic CNN. This is followed by a com- are evaluated and compared to a Random acquisition function.
Deep Bayesian Active Learning with Image Data

100 100 100

98 98 98

96 96 96

94 94 94

92 92 92

90 90 90

88 88 88

86 86 86

84 84 84

82 BALD 82 Var Ratios 82 Max Entropy

Deterministic BALD Deterministic Var Ratios Deterministic Max Entropy
80 80 80
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000

(a) BALD (b) Var Ratios (c) Max Entropy

Figure 2. Test accuracy as a function of number of acquired images for various acquisition functions, using both a Bayesian CNN (red)
and a deterministic CNN (blue).

each acquisition, using the dropout approximation at test 5.2. Importance of model uncertainty
time. To decide what data points to acquire though we used
We assess the importance of model uncertainty in our
MC dropout following the derivations above. We repeated
Bayesian CNN by evaluating three of the acquisition func-
the acquisition process 100 times, each time acquiring the
tions (BALD, Variation Ratios, and Max Entropy) with a
10 points that maximised the acquisition function over the
deterministic CNN. Much like the Bayesian CNN, the de-
pool set. Each experiment was repeated three times and
terministic CNN produces a probability vector which can
the results averaged (the standard deviation for the three
be used with the acquisition functions of §4 (formally, by
repetitions is shown below)2 .
setting qθ∗ (ω) = δ(ω − θ) to be a point mass at the location
We compared the acquisition functions BALD, Variation of the model parameters θ). Such deterministic models can
Ratios, Max Entropy, Mean STD, and the baseline Random. capture aleatoric uncertainty – the noise in the data – but
We found Random and Mean STD to under-perform com- cannot capture epistemic uncertainty – the uncertainty over
pared to BALD, Variation Ratios, and Max Entropy (figure the parameters of the CNN, which we try to minimise dur-
1). The Variation Ratios acquisition function seems to obtain ing active learning. The models in this experiment still use
slightly better accuracy faster than BALD and Max Entropy. dropout, but for regularisation only (i.e. we do not perform
It is interesting that Mean STD seems to perform similarly MC dropout at test time).
to Random – which samples points at random from the pool
A comparison of the Bayesian models to the deterministic
set.
models for the BALD, Variation Ratios, and Max Entropy
Lastly, in table 1 we give the number of acquisition steps acquisition functions is given in fig. 2. The Bayesian mod-
needed to get to test errors of 5% and 10%. As can be seen, els, propagating uncertainty throughout the model, attain
BALD, Variation Ratios, and Max Entropy attain a small higher accuracy early on, and converge to a higher accuracy
test error with much fewer acquisitions than Mean STD and overall. This demonstrates that the uncertainty propagated
Random. This table demonstrates the importance of data throughout the Bayesian models has a significant effect on
efficiency – an expert using the Variation Ratios model for the models’ measure of their confidence.
example would have to label less than half the number of
images she would have had to label had she acquired new 5.3. Comparison to current active learning techniques
images at random. with image data
We next compare to a method in the sparse existing literature
% error BALD Var Ratios Max Ent Mean STD Random of active learning with image data, concentrating on (Zhu
et al., 2003) which relies on a kernel method and further
10% 145 120 165 230 255 leverages the unlabelled images (which will be discussed in
5% 335 295 355 695 835 more detail in the next section). Zhu et al. (2003) evaluate
an RBF kernel over the raw images to get a similarity graph
which can be used to share information about the unlabelled
Table 1. Number of acquired images to get to model error of % on data. Active learning is then performed by greedily selecting
MNIST.
unlabelled images to be labelled, such that an estimate to
the expected classification error is minimised. This will be
2
The code for these experiments is available at referred to as MBR.
http://mlg.eng.cam.ac.uk/yarin/publications. MBR was formulated for the binary classification case,
html#Gal2016Active.
Deep Bayesian Active Learning with Image Data

100

98
BALD
97 Var Ratios
96 Max Entropy
MBR
95
Random
94
0 100 200 300 400 500 600 700 800 900 1000

Figure 3. MNIST test accuracy (two digit classification) as a function of number acquired images, compared to a current
technique for active learning of image data: MBR (Zhu et al., 2003).

hence we compared MBR to the acquisition functions models (although note that we use a fairly small model
BALD, Variation Ratios, Max Entropy, and Random on compared to (Rasmus et al., 2015) for example). Rasmus
a binary classification task (two digits from the MNIST et al. (2015)’s ladder network (full) attains error 0.84% with
dataset). Classification accuracy is shown in fig. 3. Note 1000 labelled images and 59,000 unlabelled images. How-
that even a random acquisition function, when coupled with ever, (Rasmus et al., 2015)’s Γ-model architecture is more
a CNN (a specialised model for image data) outperforms directly comparable to ours. The Γ-model attains 1.53%
MBR which relies on an RBF kernel. We further experi- error, compared to 1.64% error of our Var Ratio acquisition
mented with a CNN version for MBR where we replaced function which relies on no additional unlabelled data.
the RBF kernel with a CNN. It is interesting to note that this
did not give improved results.

5.4. Comparison to semi-supervised learning

We continue with a comparison to the closest models
(in modern literature) to our active learning with image
data: semi-supervised learning with image data. In semi- Technique Test error
supervised learning a model is given a fixed set of labelled
data, and a fixed set of unlabelled data. The model can use Semi-supervised:
the unlabelled dataset to learn about the distribution of the Semi-sup. Embedding (Weston et al., 2012) 5.73%
inputs, in the hopes that this information will aid in learning Transductive SVM (Weston et al., 2012) 5.38%
the mapping to the outputs as well. Several semi-supervised MTC (Rifai et al., 2011) 3.64%
models for image data have been suggested in recent years Pseudo-label (Lee, 2013) 3.46%
(Weston et al., 2012; Kingma et al., 2014; Rasmus et al., AtlasRBF (Pitelis et al., 2014) 3.68%
2015), models which have set benchmarks on MNIST given DGN (Kingma et al., 2014) 2.40%
a small number of labelled images (1000 random images). Virtual Adversarial (Miyato et al., 2015) 1.32%
These models make further use of a (very) large unlabelled Ladder Network (Γ-model) (Rasmus et al., 2015) 1.53%
set of 49K images, and a large validation set of 5K-10K Ladder Network (full) (Rasmus et al., 2015) 0.84%
labelled images to tune model hyper-parameters and model
Active learning with
structure (Rasmus et al., 2015). These models have access
various acquisitions:
to much more data than our active learning models, but we
still compare to them as they are the most relevant models Random 4.66%
in the field given the constraint of small amounts of labelled BALD 1.80%
data. Max Entropy 1.74%
Var Ratios 1.64%
Test error for our active learning models with various ac-
quisition functions (after the acquisition of 1000 training Table 2. Test error on MNIST with 1000 labelled training sam-
points), as well as the semi-supervised models, is given in ples, compared to semi-supervised techniques. Active learning
table 2. In this experiment, to be comparable to the other has access to only the 1000 acquired images. Semi-supervised fur-
techniques, we use a validation set of 5K points. Our model ther has access to the remaining images with no labels. Following
attains similar performance to that of the semi-supervised existing research we use a large validation set of size 5000.
Deep Bayesian Active Learning with Image Data

Figure 4. Skin cancer (melanoma) example lesions from the ISIC 2016 melanoma diagnosis dataset. The two lesions on the left are benign
(non-cancerous), while the two lesions on the right are malignant (cancerous).

5.5. Cancer diagnosis from lesion image data performed on two different random splits – since even a test
set size of 200 gives very different accuracy with different
We finish by assessing the proposed technique with a real
random splits. Note that on each such random split we
world test case. We experiment with melanoma (skin can-
repeat our experiments three times and average the results
cer) diagnosis from dermoscopic lesion images. In this task
with respect to the fixed test set.
we are given image data of skin segments, of both malig-
nant (cancerous) as well as benign (non-cancerous) lesions. We experiment with active learning by following the fol-
Our task is to classify the images as malignant or benign lowing procedure. We begin by creating an initial training
(an example is shown in fig. 4). The data used is the ISIC set of 80 negative examples and 20 positive examples from
Archive (Gutman et al., 2016). This dataset was collected in our training data, as well as a pool set from the remaining
order to provide a “large public repository of expertly anno- data. With each experiment repetition (out of the three ex-
tated high quality skin images” to provide clinical support periment repetitions w.r.t. the fixed test split) the pool is
in the identification of skin cancer, and to develop algo- shuffled anew. The positive examples in the current training
rithms for skin cancer diagnosis. Specifically, we use the set are augmented following the original training procedure,
training data of the “ISBI 2016: Skin Lesion Analysis To- and a model is trained on the augmented training set for
wards Melanoma Detection – Part 3B: Segmented Lesion 100 epochs until convergence. We use batch size 8 and
Classification” task. The data contains 900 dermoscopic weight decay set by (1 − p)l2 /N , where N is the number
lesion images in JPEG format with EXIF tags removed. of training points, p = 0.5 is the dropout probability, and
Malignancy diagnosis for these lesions was obtained from the length-scale squared l2 is set to 0.5. An acquisition
expert consensus and pathology report information. The function is then used to select the 100 most informative
data contains lesion segmentation as well, which we did not images from the pool set. These points are removed from
use. the pool set and added to the (non-augmented) training set,
where we use the original expert-provided labels for these
For our model we replicate the model of (Agarwal et al.,
points. The process is repeated until all pool points have
2016). This model achieved second place in the “Part 3B:
been exhausted, where at each acquisition step we reset the
Segmented Lesion Classification” task, with its code open-
model to its original pre-trained weights (as we also did
sourced. The model relies on data augmentation of the
in the previous section experiments). This reset is done in
positive examples (flipping the lesions vertically and hori-
order to avoid local optima, and to avoid confusing model
zontally), and fine-tunes the VGG16 CNN model (Simonyan
performance improvement with an improvement resulting
& Zisserman, 2015) (i.e. optimises a pre-trained model with
from simply using longer (cumulative) optimisation time.
a small learning rate). The VGG16 model was pre-trained
on ImageNet (Deng et al., 2009). The top layer of the After each acquisition the test performance of the model
model (1000 logits) was removed and replaced with a 2 is logged using MC dropout with 20 samples. We further
dimensional output (for our classification task of malig- keep track of the number of positive examples acquired
nant/benign). Preceding the last layer are two fully con- after each acquisition. Model performance is assessed using
nected layers of size 4096, each one followed by a dropout area-under-the-curve (AUC) as this seems to be the most
layer with dropout probability 0.5. This architecture seems informative of all metrics used by Gutman et al. (2016). We
to provide good uncertainty estimates as observed before experimented with the average precision metric suggested
(Kendall et al., 2015; Gal & Ghahramani, 2016a). by Gutman et al. (2016) as well, but managed to get results
improving over the competition winner by simply predicting
The data is unbalanced, containing 727 negative (benign)
all points as “benign”. This might be because of the data
examples, and 173 positive (malignant) examples (20% pos-
imbalance. AUC on the other hand takes into account all
itive examples). Since the data is so small, to assess model
possible decision-thresholds possible to classify a malignant
performance reliably we have to take a large balanced test
image.
set. We randomly partition the data, and set aside 100 neg-
ative and 100 positive examples. All our experiments are We assessed two acquisition functions: a uniform baseline,
Deep Bayesian Active Learning with Image Data

70
0.74
60

# positive examples acquired

0.72
0.70
50
0.68
AUC

0.66 40
0.64
30
0.62 BALD BALD
uniform uniform
0.60 20
0 1 2 3 4 0 1 2 3 4
Acquisition steps Acquisition steps
(a) AUC as a function of acquisition step, first test split (b) # of positive examples as a function of acquisition
step, first test split
0.78 65
60
0.76

# positive examples acquired

55
0.74 50
0.72 45
AUC

0.70 40
35
0.68
30
0.66 BALD 25
BALD
uniform uniform
0.64 20
0 1 2 3 4 0 1 2 3 4
Acquisition steps Acquisition steps
(c) AUC as a function of acquisition step, second test (d) # of positive examples as a function of acquisition
split step, second test split
Figure 5. AUC (left) as well as the number of acquired positive examples (right) for both the BALD acquisition function as well as
uniform acquisition function, on ISIC 2016 melanoma diagnosis dataset. Two random test splits are assessed (top and bottom), and on
each test set the experiment was repeated three times with different random seeds (shown mean with standard error).

and BALD. Even though Variation Ratios performs well on same initial training set). This demonstrates the difficulties
MNIST above, the function fails with the melanoma data with handling of small data: each test split gives radically
since most malignant images are given only a slight higher different results, and in this case even though each acqui-
probability of being malignant compared to the probability sition function experiment has a relatively small standard
of benign images of being malignant. As a result all pool error, averaging the AUC of the acquisition functions over
points are given identical Variation Ratios acquisition value. the different test splits would artificially increase the stan-
dard error. Lastly, it is interesting to experiment with a
Experiment results are given in fig. 5, where results are
model trained over the entire pool set, i.e. with the settings
reported on both test splits (top and bottom), and where
of the second place winner in the ISIC2016 task. For the
with each split the experiment is repeated three times and
first test split this model attains AUC 0.71 ± 0.003, whereas
performance results are averaged on that fixed split. For
with the second test split it attains AUC 0.75 ± 0.01. For
each test split we report mean with standard error. AUC is
both test splits this AUC is worse than BALD’s converged
reported for each split (left), and number of acquired posi-
AUC after 4 acquisition steps. This might be because BALD
tive examples is reported as well (right) for each acquisition
avoided selecting noisy points – near-by images for which
step. BALD achieves better AUC faster than uniform, and
there exist multiple noisy labels of different classes. Such
acquires more positive examples at each acquisition step
points have large aleatoric uncertainty – uncertainty which
than uniform (i.e. BALD finds positive examples as infor-
cannot be explained away – rather than large epistemic un-
mative and adds these to the training set, whereas uniform
certainty – the uncertainty which BALD captures in order
simply selects positive examples from the pool set based on
to explain it away, i.e. reduce it.
their frequency).
Note how AUC range varies wildly between the two differ- 6. Future Research
ent test splits, but how AUC is similar for both acquisition
functions on each fixed test set before the initial acquisition We presented a new approach for active learning of im-
(when both uniform and BALD models are trained on the age data, relying on recent advances at the intersection of
Deep Bayesian Active Learning with Image Data

Bayesian modelling and deep learning, and demonstrated He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, and Sun,
a real-world application in medical diagnosis. We assessed Jian. Delving deep into rectifiers: Surpassing human-
the performance of the techniques by resetting the models level performance on imagenet classification. In Proceed-
after each acquisition, and training them again to conver- ings of the IEEE International Conference on Computer
gence. This was done to isolate the effects of our acquisition Vision, pp. 1026–1034, 2015.
functions, which came at a cost of prolonged training times
(20 hours for each melanoma experiment for example). We Hernandez-Lobato, Jose Miguel and Adams, Ryan. Proba-
showed that even with this long running time, our technique bilistic backpropagation for scalable learning of Bayesian
still reduces required expert labels, thus reduces costs for neural networks. In Proceedings of The 32nd Interna-
such a system. This running time can be reduced further by tional Conference on Machine Learning, pp. 1861–1869,
not resetting the system – with the potential price of falling 2015.
into local optima. We leave this problem for future research.
Hinton, Geoffrey E, Srivastava, Nitish, Krizhevsky, Alex,
Sutskever, Ilya, and Salakhutdinov, Ruslan R. Improving
References neural networks by preventing co-adaptation of feature
Agarwal, Mohit, Damaraju, Nandita, and Chaieb, detectors. arXiv preprint arXiv:1207.0580, 2012.
Sahbi. Dl8803. https://github.com/
Holub, Alex, Perona, Pietro, and Burl, Michael C. Entropy-
NanditaDamaraju/DL8803, 2016.
based active learning for object recognition. In Com-
puter Vision and Pattern Recognition Workshops, 2008.
Cohn, David A, Ghahramani, Zoubin, and Jordan, Michael I.
CVPRW’08. IEEE Computer Society Conference on, pp.
Active learning with statistical models. Journal of artifi-
1–8. IEEE, 2008.
cial intelligence research, 1996.
Houlsby, Neil, Huszár, Ferenc, Ghahramani, Zoubin, and
Cortes, Corinna and Vapnik, Vladimir. Support-vector net- Lengyel, Máté. Bayesian active learning for classification
works. Machine learning, 20(3):273–297, 1995. and preference learning. arXiv preprint arXiv:1112.5745,
2011.
Deng, Jia, Dong, Wei, Socher, Richard, Li, Li-Jia, Li, Kai,
and Fei-Fei, Li. Imagenet: A large-scale hierarchical Joshi, Ajay J, Porikli, Fatih, and Papanikolopoulos, Niko-
image database. In Computer Vision and Pattern Recog- laos. Multi-class active learning for image classification.
nition, 2009. CVPR 2009. IEEE Conference on, pp. 248– In Computer Vision and Pattern Recognition, 2009. CVPR
255. IEEE, 2009. 2009. IEEE Conference on, pp. 2372–2379. IEEE, 2009.

fchollet. Keras. https://github.com/fchollet/ Jozefowicz, Rafal, Vinyals, Oriol, Schuster, Mike, Shazeer,
keras, 2015. Noam, and Wu, Yonghui. Exploring the limits of lan-
guage modeling. arXiv preprint arXiv:1602.02410, 2016.
Freeman, Linton G. Elementary applied statistics, 1965.
Kalchbrenner, Nal and Blunsom, Phil. Recurrent continuous
Gal, Yarin. Uncertainty in Deep Learning. PhD thesis, translation models. In EMNLP, 2013.
University of Cambridge, 2016. Kampffmeyer, Michael, Salberg, Arnt-Borre, and Jenssen,
Robert. Semantic segmentation of small objects and
Gal, Yarin and Ghahramani, Zoubin. Bayesian convolu- modeling of uncertainty in urban remote sensing images
tional neural networks with Bernoulli approximate varia- using deep convolutional neural networks. In The IEEE
tional inference. ICLR workshop track, 2016a. Conference on Computer Vision and Pattern Recognition
(CVPR) Workshops, June 2016.
Gal, Yarin and Ghahramani, Zoubin. Dropout as a Bayesian
approximation: Representing model uncertainty in deep Kendall, Alex, Badrinarayanan, Vijay, and Cipolla, Roberto.
learning. ICML, 2016b. Bayesian segnet: Model uncertainty in deep convolu-
tional encoder-decoder architectures for scene understand-
Gutman, David, Codella, Noel CF, Celebi, Emre, Helba, ing. arXiv preprint arXiv:1511.02680, 2015.
Brian, Marchetti, Michael, Mishra, Nabin, and Halpern,
Allan. Skin lesion analysis toward melanoma detec- Kingma, Diederik P, Mohamed, Shakir, Rezende,
tion: A challenge at the international symposium on Danilo Jimenez, and Welling, Max. Semi-supervised
biomedical imaging (ISBI) 2016, hosted by the interna- learning with deep generative models. In Advances in
tional skin imaging collaboration (ISIC). arXiv preprint Neural Information Processing Systems, pp. 3581–3589,
arXiv:1605.01397, 2016. 2014.
Deep Bayesian Active Learning with Image Data

Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Simonyan, K. and Zisserman, A. Very deep convolutional
Imagenet classification with deep convolutional neural networks for large-scale image recognition. In Interna-
networks. In Advances in neural information processing tional Conference on Learning Representations, 2015.
systems, pp. 1097–1105, 2012.
Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex,
LeCun, Yann and Cortes, Corinna. The MNIST database of Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout:
handwritten digits, 1998. A simple way to prevent neural networks from overfitting.
JMLR, 2014.
LeCun, Yann, Boser, Bernhard, Denker, John S, Henderson,
Donnie, Howard, Richard E, Hubbard, Wayne, and Jackel, Sundermeyer, Martin, Schlüter, Ralf, and Ney, Hermann.
Lawrence D. Backpropagation applied to handwritten zip LSTM neural networks for language modeling. In IN-
code recognition. Neural Computation, 1(4):541–551, TERSPEECH, 2012.
1989.
Sutskever, Ilya, Vinyals, Oriol, and Le, Quoc VV. Sequence
Lee, Dong-Hyun. Pseudo-label: The simple and efficient to sequence learning with neural networks. In NIPS,
semi-supervised learning method for deep neural net- 2014.
works. In Workshop on Challenges in Representation
Tong, Simon. Active Learning: Theory and Applications.
Learning, 2013.
PhD thesis, 2001. AAI3028187.
Li, Xin and Guo, Yuhong. Adaptive active learning for
Weston, Jason, Ratle, Frédéric, Mobahi, Hossein, and Col-
image classification. In Proceedings of the IEEE Con-
lobert, Ronan. Deep learning via semi-supervised em-
ference on Computer Vision and Pattern Recognition, pp.
bedding. In Neural Networks: Tricks of the Trade, pp.
859–866, 2013.
639–655. Springer, 2012.
Marcus, Daniel S, Fotenos, Anthony F, Csernansky, John G,
Zhu, X, Lafferty, J, and Ghahramani, Z. Combining active
Morris, John C, and Buckner, Randy L. Open access
learning and semi-supervised learning using Gaussian
series of imaging studies: longitudinal mri data in nonde-
fields and harmonic functions. In Proceedings of the
mented and demented older adults. Journal of cognitive
ICML-2003 Workshop on The Continuum from Labeled
neuroscience, 22(12):2677–2684, 2010.
to Unlabeled Data, pp. 58–65. ICML, 2003.
Miyato, Takeru, Maeda, Shin-ichi, Koyama, Masanori,
Nakae, Ken, and Ishii, Shin. Distributional smooth-
ing by virtual adversarial examples. arXiv preprint
arXiv:1507.00677, 2015.

Pitelis, Nikolaos, Russell, Chris, and Agapito, Lourdes.

Semi-supervised learning using an unsupervised atlas.
In Joint European Conference on Machine Learning
and Knowledge Discovery in Databases, pp. 565–580.
Springer, 2014.

Rasmus, Antti, Berglund, Mathias, Honkala, Mikko,

Valpola, Harri, and Raiko, Tapani. Semi-supervised learn-
ing with ladder networks. In Advances in Neural Infor-
mation Processing Systems, pp. 3546–3554, 2015.

Rifai, Salah, Dauphin, Yann N, Vincent, Pascal, Bengio,

Yoshua, and Muller, Xavier. The manifold tangent clas-
sifier. In Advances in Neural Information Processing
Systems, pp. 2294–2302, 2011.

Rumelhart, David E, Hinton, Geoffrey E, and Williams,

Ronald J. Learning internal representations by error prop-
agation. Technical report, DTIC Document, 1985.

Shannon, Claude Elwood. A mathematical theory of com-

munication. Bell System Technical Journal, 27(3):379–
423, 1948.

Common questions

The practical application of the proposed active learning technique in melanoma diagnosis illustrates its efficacy by demonstrating that a small number of dermoscopic images can be effectively used to train a model that distinguishes between benign and malignant lesions. By fine-tuning a VGG16 convolutional neural network on the ISIC 2016 dataset, the study reveals that even with prolonged training times, the active learning system significantly reduces the number of required expert-labeled images, thereby decreasing the cost and time involved in such diagnostic tasks .

Bayesian convolutional neural networks (BCNNs) are integral to the described active learning framework as they facilitate the quantification and propagation of epistemic uncertainty across the neural network, which is essential for selecting the most informative data points for labeling. BCNNs incorporate uncertainty into the learning process, enabling more accurate and efficient classification with significantly fewer labeled instances when dealing with high-dimensional image data. By leveraging dropout methods during training, BCNNs provide a robust mechanism for model confidence and improved learning efficiency in active learning scenarios .

Resetting models after each acquisition in active learning helps in evaluating the true impact of new training data, isolating the effects of acquisition functions on model learning. By doing so, any residual learning effects from previous data are cleared, offering a clearer picture of incremental performance improvements. However, this approach poses a challenge due to increased computational time and resources, as demonstrated in melanoma experiments, which require significant retraining time. This trade-off is noted, and alternative strategies to reduce computation while avoiding local optima is an area for future research .

The effectiveness of acquisition functions like BALD, Max Entropy, and Variation Ratios surpasses traditional methods like MBR due to their ability to efficiently reduce uncertainty in model predictions with fewer labeled data points. The BALD function captures epistemic uncertainty, providing an informed choice of samples, and both Max Entropy and Variation Ratios function effectively by concentrating on instances with maximal informational value. These methods outperform MBR, especially when integrated with CNNs, as MBR's reliance on RBF kernels is less adaptable to high-dimensional and diverse datasets, as evidenced by their comparative performance on MNIST .

Semi-supervised learning utilizes large unlabeled datasets alongside a small set of labeled data to understand input distribution and improve mapping to outputs. This is different from active learning, which focuses on selecting the most informative data points for labeling, without necessarily relying on unlabeled data. In the active learning approach, Bayesian convolutional neural networks are used to selectively acquire labels, relying solely on labeled data to train models effectively, whereas semi-supervised learning leverages the additional context from unlabeled data to build models .

Dropout serves as a tool for regularization in the models explored in these studies. Specifically, in the context of active learning, dropout is employed to simulate model uncertainty, providing a Bayesian approximation that helps assess model confidence in its predictions. Importantly, dropout is used during training to aid in model generalization, but not during testing, preserving the acquisition function's capability to select data points that render high uncertainty, thus requiring more model learning .

The active learning framework developed demonstrates significant improvements in test accuracy with fewer labelled images by employing Bayesian convolutional neural networks (BCNNs). Using this method, the system achieves a 5% test error on MNIST with only 295 labelled images, whereas traditional random sampling requires 835 labelled images to achieve the same accuracy. The framework's efficacy lies in its ability to capture epistemic uncertainty over the neural network's parameters, thus selecting the most informative data points for labelling and improving learning efficiency .

Epistemic uncertainty refers to uncertainty in the model parameters, and it plays a crucial role in the active learning framework by guiding the selection of data points that provide the most informative labels. The framework employs techniques like the BALD acquisition function to minimize this uncertainty, thus focusing the labeling effort on examples that are expected to most improve the model's performance. This approach contrasts with solely addressing aleatoric uncertainty, which involves noise inherent in the data itself .

Active learning can drastically reduce costs and resources compared to traditional supervised learning methods by minimizing the number of labeled instances required to achieve equivalent levels of model accuracy. Through selectively acquiring labels for only the most informative or uncertain samples, active learning frameworks alleviate the need for exhaustive expert labeling. This selective process is particularly valuable in domains where labeling is expensive or time-consuming, such as in medical imaging diagnosis. By achieving competitive performance with fewer labels and reducing the dependency on large datasets, active learning minimizes both the monetary and temporal costs associated with data preparation .

Bayesian models differ from deterministic models in their ability to capture epistemic uncertainty, which is uncertainty over the model parameters. This is achieved by propagating uncertainty throughout the model using techniques such as dropout during training but not at test time. Bayesian models achieve higher early accuracy and converge to a higher accuracy overall compared to deterministic models due to their improved measure of model confidence. In contrast, deterministic models do not account for this parameter uncertainty, leading to potentially less efficient data acquisition in active learning settings .

Gal 17 A
No ratings yet
Gal 17 A
10 pages
Beluch The Power of CVPR 2018 Paper
No ratings yet
Beluch The Power of CVPR 2018 Paper
10 pages
Aghdam Et Al. - 2019 - Active Learning For Deep Detection Neural Networks
No ratings yet
Aghdam Et Al. - 2019 - Active Learning For Deep Detection Neural Networks
9 pages
Yuan 2019
No ratings yet
Yuan 2019
9 pages
Active Learning in Autonomous Driving
No ratings yet
Active Learning in Autonomous Driving
2 pages
Cost-Effective Active Learning For Deep Image Classification
No ratings yet
Cost-Effective Active Learning For Deep Image Classification
10 pages
A Simple Baseline For Low-Budget Active Learning
No ratings yet
A Simple Baseline For Low-Budget Active Learning
20 pages
Mathematics 11 00820
No ratings yet
Mathematics 11 00820
38 pages
Bad Students Make Great Teachers
No ratings yet
Bad Students Make Great Teachers
16 pages
Scalable Active Learning For Multiclass Image Classification
No ratings yet
Scalable Active Learning For Multiclass Image Classification
15 pages
30 - Bayesian Active Learning For Production, A Systematic Study and A Reusable
No ratings yet
30 - Bayesian Active Learning For Production, A Systematic Study and A Reusable
8 pages
Active Learning
No ratings yet
Active Learning
102 pages
Active Finetuning Exploiting Annotation Budget in The Pretraining Finetuning Paradigm
No ratings yet
Active Finetuning Exploiting Annotation Budget in The Pretraining Finetuning Paradigm
12 pages
Active Learning NG
No ratings yet
Active Learning NG
81 pages
Clinical Trial Active Learning 1dlud0n4xp
No ratings yet
Clinical Trial Active Learning 1dlud0n4xp
11 pages
TR1648
No ratings yet
TR1648
47 pages
Data Mining - Utrecht University - 13. Active Learning
No ratings yet
Data Mining - Utrecht University - 13. Active Learning
57 pages
Xxkapoor - (14) in Xxyue - Gauss 04408844
No ratings yet
Xxkapoor - (14) in Xxyue - Gauss 04408844
8 pages
ADeepReinforcement Active Learning Method For Multi-Label Image Classification
No ratings yet
ADeepReinforcement Active Learning Method For Multi-Label Image Classification
15 pages
Iv47402 2020 9304793
No ratings yet
Iv47402 2020 9304793
6 pages
Unified Active Learning and Feature Selection
No ratings yet
Unified Active Learning and Feature Selection
11 pages
Zhu and Bento - 2017 - Generative Adversarial Active Learning
No ratings yet
Zhu and Bento - 2017 - Generative Adversarial Active Learning
11 pages
Streaming Active Learning With Deep Neural Networks: Ash & Adams 2020
No ratings yet
Streaming Active Learning With Deep Neural Networks: Ash & Adams 2020
17 pages
4305-Article Text-7359-1-10-20190706
No ratings yet
4305-Article Text-7359-1-10-20190706
8 pages
2008 - CVPR-gjqi - Two-Dimensional Active Learning For Image Classification
No ratings yet
2008 - CVPR-gjqi - Two-Dimensional Active Learning For Image Classification
8 pages
Valleti Et Al. - 2024 - Deep Kernel Methods Learn Better From Cards To Process Optimization
No ratings yet
Valleti Et Al. - 2024 - Deep Kernel Methods Learn Better From Cards To Process Optimization
20 pages
Variational Adversarial Active Learning
No ratings yet
Variational Adversarial Active Learning
10 pages
M-VAAL: Multimodal Variational Adversarial Active Learning For Downstream Medical Image Analysis Tasks
No ratings yet
M-VAAL: Multimodal Variational Adversarial Active Learning For Downstream Medical Image Analysis Tasks
17 pages
Leukemia Cancer Cells Segmentation and Classification Using Machine Learning
No ratings yet
Leukemia Cancer Cells Segmentation and Classification Using Machine Learning
18 pages
Collaborative Active Learning Vision
No ratings yet
Collaborative Active Learning Vision
14 pages
Feature Transformers A Unified Representation Learning Framework For Lifelong Learning
No ratings yet
Feature Transformers A Unified Representation Learning Framework For Lifelong Learning
11 pages
410 P3B-43
No ratings yet
410 P3B-43
8 pages
Visual Analytics For Collaborative Human Machine Confidence in Human Centric Active Learning Tasks
No ratings yet
Visual Analytics For Collaborative Human Machine Confidence in Human Centric Active Learning Tasks
25 pages
ICML - 2022 - Active Testing Sample-Efficient Model Evaluation
No ratings yet
ICML - 2022 - Active Testing Sample-Efficient Model Evaluation
11 pages
Active Machine Learning
No ratings yet
Active Machine Learning
8 pages
Active Learning For Deep Object Detection 2
No ratings yet
Active Learning For Deep Object Detection 2
10 pages
An Active Learning Algorithm Based On Parzen Window Classification
No ratings yet
An Active Learning Algorithm Based On Parzen Window Classification
14 pages
Box-Level Active Detection Framework
No ratings yet
Box-Level Active Detection Framework
10 pages
Active and Transfer Learning
No ratings yet
Active and Transfer Learning
15 pages
Reducing Costs in Active Learning Strategies
No ratings yet
Reducing Costs in Active Learning Strategies
6 pages
MARE RE Next
No ratings yet
MARE RE Next
6 pages
Wang 2016
No ratings yet
Wang 2016
10 pages
A Survey of Deep Active Learning
No ratings yet
A Survey of Deep Active Learning
40 pages
Machine Learning On Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and Beyond
No ratings yet
Machine Learning On Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and Beyond
6 pages
Deep Learning in Medical Imaging Explained
No ratings yet
Deep Learning in Medical Imaging Explained
31 pages
Active Rare Class Discovery and Classification Using Dirichlet Processes
No ratings yet
Active Rare Class Discovery and Classification Using Dirichlet Processes
18 pages
Etana Material
No ratings yet
Etana Material
29 pages
Pendidikan
No ratings yet
Pendidikan
39 pages
Irjet V6i31160
No ratings yet
Irjet V6i31160
7 pages
Deep Learning for Mental Illness Prediction
No ratings yet
Deep Learning for Mental Illness Prediction
58 pages
Paper 3
No ratings yet
Paper 3
10 pages
FULLTEXT01
No ratings yet
FULLTEXT01
59 pages
Active Learning Icml09
No ratings yet
Active Learning Icml09
96 pages
Survey On DeepLearning Medical Image Analysis MIT2017-1
No ratings yet
Survey On DeepLearning Medical Image Analysis MIT2017-1
15 pages
How To Measure Uncertainty in Uncertainty Sampling
No ratings yet
How To Measure Uncertainty in Uncertainty Sampling
35 pages
Enhancing Text Classification Through LLM-Driven Active Learning and Human Annotation
No ratings yet
Enhancing Text Classification Through LLM-Driven Active Learning and Human Annotation
14 pages
hospedalesEtAl Pakdd2011
No ratings yet
hospedalesEtAl Pakdd2011
12 pages
Semi-Supervised Variational Adversarial Active Lea
No ratings yet
Semi-Supervised Variational Adversarial Active Lea
20 pages
Activing Learning Method Using SVM For Text Classification
No ratings yet
Activing Learning Method Using SVM For Text Classification
9 pages
Unit 2 - Interpolation
No ratings yet
Unit 2 - Interpolation
31 pages
Exploring The Influence of Financial Development, Renewable Energy, and Tourism On Environmental Sustainability in Tunisia
No ratings yet
Exploring The Influence of Financial Development, Renewable Energy, and Tourism On Environmental Sustainability in Tunisia
23 pages
Adaptive and Array Signal Processing
No ratings yet
Adaptive and Array Signal Processing
44 pages
74 (Number) - Wikipedia
No ratings yet
74 (Number) - Wikipedia
1 page
Lecture#4 Foundation Design and Phy
No ratings yet
Lecture#4 Foundation Design and Phy
44 pages
IADC-04-03 - ProtectionManual - v7.1 - Interagency Debris Committee
No ratings yet
IADC-04-03 - ProtectionManual - v7.1 - Interagency Debris Committee
292 pages
EMPro Microstrip Line Setup Guide
No ratings yet
EMPro Microstrip Line Setup Guide
32 pages
Vision Sensing Based People Following Robot A Superpixel Augmented Density Based Clustering Approach
No ratings yet
Vision Sensing Based People Following Robot A Superpixel Augmented Density Based Clustering Approach
6 pages
Lesson 6 - Circumference and Area of A Circle
No ratings yet
Lesson 6 - Circumference and Area of A Circle
19 pages
Feature Engineering
No ratings yet
Feature Engineering
6 pages
Importance and Applications of Gravity Field
No ratings yet
Importance and Applications of Gravity Field
11 pages
Multibody Simulation: The Jacobian Matrix (A Tool For Analysis)
No ratings yet
Multibody Simulation: The Jacobian Matrix (A Tool For Analysis)
19 pages
Master Thesis Report 9 July Defence PDF
No ratings yet
Master Thesis Report 9 July Defence PDF
67 pages
h7bs99 ps1
No ratings yet
h7bs99 ps1
2 pages
Correction 11-03-25 Group2 SBA Submisson
No ratings yet
Correction 11-03-25 Group2 SBA Submisson
21 pages
Simpson's 1/3 Rule Explained
No ratings yet
Simpson's 1/3 Rule Explained
9 pages
Graduate Gas Dynamics Course Overview
No ratings yet
Graduate Gas Dynamics Course Overview
2 pages
Huang 等 - 2025 - Machine Learning-optimized Jet-Enhanced Immersion Liquid Cooling for High-power Data Centers
No ratings yet
Huang 等 - 2025 - Machine Learning-optimized Jet-Enhanced Immersion Liquid Cooling for High-power Data Centers
16 pages
MBA - Quantitative Methods
No ratings yet
MBA - Quantitative Methods
266 pages
Systems of Equations by Substitution
No ratings yet
Systems of Equations by Substitution
4 pages
Probability and Statistics Exam Review
No ratings yet
Probability and Statistics Exam Review
15 pages
Solving Differential Equations for x
No ratings yet
Solving Differential Equations for x
13 pages
Pairs of Numbers That Make 20 2
No ratings yet
Pairs of Numbers That Make 20 2
4 pages
Engineering Mathematics and Electronics Syllabus
No ratings yet
Engineering Mathematics and Electronics Syllabus
62 pages
Quantum Weak Measurements Explained
No ratings yet
Quantum Weak Measurements Explained
55 pages
Automated Discovery of Workflow Models From Hospital Data: L.Maruster and W.van Der Aalst
No ratings yet
Automated Discovery of Workflow Models From Hospital Data: L.Maruster and W.van Der Aalst
5 pages
Mock 18372 1634854969107
No ratings yet
Mock 18372 1634854969107
22 pages
Chapter 5 - Complex Numbers Notes
No ratings yet
Chapter 5 - Complex Numbers Notes
4 pages
Steady State Heat Conduction Analysis
No ratings yet
Steady State Heat Conduction Analysis
4 pages

28 - Deep Bayesian Active Learning With Image Data

Uploaded by

28 - Deep Bayesian Active Learning With Image Data

Uploaded by

Deep Bayesian Active Learning with Image Data

Yarin Gal 1 2 Riashat Islam 1 Zoubin Ghahramani 1

Abstract amount of data (the initial training set), and an acquisition

are not prevalent within it. Deep learning poses

tion functions can be approximated similarly. 96

In the next section we will experiment with these acquisi- 94

and to various other techniques for active learning of image

5. Active Learning with Bayesian Var Ratios

100 100 100

82 BALD 82 Var Ratios 82 Max Entropy

(a) BALD (b) Var Ratios (c) Max Entropy

5.4. Comparison to semi-supervised learning

# positive examples acquired

# positive examples acquired

Pitelis, Nikolaos, Russell, Chris, and Agapito, Lourdes.

Rasmus, Antti, Berglund, Mathias, Honkala, Mikko,

Rifai, Salah, Dauphin, Yann N, Vincent, Pascal, Bengio,

Rumelhart, David E, Hinton, Geoffrey E, and Williams,

Shannon, Claude Elwood. A mathematical theory of com-

Common questions

How does the experimental setup for melanoma diagnosis illustrate the practical implications of the proposed active learning technique?

How do Bayesian convolutional neural networks (BCNNs) contribute to the active learning framework described in these sources?

What are the benefits and challenges of resetting models after each acquisition in active learning?

Evaluate the effectiveness of using different acquisition functions like BALD, Max Entropy, and Variation Ratios in comparison to traditional methods like MBR in the active learning setup.

In what ways does semi-supervised learning with large unlabeled datasets differ from the active learning approach described?

What is the significance of dropout in the context of active learning as used in these studies?

How does the active learning framework developed in this research improve test accuracy with fewer labelled images compared to traditional methods?

What role does epistemic uncertainty play in the active learning framework discussed in the document?

Discuss how the active learning approach could potentially reduce costs and resources compared to traditional supervised learning methods.

What are the main differences between Bayesian models and deterministic models in the context of active learning?

You might also like