0% found this document useful (0 votes)
37 views9 pages

BSCM-04 Estimation of Distribution Algorithms

The document summarizes the Univariate Marginal Distribution Algorithm (UMDA), an estimation distribution algorithm for binary vectors. The UMDA models the joint probability distribution of binary vectors as the product of univariate marginal probabilities, assuming independence between attributes. It updates the marginal probabilities based on the mating pool at each generation and samples new solutions by drawing each bit independently from the corresponding marginal probability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views9 pages

BSCM-04 Estimation of Distribution Algorithms

The document summarizes the Univariate Marginal Distribution Algorithm (UMDA), an estimation distribution algorithm for binary vectors. The UMDA models the joint probability distribution of binary vectors as the product of univariate marginal probabilities, assuming independence between attributes. It updates the marginal probabilities based on the mating pool at each generation and samples new solutions by drawing each bit independently from the corresponding marginal probability.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Evolutionary Algorithms (EAs) – reminder

Bioinspired and soft-computing Main Mating Evolutionary Next


methods in data analysis and population pool operators population
optimization

Estimation of Distribution Algorithms

Krzysztof Michalak
[email protected]
https://krzysztof-michalak.pl/

Estimation of Distribution Algorithms Estimation of Distribution Algorithms


 Population-based algorithms  Main steps
 Select specimens from the population ("mating pool" in EAs)
 Representation of good solutions: a probabilistic  Update the probability model
model  Draw a new population from the model
 Evaluate
 In comparison to EAs:
Selected Model
 Specimens are used for evaluation Population Gn
specimens update
Population Gn+1

 No genetic operators, instead:


 A probabilistic model is built
 New specimens are drawn from this model

Larrañaga P., Lozano J. A., "Estimation of Distribution Algorithms", Genetic Algorithms and
Evolutionary Computation series vol. 2, Springer, 2002.
https://link.springer.com/book/10.1007/978-1-4615-1539-5 3 4

A step back… to data mining A step back… to data mining


 Before we go to the EDAs… a question for you!   Before we go to the EDAs… a question for you! 

 What is the underlying assumption of the Naive  What is the underlying assumption of the Naive
Bayes classifier? Bayes classifier?

The probability distributions of the attributes


(features) are assumed to be independent

? We can model the joint probability distribution


using the marginals
n
P ( x | o j  ci )   P( xk | o j  ci )
k 1

5 6

1
EDAs for binary vectors EDAs for binary vectors
 Univariate Marginal Distribution Algorithm (UMDA)[1]  Univariate Marginal Distribution Algorithm (UMDA)

 Models binary vectors in  = { 0, 1 }n  Model update

 The joint probability distribution is modelled as:  Performed using solutions from the mating pool Q
n
We only need to model the marginals for k = 1, …, n ,
P ( x )   P ( xk ) 

k 1
because we have assumed independence

 A difference w.r.t. the Naive Bayes: we update the  The model is a vector of probabilities p k for k = 1, …, n
model based on the mating pool Q, and not based
 Each p k represents the probability of setting the k -th
on the training sample genotype element to 1.
[1] Mühlenbein H., Paaß G., "From recombination of genes to the estimation of distributions I. x  Q : xk  1
Binary parameters", in: Voigt H. M., Ebeling W., Rechenberg I., Schwefel H. P. (eds) Parallel pk  P( xk  1) 
Problem Solving from Nature — PPSN IV. PPSN 1996, Lecture Notes in Computer Science, Q
vol 1141. Springer, Berlin, Heidelberg, 1996. 7 8

EDAs for binary vectors EDAs for binary vectors


 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)

 Model update xk
 Sampling solutions for the next generation
 Because we assumed that the probability distributions for
different positions in the genotype are independent…
Mating pool Q
 … we can draw the value for each position in the
genotype independently of the others. For k = 1, …,n :
 Draw a number  uniformly from [0, 1]
Model
 If  < P(x k = 1) set x k = 1
pk
 Otherwise, set x k = 0
x  Q : xk  1
pk  P( xk  1) 
Q
9 10

EDAs for binary vectors EDAs for binary vectors


 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)

 Sampling solutions for the next generation  Pseudocode


pk
 Start from a population of random binary vectors
Model
if rand([0, 1]) < p k
then x k = 1  Until the stopping condition is met
else x k = 0  Evaluate solutions in the population
 Select the mating pool
 Update the model
Next Population Gn
Selected
specimens
Model
Population Gn+1
Sample the next generation
update
generation 

from the model

xk

11 12

2
EDAs for binary vectors EDAs for binary vectors
 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)

 Pseudocode  Pseudocode
Time for a little refresher! Time for a little refresher!
 Start from a population of random binary vectors  Start from a population of random binary vectors
Name three selection methods we Name three selection methods we
 Until the stopping conditiondiscussed
is metin the previous lectures:  Until the stopping conditiondiscussed
is metin the previous lectures:
 Evaluate solutions in the population
1) . . . . . . . .  Evaluate solutions in the population
1) Roulette wheel
2) . . . . . . . . 2) Binary tournament
 Select the mating pool 3) . . . . . . . .  Select the mating pool 3) Elitist selection
 Update the model  Update the model
Selected Model Selected Model
Population Gn Population Gn+1 Population Gn Population Gn+1
specimens specimens
Sample the next generation Sample the next generation
update update
 

from the model from the model

13 14

EDAs for binary vectors EDAs for binary vectors


 Univariate Marginal Distribution Algorithm (UMDA)  Univariate Marginal Distribution Algorithm (UMDA)

 Generalizations – how to apply a similar approach  Generalizations – how to apply a similar approach
to: to:
 Discrete values, but not binary, e.g. 1, 2, …, h  Discrete values, but not binary, e.g. 1, 2, …, h
 Continuous distributions  Continuous distributions

?  Follow the same approach as for the Naive Bayes


 Model each marginal distribution separately (for each x k
with k = 1, …, n )

 A wide variety of models is available (some examples in


the following slides)
15 16

EDAs for integer vectors EDAs for integer vectors


 Generalization to discrete values: j = 1, 2, …, h  Generalization to discrete values: j = 1, 2, …, h

 The model for each variable x k is a histogram  The model for each variable x k is a histogram
Probability distribution for xk  Model update: approximate the probability that
0.30 x k = j with the frequency with which j occurs at
0.25 position k :
0.20
x  Q : xk  j
j  1,2,  , h : P ( xk  j ) 
P( xk = j )

0.15
Q
0.10

0.05

0.00
1 2 3 4 5
j  { 1, …, h }

17 18

3
EDAs for integer vectors EDAs for integer vectors
 Model update: histograms represent the frequencies  The model for each variable x k is a histogram

 Sampling: for the genotype element at position k


draw each of the values j = 1, 2, …, h with the
probability P(x k = j )

x  Q : xk  j
j  1,2,  , h : P ( xk  j ) 
Q 19 20

EDAs for integer vectors EDAs for integer vectors


 The model for each variable x k is a histogram  The model for each variable x k is a histogram

 Sampling: for the genotype element at position k  Sampling: for the genotype element at position k
draw each of the values j = 1, 2, …, h with the draw each of the values j = 1, 2, …, h with the
probability P(x k = j ) probability P(x k = j )
You can use the roulette-wheel selection.
Just use the probabilities P(x k = j ) for the
values j = 1, 2, …, h
Do you remember, from the previous
lectures, a method which does that?
P(xk = 1)
P(xk = 2)
P(xk = 3)
P(xk = 4)
P(xk = 5)
P(xk = 6)

21 22

EDAs for integer vectors EDAs for real vectors


 Sampling solutions for the next generation  Generalization to continuous values
xk
 Parametric approach – model update
 Assume the type of distribution (e.g. Gaussian)

 Estimate the parameters for this type of distribution

 Note, that we only need univariate distributions

 Sampling
 Sample x k for k = 1, …, n from the distribution with the
estimated parameters (for each k separately)

23 24

4
EDAs for real vectors EDAs for real vectors
 An analogy to the parametric estimation in the  An analogy to the parametric estimation in the
Naive Bayes classifier Naive Bayes classifier Remember an important difference:
The marginal distributions The marginal distributions  we do not try to model multiple
classes
 instead we model the
distribution of solutions in the
mating pool
So, for example, we construct a
model only for the green points

25 26

EDAs for real vectors EDAs for binary vectors


 An analogy to the parametric estimation in the  Population-Based Incremental Learning (PBIL)
Naive Bayes classifier
 Models binary vectors in  = { 0, 1 }n

 The model is a vector of probabilities p k


for k = 1, …, n

The joint distribution  Each p k represents the probability of setting


the k -th genotype element to 1.

[1] Baluja S., "Population-based incremental learning: a method for integrating genetic search
based function optimization and competitive learning", Carnegie Mellon University,
27 Pittsburgh, PA, USA, Technical report, 1994. 28

EDAs for binary vectors EDAs for binary vectors


 Population-Based Incremental Learning (PBIL)  Population-Based Incremental Learning (PBIL)

 Model update  Parameters (and their default values from the


 Performed using the best solution x (+) in the mating pool S. Baluja paper)
and the worst solution x (-)  Positive learning rate: η + (0.1)
 The second step of the update is mutation  Negative learning rate: η − (0.075)

 Sampling  The sum of the learning rates: η = η + + η −

 Performed in the same way as in UMDA  Mutation probability: P mut (0.02)

 Each element of the new genotype x k is set to 1 with the  Mutation shift:  (0.05)
probability p k
 For k = 1, …, n draw a number  uniformly from [0, 1]
 If  < P(x k = 1) set x k = 1, otherwise, set x k = 0 29 30

5
EDAs for binary vectors EDAs for binary vectors
 Population-Based Incremental Learning (PBIL)  Population-Based Incremental Learning (PBIL)

 Model update using the best solution x (+) and the  Mutation
worst solution x (-)  Independently for each position k = 1, …, n in the
 If xk(-) = xk(+) set: genotype with the probability P mut

pk  pk  (1    )  xk(  )    Draw a value   { 0, 1 } with P(0) = P(1) = 0.5


 Set p k = p k  ( 1 - )+
 If xk (-) ≠ xk (+) set:
pk  pk  (1   )  xk( ) 

31 32

EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model used in the Edge Histogram-Based Sampling  Model update
Algorithm (EHBSA). A matrix [n ×n ] with p i,j  [0, 1]n
 From the mating pool select the shortest tour (+) and
p i,j – how probable it is that j is placed
right after i in good permutations the longest one (-)
j  Encode the permutations (+) and (-) as permutation
0.1 0.8 0 0.1 matrices x (+) and x (-)
0 0 0.1 0.9 1
i
0.9 0 0 0.1
π = 3, 1, 4, 2 1
1
0.1 0.1 0.8 0
 Apply the update rule used in the PBIL algorithm using
x (+) and x (-)
33 34

EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model sampling  Model sampling
 The model is the probability matrix  We do not want to revisit cities!
j  Set the probabilities for the j -th city to 0
0.1 0.8 0 0.1
j
0 0 0.1 0.9 j=3
0.1 0.8 0 0.1
i
0.9 0 0 0.1
0 0 0 0.9
i
0.1 0.1 0.8 0
0.9 0 0 0.1

 Select the first city in the tour uniformly at random 0.1 0.1 0 0

(e.g. j = 3)
 Now, the j -th city is our starting point (set i = j )
Tour: [ 3 ? ? ? ] 35 36

6
EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model sampling  Model sampling
 Select the next city in the tour with the probabilities  We do not want to revisit cities!
given in the i -th row
 Set the probabilities for the j -th city to 0
j
0.1 0.8 0 0.1 j
i=3 j=1
0 0.8 0 0.1
0 0 0 0.9
i
0 0 0 0.9
0.9 0 0 0.1 i
0 0 0 0.1
0.1 0.1 0 0
0 0.1 0 0
 Selected city j = 1
 Now, the j -th city is our starting point (set i = j )
Tour: [ 3 1 ? ? ]
37 38

EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The EH-PBIL model
 Model sampling  Model sampling
 Select the next city in the tour with the probabilities  We do not want to revisit cities!
given in the i -th row  Set the probabilities for the j -th city to 0
j
0 0.8 0 0.1 j
i=1 j=2
0 0 0 0.1
0 0 0 0.9
i
0 0 0 0.9
0 0 0 0.1 i
0 0 0 0.1
0 0.1 0 0
0 0 0 0
 Selected city j = 2
 Now, the j -th city is our starting point (set i = j )
Tour: [ 3 1 2 ? ]
39 40

EDAs for the TSP – models EDAs for the TSP – models
 The EH-PBIL model  The Mallows model
 Model sampling  Exponential, unimodal distribution

 Select the next city in the tour with the probabilities  Analogous to the Gaussian distribution
given in the i -th row
 The probability distribution has two parameters:
j
 A central permutation π0
0 0 0 0.1
i=2  A spread parameter Θ
0 0 0 0.9
i
0 0 0 0.1
 Needs a definition of
a distance between
0 0 0 0 permutations D(, )

 Selected city j = 4
Tour: [ 3 1 2 4 ]
41 42
Mallows models used in EDAs for permutation-based
optimization problems, like, for example, the TSP

7
EDAs for the TSP – models EDAs for the TSP – models
 The Kendall's- distance D(σ, π),  The Kendall's- distance D(σ, π),
 Measures the number of pairs of items for which σ and π  Has properties which make model update and sampling
have the opposing ordering simpler

 Right invariance
 For any permutation : D(σ, π) = D(σ  , π  )
 Thus: D(σ, π) = D(σ  π-1, id), where id is the identity permutation
 Example: id = 1 2 … n
σ= 12 34 5  Instead D(σ  π-1, id) denote D(), where  = σ  π-1
π=34125
D(σ, π) = 4
 Decomposition Vj(ζ) is the number of positions of
D ( )   j 1V j ( )
n 1
the permutation ζ to the right of j
with values smaller than ζ(j)
V j ( )  i  j 1 I  ( j )   (i ) 
n

43 44

EDAs for the TSP – models EDAs for the TSP – models
 The Mallows model – model update (overview)  The Mallows model – model update (overview)
 Calculate the centroid 0 using the Borda algorithm  Calculate the centroid 0 using the Borda algorithm

 For each permutation  in the mating pool  For each permutation  in the mating pool
 Decompose D(π, π0) to the sum of n – 1 values Vj(ππ-1
0)  Decompose D(π, π0) to the sum of n – 1 values Vj(ππ-1
0)
 Average the Vj(ππ-1
0 ) values in the mating pool  Average the Vj(ππ-1
0 ) values in the mating pool

 Calculate the spread parameter  and the normalization  Calculate the spread parameter  and the normalization
Note the similarity to the Gaussian
constant  ( ) constant  ( )
distribution!

 The model is:  The model is:


n 1 n 1

1  V j  01  1  V j  01  1


1  x   2
  
P ( )  e j 1
P ( )  e j 1
f ( x)  e 2  

       2

45 46

EDAs for the TSP – models EDAs for the TSP – models
 The Mallows model – model update (overview)  The Mallows model – model update (overview)
 Calculate the centroid 0 using the Borda algorithm  Calculate the centroid 0 using the Borda algorithm

 For each permutation  in the mating pool  For each permutation  in the mating pool
 Decompose D(π, π0) to the sum of n – 1 values Vj(ππ0-1)  Decompose D(π, π0) to the sum of n – 1 values Vj(ππ0-1)
 Average the Vj(ππ-1
0 ) values in the mating pool  Average the Vj(ππ-1
0 ) values in the mating pool

 Calculate the spread parameter  and the normalization  Calculate the spread parameter  and the normalization
constant  ( ) Normalization constants
constant  ( ) Distance from the centroid
(Mallows) and the mean (Gaussian)
 The model is:  The model is:
n 1 n 1

1   
V j  01  1
1  x   2
   1   
V j  01  1  
1  x   2

P ( )  e j 1
f ( x)  e 2  
P ( )  e j 1
f ( x)  e 2  
    2     2

47 48

8
EDAs for the TSP – models
 The Mallows model – sampling (overview)
 Draw the Vj(ππ-1
0 ) values randomly

 Reconstruct a permutation using the algorithm proposed


by Meila et. al[1]

 Many more details can be found in the paper by Josu


Ceberio[2]

[1] Meila M., Phadnis K., Patterson A., and Bilmes J., "Consensus ranking under the exponen-
tial model", in: Proc. 22nd Conf. Uncertainty Artif. Intell., Vancouver, BC, USA, Jul. 2007, pp.
285–294.

[2] Ceberio J., Irurozki E., Mendiburu A., "A Distance-Based Ranking Model Estimation of
Distribution Algorithm for the Flowshop Scheduling Problem", IEEE Transactions on Evolutionary
computation, vol. 18, no. 2, 2014. 49

You might also like