0% found this document useful (0 votes)
37 views16 pages

TAG3P: Preliminary Comparison: 5.1 Test Problems

The document describes a preliminary comparison of TAG3P, a genetic programming system designed by the author, against standard genetic programming (GP) and another grammar-guided genetic programming system (CFG-GP). Eight problems are used to test the three systems, including symbolic regression with polynomials of increasing complexity, 6-multiplexer, symbolic integration, and symbolic differentiation. The results of TAG3P are compared to GP and CFG-GP and further analysis is conducted on TAG3P runs to understand its evolutionary dynamics.

Uploaded by

Nam Le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views16 pages

TAG3P: Preliminary Comparison: 5.1 Test Problems

The document describes a preliminary comparison of TAG3P, a genetic programming system designed by the author, against standard genetic programming (GP) and another grammar-guided genetic programming system (CFG-GP). Eight problems are used to test the three systems, including symbolic regression with polynomials of increasing complexity, 6-multiplexer, symbolic integration, and symbolic differentiation. The results of TAG3P are compared to GP and CFG-GP and further analysis is conducted on TAG3P runs to understand its evolutionary dynamics.

Uploaded by

Nam Le
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 5

TAG3P: Preliminary Comparison


In this chapter, the robustness of the genetic programming system (TAG3P)
designed in the previous chapter is tested on a number of standard problems. The
results are compared with standard genetic programming (GP) and Whighams
grammar guided genetic programming system (CFG-GP). Some of the results
have been published in [NXH2001a], [NXH2001b], [NXH2002a], [NXH2002b],
[NXH2002c], and [NXH2002d]. This chapter is structured as follows. Firstly,
all the problems are stated. Then, the experiment setup is described. Next,
the results are given and discussed. Finally, using some of the rather unique
properties of TAG3P described in the previous chapter, we conduct some further
analyses on TAG3P runs to discover useful information about the dynamics of
its evolutionary process.
5.1 Test Problems
To test the performance of TAG3P compared to standard GP and CFG-GP, eight
problem instances of four standard problems from [Koz92] are employed as test
suite. The four problems are simple symbolic regression, 6-multiplexer, symbolic
integration, and symbolic dierentiation. The description of each problem and
problem instance is briey stated in the following subsections. It is noted that a
full comparison would compare the performance of the systems over a range of
79
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 80
parameter settings, but that because of computational limitations, only a sample
of values is used in the thesis.
5.1.1 Simple Symbolic Regression Problem
In the simple symbolic regression problem, the task is to learn a function of one in-
dependent variable in symbolic form such that it ts a given nite sample of data.
As in [Koz92], the function and terminal set are F = {+, , , /, sin, cos, ep, rlog}
and T = {X}. The function set can be divided into three groups. The rst group
contains arithmetic operators: addition, subtraction, multiplication, and division.
The division operator is protected in the sense that if the denominator is 0, it
will return 1. The second group has two basic trigonometric functions sine and
cosine. The third group has the two transcendent functions, namely, the expo-
nential function and logarithm function. The logarithm function is protected by
taking logarithm of the absolute value of the input unless the input value is 0, in
which it returns 0. The number of sampled data points is 20 and the sampling
interval is [1..1].
In [Koz92], the target function for genetic programming is the quadtic poly-
nomial function X
4
+ X
3
+ X
2
+ X. In this chapter, a family of four target
functions in four problem instances are used.
F1 = X
3
+X
2
+X
F2 = X
4
+X
3
+X
2
+X
F3 = X
5
+X
4
+X
3
+X
2
+X
F4 = X
6
+X
5
+X
4
+X
3
+X
2
+X
It is noted that F1 F4 are a family of polynomial functions of increasing order
of structural complexity, where F
i
= F
i1
X + X with i = 2, 3, 4. The purpose
of using theses four polynomial target functions of increasing order of structural
complexity is not only to test the robustness of TAG3P against standard GP
and CFG-GP but also to investigate how it copes with a scaling in structural
complexity of the target function.
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 81
5.1.2 6-Multiplexer Problem
A 6-multiplexer is a Boolean device that has six inputs consisting of two address
lines and four data lines. To produce the output, a 6-multiplexer uses the two
address lines as a two binary digit number in order to select one of the four
possible data lines. The task is to learn the Boolean 6-multiplexer function from
all 64 possible examples. The function and terminal sets used here are the same
as in [Koz92]: F = {if, and, or, not} and T = {a
0
, a
1
, d
0
, d
1
, d
2
, d
3
}. Figure 5.1
shows the structure of a 6-multiplexer and one possible solution represented as a
series of if-then-else.
Address Lines
Data Lines
d3 d2 d1 d0
a1
a0
6Multiplexer
ELSE d0
IF a1 THEN d2
ELSE
ELSE d1
IF a1 THEN d3
IF a0 THEN
Figure 5.1: A 6-multiplexer and a solution
5.1.3 Symbolic Integration Problem
The problem of symbolic integration involves nding a mathematical expression
that is the integral, in symbolic form, of a given curve. The function and terminal
sets used in [Koz92] are the same as those for the simple symbolic regression
problem and are used here.
The two problem instances experimented in this chapter are cos x+2x+1 and
4x
3
+3x
2
+2x+1, of which the target symbolic integral functions are sin x+x
2
+x
and x
4
+x
3
+x
2
+x respectively. The number of sampled data points are 50 and
the sampling interval is [0..2] in the rst case and [0..1] in the second.
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 82
5.1.4 Symbolic Dierentiation Problem
Symbolic dierentiation involves nding a mathematical expression that is the
derivative, in symbolic form, of a given curve. The function and terminal sets are
the same as in the simple regression and symbolic integration problems ([Koz92]).
The given curve is sin x + x
2
+ x, of which the target derivative function is
cos x +2x +1. The number of randomly sampled data points is 200 (as numeric
dierentiation is considered to be much less accurate than symbolic integration);
and the sampling interval is [0..1].
5.2 Experiment Setup
All three systems: standard genetic programming, CFG-GP, and TAG3P were
tried on eight problem instances of the four standard problems above. The GP
and CFG-GP systems are implemented faithfully to the descriptions in [Koz92]
and [Whi1996]. All three systems used the same random generators and were run
on the same platform. Parameter settings for all three systems were as uniform
as possible. They are listed as follows: population size - POPSIZE=500, maxi-
mal number of generations - MAXGEN=51, crossover probability=0.9, mutation
probability = 0.1, selection mechanism = tournament selection, tournament size
of selection = 3, maximal size for TAG3P - MAXSIZE=40; the size in TAG3P is
the number of tree nodes; maximal depth for GP and CFG-GP - MAXDEPTH
= 15. The initialisation method used for GP was ramped-half-and-half. The
grammars (used for CFG-GP and TAG3P) for all problems are listed as follows:
Grammars for the symbolic regression, symbolic integration, and sym-
bolic dierentiation problems. The grammars (G and G
lex
) used for all three
problems were the same.
G = {

, N, P, EXP},
where

= {+, , , /, sin, cos, ep, rlog, X}, N = {S, PRE, OP, V AR}, and the
rule set P as follows.
S EXP OP EXP
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 83
S PRE EXP
S V AR
OP +| | |/
PRE sin|cos|ep|rlog
V AR X
where / is the protected division (return 1 when the denominator is 0), ep
is the exponential function, and rlog is the protected logarithm function (return
the logarithm of the absolute value of the input unless the input is 0, in which
case it return 0). G
lex
= {

, N, I, A, EXP}, where

and N are the same as in


G and the elementary tree set E = A I is depicted in Figure 5.2.
EXP
OP

X
EXP
PRE EXP*
sin
EXP
EXP
PRE EXP*
ep rlog
1 2 3 4
6 5 7 8
9 10 11 12
EXP*
OP
*
EXP
OP
/
EXP
OP
+
EXP* EXP* EXP* EXP*
EXP
PRE EXP*
cos
EXP
PRE
VAR
EXP
VAR
X
EXP
VAR
X
EXP
VAR
X
EXP
VAR
X
EXP
VAR
X
EXP
EXP* OP
+
EXP
VAR
X
EXP
VAR
X
EXP
VAR
X
EXP
/
EXP
EXP* OP

EXP
EXP* OP
*
EXP
EXP* OP
Figure 5.2: TAG elementary trees for symbolic regression, symbolic integration,
and symbolic dierentiation problems
Grammars for the 6-multiplexer problem. G = {

, N, P, B}, where

= {a
0
, a
1
, d
0
, d
1
, d
2
, d
3
}, N = {B}, and the rule set P is as follows:
B if B B B
B B and B
B B or B
B not B
B a
0
|a
1
|d
0
|d
1
|d
2
|d
3
G
lex
= {

, N, I, A, B}, where

and N are the same as in G, the elementary


CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 84
tree set E = A I is depicted in Figure 5.3, where TL stands for a lexicon that
can be substituted with any of a
0
, a
1
,...,d
3
.
B
and
B
B*
TL
B if
B
B*
TL
B if
B
and
8 7 6 5 4
3 2 1
not B*
B
B* or or B* B*
B
TL
B B*
B
TL
B
TL
B
TL
B
TL
B B* if
B
TL
B
TL
B
TL
B
TL
B
Figure 5.3: TAG elementary trees for the 6-multiplexer problem
The tableaus for symbolic regression, symbolic integration, symbolic dieren-
tiation, and 6-multiplexer are as follows.
Table 5.1. Tableau for symbolic regression, integration, and dierentiation.
Objective Find a solution for either symbolic regression,
symbolic integration, or symbolic dierentiation.
Terminal set X
Function set +,-,*,/,sin, cos, ep, rlog.
Fitness cases Random samples of 20 values (symbolic regression);
50 values (symbolic dierentiation);
200 values (for symbolic integration)
from the intervals of interest.
Raw tness Sum of absolute error for all tness cases.
Standardized tness The same as raw tness.
Hits Number of absolute errors that is smaller then 0.01.
General Parameter POPSIZE=500, MAXGEN=51,
Tournament selection with size 3.
Success Predicate A program hits of 20 (symbolic regression);
50 (symbolic integration); 200 (symbolic dierentiation)
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 85
Table 5.2. Tableau for 6-multiplexer.
Objective Find a Boolean function whose output is the same as
the 6-multiplexer function.
Terminal set a
0
, a
1
, d
0
, d
1
, d
2
, d
3
Function set if, and, or, not.
Fitness cases The 64 possible combinations of 6 the inputs;
Raw tness Number of tness cases for which the output is correct
Standardized tness Number of incorrect outputs over all tness cases.
Hits Equivalent to raw tness.
General Parameters POPSIZE=500, MAXGEN=51,
Tournament selection with size 3.
Success Predicate A program scores 64 hits.
5.3 Results and Discussion
For each problem instance, 100 runs were allocated for each system, which made
the total number of runs for all three systems (GP, CFG-GP, TAG3P) to be 2400
runs. The following Table 5.3 summarises the proportion of success of all three
systems on the eight problem instances tried, where SY MF
i
(with i=1,2,3,4)
stands for symbolic regression problem with F
i
as the target function; 6MUL
stands for 6-multiplexer problem; SY MDIFF indicates the symbolic integra-
tion problem; SY MITEGCOSX and SY MITEGX3 are the abbreviations of
symbolic integration problem with the curves cos x+2x+1 and 4x
3
+3x
2
+2x+1
respectively.
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 86
Table 5.3. Proportion of success on eight problem instances.
Problem GP CFG-GP TAG3P
SY MF
1
66% 47% 100%
SY MF
2
9% 27% 93%
SY MF
3
3% 12% 82%
SY MF
4
1% 8% 43%
6MUL 63% 61% 63%
SY MITEGCOSX 81% 83% 100%
SY MITEGX3 50% 63% 99%
SY MDIFF 70% 77% 87%
Also, Figures 5.4, 5.5, 5.6, and 5.7 depict the cumulative frequencies of success
of GP, CFG-GP (GGGP), and TAG3P on all the problem instances.
The results show that TAG3P is very competitive with GP and CFG-GP. On all
problem instances tried, TAG3P performed signicantly better than CFG-GP and
GP (with statistical condence level = 0.01 using a one-tailed statistical test of
the dierence between two binomial variables), except for 6-multiplexer, where
all the performances of the three systems are rather similar. In particular, on
the symbolic regression problem, TAG3P not only performed substantially better
than GP and CFG-GP but also worked well when the structural complexity of
the target function was scaled up.
We note that, apart from the fact that these three systems use very dierent
representation and genetic operators, they have dierent types of complexity
bounds on individual programs in their population. For GP, the bound is the
depth of the expression tree; for CFG-GP, the bound is the depth of the derivation
trees generated by G; and, for TAG3P, the bound is the size of G
lex
derivation
trees. This usually leads to dierent absolute types and sizes of bounds and thus
it is very dicult to tune the dierent bounds so that the complexity bounds on
individual programs in all systems are the same. Therefore, it is possible that
the performance dierences between the dierent GP systems might be caused
by the dierent representations (operators) and/or the types and values of the
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 87
0 10 20 30 40 50 60
0
10
20
30
40
50
60
70
80
90
100
Generation
C
u
m
u
la
t
iv
e

F
r
e
q
u
e
n
c
y
GP
GGGP
TAG3P
0 10 20 30 40 50 60
0
10
20
30
40
50
60
70
80
90
100
Generation
C
u
m
u
la
t
iv
e

F
r
e
q
u
e
n
c
y
GP
GGGP
TAG3P
F
1
F
2
0 10 20 30 40 50 60
0
10
20
30
40
50
60
70
80
90
Generation
C
u
m
u
la
t
iv
e

F
r
e
q
u
e
n
c
y
GP
GGGP
TAG3P
0 10 20 30 40 50 60
0
5
10
15
20
25
30
35
40
45
Generation
C
u
m
u
la
t
iv
e

F
r
e
q
u
e
n
c
y
GP
GGGP
TAG3P
F
3
F
4
Figure 5.4: Cumulative Frequencies for Symbolic Regression Problem
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 88
0 10 20 30 40 50 60
0
10
20
30
40
50
60
70
Generation
C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
GP
GGGP
TAG3P
Figure 5.5: Cumulative Frequencies for 6-Multiplexer Problem
0 10 20 30 40 50 60
0
10
20
30
40
50
60
70
80
90
100
Generation
C
u
m
u
la
t
iv
e

F
r
e
q
u
e
n
c
y
GP
GGGP
TAG3P
0 10 20 30 40 50 60
0
10
20
30
40
50
60
70
80
90
100
Generation
C
u
m
u
la
t
iv
e

F
r
e
q
u
e
n
c
y
GP
GGGP
TAG3P
Cos x + 2x + 1 4x
3
+ 3x
2
+ 2x + 1
Figure 5.6: Cumulative Frequencies for Symbolic Integration Problem
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 89
0 10 20 30 40 50 60
0
10
20
30
40
50
60
70
80
90
Generation
C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
GP
GGGP
TAG3P
Figure 5.7: Cumulative Frequencies for Symbolic Dierentiation Problem
complexity bounds of their individual programs. This problem is investigated in
more detail in chapter 10.
In the following section, some further analyses of the TAG3P runs on two of
the eight problem instances above are given and discussed.
5.4 Some further analyses on TAG3P
In this section, some further analyses of the runs in the previous section are pre-
sented. Some properties of TAG3P inherited from the TAG-based representation
mentioned in the previous chapter are also used to give more information on the
behaviour of TAG3P during the evolutionary process. The runs of two problem
instances used are 6-multiplexer and symbolic regression with function F
2
.
Figures 5.8 and 5.9 depict the time series over generation of the average t-
ness of the population and the average tness of the best individual for the two
problems. These gures show that the TAG3P population converges to better
and better tness over time.
As mentioned in the previous chapter, thanks to the non-xed arity property
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 90
0 10 20 30 40 50 60
1
0
1
2
3
4
5
Generation
L
o
g

A
v
e
r
a
g
e

F
it
n
e
s
s
0 10 20 30 40 50 60
5
10
15
20
25
30
Generation
A
v
e
r
a
g
e

F
it
n
e
s
s
Symbolic regression 6-Multiplexer
Figure 5.8: Average Fitness of the population
0 10 20 30 40 50 60
0
0.5
1
1.5
2
2.5
3
Generation
B
e
s
t

F
it
n
e
s
s
0 10 20 30 40 50 60
2
4
6
8
10
12
14
16
Generation
B
e
s
t

F
it
n
e
s
s
Symbolic Regression 6-Multiplexer
Figure 5.9: Average tness of the best in the population
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 91
in the TAG-based representation, it is also possible to measure how each subcode
(subtree) contributes to the overall tness of the program containing it. There-
fore, it is possible to conduct a study on the subcode level in order to gain more
understanding of the dynamics of TAG3P during the evolutionary process.
The rst study is to investigate the complexity ratio between active subcode
and the whole individual. A subcode is called inactive if, when it is removed,
the overall tness of the individual containing it does not change; otherwise, it is
called an active subcode. Figures 5.10 and 5.11 show, on the two problems, the
evolution of program size and active subcode size in the population, averaging
over both the whole population and also over just the best individuals.
From the Figures 5.10 and 5.11, there is a clear trend for programs in TAG3P
to bloat to the maximal allowed size. The size of active (tness contributing)
subcodes evolves on a dierent trend. For the rst half of the evolutionary pro-
cess, the size of active subcodes increases (both for the best individual and for
the population). Presumably, this is because most of the solutions are found in
the rst half of the evolutionary process, the active size of the subcode needs to
increase to accumulate useful subcodes, which are partial solutions. Indeed, by
examining some typical successful runs, we found that the most frequent partial
subcodes appearing in the best subcodes detected by the methods described in
the previous chapter, during this period of evolution, were X+(X) or X+(X)
for the symbolic regression problem, and if(a0, d1, ) for the 6-multiplexer prob-
lem. These are all components of full solutions. For the second period, after
the solutions were found, there is a pressure toward solutions with smaller and
smaller sizes of active subcodes. This can be explained by the schema theorem
in chapter 7, where the active subcode belongs to some useful schemata, and in
order to survive they need to be guarded by more and more inactive subcode,
so that the ratio o(H)/n(H) (o(H) is the size of the schema and n(H) is the
size of the individual that matches H) increases. This is some what similar to
conclusion on bloat in [Ban1998]. It is noted that the size ratio of active subcodes
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 92
0 10 20 30 40 50 60
5
10
15
20
25
30
35
Generation
A
v
e
r
a
g
e

S
iz
e
0 10 20 30 40 50 60
4
6
8
10
12
14
16
18
20
22
Generation
A
v
e
r
a
g
e

A
c
t
iv
e

S
iz
e
Average size for Average sizes of active
the whole population subcodes in the whole population
0 10 20 30 40 50 60
5
10
15
20
25
30
35
Generation
A
v
e
r
a
g
e

B
e
s
t

S
iz
e
0 10 20 30 40 50 60
4
6
8
10
12
14
16
18
Generation
A
v
e
r
a
g
e

A
c
t
iv
e

B
e
s
t

S
iz
e
Average size of Average sizes of active
the best individual subcodes in the best individual
Figure 5.10: Evolution of size in the runs for the symbolic regression problem
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 93
0 10 20 30 40 50 60
5
10
15
20
25
30
35
Generation
A
v
e
r
a
g
e

S
iz
e
0 10 20 30 40 50 60
3
3.5
4
4.5
5
5.5
6
6.5
7
Generation
A
v
e
r
a
g
e

A
c
t
iv
e

S
iz
e
Average size for Average sizes of active
the whole population subcodes in the whole population
0 10 20 30 40 50 60
5
10
15
20
25
30
35
Generation
A
v
e
r
a
g
e

B
e
s
t

S
iz
e
0 10 20 30 40 50 60
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
Generation
A
v
e
r
a
g
e

B
e
s
t

A
c
t
iv
e

S
iz
e
Average size of Average sizes of active
the best individual subcodes in the best individual
Figure 5.11: Evolution of size in the runs for the 6-Multiplexer problem
CHAPTER 5. TAG3P: PRELIMINARY COMPARISONS 94
versus the whole individual is much smaller in the 6-multiplexer problem than
in symbolic regression. The reason for this is that Boolean domains have more
redundant structures than numerical domains. The average complexity ratio for
the solution when rst found is 0.947% in the symbolic regression problem, and
0.423% in the 6-multiplexer problem.
5.5 Conclusion
In this chapter, the robustness of TAG3P was tested against a number of standard
problem instances, where results were compared with standard GP and CFG-
GP. It has been shown that for the problem instances in this chapter, TAG3P is
very competitive. It outperformed GP and CFG-GP for almost all the problem
instances tried.
The ability to usefully work with subcode tness described in the previous
chapter also helped to extract useful information from the runs in order to gain
insight into the dynamics of the TAG3P evolutionary process.

You might also like