5/27/2010
GENE EXPRESSION
PROGRAMMING
Dr. Sundaram Suresh
School of Computer Engineering
Nanyang Technological University
g p
Singapore
Email: [email protected]
Textbook
2
Candida Ferreira, Gene Expression Programming:
Mathematical Modeling by an Artificial Intelligence,
Intelligence Angra
do Heroismo, Portugal. 2002
Weblink
http://www.gene-expression-programming.com/
www.gepsoft.com
GEP code can be found in
http://jgep.sourceforge.net/
http://www.gene-expression-programming.com/Downloads.asp
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 2
1
5/27/2010
What is Gene Expression
3
Programming?
GEP is also an evolutionary based algorithm.
Gene expression programming is developed by
incorporating both the idea of simple, linear
chromosomes of fixed length used in GAs and the
ramified structures of different sizes and shapes
used in GP.
Genes - codes for a smaller program or sub-
expression tree.
The structure of chromosomes in GEP was
designed to allow the creation of multiple genes.
It is worth emphasizing that GEP is the only
genetic algorithm with multiple genes.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 3
GEP
4
GA – does not represent the I/O relationship
mathematically
a e a ca y
GP – complexity in genetic operators and
increase in tree length due to genetic operation.
GEP – is combination GA string representation
and GP mathematical expression
GEP uses genetic operators in GA to change the
t
tree length.
l th But,
B t here
h the
th length
l th off the
th string
ti
remains the same.
GEP is based on human genome…
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 4
2
5/27/2010
GEP Representation
5
We want to represent the
arithmetic expression
Chromosome made of
genes
Function set arguments (n)
Gene – head and tails
Heads (h) are specified
for a given problem
T il are calculated
Tails l l db basedd
on number of heads and
t = h(n-1)+1
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
GEP Representation
6
Arithmetic expression - Gene Equivalent – K-
E pression
Expression
0 1 2 3 4 5 6 7
Q * + - a b c d
Three Gene Representation
p
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 6
3
5/27/2010
Tree Construction of Three Gene
7
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 7
Biological process in Gene
8
The main operations that occur in a Genome are:
Genome
G Replications
R li i
Genome restructuring
Transcriptions
Translation and post-translation modifications
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 8
4
5/27/2010
Replication
9
Replication of DNA
molecules.
The strands acts as a
template for a new,
complementary strand.
When copying is complete,
there will be two daughter
DNA molecules, each
identical in sequence to the
mother molecule.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
Genome Restructuring
10
This operation modify the gene structure.
Introduce g
genetic diversityy
Similar to GA and GP, in GEP also, populations of individuals
(computer programs) evolve by developing new abilities and
becoming better adapted to the environment due to the
genetic modifications accumulated over a certain number of
generations.
Mutation
Recombination
Transposition
Gene duplication
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 10
5
5/27/2010
Mutation Operation
11
Basic mutation operations
are:
Substitution (s)
Deletion (d)
Insertion (i)
The base strand (mother) is
modified using these three
operations
ti tto generate
t
daughters
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
Recombination
12
Different recombination operators:
Homologous
Site
Sit specific
ifi
Non-homologous
Transposition and genetic duplication
Transposition - genetic elements consist of genes that can move from place to
place within the genome.
Genetic duplication - a gene is copied twice during replication.
duplicated through the combined effects of gene transposition and recombination
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 12
6
5/27/2010
Solution Representation
13
Like in GP, GEP the chromosomes (solutions) are represented
using function set and terminal set.
In GEP – chromosomes are represented using genes
Genes – heads and tails
Heads are coded using function and terminal set
Tails are coded using only terminal set.
Let F be function set, F = {*,+,-,Q}, where
Q – square root
Let T be terminal set, T = {a,b,c,d}
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 13
Solution Representation
14
We want to represent the
arithmetic expression
Ch
Chromosome made
d off genes
Max. of arguments for the
elements in the Function set (n)
Gene – head and tails
Heads (h) are specified for a
given problem
Tails are calculated based on
number
b off heads
h d and d
t = h(n-1)+1
Arithmetic expression - Gene 0 1 2 3 4 5 6 7
Equivalent – K-Expression
Q * + - a b c d
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
7
5/27/2010
Example
15
K – expression
Equivalent Tree is
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
Representation…
16
The structural organization of GEP genes is better
understood in terms of open reading frames (ORFs).
In biology, an ORF, or coding sequence of a gene,
begins with the “start” codon, continues with the amino
acid codons, and ends at a termination codon.
However, a gene is more than the respective ORF, with
sequences upstream from the start codon and sequences
downstream from the stop codon.
Although in GEP the start site is always the first
position
iti off a gene, the
th termination
t i ti point i t does
d nott
always coincide with the last position of a gene.
It is common for GEP genes to have non-coding regions
downstream from the termination point.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 16
8
5/27/2010
Use of non-coding region
17
They are, in fact, the essence of GEP and
evolvability for they allow modification of the
evolvability,
genome using any genetic operator without
restrictions, always producing syntactically correct
programs without the need for a complicated
editing process or highly constrained ways of
implementing genetic operators.
Indeed, this is the paramount difference between
GEP and previous GP implementations, with or
without linear genomes
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 17
Example with Head and Tail
18
Function set : {Q,+,-,*,/}
Terminal set : {a,b}
F ti arguments
Function t =2
Head = 10
Tail = 10(2-1)+1=11
The K- expression
The bold face represent
the tail.
Here ORF ends at 10,
where as the gene end at
20.
ORF is phenotype
representation
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
9
5/27/2010
Use of Non-Coding Region
19
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
Use of Non-Coding Region
20
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
10
5/27/2010
Use of Non-Coding Region
21
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
MultiGene Representation
22
We saw the representation for single gene
representation
representation.
Now, we discuss the multi-gene representation for
the chromosome
Number of genes can be greater than one in a
chromosome.
For all problem, number of genes and number of
heads are fixed prior.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 22
11
5/27/2010
Three Gene Representation
23
Three genes
n = 2;; h = 4
K-Expressions for
Gene1 :
Gene2 :
Gene3 :
Position ‘0’ is the start of the
gene and position ‘8’ is the end
of the gene.
The ORF ending of each gene
can be calculated after tree
construction.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
Three Gene Representation
24
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 24
12
5/27/2010
Tree Construction
25
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 25
Translation
26
The process of converting the K-expressions into tree (ET) and
reducing it to mathematical form is called Translation.
GEP chromosomes are composed of one or more ORFs, and
obviously the encoded individuals have different degrees of
complexity.
The simplest individuals are encoded in a single gene, and the
ìorganismî is, in this case, the product of a single gene - an ET.
In other cases, the organism is a multi-subunit ET, in which the
different sub-ETs are linked together by a particular function.
In other cases, the organism emerges from the spatial organization
of different sub-ETs (e.g., in planning and problems with multiple
outputs).
And in yet other cases,
And, cases the organism emerges from the interactions
of conventional sub-ETs with different domains (e.g., neural
networks). However, in all cases, the whole organism is encoded in a
linear genome.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 26
13
5/27/2010
Linking Function
27
In case of multi-gene representation, the genes are
linked together to form a mathematical eexpression.
pression
The link functions are defined apriori.
This function will not appear in the chromosome
representation.
Now, we show an example for better understanding.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 27
Example 1
28
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 28
14
5/27/2010
Example with + Link
29
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 29
Analysis
30
Can we represent the final tree with K-Expression?
Yes
Yes.
What is the equivalent K-expression?
Issues?
It is difficult to use the g
genetic operators
p to evolve because of less
number of tails
Multi-genes representation – faster convergence
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 30
15
5/27/2010
Example 2 for Non-coding region in
31
multi-gene
Chromosome has two genes
Head = 3 and tail = 4
Operators: N – NOT, O – OR
Fig a) represent the K
expression
Fig b) the first operator is the
connecting operator
In gene1 – OOcacab – ‘the
last two character ‘ab’
b l
belongs to
t non-coding
di region i
Gene1
In gene2 – NNNbbcb – the
last three character ‘bcb’ Gene2
belongs to non-coding region
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010
Example 3
32
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 32
16
5/27/2010
Example 4
33
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 33
Points to Remember
34
The type of linking function, as well as the number of
genes and the length
g g of each gene,
g are a ppriori chosen
for each problem.
So, we can always start by using a single gene
chromosome, gradually increasing the length of the
head; if it becomes very large, we can increase the
number of genes and of course choose a function to link
them.
We can start with addition or OR,
OR but in other cases
another linking function might be more appropriate.
The idea, of course, is to find a good solution, and GEP
provides the means of finding one.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 34
17
5/27/2010
Mutation
35
Mutations can occur anywhere in the chromosome.
However the structural organization of
However,
chromosomes must remain intact.
In the heads any symbol can change into another
(function or terminal); in the tails terminals can only
change into terminals.
This way,
y, the structural organization
g of chromosomes
is maintained, and all the new individuals produced
by mutation are structurally correct programs.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 35
Mutation – Mother genome
36
K – Expression – Equation 3.5
Equivalent ET-tree
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 36
18
5/27/2010
Daughter Genome
37
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 37
Mutation…
38
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 38
19
5/27/2010
Neutral Mutation
Mutation occur in non-coding
region is called neutral mutation.
This mutation does not affect the
ET of mother and daughter.
ORF ends at position 7 of the
head
Suppose, mutation occur at tail
position 9.
Change ‘a’a to ‘b’
b
The ‘phenotype’ of daughter
genome is same as mother.
Workshop on bio-inspired Computing,
39
VTU, Mysore, 7-10, June, 2010
Comments on mutation
40
If a function is mutated into a terminal or vice versa,
g
or a function of one argument is mutated into a
function of two arguments or vice versa, the ET is
modified drastically.
The change in tree size take place with-out
increasing the computational complexity.
It is worth noticing that in GEP there are no
constraints neither in the kind of mutation nor the
number of mutations in a chromosome: in all cases
the newly created individuals are syntactically
correct programs.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 40
20
5/27/2010
Comments on mutation…
41
In nature, a point mutation in the sequence of a gene can
slightly change the structure of the protein or not change it at
all, as neutral mutations are fairly frequent (e.g., mutations in
introns, mutations that result in the same amino acid due to the
redundancy of the genetic code, etc.).
In GEP also neutral mutations exist (e.g., mutations in the non-
coding regions).
A mutation in the coding sequence of a gene has a much more
profound effect: it usually drastically reshapes the ET.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 41
Transposition and insertion sequence elements
42
The transposable elements of GEP are fragments of the
genome that can be activated and jump to another place in
the
h chromosome.
h
A) Short fragments with a function or terminal in the first position
that transpose to the head of genes, except to the root (insertion
sequence elements or IS elements).
B) Short fragments with a function in the first position that
transpose to the root of genes (root IS elements or RIS elements).
C) Entire genes that transpose to the beginning of chromosomes.
This operator will be useful only when the chromosome has
more than one gene.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 42
21
5/27/2010
A) Transposition…
43
Any sequence in the genome might become an IS element,
therefore these elements are randomly selected throughout the
chromosome.
A copy of the transposition is made and inserted at any
position in the head of a gene, except at the start position.
Typically, an IS transposition rate (pis) of 0.1 and a set of three
IS elements of different length are used.
Th transposition operator randomly
The d l chooses
h the
h chromosome,
h
the start of the IS element, the target site, and the length of the
transposition.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 43
A) Transposition…
Suppose that the sequence “bba” in gene 2 (positions 12 through 14) was
chosen to be an IS element, and the target site was bond 6 in gene 1 (between
positions 5 and 6).
Then, a cut is made in bond 6 and the block “bba” is copied into the site of
insertion.
During transposition, the sequence upstream from the insertion site stays
unchanged, whereas the sequence downstream from the copied IS element
loses, at the end of the head, as many symbols as the length of the IS element
(in this case the sequence “a*b” was deleted).
--Mother
-- Daughter
Workshop on bio-inspired Computing,
44
VTU, Mysore, 7-10, June, 2010
22
5/27/2010
B) Root Transposition
45
All RIS elements start with a function, and thus are chosen
among the sequences of the heads. For that, a point is
randomly chosen in the head and the gene is scanned
downstream until a function is found.
This function becomes the start position of the RIS element.
If no functions are found, it does nothing.
Typically a root transposition rate (pris) of 0.1 and a set of
three RIS elements of different sizes are used.
This operator randomly chooses the chromosome, the gene to
be modified, the start of the RIS element, and its length.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 45
Root…
46
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 46
23
5/27/2010
Root…
47
During root transposition, the whole head shifts to accommodate the RIS
element, losing, at the same time, the last symbols of the head (as many as
the transposon length).
As with IS elements, the tail of the gene subjected to transposition and all
nearby genes stay unchanged.
Note, again, that the newly created programs are syntactically correct
because the structural organization of the chromosome is maintained.
The modifications caused by root transposition are extremely radical,
because the root itself is modified.
In nature, if a transposable element is inserted at the beginning of the
coding sequence of a gene
gene, causing a frameshift mutation,
mutation it radically
changes the encoded protein.
Like mutation and IS transposition, root insertion has a tremendous
transforming power and is excellent for creating genetic variation.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 47
c) Gene Transposition
48
In gene transposition an entire gene functions as a transposon
and transposes itself to the beginning of the chromosome.
In contrast to the other forms of transposition, in gene
transposition the transposition (the gene) is deleted in the place
of origin.
This way, the length of the chromosome is maintained.
The chromosome to undergo gene transposition is randomly
chosen,
h and
d one off its genes (except
( the
h first,
f obviously)
b l ) is
randomly chosen to transpose.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 48
24
5/27/2010
Gene…
49
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 49
Recombination operator
50
Three type of recombination operator
One-point
O i
Two-point
Gene
Two parents are randomly chosen and paired to
exchange the genetic material between them.
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 50
25
5/27/2010
One-point
51
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 51
Two-point
52
In two-point recombination the chromosomes are
paired and the two points of recombination are
randomly chosen.
The material between the recombination points is
afterwards exchanged between the two
chromosomes, forming two new daughter
chromosomes.
h
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 52
26
5/27/2010
Two-point…
53
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 53
Gene Recombination
54
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 54
27
5/27/2010
Observation
55
Workshop on bio-inspired Computing, VTU, Mysore, 7-10, June, 2010 55
28