Pacman usingTOC
Pacman usingTOC
2463
tion primarily on the immediate area of the maze where
Pac-Man is currently located. Planning a long sequence of
moves in advance would not only he very difficult for a hu-
man, hut also ineffective, since the dynamics of the game
would he likely to make such a strategy poor and irrelevant.
Our initial approach for an artificial Pac-Man agent is there-
fore to view play as a small number of possible turn types,
together with basic global information such as the position
and movement of the ghost. Figure 2 categorizes the en-
tire maze in terms of turn types, including straight corridors. Agent0 {
Note that the representation used below considers turn types while (game is in play)
from the current orientation of Pac-Man, so that the “true” currentdistance = Distance(pacman.ghos1)
if currentdistance > then
orientation of a turn type does not need to be directly con-
sidered (e.g.. vertical versus horizontal corridors). Explore
else
Retreat
end
1
Explore() {
switch(1urntype) {
case “conidor”
newdir = Random(P,prevdir,turntype)
case ”L-turn”
newdir = Random(P,prevdir,turntype)
case ‘7-turn“
newdir = Random(P,[Link],onentalion)
case “Intersection“
newdir = Random(P,[Link])
}
1
Retreat0 {
switch(tumtypei {
case “corridor”
Figure 2 : The first Pac-Man maze in terms of turn types. newdir = Random(P,prevdir,tumtype,ghostpos)
Some types are more common than others. case “L-turn”
newdir = Random(P,prevdir,tumtype,[Link]
At each time step (tick) of the game. the agent is located
case “T-turn”
in an instance of a certain turn-type, and needs to produce
newdir = Random([Link],orientation,ghostpos)
an output that becomes the current direction of movement
case “Intersection”
for Pac-Man. Depending on the turn-type of the current po-
newdir = Random(P,prevdir,turnrype,ghostpos)
sition (corridor, L-turn, T-junction, intersection), there are
a number of different feasible directions to move. including }
maintaining the current direction (but excluding running di- I
rectly into wails). In the Explore state, a probabilistic de-
cision is made to choose one of the feasible directions de- Figure 3: Pseudo-code for the Pac-Man agent strategy.
pending on the turn type. As a result, Pac-Man is ahle to
explore the maze in a pseudo-random fashion. Exploration
does not use any other game information to determine the
agent’s output. In the Retreat state, the agent additionally
considers the location of the ghost to determine the output.
This results in the need to consider a number of possible
turn-type and ghost position combinations. We classified
the ghost’s position as being either forward, hack, left, right,
forward-left, forward-right, back-left, or hack-right relative
2464
to Pac-Man. Note that the first four classifications are de-
termined using only one coordinate dimension. For exam-
ple. if Pac-Man is currently facing north (up) on the screen,
the ghost would be classified as being "back if its current
y-coordinate is less than Pac-Man's current y-coordinate.
Adding the final four classifications are one way of refining
this information. For the same example (Pac-Man facing
north), the ghost's location will be classified as follows:
2465
4 Simulation Methodology I Vector 1 Mean I Std. Dev. I Min. 1 Max. 1
4.1 Fitness Function
1 P+,I 1 0.215 1 0.080 I0.104 I 0.511 1
Ph2 10.613 1 0.316 10.187 1 1.699
For our simplified Pac-Man game, a fitness function was Ph3 I 1.372 1 0.522 I 0.276 I 2.062
developed based primarily on the score obtained by Pac-
Man. The only way tn score points in this simplified game Table 2: Results for hand-coded parameter vectors tested,
is to eat dots from the mare, and since there is a fixed initial summarizing fitness values from 50 games with each pa-
number of dots in a maze. the maximum possible score for rameter vector.
each maze is known a priori. It was considered that the
time Pac-Man manages to survive should also be a factor in The probability vector was initialized with pl = 15
the fitness function. Note however that there also exists a - this value p l chosen such that the agent initially spent
possibility that Pac-Man may successfully avoid the ghost roughly equal amounts of time in the Explore and Retreat
indefinitely, hut fail to clear the maze of dots. Because of states. pl was allowed to evolve as a continuous value, al-
this a term was added to reward the time survived by Pac- though the Manhattan distance between Pac-Man and the
Man. but imposing a pre-chosen limit on this factor. The ghost is integer-valued. The remaining parameters repre-
fitness function used is: sented probabilities for subsets of variables. so they were
initialized with uniform probability for each feasible output
direction of the agent (e.g. p2--3 = 0.5. pls-ls = 0.25).
These parameters were re-normalized after being updated at
The score and time factors in the fitness function are nor- each generation by the PBIL learning rule. Note that PBIL
malized by their maximum (respectively, known and cho- evolves a probabilistic model of the search space which con-
sen) values for each level. Hence, the maximum fitness for verges towards a locally or globally optimal value. In this
each level is 2. paper we interpret the probability vector learnt by the al-
The movement of the ghost in the game has a small gorithm as the realization of our evolved ”agent”, and a
amount of randomness, and the agent developed above population-based search is used to perform this task.
clearly produces stochastic movement for Pac-Man. As a
consequence, the fitness value of each game given a fixed 5 Results
P will also he stochastic. In such a situation it is typical
to conduct repeated evaluations of the fitness function with 5.1 Hand-coded Agent Parameters
a given parameter set, and produce an average fitness value
A feature of the agent implementation described above is
to be used by the evolutionary algorithm. In the simulations
below, I O games were played per average fitness evaluation. that it is transparent: each parameter has a clear function in
the control of the output of the agent. It is therefore possible
Recall that Pac-Man has three lives in a,game, meaning that
to experiment with an agent of hand-coded parameters. The
Pac-Man “runs“ in the maze 30 times for each fitness eval-
rule sets allow for basic explore and retreat behaviour only,
uation used in the EA.
so the hand-coded parameters were chosen to try and pro-
duce an agent with the ability to explore the maze widely in
4.2 Algorithm
the “Explore” state. and to retreat from the ghost to avoid
The algorithm used tn learn the parameters of an agent is being eaten in the “Retreat” state. Results for the different
an implementation of population-based incremental learn- hand-coded parameter vectors tested are shown in Table 2.
ing (PBIL) for continuous search spaces [ I , IO, I I ] . PBIL Consider firstly an agent. P h l . with equal probabilities
replaces the conventional genetic operators of mutation and assigned to each parameter in each relevant subset of val-
recombination with a probability vector, which is used to ues. The behaviour of this agent was observed to he highly
generate each population and is updated via a learning rule erratic, because equal probabilities mean that Pac-Man is
based on the single best individual in the current popula- just as likely to reverse his direction at any point in time as
tion. Each component of the probability vector represents he is to choose any other feasible direction. As a result, the
the mean of a Gaussian distribution. In our experiments the movement of Pac-Man is dominated by oscillations (e.g.,
population size was set to 25 (computational time prevented forwards-backwards in a corridor) around the current point.
a larger value). For the standard deviations of the Gaussians This is reflected in the low fitness values observed for agent
and the PBIL learning rate parameter. a,several different P h l in Table 2.
values were tested. Note that for a = 1, this algorithm is For the next parameter vector tested (&). in the Ex-
equivalent to a simple (1,A) evolution strategy with a con- plore state a lower probability was assigned to reversing the
stant standard deviation value for all variables [4] current direction, with probabilities for all other feasible di-
2466
16
rections assigned uniformly (e.g. for a corridor to move for-
wards and backwards respectively,pz = 0 . 8 , ~ = ~0.2; for . .
14 . . .. .
an intersection to move forwards, backwards. left and right
. . .. .. .:, ..
i
2467
that can consider only “localized information of the game
is a factor that deserves further consideration. Finally, the
fitness function used is based on both score achieved and
time taken. The impact of the time factor on our results is
not clear and it may he possible to remove this from the
fitness function with no adverse effects.
Nevertheless. we believe that the approach taken here
could be useful as a benchmark in considering different rep-
resentations and approaches to evolving a Pac-Man playing
agent. It is expected that future work would need to he
able to improve on the performance (fitness) of the agent.
However it will also he interesting to compare the com-
plexity (dimensionality, interpretability, computational re-
quirements) of alternative approaches, to the rule-based ap-
proach developed above. Finally, we conjecture that general
aspects of the approach taken here to developing an adap-
Figure 5: Evolution of the minimum (crosses), mean (line) tive agent for Pac-Man may eventually lead to techniques
and maximum (dots) fitness values in the population for that can be applied successfully to other real-time computer
PBIL with learning rate = I .O (i.e. ( I .25)-ES). games.
2468
[7] J. Koza. Genetic Progratiiming: On rhe Prograniniing
of Contpirter by Mearrs of Natiri-al Selection. MIT
Press, 1992.
2469