0% found this document useful (0 votes)
35 views8 pages

Pacman usingTOC

Uploaded by

krotyyedge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views8 pages

Pacman usingTOC

Uploaded by

krotyyedge
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Learning to Play Pac-Man: An Evolutionary, Rule-based Approach

Marcus Gallagher Amanda Ryan


[Link] [Link].a~

School of Information Technology and Electrical Engineering


University of Queensland 4072
Australia

Abstract- Pac-Man is a well-known, real-time computer


game that provides a n interesting platform for research.
This paper describes a n initial approach t o developing
a n artificial agent that replaces the buman t o play a sim-
plified version of Pac-Man. The agent is specified a s
a simple finite state machine and ruleset, with p a r a m -
eters that control the probability of movement by the
agent given the constraints of the maze a t some instant
of time. In contrast to previous approaches, the agent
represents a dynamic strategy for playing Pac-Man,
r a t h e r than a pre-programmed maze-solving method.
T h e agent adaptively “learns” through the application
of population-based incremental learning (PBIL) to ad-
just the agents’ parameters. Experimental results a r e
presented that give insight into some of the complexities
of the game, as well as highlighting the limitations a n d
difficulties of the representation of the agent.
Figure 1: The starting position of the Pac-Man game. show-
ing the maze structure, Pac-Man (lower-center), power pills
1 Introduction (large dots). dots (small dots) and ghosts (center).
Pac-Man is a well-known, real-time arcade computer game
originally developed by Toru lwatani for the Namco Com- non-player game characters‘. This kind of game requires
pany in 198 I . Different versions of the game have been de-
dynamic control of the game agent by the human player
veloped subsequently for a large number of home computer, and involves task prioritization, planning, and risk assess-
game-console, and hand-held systems. ment. While il is relatively easy for a human to learn a basic
The typical version of Pac-Man is a one-player game strategy for Pac-Man, the game hascomplex aspects that al-
where the human player maneuvers the Pac-Man character low the possihility of developing more intelligent strategies
around a maze, attempting to avoid four “ghost” characters (in conjunction with other skills such as hand-eye coordina-
while eating dots initially distributed throughout the maze. tion). It is also a challenge for a person to describe precisely
If Pac-Man collides with a ghost, he loses one of his three their Pac-Man-playing strategy, or to represent such a strat-
lives and play resumes with the ghosts reassigned to their egy formally (e.g., as a set of rules).
initial starting location (the “$host cage” in the centre of This paper describes an initial approach to developing an
the maze). Four “power pills” are initially positioned near
artificial agent that replaces the human playing Pac-Man.
each corner of a maze: when Pac-Man eats a power pill he For this work. the game bas heen significantly simplified
is able to turn the tables and eat the ghosts for a few sec- to having only a single ghost, dots and the Pac-Man agent
onds of time. The game ends when Pac-Man has lost all present in the mazes. The agent is specified as a simple
(usually 3) of his lives. Figure I shows a screen-shot of finite state machine and ruleset, with parameters that spec-
the starting position of the first maze in the game. Pac-Man ify the state transition and the probabilities of movement
is a real-time computer game that resembles a simplified according to each rule. Section 2 provides an overview of
version of many modern first-person environment computer previous work relevant to Pac-Man and AI game playing,
games. That is, the game is centered around navigating
the player character around a semi-structured world. accu- ‘Alternatively. other game characters might be controlled by other hu-
mulating points. avoiding and (when appropriate) attacking man players in a multi-player setting.

0-7803-78044 /03/$17 00 0 2003 IEEE 2462


while Section 3 describes the evolutionary, rule-based ap- eralizable strategies for good game play, based not on the
proach taken in this paper. Some experimental details are exact structure of a given maze hut rather on more general
discussed in Section 4. Results are presented in Section 5 . principles of good strategies to play Pac-Man.
and Section 6 provides a summary and some discussion of
possihle future work. 3 Designing an Artificially Intelligent Agent
that learns to play Pac-Man Inductively
2 Artificial Intelligence Approaches to Pac-
Man Pac-Man is a simple game to describe and play. Human
players have little difficulty in learning the basics of game
A relatively small amount of previous research has been play, which are to obtain as many points as possible (by
done toward the application of artificial intelligence to Pac- clearing mazes of dots and eating power pills and ghosts),
Man or similar games. Koza [7] and Rosca [9] use Pac-Man whilst avoiding the ghosts. However there is not obviously
as an example problem domain to study the effectiveness of one correct way of specifying such a strategy precisely, per-
genetic programming for task prioritization. Their approach haps because it is largely based on perception of the visual
relies on a set of predefined control primitives for percep- presentation of the game and its dynamics in time.
tion. action and program control (e.g., advance the agent on Moving heyond this basic approach makes clear the full
the shortest path to the nearest uneaten power pill). The pro- complexity of the game domain. A large point incentive is
grams produced represent procedures that solve mazes of a provided for eating power pills, followed by eating ghosts in
given structure. resulting in a sequence of primitives that the few seconds that the power pill remainseffective. hut the
are followed. Genetic Programming has also been applied ghosts generally try to retreat from Pac-Man in this time to
to the computer game Tron [3], where coevolution was used avoid heing eaten. Thus an effective strategy might involve
to produce agents that learn strategies by playing human op- attracting ghosts to the area of the maze near a power pill
ponents over the internet. hefore eating it. The ghosts movements are dependent on
Gugler [ 5 ] describes "Pac-Tape", a project which at- each other and typically contain an element of randomness,
tempted to produce a self-playing Pac-Man based on the making it difficult to predict their behaviour and plan around
original Pac-Man arcade game emulated on a desktop PC. it.
The approach appears to he based on brute force search, Our approach here is to initially remove much of the
hut is not 'described or tested. Lawrence [SI builds on the complexity from thc game to see if it is possible to produce
Pac-Tape work, hut applies a genetic algorithm to evolve an effective agent in a simplitied Pac-Man environment. In
a string of directions (north. south. east, west) to traverse a the experiments discussed below, only the dots and a sin-
maze. The aim was to evolve interesting patterns for solving gle ghost are present in a maze with Pac-Man. While this
particular mazes. Unfortunately, success was limited, per- clearly removes several interesting aspects of game-play, we
haps due to unfavourable interactions between the genetic believe that what remains (learning to avoid the ghost and
operators (esp. crossover) and the representation used. perhaps to include random exploration) is still a fundamen-
Kalyanpur and Simon [6] use a genetic algorithm to try tal aspect of the full Pac-Man game.
to improve the strategy of the ghosts in a Pdc-Man-like We were motivated to begin by using a simple. trans-
game. Here the solution produced is also a list of directions parent representation for the agent. A representation that
to he traversed. A neural network is used to determine suit- was potentially capable of capturing the general aspects of
ahle crossover and mutation rates from experimental data. a basic human strategy provides a good platform on which
Finally. De Bonet and Stauffer [ 2 ] describe a project using to test the feasibility of using an evolutionary approach to
reinforcement learning to develop strategies simultaneously learning, as well as serving as a benchmark for comparing
for Pac-Man and the ghosts, by starting with a small, simple future work. It was also interesting to test if the evolutionary
maze structure and gradually adding complexity. algorithm would generate an agent with an obvious strategy.
The aim of our research is to investigate techniques to These considerations lead to the agent heing represented as
develop an adaptive agent that learns inductively to play a two-state finite state machine, with a set of probabilistic
Pac-Man. Our approach is aimed at producing agents that rules for each state that control the movement of the agent.
can learn based only on the information that is availahle At any given time. the state of the agent is determined by the
to a human playing the game (or quantities that a human distance between Pac-Man and the ghost. If this distance is
could conceivably estimate in real-time). In particular, the greater than some given value, p l , the agent is in the Ex-
agent should not he able to exploit internal knowledge about plore state, while if the distance is less than p l , the agent
the game (e.& the programmed behaviour of the ghosts) - switches to the Retreat state.
the (software) agent should he considered external from the A human player is able to see the entire maze during
game software. Furthermore, this agent should learn gen- game play, hut dynamically is usually focusing their atten-

2463
tion primarily on the immediate area of the maze where
Pac-Man is currently located. Planning a long sequence of
moves in advance would not only he very difficult for a hu-
man, hut also ineffective, since the dynamics of the game
would he likely to make such a strategy poor and irrelevant.
Our initial approach for an artificial Pac-Man agent is there-
fore to view play as a small number of possible turn types,
together with basic global information such as the position
and movement of the ghost. Figure 2 categorizes the en-
tire maze in terms of turn types, including straight corridors. Agent0 {
Note that the representation used below considers turn types while (game is in play)
from the current orientation of Pac-Man, so that the “true” currentdistance = Distance(pacman.ghos1)
if currentdistance > then
orientation of a turn type does not need to be directly con-
sidered (e.g.. vertical versus horizontal corridors). Explore
else
Retreat
end
1
Explore() {
switch(1urntype) {
case “conidor”
newdir = Random(P,prevdir,turntype)
case ”L-turn”
newdir = Random(P,prevdir,turntype)
case ‘7-turn“
newdir = Random(P,[Link],onentalion)
case “Intersection“
newdir = Random(P,[Link])
}
1
Retreat0 {
switch(tumtypei {
case “corridor”
Figure 2 : The first Pac-Man maze in terms of turn types. newdir = Random(P,prevdir,tumtype,ghostpos)
Some types are more common than others. case “L-turn”
newdir = Random(P,prevdir,tumtype,[Link]
At each time step (tick) of the game. the agent is located
case “T-turn”
in an instance of a certain turn-type, and needs to produce
newdir = Random([Link],orientation,ghostpos)
an output that becomes the current direction of movement
case “Intersection”
for Pac-Man. Depending on the turn-type of the current po-
newdir = Random(P,prevdir,turnrype,ghostpos)
sition (corridor, L-turn, T-junction, intersection), there are
a number of different feasible directions to move. including }
maintaining the current direction (but excluding running di- I
rectly into wails). In the Explore state, a probabilistic de-
cision is made to choose one of the feasible directions de- Figure 3: Pseudo-code for the Pac-Man agent strategy.
pending on the turn type. As a result, Pac-Man is ahle to
explore the maze in a pseudo-random fashion. Exploration
does not use any other game information to determine the
agent’s output. In the Retreat state, the agent additionally
considers the location of the ghost to determine the output.
This results in the need to consider a number of possible
turn-type and ghost position combinations. We classified
the ghost’s position as being either forward, hack, left, right,
forward-left, forward-right, back-left, or hack-right relative

2464
to Pac-Man. Note that the first four classifications are de-
termined using only one coordinate dimension. For exam-
ple. if Pac-Man is currently facing north (up) on the screen,
the ghost would be classified as being "back if its current
y-coordinate is less than Pac-Man's current y-coordinate.
Adding the final four classifications are one way of refining
this information. For the same example (Pac-Man facing
north), the ghost's location will be classified as follows:

hack if (ghost, < pacman,) and (ghost, =


Parameter Description
pacman, )
PI Distance to ghost
back-IeftAeft-back if (ghost,. < pacman,) and Exdore:
(ghost, < pacman,) Corridor: forward, hackward
m-3
back-righthight-hack if (ghost, < pacma%.) and P4-5 L-turn: forward. hackward
(ghost, > p a c m a q ) P6-8 T-turn (a) approach centre
m-11 T-turn (b) approach left
forward if (ghost, > pacman,) and (ghost, = p12--14 T-turn (c) approach right
pacman,) PE-IS Intersection
Retreat:
forward-lefffleft-forward if (ghost, > pwm%) Ply-20 Corridor: ghost forward
and (ghost, < pacman,) Corridor: ghost behind
m1-22

forward-righthight-forward if m3-24 L-turn: ghost forward


(ghost, > pacma%) and (ghost, > pacman,) P25-26 L-turn: ghost hehind
m7-29 T-turn (a): ghost hehind
left if (ghost, = pacman,) and (ghost, < P30-32 T-turn (h): ghost behind
pacma4 Pas-35 T-turn (c): ghost behind
P36-38 T-turn (a): ghost on left
right if (ghost, = pacman,,) and (ghost, > T-turn (b): ghost on left
P39-41
pacmm,) p42--44 T-turn (a): ghost on right
Note also that, for example. back-left is equivalent to left- P45-47 T-turn (h): ghost on right
hack as indicated in the above. Unfortunately, having eight P~S--50 T-turn (h): ghost forward
classes for the position of the ghost for each turn-type means P51-53 T-turn (c): ghost forward
that the ruleset (and the numher of parameters) also gets P54-51 Intersection : ghost forward
larger. Hence in our implementation. eight classes were P58-61 Intersection : ghost behind
used to classify the ghost's position for intersections, with p62-65 Intersection : ghost left
only the first four classes used for other turn-types. Pseudo- P66-69 Intersection : ghost right
code for the agent's strategy is shown in Figure 3. P70-73 Intersection : ghost forwardlleft
Implementation of the agent based on the specifications P74--7i Intersection : ghost forwardhght
above leads to a total of 85 adjustable parameters. collected P7S-81 Intersection : ghost hehindlleft
into a parameter vector P = ( P I , .. . ,pas). that controls the PBZ-S~ Intersection : ghost behindlright
~.
agent's hehaviour. A suh-component of P is used to decide
Table 1: Description of parameter values used in the Pac-
Pac-Man's current (new) direction of movement, depending
Man agent and their usage.
on the previous direction (prevdir), turn type, orientation,
and (in retreat mode) the ghost's location. Parameter p l is
the distance value at which the agent shifts from Explore
to Retreat (and vice-versa), calculated as the direct (i.e., ig-
noring maze walls) Manhattan distance between Pac-Man
and the ghost. All other parameters of P are probability val-
ues, 0 5 p; 5 1, i = 2 , . . .,[Link] I gives a complete
specification of these values.

2465
4 Simulation Methodology I Vector 1 Mean I Std. Dev. I Min. 1 Max. 1
4.1 Fitness Function
1 P+,I 1 0.215 1 0.080 I0.104 I 0.511 1
Ph2 10.613 1 0.316 10.187 1 1.699
For our simplified Pac-Man game, a fitness function was Ph3 I 1.372 1 0.522 I 0.276 I 2.062
developed based primarily on the score obtained by Pac-
Man. The only way tn score points in this simplified game Table 2: Results for hand-coded parameter vectors tested,
is to eat dots from the mare, and since there is a fixed initial summarizing fitness values from 50 games with each pa-
number of dots in a maze. the maximum possible score for rameter vector.
each maze is known a priori. It was considered that the
time Pac-Man manages to survive should also be a factor in The probability vector was initialized with pl = 15
the fitness function. Note however that there also exists a - this value p l chosen such that the agent initially spent
possibility that Pac-Man may successfully avoid the ghost roughly equal amounts of time in the Explore and Retreat
indefinitely, hut fail to clear the maze of dots. Because of states. pl was allowed to evolve as a continuous value, al-
this a term was added to reward the time survived by Pac- though the Manhattan distance between Pac-Man and the
Man. but imposing a pre-chosen limit on this factor. The ghost is integer-valued. The remaining parameters repre-
fitness function used is: sented probabilities for subsets of variables. so they were
initialized with uniform probability for each feasible output
direction of the agent (e.g. p2--3 = 0.5. pls-ls = 0.25).
These parameters were re-normalized after being updated at
The score and time factors in the fitness function are nor- each generation by the PBIL learning rule. Note that PBIL
malized by their maximum (respectively, known and cho- evolves a probabilistic model of the search space which con-
sen) values for each level. Hence, the maximum fitness for verges towards a locally or globally optimal value. In this
each level is 2. paper we interpret the probability vector learnt by the al-
The movement of the ghost in the game has a small gorithm as the realization of our evolved ”agent”, and a
amount of randomness, and the agent developed above population-based search is used to perform this task.
clearly produces stochastic movement for Pac-Man. As a
consequence, the fitness value of each game given a fixed 5 Results
P will also he stochastic. In such a situation it is typical
to conduct repeated evaluations of the fitness function with 5.1 Hand-coded Agent Parameters
a given parameter set, and produce an average fitness value
A feature of the agent implementation described above is
to be used by the evolutionary algorithm. In the simulations
below, I O games were played per average fitness evaluation. that it is transparent: each parameter has a clear function in
the control of the output of the agent. It is therefore possible
Recall that Pac-Man has three lives in a,game, meaning that
to experiment with an agent of hand-coded parameters. The
Pac-Man “runs“ in the maze 30 times for each fitness eval-
rule sets allow for basic explore and retreat behaviour only,
uation used in the EA.
so the hand-coded parameters were chosen to try and pro-
duce an agent with the ability to explore the maze widely in
4.2 Algorithm
the “Explore” state. and to retreat from the ghost to avoid
The algorithm used tn learn the parameters of an agent is being eaten in the “Retreat” state. Results for the different
an implementation of population-based incremental learn- hand-coded parameter vectors tested are shown in Table 2.
ing (PBIL) for continuous search spaces [ I , IO, I I ] . PBIL Consider firstly an agent. P h l . with equal probabilities
replaces the conventional genetic operators of mutation and assigned to each parameter in each relevant subset of val-
recombination with a probability vector, which is used to ues. The behaviour of this agent was observed to he highly
generate each population and is updated via a learning rule erratic, because equal probabilities mean that Pac-Man is
based on the single best individual in the current popula- just as likely to reverse his direction at any point in time as
tion. Each component of the probability vector represents he is to choose any other feasible direction. As a result, the
the mean of a Gaussian distribution. In our experiments the movement of Pac-Man is dominated by oscillations (e.g.,
population size was set to 25 (computational time prevented forwards-backwards in a corridor) around the current point.
a larger value). For the standard deviations of the Gaussians This is reflected in the low fitness values observed for agent
and the PBIL learning rate parameter. a,several different P h l in Table 2.
values were tested. Note that for a = 1, this algorithm is For the next parameter vector tested (&). in the Ex-
equivalent to a simple (1,A) evolution strategy with a con- plore state a lower probability was assigned to reversing the
stant standard deviation value for all variables [4] current direction, with probabilities for all other feasible di-

2466
16
rections assigned uniformly (e.g. for a corridor to move for-
wards and backwards respectively,pz = 0 . 8 , ~ = ~0.2; for . .
14 . . .. .
an intersection to move forwards, backwards. left and right
. . .. .. .:, ..
i

respectively p1~,17,18= 0.3,p10 = 0.1). For the Retreat


state, zero probability was assigned to the direction that led
to the ghost, with all other feasible options given uniform
probability. This agent had reduced oscillations in its move-
ment and was observed to have some ability to retreat from
the ghost over “chase” sequences of turns (see Table 2 for
results). However, the agent only occasionally comes close
to clearing a maze (surviving long enough to d o so) and still
contains a harmful degree of oscillation in its output.
Finally. a parameter vector (&) was tested that refined
p h 2 by having very low probability values for reversing the
current direction. For example, for a corridor to move for- 100 208 300 rm 100 IM 700
(..nen\rnr
wards and backwards respectively. p~ = 0 . 9 9 , ~=~0.01;
for an inteisection to move forwards. backwards, left and Figure 4: Evolution of the minimum (crosses), mean (line)
right respectively p l s , ~ ,=, ~0.33,ple
~ = 0.01). This agent and maximum (dots) fitness values in the population for
produced improved behaviour (Table 2 ) , typically coming PBIL with a = 0.05.
close to clearing the maze and occasionally doing so (6 out
of the 50 trials produced fitness values above 2.0). More
importantly (since the Explore state for the agent does not shows a smooth learning curve compared to other standard
have any knowledge of the dots in the maze), the agent is deviationllearning rate values that we experimented with.
able to evade the ghost for long periods of time, due to the After 122 generations the population mean is around 0.8.
Retreat state. Nevertheless, the limitations of the agent be- which is better than our hand-coded parameter sets Pl and
came clear from observing the performance of this agent. Pi but below the performance of P3. We were interested in
The ruleset decides the move direction based on the cur- the influence of the PBIL learning rate on the results of the
rent location only. In retreat mode, the position of the ghost algorithm. Figure 5 shows the result of a different experi-
is considered but only crudely measured. As an example, ment. with a = 1.0 (note that less generations have been
consider a situation where Pac-Man is located at one of the performed). This is equivalent to a (1,25)-evolution strat-
extreme corners of the maze (ref. Figure I or Figure 2). For egy. It is evident that this algorithm is able to initially pro-
say, the top-right corner, the ghost will usually br moving vide more rapid improvement in fitness. but after approxi-
towards Pac-Man from a below-left position. Because of mately 100 generations progress is much more noisy than
the limited way that the agent considers the position of the that of Figure 4. In preliminary experiments we observed
ghost, moving left is undesirable because the ghost is to the that a larger standard deviation value had a similar effect.
left, but moving downward is also undesirable because the The results suggest that a kind o f annealing scheme for ei-
ghost is also downward. Oscillation results, until the ghost ther of these parameters may allow the algorithm to con-
becomes very close to Pac-Man. Other situations allow the verge more reliably. Overall the performance of the algo-
agent to be captured by the ghost because equiprobable al- rithms with different learning rates is similar in terms of the
ternatives for retreating can lead to oscillatory behaviour. kind of fitness values obtained during learning. The prob-
ability vectors produced by the the two algorithms in the
5.2 PBIL-evolved Agent experiments above were examined manually. Many o f the
parameters appeared to be converging towards values simi-
Figure 4 shows the result of the PBIL algorithm evolving a lar to the hand-coded strategy Ph3 above. It is clear from
Pac-Man agent parameter vector. in terms of the minimum, the results shown i n Figures 4 and 5 that the probability
mean, and maximum fitness values in the population distri- vector is still subject to significant perturbation, making it
bution. Although the game code was modified to make play difficult to interpret without extending the experiment over
as fast as possible, 250 games are played in each genera- more generations.
tion of the algorithm (population of 25. I O games per fit-
ness evaluation), meaning several days of simulation time.
6 Discussion
For this experiment, a standard deviation of 0.01 was used
for all parameters except p l , which used a value of 1.O, and This paper has developed an approach to constructing an
the learning rate was a = 0.05. The mean of the population artificial agent that learns to play a simplified version of

2467
that can consider only “localized information of the game
is a factor that deserves further consideration. Finally, the
fitness function used is based on both score achieved and
time taken. The impact of the time factor on our results is
not clear and it may he possible to remove this from the
fitness function with no adverse effects.
Nevertheless. we believe that the approach taken here
could be useful as a benchmark in considering different rep-
resentations and approaches to evolving a Pac-Man playing
agent. It is expected that future work would need to he
able to improve on the performance (fitness) of the agent.
However it will also he interesting to compare the com-
plexity (dimensionality, interpretability, computational re-
quirements) of alternative approaches, to the rule-based ap-
proach developed above. Finally, we conjecture that general
aspects of the approach taken here to developing an adap-
Figure 5: Evolution of the minimum (crosses), mean (line) tive agent for Pac-Man may eventually lead to techniques
and maximum (dots) fitness values in the population for that can be applied successfully to other real-time computer
PBIL with learning rate = I .O (i.e. ( I .25)-ES). games.

Pac-Man. The agent is specified as a simple state machine Bibliography


and parameterized ruleset, with the PBIL algorithm used to S. Baluja. Population-Based Incremental Learning: A
learn suitable values for these parameters. Hand-coded pa- method for integrating genetic search based function
rameters were also tested. optimization and competitive learning. Technical Re-
The results highlight the limitations of the representa- port CMU-(3-94-163, School of Computer Science,
tion used as well as some of the complexities of the game, Carnegie Mellon University. 1994.
even in the highly simplified form considered in these exper-
iments. The ruleset could certainly be improved given the J. S. De Bonet and C. P. Stauffer. Learning to play
difficulties observed in our simulations. For example, con- pacman using
sidering 8 classifications for the ghost‘s position for each incremental reinforcement learning. Retrieved from
turn type (Section 3 should reduce the problems observed [Link]
in Section 5.1. This would however necessarily result in a (19/06/03), 1999.
larger ruleset. The representation also has redundancies in
the way that probability values are represented (e.g. replace P. Funes. E. Sklar, H. JuillB, and J. Pollack. Animal-
p3 by (1- p z ) . Nevertheless, the representation seems to animat coevolution: using the animal population as
have serious limitations for scaling up the “intelligence” of fitness function. In R. Pfeifer et. al., editor, Front
the agent. This may result in a very high dimensional op- Animals to Animars 5: Proceedings of the Fifrh ln-
timization problem which would require a large amount of ternatiorial Conference on Sintulation of Adaptive Be-
computation time to allow enough generations for the algo- haviour, pages 525-533. MIT Press, 1998.
rithm to produce a good result. Given the complexities of [4] M. Gallagher. Multi-layer perceptrori error surfaces:
the full game compared to the version considered here, it visualization, structure and modelling. PhD thesis.
seems that extending on a ruleset-based representation may
Dept. Computer Science and Electrical Engineering,
he difficult and impractical.
University of Queensland, 2000.
The methodology used in this paper is a first attempt at
this problem and there are several factors that might be im- [SI S. Gugler. Pac-tape. Retrieved from
proved. Firstly, it is clear that the different strategies (hand- [Link]
coded and learned) used in this paper do not come close to (21/05/01). 1997.
capturing the variety of possible effective strategies used by
humans in the full Pac-Man game. We hypothesize that the [61 A. Kalyanpur and M. Simon. Pacman using ge-
basic explorehetreat approach is likely to have a role in an netic algorithms and neural networks. Retrieved
effective strategy for the full Pac-Man game, but this has from [Link]
not been verified. Secondly, our decision to use an agent (19/06/03), 2001.

2468
[7] J. Koza. Genetic Progratiiming: On rhe Prograniniing
of Contpirter by Mearrs of Natiri-al Selection. MIT
Press, 1992.

[XI S. Lawrence. Pac-man autoplayer. Retrieved


from [Link]
(21/05/01). 1999.

[9] J . P. Rosca. Generality versus size in genetic program-


ming. In J. Koza et. al., editor, Generic Programniing
(GP96J Corference, pages 38 1-387. MIT Press. 1996.
[ I O ] S. Rudlof and M. Koppen. Stochastic hill climh-
inp with learning by vectors of normal distribu-
tions. In 1st Ori/irre Workshop nri Sofr Conrput-
it!#. Retrieved from [Link]
[Link]/wscI/ (8/12/99), 1996.
[ I l l M. Sehag and A. Ducoulomhier. Extending
population-based incremental learning to continuous
search spaces. In A. Eihen et. al., editor, Parallel Prnb-
leni Snlt~itigfrntii Nature - PPSN V. volume 1498 of
Lectiire Notes in Conipirter Scietice. pages 4 18427,
Berlin. New York, 1998. Sprinper.

2469

You might also like