Reinforcement Learning with AlphaGo
DeepMind (2016)
Prepared for Academic Purposes
Abstract & Background
In 2016, DeepMind’s AlphaGo achieved a historic milestone by defeating the world champion Lee
Sedol in the game of Go, a feat once thought to be decades away. This work combined deep
learning with reinforcement learning and Monte Carlo Tree Search (MCTS), showcasing how AI
could tackle extremely complex strategy games.
Key Contributions
AlphaGo introduced a hybrid system that combined policy networks, which selected promising
moves, and value networks, which estimated the winner of a given board state. These networks
were trained on millions of human games and further refined through self-play. Monte Carlo Tree
Search guided the exploration of moves, enabling the system to evaluate deep possibilities
efficiently.
Critical Analysis
AlphaGo demonstrated the power of combining learning-based methods with search. It provided
evidence that reinforcement learning could master domains characterized by high complexity and
large state spaces. However, AlphaGo required immense computational resources—hundreds of
GPUs and TPUs—which limited its immediate practical applications outside research. Nonetheless,
its successors, AlphaZero and MuZero, extended the approach to broader domains.
Personal Reflection
From my point of view, AlphaGo was more than a technical achievement—it was a cultural moment
for AI. It highlighted that machines could achieve creativity-like behavior in a game considered
deeply human. While the method may not be directly applicable to everyday tasks, I see it as a
powerful reminder that AI can surprise us when innovative techniques are combined.
References
[1] D. Silver et al., 'Mastering the Game of Go with Deep Neural Networks and Tree Search,'
Nature, vol. 529, pp. 484-489, 2016.
[2] D. Silver et al., 'Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning
Algorithm,' arXiv preprint arXiv:1712.01815, 2017.
[3] J. Schrittwieser et al., 'Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model,'
Nature, vol. 588, pp. 604-609, 2020.