AlphaGo,
is a computer program that beats a top professional human player Lee Sedol recently.
To find the next move, AlphaGo plays millions of games from the current board situation by itself.
To do so, AlphaGo needs to do two things:
Note: "AI beats human in Go" makes attractive headline. But one must not belittle the contributions to AlphaGo by human intelligence and computer hardware.
AlphaGo uses (a) expert knowledge, plus
(b) machine learning,
a branch in Artificial Intelligence, to perform these tasks.
In Go, a game play is formed by a sequence of alternate moves by two players. To find the next move, AlphaGo plays millions of games from the current board situation by itself. It then chooses the move that leads to the highest score. To do so, AlphaGo needs to do two things:
On Task1 AlphaGo attempts to shortlist moves. If AlphaGo were to consider all legal moves on the board, then it will not be able to play many games, due to the large number of combinations available. If it shortlists too few moves, it may miss good moves. AlphaGo uses (a) expert input; and (b) machine learning to learn what moves to consider. Experts suggest what "features" make a move valuable. For example, the positions around the the last two moves tend to be worth considering. A position that kills a group tend to be worth considering too. Exactly how these features should be combined together is the job of machine learning. This includes (i) learning from 30 million positions which experts played; and (ii) self-playing.
Task2 is to score a board situation. If a game has played to the end, then AlphaGo will know which side is better. However, if the game sequence does not end in one side winning, AlphaGo has to estimate how favourable either side is. If the estimation is grossly inaccurate, then the simulation was futile or even misleading.
It is unclear in the publication exactly how AlphaGo evaluates a board situation. Roughly speaking, experts identify factors that they consider important for evaluating a board position. For example, how much territory is occupied by either side? How many pieces have been captured by either side? AlphaGo uses the moves generator (which is part of Task1) to play many games by itself, in order to learn how those factors should be combined.
"Artificial intelligence beats human beings" is an exciting headline, hence machine learning gets most of the publicity. In reality, AlphaGo's power comes from a combination of the following sources:
For Task 1, AlphaGo generates moves using an artificial neural network, which is called a “Policy Network”. The connections in this network and the weights associated to it determine how the input features should be combined with each other. The weights on the connections were learned by (a) supervised learning, followed by (b) reinforcement learning through self-playing.
For Task 2, AlphaGo uses “Valuation Network” to tell the fitness of a given board situation. It is a neural network. The weights on the connections were learned through simulation using the Policy Network used in Task 1.
Policy Network representation:
Valuation Network representation:
Machine learning:
Actual play [this is the author's interpretation based on the literature]:
[End]