DeepMind AlphaGo Zero Sets New AI Standard
AlphaGo became the first computer program to defeat a professional human Go player and made headlines in 2016 when it defeated former world champion Lee Sedol. AlphaGo was built by an artificial intelligence (AI) team in Google’s DeepMind division. It was trained by learning games played by human experts. Like many AI neural networks, the training data enabled it to learn quickly and develop mastery over the task.
Superior AI with Zero Training
DeepMind revealed this week that the new AlphaGo Zero was superior to the previous Alpha Go version and defeated it in a match of 100 games without losing once. The difference was the training data. AlphaGo Zero didn’t need any. It learned the game through trial and error playing against itself. Researchers at DeepMind only seeded the program with the knowledge of the game rules. By day three it could defeat the AlphaGo Lee version that bested Lee Sedol. By day 21 it reached the level of AlphaGo Master, the version that beat the reigning world champion in 2017. By day 40 it was superior to all other AlphaGo versions. Will Knight from MIT Technology Review put it this way:
The new program represents a step forward in the quest to build machines that are truly intelligent. That’s because machines will need to figure out solutions to difficult problems even when there isn’t a large amount of training data to learn from.
Improved Compute Efficiency, but Still No Match for Human Efficiency
The MIT Technology Review article is a nice summary of what happened, but also offers some added perspective. While the DeepMind CEO Demis Hassabis says the breakthrough means, “we don’t need any human data anymore,” and DeepMind researcher and University College professor David Silver said, “we’ve actually removed the constraints of human knowledge,” others say we still have a long way to go with AI. Mr. Knight caught up with University of Washington professor Pedro Domingos who commented:
It’s a nice illustration of the recent progress in deep learning and reinforcement learning, but I wouldn’t read too much into it as a sign of what computers can learn without human knowledge. What would be really impressive would be if AlphaGo beat [legendary South Korean champion] Lee Sedol after playing roughly as many games as he played in his career before becoming a champion. We’re nowhere near that.
Mr. Domingos was referring to the fact that AlphaGo Zero played 30 million games before mastering Go. Knight characterizes his sentiment by saying this is, “many more [games] than an expert human player does. This suggests that the intelligence the program employs is fundamentally different somehow.” With that said, we shouldn’t apply the same metrics for learning to computers as we do to human intelligence. A computer can play millions of games in 40 days. It learns more slowly in terms of units completed, but it still arrived at the end goal faster. Could a human achieve the same mastery in just 40 days if able to complete a few hundred games? Not likely.
The Future of AI
Everyone agrees that the AlphaGo Zero success is a great accomplishment. However, it doesn’t mean the singularity is upon us. It also doesn’t mean that training data is no longer helpful or necessary. Instead, it shows that deep learning can become radically more efficient with better algorithms and that training data may not be the only way to accomplish goals using AI.
For a more pedestrian demonstration of an earlier AI accomplishment from DeepMind, I added a video below showing how an AI program learned to play Breakout from Atari. This offers a demonstration of what AlphaGo Zero did on a much simpler level. Enjoy!
Google’s DeepMind and Blizzard to Use StarCraft II as AI Research Tool