DeepMind’s new AI taps games to enhance fundamental algorithms
DeepMind has applied its mastery of games to a more serious business: the foundations of computer science.
The Google subsidiary today unveiled AlphaDev, an AI system that discovers new fundamental algorithms. According to DeepMind, the algorithms it’s unearthed surpass those honed by human experts over decades.
The London-based lab has grand ambitions for the project. As demand for computation grows and silicon chips approach their limits, fundamental algorithms will have to become exponentially more efficient. By enhancing these processes, DeepMind aims to transform the infrastructure of the digital world.
The first target in this mission is sorting algorithms, which are used to order data. Under the covers of our devices, they determine everything from search rankings to movie recommendations.
To enhance their performance, AlphaDev explored assembly instructions, which are used to create binary code for computers. After an exhaustive search, the system uncovered a sorting algorithm that outperformed the previous benchmarks.
To find the winning combination, DeepMind had to revisit the feats that made it famous: winning board games.
Gaming the system
DeepMind made its name in games. In 2016, the company grabbed headlines when its AI program defeated a world champion of Go, a wickedly complicated Chinese board game.
Following the victory, DeepMind built a more general-purpose system, AlphaZero. Using a process of trial and error called reinforcement learning, the program mastered not only Go, but also chess and shogi (aka “Japanese chess”).
AlphaDev — the new algorithm builder — is based on AlphaZero. But the influence of gaming extends beyond the underlying model.
“We penalise it for making mistakes.
DeepMind formulated AlphaDev’s task as a single-player game. To win the game, the system had to build a new and improved sorting algorithm.
The system played its moves by selecting assembly instructions to add to the algorithm. To find the optimal instructions, the system had to probe a vast quantity of instruction combinations. According to DeepMind, the number was similar to the number of particles in the universe. And just one bad choice could invalidate the entire algorithm.
After each move, AlphaDev compared the algorithm’s output with the expected results. If the output was correct and the performance was efficient, the system got a “reward” — a signal that it was playing well.
“We penalise it for making mistakes, and we reward it for finding more and more of these sequences that are sorted correctly,” Daniel Mankowitz, the lead researcher, told TNW.
As you’ve probably guessed, AlphaDev won the game. But the system didn’t only find a correct and faster program. It also discovered novel approaches to the task.
The new algorithms contained instruction sequences that saved a single instruction each time they were applied. Dubbed “swap and copy moves,” they served as shortcuts to further algorithmic efficiencies.
DeepMind compares the approach to another moment in games: the fabled “move 37,” which an AI system played against Go champion Lee Sedol.
The strange move shocked human experts, who thought the machine had made a mistake. But they soon discovered that the program had a plan.
“It ended up not just winning the game, but also influencing the strategies that professional Go players started using,” said Mankowitz.
The win marked the first time AI has beaten a top-ranked Go professional — a milestone that experts had predicted was another decade away.
Three years later, Lee retired from professional Go competition. He attributed the decision to the abilities of his AI rivals.
“Even if I become the number one, there is an entity that cannot be defeated,” he said.
Sorting out computing
AlphaDev’s sorting algorithms have now been open-sourced in the main C++ library, where it’s available to millions of developers and companies. According to DeepMind, it’s the first change to this part of the sorting library in over a decade — and the first algorithm designed through reinforcement learning to join the library.
After the sorting game, AlphaDev began to play with hashing, which is used to retrieve, store, and compress data. The result was another enhanced algorithm, which has now been released in the open-source Abseil library. DeepMind estimates that it’s being used trillions of times a day.
Ultimately, the lab envisions AlphaDev as a step towards transforming the entire computing ecosystem. And it all began with playing board games.