How DeepMind’s AlphaTensor AI Devised a Faster Matrix Multiplication
After developing an artificial intelligence that can achieve superhuman mastery of games like chess and go, in addition to another AI that can predict how proteins fold themselves in three-dimensional space, the researchers over at DeepMind have done it again — this time using a deep learning AI model to efficiently solve a fundamental mathematics problem, while beating a 50-year-old record to boot.
In a blog post from earlier this month, the DeepMind team introduces AlphaTensor, an AI system that is designed for discovering new and more efficient algorithms for solving crucial mathematical operations — in this case, matrix multiplication.
Whether they are used to process or compress images or video, recognizing spoken commands, or running simulations to predict the weather, matrix multiplication underpins much of modern computing.
So it’s little wonder that experts and companies all over the world are constantly looking for more efficient ways to improve the algorithms for solving these mathematical operations behind such tasks.
Matrix multiplication is one of the simplest mathematical operations in algebra, where individual numbers that are arranged in grids — or matrices — are multiplied together and then added in specific way in order to generate a new matrix.
Such matrices are used to represent various types of data like sets of pixels in an image, or the internal functioning of an artificial neural network.
For centuries, mathematicians used what they believed was the most efficient method, until 1969, when German mathematician Volker Strassen rocked the math world with an even better method that could multiply a pair of 2×2 matrices using seven multiplications, rather than the standard eight.
For more than fifty years, Strassen’s record endured, but DeepMind’s AlphaTensor was able to show that it could discover even more efficient methods on its own.
In fact, the team approached the matrix multiplication problem like a game, with AlphaTensor being built upon the lessons learned from its game-playing predecessor, AlphaZero.
Both models use a type of machine learning called reinforcement learning, in addition to Monte Carlo tree search (MCTS) techniques, so that the system can gradually teach itself to improve as it receives feedback from previous “moves” as it plays the “game” — whether that might be chess, or multiplying matrices.
In the case of AlphaTensor, the team reformulated the problem of finding efficient algorithms matrix multiplication as a single-player game, where the “board” is translated as a three-dimensional array of numbers.
In order to arrive at the aim of getting all the numbers to zero in the fewest number of steps, the model has to fill in the grid of numbers correctly, selecting from a collection of allowable moves. This ultimately results in what the team says is “a provably correct matrix multiplication algorithm for any pair of matrices, and its efficiency is captured by the number of steps taken to zero out” the entries in the output matrix.
Every time the system does well, its internal parameters are updated to reinforce its future chances at succeeding again. At the same time, the Monte Carlo tree search technique helps it to predict how successful different pathways toward possible solutions might be, so that more advantageous paths are prioritized, and the results of games are fed back into the network in order to improve the system further.
“We trained an AlphaTensor agent using reinforcement learning to play the game, starting without any knowledge about existing matrix multiplication algorithms,” explained the team.
“Through learning, AlphaTensor gradually improves over time, rediscovering historical fast matrix multiplication algorithms such as Strassen’s, eventually surpassing the realm of human intuition and discovering algorithms faster than previously known.”
The team underscored the difficulty of the seemingly simple problem of multiplying two matrices together: “This game is incredibly challenging — the number of possible algorithms to consider is much greater than the number of atoms in the universe, even for small cases of matrix multiplication.
Compared to the game of Go, which remained a challenge for AI for decades, the number of possible moves at each step of our game is 30 orders of magnitude larger (above 10^33 for one of the settings we consider). Essentially, to play this game well, one needs to identify the tiniest of needles in a gigantic haystack of possibilities.”
During their experiments in testing input matrices up to 5×5, the team found that AlphaTensor not only “rediscovered” previously shown shortcuts in matrix multiplication, but also found new ways to efficiently perform these calculation.
For example, AlphaTensor was able to find an algorithm for multiplying a 4×5 matrix by a 5×5 matrix that took only 76 multiplications, besting a previous algorithm that required 80.
With a larger set of 11×12 and a 12×12 matrices, AlphaTensor was able to reduce the number of required multiplications from 1,022 to 990. AlphaTensor can also optimize matrix multiplication for specific hardware, with the team training the system on two different processors so that performance was optimized for each processor.
Ultimately, the team believes that the new work could have big implications on a variety of fields, from mathematics research to computing.
“From a mathematical standpoint, our results can guide further research in complexity theory, which aims to determine the fastest algorithms for solving computational problems. By exploring the space of possible algorithms in a more effective way than previous approaches, AlphaTensor helps advance our understanding of the richness of matrix multiplication algorithms.
Understanding this space may unlock new results for helping determine the asymptotic complexity of matrix multiplication, one of the most fundamental open problems in computer science. Because matrix multiplication is a core component in many computational tasks, spanning computer graphics, digital communications, neural network training, and scientific computing, AlphaTensor-discovered algorithms could make computations in these fields significantly more efficient.”