Modal Title
Machine Learning

MIT’s Proxyless Algorithm Optimizes and Automates AI Design

The idea of using artificial intelligence to optimize the design of AI is an exciting one that could help democratize AI, allowing a broader swath of both researchers and laypeople to design more efficient neural network architectures.
Apr 19th, 2019 8:47am by
Featued image for: MIT’s Proxyless Algorithm Optimizes and Automates AI Design

Researchers and businesses alike are benefitting from deep learning — an advanced type of machine learning AI that uses neural networks to help supercharge natural language processing, unlock medical mysteries, and make self-driving cars smarter. But developing these various deep learning neural networks can take a lot of time and computational power when it is engineered from scratch by humans, leading us to the question: What if we could use AI to automate the development of AI?

Known as neural architecture search (NAS), this automated approach of using AI to create better artificial neural networks is a new area of research that could potentially yield big results. With the aim of streamlining the development process and making it more accessible to a wider range of people, a team from MIT is unveiling an algorithm designed to automatically create more effective neural network architectures, without the need for large numbers of specialized processors (like graphics processing units, or GPUs), or having to train such networks on large datasets.

Using AI to Optimize AI

Automated machine learning — of which neural architecture search is a subset — can be used to develop systems like those that deal with images. One recent example of this is Google’s state-of-the-art NAS algorithm for image classification and detection that took hundreds of GPUs running in parallel and 48,000 GPU hours to generate one convolutional neural network (CNN).

In contrast, MIT outlines in their paper how their proxyless neural architecture search algorithm was capable of optimizing and automating the AI design process down to only 200 GPU hours, by learning directly from specialized convolutional neural networks that are tailored to run on specific hardware. According to the team, this approach helps to drastically reduce the amount of repetitive work associated with tweaking neural network architectures.

“There are all kinds of tradeoffs between model size, inference latency, accuracy, and model capacity,” said MIT electrical engineering and computer science professor Song Han on IEEE Spectrum. “[These] all add up to a giant design space. Previously people had designed neural networks based on heuristics. Neural architecture search tried to free this labor-intensive, human heuristic-based exploration [by turning it] into a learning-based, AI-based design space exploration. Just like AI can [learn to] play a Go game, AI can [learn how to] design a neural network.”

The problem arises from the fact that these networks can be configured using a variety of possible architectures, or “search spaces.” To get around this issue, especially with larger datasets, experts will run a neural architecture search on smaller datasets as a stand-in of sorts, before transferring the trained neural network architecture onto the actual task. There are downsides to this adapted approach, however, such as diminished accuracy and efficiency, particularly when the architecture runs on systems that it wasn’t originally designed for, such as mobile platforms.

“Pruning” Neural Networks

To create their proxyless neural architecture search, the MIT researchers first created an “over-parameterized” neural network with all its possible “candidate paths.” These paths connect the various computational “layers” and “filters” in the network, which act to parse any given image, grid by grid, into a more condensed form that is more easily analyzed by computers.

The team then trained their NAS algorithm on an image classification task, using a selection of large datasets consisting of millions of images. But instead of storing all of these possible paths into memory, they then used “path-level binarization” and “path-level pruning” to streamline memory consumption, meaning that only one sampled path is saved at a time, while low-probability neuronal connections are trimmed away. In addition, another big improvement is the team’s algorithm is “hardware-aware” — meaning that potential latencies that might have occurred when using the resulting CNNs on different platforms are instead used as feedback to further optimize the architecture. In testing out their algorithm, the researchers found that the generated CNNs were 1.8 times faster than those that were produced using conventional methods.

The idea of using AI to optimize the design of AI is an exciting one that could help democratize AI, allowing a broader swath of both researchers and laypeople to design more efficient neural network architectures that can run lighting-fast on particular kinds of hardware — thus accelerating innovation in a wider range of fields.

Read the paper here.

Images: MIT

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.