MIT Devises a Photonic Processor for Building Optical Neural Networks

As technology in the realm of artificial intelligence has progressed, so have the demands for increased and more energy-efficient computing power. Limited by what is called the Von Neumann bottleneck, the capabilities of conventional computer architectures — even those found in supercomputers — will soon be outstripped by the memory and bandwidth requirements of neural networks, which underpin deep learning applications such as those found in voice-recognition and image-classification software.
But that problem may be solved by using light to power deep learning computations, instead of electrons. A team of Massachusetts Institute of Technology researchers is proposing a prototype of what they call a programmable nanophotonic processor, capable of efficiently and quickly carrying out the many repeated multiplications of matrices (or arrays of numbers) required in deep learning tasks, using much less power.
The researchers’ findings, which were published in recent issue of Nature Photonics, detail the potential advantages of a light-based neural network system over one that utilizes conventional CPU or GPU chips.
“The natural advantage of using light to do matrix multiplication plays a big part in the speed up and power savings because dense matrix multiplications are the most power hungry and time-consuming part in AI algorithms,” said MIT postdoctoral researcher Yichen Shen, one of the paper’s co-authors.
An Optical Neural Network
The researchers’ prototype for a programmable nanophotonic processor functions the way an ordinary glass lens might decompose a beam of light into its constituent wavelengths. The team’s photonic device features a cascaded array of 56 programmable Mach–Zehnder interferometers in a silicon photonic integrated circuit. It’s capable of funneling light through its interconnected array of photonic waveguides, which can be modified as needed, in order to allow a set of beams to be ‘programmed’ for a certain matrix computation. The multiple beams then interact and produce interference patterns that transmit the intended operation.
On a larger scale, the team notes that these devices could be arranged in interleaved configuration, in order to perform what’s called a nonlinear activation function, similar to the way human neurons operate. The digital network of a typical computer chip circuit operates on the principle of linear activation function, which defines the outputs of nodes in either binary state of “on” or “off.”
However, in emulating the way biological neural networks operate, artificial neural networks rely on what’s called nonlinear activation function, which allows these networks to compute problems efficiently, using only a small number of nodes — something that would no doubt be useful as computational problems get more complex.
“This chip, once you tune it, can carry out matrix multiplication with, in principle, zero energy, almost instantly,” said MIT professor Marin Soljačić. “We’ve demonstrated the crucial building blocks but not yet the full system.”

Image showing the general architecture of an optical neural network. a) A general artificial neural network architecture composed of an input layer, a number of hidden layers, and an output layer. b) Decomposition of the general neural network into individual layers. c) Optical interference and nonlinearity units that compose each layer of the artificial neural network.
During their experiments, the team found that the programmable nanophotonic processor could perform the calculations found in standard artificial intelligence algorithms about two orders of magnitude faster, and using less than one-thousandth as much energy per operation compared to standard electronic chips. The team used their processor for a basic neural network that recognized only four vowel sounds. Even with their relatively simple model, the processor was able to achieve a 77 percent accuracy rate, compared with a rate of 90 percent with conventional systems that use much more power and take more time to perform the same learning tasks.
The team envisions that the nanophotonic processor could be scaled up and used in places like data centers or security systems, as well as autonomous cars and drones — any area where massive amounts of computations need to be done but is limited by power and time. The device could also be used for more efficient analog-to-digital signal processing during the transmission of data of all different kinds.
“High-speed analog signal processing is something this could manage,” said MIT professor Dirk Englund, one of the paper’s authors. “This approach could do processing directly in the analog domain.”
Feature image via Pixabay, other images from MIT.