IBM Research, with the help of the University of Texas Austin and the University of Maryland, has created a technology, called BlockDrop, that promises to speed convolutional neural network operations without any loss of fidelity. This could further excel the use of neural nets, particularly in places with limited computing capability.
BlockDrop works by looking for layers in deep networks that aren’t necessary to compute to achieve the desired level of accuracy, and then dropping those layers on the fly, allowing the system to allocate resources in a more efficient manner. In tests, BlockDrop sped neural net image recognition of a standard ImageNet dataset by 20 percent on average, as some by 36 percent, according to IBM.
While more organizations are delving into neural nets for image recognition and associated tasks, such autonomous vehicle navigation, the computational requirements needed for such deep learning can increase exponentially as greater accuracy and definition are required, taxing computational resources. Moreover, more and more of this work is expected to be handled by platforms with limited computational resources, such as edge computing nodes and mobile devices.
There is a need for more research around improving the scalability of artificial intelligence and machine learning technologies, Feris said, in an interview with The New Stack. Today’s deep learning systems tend to take a one-size-fits-all approach, no matter the complexity or simplicity of the image itself. Neural networks “learn” to recognize objects through data being passed through a series of nodes to compare an image with a model of an object. But researchers found that neural networks don’t necessarily require the same number of nodes for each job.
“If you have a very simple image to process, say a dog on a clean background, do we really need to run a 100 layers of a neural network to reach a decision” to determine if the object in an image is, in fact, a dog? Feris asked.
Some work has been done to streamline neural networks, though much has been done with data compression, an approach that still requires the entire job to be carried out. Instead, BlockDrop determines the minimal configuration of layers, or blocks, that needed to correctly classify a given input image, removing those blocks that don’t uniquely encode meaningful visual information. The simpler the image, the more layers can be removed, the more time saved.
The researchers note in the paper that this approach roughly mimics how the human mind works. “An important feature of the human perception system is its ability to adaptively allocate time and scrutiny for visual recognition. For example, a single glimpse is sufficient to recognize some objects and scenes, whereas more time and attention is required to clearly understand occluded or complicated ones,” they note.
Feris’ work is one of a number of initiatives IBM is taking to speed AI-based computer vision operations. AT CVPR, company researchers will also present the performance results of an experimental system, based on IBM TrueNorth neuromorphic chips, that ties together a pair of vision sensors to act like a set of eyes. This system, the researcher claim, requires 200 times less power per pixel than comparable systems using traditional hardware.