This Robot Can Visualize Its Immediate Future with Deep Learning

Watching an infant pushing, pulling and grasping objects may not seem like they are doing much. But in fact, babies are learning immensely when they engage in this kind of unsupervised experimentation, teaching themselves about the physical environment around them and how to manipulate objects they encounter. Human children are then able to take these lessons and apply them to new, unknown situations, by imagining and predicting how they might be solved, using what they’ve learned previously.
While this kind of general, unsupervised learning and predictive anticipation of cause-then-effect is easy enough for humans to master and understand, it’s something that machines would find difficult to do. To help tackle this problem, researchers over at the University of California Berkeley’s Artificial Intelligence Research lab (BAIR) are developing a deep learning technique that they call visual foresight. The technology would imbue robots with a kind of short-term imagination, allowing them to predict the potential outcome of certain actions, based on what they might have gleaned in previously unsupervised learning situations — in this case, moving a variety of objects around on their own.
Robotic Visualization
While this new robotic foresight doesn’t go very far — only a few seconds into the future — it’s a relatively big leap ahead, as it permits robots to manipulate objects without human intervention or any previous knowledge about the objects or the environment itself. Typically, to get a machine to move objects around, conventional methods would require human programmers to identify objects with some kind of labeling information, so that it can interact with the objects in its environment. While this approach may be sufficient in closed settings like factories, it is impractical for scaling up to larger, real-world situations which might require some quick thinking ahead, such as those that might be encountered by a self-driving vehicle on the road.
“In the same way that we can imagine how our actions will move the objects in our environment, this method can enable a robot to visualize how different behaviors will affect the world around it,” said Sergey Levine, an engineering and computer science assistant professor who supervised the research. “This can enable intelligent planning of highly flexible skills in complex real-world situations.”
The team’s findings report how the robot, named Vestri, is put through an independent phase of “play”, where it is given a random collection of objects to push around on a table. All of the action is captured on Vestri’s camera.
During this play phase, which lasted about a week and consisted of over 59,000 interactions with various objects, the robot used a deep learning technique called convolutional recurrent video prediction, or dynamic neural advection (DNA) to build a predictive model of its world. The model uses image pixels to help it predict future scenarios of motion, “imagining” different future outcomes for different actions. Building upon a robot’s previous learned experiences during self-play, these models can then generalize this knowledge for unfamiliar objects, thus permitting the robot to perform specified tasks, physically handling new, never-before-seen items, without the need for cumbersome labels or specialized coding.
The possibility of developing an AI that’s capable of learning autonomously, and “generalizing” those lessons so that they can be applied on a broader scale — is the holy grail of current AI research. We’re already seeing glimmers of this generalized, human-like learning in AI that can learn and teach itself how to master a complex board game from scratch or machine systems that can reliably crack visual puzzles on its own without the benefit of large sets training data. In this case, we’re seeing how this tabula rasa approach might be applied to predictive models about physical motion in the real world.
“Children can learn about their world by playing with toys, moving them around, grasping, and so forth. Our aim with this research is to enable a robot to do the same: to learn about how the world works through autonomous interaction,” explained Levine. “The capabilities of this robot are still limited, but its skills are learned entirely automatically, and allow it to predict complex physical interactions with objects that it has never seen before by building on previously observed patterns of interaction.”
The team is now working to further develop this video prediction system that acts as a kind of “visual imagination” for robots, with the aim of enabling robots to not just push objects around, but to also assembling, grasping and repositioning them, as well as working with malleable objects that can be tied or folded.