Robots are increasingly able to perform tasks that look simple but are actually quite complex. Machines are now taking on activities like making precise surgical cuts, picking-and-placing on the micro-scale, flipping burgers, playing Jenga, and even hugging — all of which require a careful balance between applying the correct amount of physical force and an almost intuitive “feel” for the inherent dynamics as these forces change over time, with different objects and different targets.
Now, a team of engineers from Google, MIT, Columbia University and Princeton University are now looking to using artificially intelligent, deep learning neural networks to help robots learn how to grasp and accurately toss arbitrary objects. Fittingly named the TossingBot, the researchers aimed to develop a machine that could pick up unknown objects and toss them into distinct locations, with a minimal amount of training time and human supervision.
As one might expect, tossing a variety of objects accurately takes us humans some practice before we can get reasonably proficient at it. For a robot, structuring the way it learns how to throw makes a big difference: rather than learning how to throw each object that’s in existence (now that’s an impossibly large dataset if there ever was one), it must learn how to “generalize” and adapt the basic knowledge it does have about throwing for new objects. Put another way, for a robot to reliably and accurately throw things, it means getting a good grasp on handling oddly shaped objects, as well as learning how to predict and compensate for any unknown dynamics and variability in an unstructured environment.
As detailed in their paper, the team’s hybridized approach — which they dubbed “residual physics” — took into account not only the physics and the ever-changing aerodynamics behind tossing objects, but also the way objects are initially grasped. Their experimental set-up included an industrial-grade UR5 robotic arm, a bin of random objects, and series of boxes that were the intended targets for thrown objects, and which sat just outside of the arm’s reaching range. The progress and accuracy of the robotic arm were tracked by an overhead camera, so that the arm could teach itself to throw and improve its accuracy rate over time, using trial and error.
One of the main factors the team focused on was “pre-throw conditions” — how an object is picked up and where it is held will affect how it is thrown. The team discovered that these pre-throw conditions can make a noticeable difference in whether the toss attempt was successful or not, as it can affect the projectile trajectory of an object. For instance, TossingBot learned how to hold asymmetrical objects like bananas or marker pens in a certain way, in order to ensure they don’t fly far beyond the intended location.
Prior work on robotic throwing was limited to objects of a certain shape, such as balls or darts, with pre-throw conditions that were manually set up, instead of being randomized. In contrast, to achieve better accuracy when confronted with a range of variable objects and environments, TossingBot uses a hybrid, physics-based controller that employs analytical models to deliver initial estimates about how hard and far to throw something. This initial estimate is then combined with a deep learning model that is capable of predicting and compensating for other data-driven, “residual” parameters, such as aerodynamic drag, and projectile velocity (as determined by how the object is picked up and grasped). The robot is then tasked with grasping each different object at different angles, throwing them into a certain box, and then assessing the accuracy of each attempt via the overhead camera in order to learn how to better throw the object next time.
In letting TossingBot toss happily away, the team found that the machine initially had poor results in both grasping objects optimally and throwing accurately. However, after about 10,000 training attempts (or about 14 hours of throwing), the bot’s throwing accuracy climbed up to 85 percent, with a rate of grasping reliability at 87 percent, out of 600 possible “pick-and-place” actions in one hour — offering better results than other previous methods, and even better than the average human.
The team is now working to improve the system’s accuracy and reliability, in addition to incorporating not just a visual means of feedback, but potentially adding other sensors for gathering other kinds of information, like tactility and torque. As it develops further, one can imagine that such a system could be “generalized” further and adapted to other situations, such as getting a robot to pack boxes, sort laundry, or toss collapsed debris quickly and efficiently during a rescue operation in order to save lives, as well as many other yet unforeseen possibilities.
Images: Google, MIT, Columbia University and Princeton University