Control a Drone with Eye-Tracking Glasses

Controlling a drone isn’t necessarily the most intuitive activity, as many a drone-ravaged wedding (and the subsequent lawsuits) will show. Making drones more autonomous with deep learning and equipping them with safety features can help, but it doesn’t solve the root problem. With other machines, we’ve seen a variety of possible solutions, such as using brain-computer interfaces (BCIs), augmented reality or one’s voice to control devices.
But there are many other possible methods. As a team from the University of Pennsylvania, U.S. Army Research Laboratory and New York University are demonstrating, controlling a drone could be as simple as using a pair of eye-tracking glasses and then deliberately moving one’s gaze to direct it to where it needs to go. Take a look:
According to the team, the idea here was to create an intuitive and non-invasive way for people to remotely control an aerial vehicle. While there have been some previous attempts at developing vision-based methods of drone control, what’s different this time around is that this is a standalone system which doesn’t use external sensors to keep track of the drone, relative to the person who is in control.
In addition, the system of navigation is relative to the user, instead of being relative to the drone, meaning that all points of orientation are taken from the user’s standpoint. For instance, if a user tells the drone to go right, it will go to the user’s right, rather than going to the drone’s right, which will appear to the user as the drone going left (since it’s a mirrored movement). The way the team’s system has been configured, all this wayfinding is done without the need to incorporate other external systems like motion-capture technology or GPS to track the positions of user and drone relative to each other.
According to the research team, the system includes some off-the-shelf components like the Tobii Pro Glasses 2, an gaze-tracking wearable that is equipped with an inertial measurement unit (IMU), in addition to a high-definition camera, and a NVIDIA Jetson TX2 CPU and GPU in order to help with processing data, using a deep neural network.
Once the user dons the glasses, he or she can detect the drone just by looking at it, and from size of a quadrotor drone as seen from the position of the user, the processor will determine approximately how far its relative position is, using orientation data from the IMU and the onboard camera. As the team notes, the IMU also helps to determine the user’s head orientation, which serves to “decouple the gaze direction from the head motion” — which means that the system can work with the user moving only the eyes, or moving both eyes and head.
“This solution provides the opportunity to create new, non-invasive forms of interactions between a human and robots allowing the human to send new 3D-navigation waypoints to the robot in an uninstrumented environment,” explained Dr. Giuseppe Loianno, assistant professor of robotics at New York University and director of the GRASP Lab at University of Pennsylvania on Digital Trends. “The user can control the drone just pointing at a spatial location using his gaze, which is distinct from the head orientation in our case.”
The team is now working to refine the way the system translates two-dimensional “gaze coordinates” into three-dimensional navigational waypoints. According to the team: “Ideally, the 3D navigation waypoint would come directly from the eye tracking glasses, but we found in our experiments that the depth component reported by the glasses was too noisy to use effectively. In the future, we hope to further investigate this issue in order to give the user more control over depth.”
The team’s goal is to ultimately create new human-machine interfaces that are intuitive and responsive, particularly ones that can combine multiple modes of interaction — whether that’s through vision, gestures or voice-based commands. Besides allowing people with little to no experience to safely pilot drones, there are many potential uses for such multi-modal interfaces, such as making it easier for people with disabilities to control their devices, or for inspection purposes, law enforcement or search and rescue missions.
Images: New York University