Deep Learning Drone Detects Fights, Bombs, Shootings in Crowds

Reading the news nowadays can be a harrowing experience that might challenge one’s worldly sense of security. It seems that we are hearing more and more about seemingly random outbreaks of violence or horrific acts of terrorism, shootings and bombings in our cities. Even more worryingly, we are also discovering how these incidents are too hard to predict and prevent, and even more difficult for law enforcement to stop once they are happening in real time.
Once again, here is where the machines come in. Recent video surveillance systems have been developed for monitoring suspicious and abandoned objects such as bags in public spaces, incidents of thefts, as well as for detecting fires and smoke.
Now, an international team of researchers from the United Kingdom and India has developed a drone surveillance system that would use computer vision and deep learning AI technology to automatically detect when violence occurs in public places, such as physical fights breaking out among large groups of people:
While in practical terms such an airborne system would be a step up from ground-based video cameras, it would nevertheless be hampered by the computational requirements of crunching the data gleaned from large numbers of individuals. In addressing that issue, the team’s preprint paper outlines how the team was able to work out an improved system for autonomous drone surveillance that would work in real-time, by streamlining the process of gathering training data, harnessing the computational power of the cloud, and using deep learning techniques to identify whether an individual is violent or not, based on their movements.
The team was initially inspired to start the project after the Boston Marathon bombing in 2013, but stalled with unsatisfactory results. They were spurred on again in 2017 by the Manchester Arena suicide bombing after the Ariana Grande concert, this time implementing deep learning AI and cloud computing into the project’s development.
“This time we were able to do a relatively better job because the software was able to run in real-time and does a relatively good job of detecting violent individuals,” University of Cambridge Ph.D. student and paper co-author Amarjot Singh told IEEE Spectrum.
Part of the challenge was collecting good training data, as footage from aerial drones can be affected by poor lighting conditions, pixelation and blurring, in addition to the added complexity of human subjects appearing at different scales and orientations, and at different locations. To tackle the problem, the team recruited a group of 25 volunteers to engage in various poses that imitated the motions of punching, kicking, strangling, stabbing and shooting, while being filmed by a Parrot AR drone at varying heights.
These images of human subjects were then annotated with 14 main “pose estimation” labels that would assist the deep learning network in determining whether an individual is engaged in violent behavior (red lines) or not (blue lines). Now a typical, unsupervised deep learning algorithm would require tens of thousands of images to train it before it can reliably recognize a desired pattern. But instead of going through the time- and labor-intensive process of hand-annotating all these images, the team streamlined the process somewhat by modifying the front and backend layers of the artificial neural network underlying the algorithm, so that it now had fixed parameters at the front-end, and could learn with a bit of human supervision at the backend when it came to recognizing different poses. With this approach, the researchers’ ScatterNet Hybrid Deep Learning (SHDL) network requires less data, but can learn faster and with fewer computational resources.
In addition, the researchers also build upon two other existing deep learning algorithms to help the system detect violent individuals. The process first involves using the feature pyramid network (FPN), which helps to detect humans from an aerial image. The team’s ScatterNet Hybrid Deep Learning (SHDL) network is then used to ‘estimate’ the pose for each human. Finally, the support vector machine (SVM) algorithm is then utilized, which identifies individuals exhibiting violent behavior. All images recorded by the drone are sent to Amazon for real-time cloud computing, meaning that the drones themselves aren’t burdened with these tasks.
According to the team’s tests, the system’s accuracy rate starts out at 94 percent in identifying one violent individual in an image but slips down to 84 percent when there are five identified violent individuals in one image. Such a discrepancy may be due to the system classifying poses incorrectly, or problems stemming from collecting data from groups of people that are too far from the drone.
Currently, the team’s dataset of images focus solely on actual hand-to-hand types of combat, the forms of which may look very different from one individual to another, though there are plans to expand the system’s capabilities to recognize individuals armed with weapons like guns and knives, or tracking people toting unusual objects and who are acting suspiciously. But it’s not going to be left up to the machines to decide who’s dangerous and who’s not: the team’s ultimate aim is to develop a more automated system that will make it easier for human operators to not only zero in on violent activity in a crowd, but to also identify illegal border crossings and even recognize kidnappers — potentially saving lives in a quicker and more efficient fashion.