This Tool Defends AI Models Against Adversarial Attacks
The potential number of applications for machine learning has grown tremendously in the last several years, as AI models become increasingly more powerful. Machine learning is already being used in many areas of daily life, whether that’s in recommendation algorithms, self-driving cars, or being used in novel ways in fields like research or finance. Even more promising is how machine learning models might someday revolutionize healthcare, and may even help us grapple with impossibly complex issues like mitigating climate change.
But despite the great potential of machine learning models, they are not foolproof and can make mistakes — sometimes with disastrous consequences. These unintended impacts are all the more concerning when image recognition algorithms are being increasingly used in evaluating people’s biometric data. Yet, at the same time, it’s also becoming clear that these same machine learning models can be easily deceived when images are modified. Unfortunately, the “black box” nature of AI makes it hard to determine why models make the decisions — or mistakes — that they do, thus highlighting the importance of making models more robust.
A team of researchers from Kyushu University in Japan are doing just that, by developing a new method to assess how neural networks handle unfamiliar elements during image recognition tasks. The technique, dubbed Raw Zero-Shot, could be one tool to help researchers pinpoint underlying features that lead to AI models making these errors, and ultimately, figure out how to create more resilient AI models.
“There is a range of real-world applications for image recognition neural networks, including self-driving cars and diagnostic tools in healthcare,” explained the study’s lead author Danilo Vasconcellos Vargas in a statement. “However, no matter how well trained the AI, it can fail with even a slight change in an image.”
Typically, the AI models being used for image recognition are initially trained on a large number of images. While some of these larger models can be quite powerful due to their size, recent work has shown that altering an input image — even by only one pixel — can throw the system off. These intentionally altered images are called adversarial images, and can be used as part of a coordinated attack on AI-powered systems.
“Adversarial samples are noise-perturbed samples that can fail neural networks for tasks like image classification,” explained the team. “Since they were discovered by some years ago, both the quality and variety of adversarial samples have grown. These adversarial samples can be generated by a specific class of algorithms known as adversarial attacks.”
To investigate the root cause behind these failures, the team focused on twelve of the most common image recognition systems, testing them to see how they would react when confronted with sample images that were not part of their initial training dataset. The team hypothesized that there would be correlations in their subsequent predictions — that is to say, the AI models would be mistaken, but they would make mistakes in the same way.
The team’s test results ultimately showed that when confronted with these altered images, these AI models were, in fact, consistently wrong in the same way. The team hypothesized that the linear structure of some artificial neural networks is one of the main factors why they fail in a similar manner, in addition to other work that suggests such models are learning “false structures” that are simpler to learn, rather than what is expected.
“If we understand what the AI was doing and what it learned when processing unknown images, we can use that same understanding to analyze why AIs break when faced with images with single-pixel changes or slight modifications,” Vargas explained. “Utilization of the knowledge we gained trying to solve one problem by applying it to a different but related problem is known as transferability.”
In the course of their work, the researchers found that an AI model called Capsule Networks (CapsNet) offered the greatest transferability out of all the neural networks tested, while another model called LeNet came in second.
In the end, the team said that AI development should not focus only on accuracy, but also on augmenting the robustness and flexibility of models. According to the team, tools such as Raw Zero-Shot can help experts pinpoint why problems might occur in their models, so that future systems can be designed to be less vulnerable to adversarial attacks.
“Most of these adversarial attacks can also be transformed into real-world attacks,” noted the team, “which confers a big issue, as well as a security risk, for current neural networks’ applications. Despite the existence of many variants of defenses to these adversarial attacks, no known learning algorithm or procedure can defend [against these attacks] consistently. This shows that a more profound understanding of the adversarial algorithms is needed to formulate consistent and robust defenses.”