Google Develops ‘Adversarial Example’ Images that Fool Both Humans and Computers

We know that AI technology is becoming increasingly powerful: it can help us diagnose disease more accurately, help cars recognize pedestrians as they drive themselves and even boost productivity in the construction industry.
But we know machines, not matter how smart they are, can also make mistakes — sometimes with quite tragic consequences. Sometimes, machines can be fooled intentionally, with so-called adversarial examples — for instance, when the computer vision systems in autonomous cars are deceived by graffiti hidden on road signs into reading them as something else entirely.
But what’s surprising is that humans can apparently fall prey to the same adversarial trickery that confuses artificial neural networks. At least, that’s what Google Brain researchers discovered in developing an algorithm that produces extra information that can be layered over an existing image, leading both machines and humans to misidentify the content. As seen in the main image above, both humans and computers wrongly classified an image of a cat as that of a dog, after it was modified by this software.
As previous research has shown, however, it’s not that hard to fool machine learning models, which don’t see the world in the same way as we humans do. Visual classification models use a type of artificial neural network that’s known as a convolutional neural network (CNN), which can require millions of different training images of an object before they can reliably identify it. In contrast, biological vision systems are much more efficient with far fewer training examples; human children might be shown only a few different versions of ‘cat’ before they might make the leap to extrapolate and generalize that all these fit under the concept of what makes a ‘cat’: a certain shape to the ears, eyes, body and so on.
On the other hand, even well-trained AI models can be easily fooled into making the wrong identification, merely by adding a layer of visual “noise” called a perturbation. As seen here in this example of a panda, a machine will be misled into identifying it as a gibbon over 99 percent of the time after the addition of such a perturbation, which is imperceptible to the human eye.
Adversarial Examples To Thwart Humans
But as the Google Brain team explained in their paper, humans can be fooled by adversarial examples too, especially ones that have been crafted in a more “robust” and obvious manner, tailored to successfully thwart human vision. Knowing that different computer vision systems with distinct architectures can still be affected by the same adversarial examples, the team’s goal here was to see if this vulnerability in machines could be “transferred” over to the way humans see things.
“We [adapted] machine learning models to mimic the initial visual processing of humans, making it more likely that adversarial examples will transfer from the model to a human observer,” wrote the team. “To better match the initial processing of human visual system, we prepend each model input with a retinal layer, which incorporates some of the transformations performed by the human eye. In that layer, we perform an eccentricity-dependent blurring of the image to approximate the input which is received by the visual cortex of human subjects through their retinal lattice. ”
To achieve this, the researchers played with the edges of objects, either by softening or enhancing them, altering contrast levels and textures, or modifying the dark areas of an image. Human subjects were shown images, grouped in the categories of pets, vegetables and dangerous creatures (spiders, snakes and so on) — which were either unmodified, or modified with a perturbation layer, or modified with a perturbation layer that has been flipped upside down (labeled “flip”) — the control case for the experiment, to see if the adversarial perturbation really had an effect on human vision. Participants had only about 60 to 70 milliseconds to quickly view the image before they had to classify them.
Interestingly enough, humans and computers consistently misclassified images at a higher rate with the upright perturbation layer applied. In this example with a photo of a dog, both machines and people tended to be less accurate in identifying it as such when the adversarial layer (labeled “adv”) was used — a trend that was reflected in the other image groups as well.
“Supernormal Stimuli for Neural Networks”
But these findings are just the tip of the iceberg, and there are more answers to be found in the intersection of neuroscience and artificial intelligence.
“Our study raises fundamental questions how adversarial examples work, how CNN models work, and how the brain works,” wrote the researchers. “Do adversarial attacks transfer from CNNs to humans because the semantic representation in a CNN is similar to that in the human brain? Do they instead transfer because both the representation in the CNN and the human brain are similar to some inherent semantic representation which naturally corresponds to reality?”
These are pretty fascinating questions, but there are potentially wider implications if adversarial examples are used for more nefarious ends. For instance, deep learning models might be trained to rate how trustworthy-looking a human face might be. In that case, “it might then be possible to generate adversarial perturbations which enhance or reduce human impressions of trustworthiness, and those perturbed images might be used in news reports or political advertising,” said the researchers.
The team drew some additional parallels between the adversarial examples and the naturally occurring phenomenon of some animals using certain “supernormal” sensory stimuli to hack into the brain responses of other animals to take advantage of them. For instance, cuckoo chicks will do this so that birds from other species will feed them instead of their own chicks.
“Adversarial examples can be seen as a form of supernormal stimuli for neural networks,” said the authors. “A worrying possibility is that supernormal stimuli designed to influence human behavior or emotions, rather than merely the perceived class label of an image, might also transfer from machines to humans.”
Knowing that AI already can be used to create startlingly persuasive fake news videos to fool human viewers, it’s not that further of a stretch to imagine that AI could be employed on an even more subtle level to create adversarial images that affect us in more subliminal ways, just below the horizon of awareness — a potentially powerful tool that would be dangerous in the wrong hands.
Images: Google Brain.