Modal Title
Machine Learning / Science

‘Common Sense’ Test Could Lead to Smarter AI

Researchers from IBM, MIT and Harvard University have developed a series of tests that would evaluate an AI's ability to perceive, understand, and judge in a manner that is shared by nearly all humans.
Nov 30th, 2022 7:00am by
Featued image for: ‘Common Sense’ Test Could Lead to Smarter AI
Images by IBM, MIT, Harvard University.

The goal of achieving what is called artificial general intelligence — or the capacity of an engineered system to display human-like general intelligence — is still some time off into the future. Nevertheless, experts in the field of AI have no doubt accomplished some major milestones along the way, including developing AI capable of deep neural reasoning, tactile reasoning, and even AI with rudimentary social skills.

Now, in yet another step toward AI with more human-like intelligence, researchers from IBM, the Massachusetts Institute of Technology and Harvard University have developed a series of tests that would evaluate an AI’s ability to use a machine version of “common sense” — or a basic ability to perceive, understand, and judge in a manner that is shared by nearly all humans.

For most people, common sense isn’t necessarily a thing that needs to be explicitly taught, but can learned early on during infancy through trial and error, in order to gain a practical kind of judgment that helps us navigate through daily life. Think of babies that quickly learn about the laws of physics by constantly manipulating or dropping objects to see what happens. In contrast, common sense doesn’t come naturally to machines, as they are constrained by the datasets they are trained with, and to having to follow the rules of their underlying algorithms.

Nevertheless, even if AI isn’t quite capable of teaching itself some common sense, researchers are still keenly interested in finding ways to measure the core psychological reasoning ability of an AI.

As the research team explained: “For machine agents to successfully interact with humans in real-world settings, they will need to develop an understanding of human mental life. Intuitive psychology, the ability to reason about hidden mental variables that drive observable actions, comes naturally to people: even pre-verbal infants can tell agents from objects, expecting agents to act efficiently to achieve goals given constraints. Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning.”

To better evaluate how machines reason, the team of researchers created a benchmark called Action-Goal-Efficiency-coNstraint-uTility, or AGENT for short. The AGENT tests consist of a dataset of 3D animations that are inspired by previous cognitive development experiments.

As IBM researchers explained, the animations show a virtual agent interacting with different items, under different physical limitations: “The videos comprise distinct trials, each of which includes one or more ‘familiarization’ videos of an agent’s typical behavior in a certain physical environment, paired with ‘test’ videos of the same agent’s behavior in a new environment, which are labeled as either ‘expected’ or ‘surprising,’ given the behavior of the agent in the corresponding familiarization videos.”

Intuitive Psychology

Inspired by experiments studying cognitive development in children, the AGENT test is structured around the concepts underlying what is called intuitive psychology, which human infants learn before learning to speak. These pre-verbal aspects of intuitive psychology include variables like goal preferences, action efficiency, unobserved constraints, and cost-reward trade-offs.

With goal preferences, the subset of tests will determine if an AI understands that the virtual agents choose to pursue a particular goal or object based on its preferences, and going after the same goal under different physical conditions could lead to different actions. For action efficiency, another subset of tests will see if a model understands that a virtual agent can be physically constrained by its environment, and will tend to take the most efficient course of action to attain its goal.

The unobserved constraints test probes whether a model can infer a hidden obstacle based on observing an agent’s actions. Lastly, the cost-reward trade-off subtest tries to ascertain whether an AI understands what the agent prefers and that they plan their actions based on utility, by observing the “level of cost” they willingly expend to attain that goal.

After being presented with these animations in the test, the AI model must then make an evaluation of how surprising the virtual agent’s actions were in the ‘test’ videos, in comparison to the ‘familiarization’ videos. Using the AGENT benchmark, this rating from the AI is then validated against ratings collected from humans who watched the same set of videos.

Interestingly, the team deliberately kept the dataset relatively small to ensure that the AI wasn’t just randomly arriving at the correct answer. “Training from scratch on just our dataset will not work. Instead, we suggest that to pass the tests, it is necessary to acquire additional knowledge, either via inductive biases in the architectures, or from training on additional data” said the researchers.

Though the test is still being improved, the team believes that AGENT could be a helpful diagnostic tool for evaluating and further advancing common sense in AI systems. Moreover, the study demonstrates the potential of translating traditional developmental psychology methods for evaluating intelligent machines in the future. Measuring the reasoning capabilities of an AI is important because we want to know how an AI will perform in situations that are unpredictable, ambiguous, and not strictly defined by rules.

In these undefined situations, some kind of self-supervised learning would help AI systems better predict what comes next, even if the available data is of lower quality or unlabeled. This would reduce training times, and the need for human supervision, as well AI systems’ reliance on massive datasets — all of which help to increase efficiency and decrease costs.

Read the team’s paper.

Group Created with Sketch.
THE NEW STACK UPDATE A newsletter digest of the week’s most important stories & analyses.