AI Surprises Researchers by Inventing New Hide-and-Seek Strategies
What does it take to evolve a higher level of intelligence? For us humans, we know it took millions of years of evolution before we progressed from being mere mammals to upright bipeds who use language and tools, besides establishing complex societies along the way. Likewise, the path forward in developing human-like artificial general intelligence (AGI) isn’t necessarily a cut-and-dried one; it can be a piecemeal process that might have various researchers exploring a constellation of different techniques to endow AI with the ability to learn from its mistakes, to “reason” and even to develop its own language.
Besides these significant milestones, AI seems to be also capable of developing the ability to use tools and inventing new, unexpected strategies on its own in order to win in a virtual game environment. Recent research from San Francisco-based research lab OpenAI shows that when multiple AI agents are placed within a simple virtual environment with a few objects, and given only basic rules about the game of hide-and-seek, over time they will “train” themselves to work collaboratively and use those objects as tools, in addition to coming with relatively innovative winning strategies that the researchers didn’t anticipate. Watch to see how the process unfolds:
Emergent Tool Use
Using tools allows us humans to survive and thrive in the most inhospitable regions on the planet, and in turn, the iterative refinement of tools and techniques further accelerates the collective evolution of human intelligence. But while it took us millions of years to get to this point, it took the AI agents in this study about 500 million games of hide-and-seek before these new behaviors and self-taught strategies emerged.
As outlined in the research team’s blog post, the study consisted of two different types of AI agents: one group that was tasked with finding agents that were hidden in the environment, and another group whose aim was to hide themselves. The agents were placed in an enclosed virtual space that also included objects such as ramps, walls, and boxes (which could be moved around or locked into place by either team). The experiments used reinforcement learning algorithms, which rewards agents with points when they perform their task successfully, essentially giving them the incentive to continue learning through trial and error. In addition, multi-agent learning techniques were used, where agents learn by dynamically interacting with their environment and with each other.
With these elements in place, the agents’ first games of hide-and-seek were unremarkable, demonstrating only random movements within the space. However, somewhere along the way between those first games and 2 million games, the “seekers” eventually learn to chase the “hiders”. After 2.69 million games, the hiders learn to grab and close off doorways with blocks. Around the 8 millionth game, the seekers then learn to use the ramps to jump over the blockades, prompting the hiders to adapt by hiding the ramps inside the room. At this point, the team noted that the hiders developed collaborative strategies on their own, such as coordinating who will block the door and who will fetch the ramp.
But the researchers didn’t stop there: they redid the experiment with a more randomized, open environment, adding movable walls to the existing repertoire of stationary partitions and portable blocks and ramps. While learning in this more complex, random setting took a longer time, it nevertheless progressed in a similar fashion as before: it took the seeker agents 22 million games to learn how to chase the hiders in this more complex environment, before hiders learned to construct shelters to hide from seekers. At around 88 million games, the seekers responded (once again) by using ramps to get over these shelters. At 115 million games, the hiders adapted by locking the ramps, thus rendering them unusable.
What happened next took the researchers completely by surprise: at 388 games, the seekers reacted to this new deterrent with a “box surfing” strategy, by bringing a box to a locked ramp in order to jump on top of it, then using it to “surf” over the hiders’ shelter — a novel move that the agents invented on their own but which the researchers didn’t foresee. The experiment continued on until 500 million games or so, up to which point the hiders adapted by locking all ramps, boxes and walls in sight.
“As agents train against each other in hide-and-seek, as many as six distinct strategies emerge,” explained the team. “Each new strategy creates a previously nonexistent pressure for agents to progress to the next stage. Note that there are no direct incentives for agents to interact with objects or to explore; rather, the emergent strategies … are a result of … multi-agent competition and the simple dynamics of hide-and-seek.”
The results underscore how unpredictable things could get, especially when implemented at scale in real-world applications. At the same time, it also points to the likely possibility that AI will find novel ways to solve problems that we humans would’ve never imagined, suggesting that AI’s co-adaptation strategies may one day generate complex and intelligent behavior that only emerges once it is set loose and left to its own devices — hopefully helping humanity rather than harming it.
Read more over at OpenAI.