In their quest to “ensure that artificial general intelligence benefits all of humanity”, researchers at OpenAI discovered that their virtual AI agents engaged in a simulated game of “Hide and Seek” had learned six distinct new strategies for playing the game outside of the designed environment.
Through training in our new simulated hide-and-seek environment, agents build a series of six distinct strategies and counterstrategies, some of which we did not know our environment supported. The self-supervised emergent complexity in this simple environment further suggests that multi-agent co-adaptation may one day produce extremely complex and intelligent behavior.
This surprising type of learning behavior suggests intrinsic motivation and/or competition with others, although the researchers are still figuring it all out.
We’ve shown that agents can learn sophisticated tool use in a high fidelity physics simulator; however, there were many lessons learned along the way to this result. Building environments is not easy and it is quite often the case that agents find a way to exploit the environment you build or the physics engine in an unintended way.