Researchers at OpenAI, who previously taught virtual AI agents to play hide and seek, have trained a five-fingered, human-like robotic hand named “Dactyl” to solve Rubik’s Cube puzzles. They discovered that the neural networks that power the highly dextrous digits are constantly learning, even when faced with progressively more challenging tasks.
The neural networks are trained entirely in simulation, using the same reinforcement learning code as OpenAI Five paired with a new technique called Automatic Domain Randomization (ADR). …ADR starts with a single, nonrandomized environment, wherein a neural network learns to solve Rubik’s Cube. As the neural network gets better at the task and reaches a performance threshold, the amount of domain randomization is increased automatically. This makes the task harder, since the neural network must now learn to generalize to more randomized environments. The network keeps learning until it again exceeds the performance threshold, when more randomization kicks in, and the process is repeated.
The researchers also discovered that the robotic hand will keep attempting to solve the puzzle despite impediments (such as a plush giraffe) deliberately put in its way.
We find that our system trained with ADR is surprisingly robust to perturbations even though we never trained with them: The robot can successfully perform most flips and face rotations under all tested perturbations, though not at peak performance.