OpenAI’s AI robot hand learns to solve a Rubik Cube using Reinforcement learning and Automatic Domain Randomization (ADR)

5 min read

A team of OpenAI researchers shared their research of training neural networks to solve a Rubik’s Cube with a human-like robot hand. The researchers trained the neural networks only in simulation using the same reinforcement learning code as OpenAI Five paired with a new technique called Automatic Domain Randomization (ADR).

In their research paper, the team demonstrates how the system trained only in simulation can handle situations it never saw during training. “Solving a Rubik’s Cube one-handed is a challenging task even for humans, and it takes children several years to gain the dexterity required to master it. Our robot still hasn’t perfected its technique though, as it solves the Rubik’s Cube 60% of the time (and only 20% of the time for a maximally difficult scramble),” the researchers mention on their official blog. The Neural networks were also trained with Kociemba’s algorithm along with RL algorithms, for picking the solution steps.

Read Also: DeepCube: A new deep reinforcement learning approach solves the Rubik’s cube with no human help

What is Automatic Domain Randomization (ADR)?

Domain randomization enables networks trained solely in simulation to transfer to a real robot. However, it was a challenge for the researchers to create an environment with real-world physics in the simulation environment. The team realized that it was difficult to measure factors like friction, elasticity, and dynamics for complex objects like Rubik’s Cubes or robotic hands and domain randomization alone was not enough.

To overcome this, the OpenAI researchers developed a new method called Automatic Domain Randomization (ADR), which endlessly generates progressively more difficult environments in simulation.

In ADR, the neural network learns to solve the cube with a single, nonrandomized environment. As the neural network gets better at the task and reaches a performance threshold, the amount of domain randomization is increased automatically. This makes the task harder since the neural network must now learn to generalize to more randomized environments. The network keeps learning until it again exceeds the performance threshold, when more randomization kicks in, and the process is repeated.

“The hypothesis behind ADR is that a memory-augmented network combined with a sufficiently randomized environment leads to emergent meta-learning, where the network implements a learning algorithm that allows itself to rapidly adapt its behavior to the environment it is deployed in,” the researchers state.


OpenAI’s AI-hand and the Giiker Cube

The researchers have used the Shadow Dexterous E Series Hand (E3M5R) as a humanoid robot hand and the PhaseSpace motion capture system to track the Cartesian coordinates of all five fingertips. They have also used RGB Basler cameras for vision pose estimation. Sensing the state of a Rubik’s cube from vision alone is a challenging task. The team, therefore, used a “smart” Rubik’s cube with built-in sensors and a Bluetooth module as a stepping stone. They also used a Giiker cube for some of the experiments to test the control policy without compounding errors made by the vision model’s face angle predictions. The hardware is based on the Xiaomi Giiker cube. This cube is equipped with a Bluetooth module and allows one to sense the state of the Rubik’s cube. However, it is limited to a face angle resolution of 90◦ , which is not sufficient for state tracking purposes on the robot setup. The team, therefore, replaced some of the components of the original Giiker cube with custom ones in order to achieve a tracking accuracy of approximately 5 degrees.

A few challenges faced

OpenAI’s method currently solves the Rubik’s Cube 20% of the time when applying a maximally difficult scramble that requires 26 face rotations. For simpler scrambles that require 15 rotations to undo, the success rate is 60%.

Researchers consider an attempt to have failed when the Rubik’s Cube is dropped or a timeout is reached. However, their network is capable of solving the Rubik’s Cube from any initial condition. So if the cube is dropped, it is possible to put it back into the hand and continue solving.

The neural network is much more likely to fail during the first few face rotations and flips. The team says this happens because the neural network needs to balance solving the Rubik’s Cube with adapting to the physical world during those early rotations and flips.

The team also implemented a few perturbations while training the AI-robot hand, including:

  • Resetting the hidden state: During a trial, the hidden state of the policy was reset. This leaves the environment dynamics unchanged but requires the policy to re-learn them since its memory has been wiped.

  • Re-sampling environment dynamics: This corresponds to an abrupt change of environment dynamics by resampling the parameters of all randomizations while leaving the simulation state18 and hidden state intact.

  • Breaking a random joint: This corresponds to disabling a randomly sampled joint of the robot hand by preventing it from moving. This is a more nuanced experiment since the overall environment dynamics are the same but the way in which the robot can interact with the environment has changed.

Here’s the complete video on how the AI-robot hand swiftly solved the Rubik cube single-handedly!

To know more about this research in detail, you can read the research paper.

Read Next

Open AI researchers advance multi-agent competition by training AI agents in a simple hide and seek environment

Introducing Open AI’s Reptile: The latest scalable meta-learning Algorithm on the block

Build your first Reinforcement learning agent in Keras [Tutorial]