OpenAI builds reinforcement learning based system giving robots human like dexterity

Researchers at OpenAI have developed a system trained with reinforcement learning algorithms which is dexterous in-hand manipulation. Termed as Dactyl, this system can solve object orientation tasks entirely in a simulation without any human input. After the system’s training phase, it was able to work on a real robot without any fine-tuning.

Using humanoid hand systems to manipulate objects has been a long-standing challenge in robotic control. Current techniques remain limited in their ability to manipulate objects in the real world. Although robotic hands have been available for quite some time, they were largely unable to utilize complex end-effectors to perform dexterous manipulation tasks.

The Shadow Dexterous Hand, for instance, has been available since 2005 with five fingers and 24 degrees of freedom. However, it did not see large-scale adoption because of the difficulty of controlling such complex systems.

Now OpenAI researchers have developed a system that trained control policies allowing a robot hand to perform complex in-hand manipulations. This systems shows unprecedented levels of dexterity and discovers different hand grasp types found in humans, such as the tripod, prismatic, and tip pinch grasps. It is also able to display dynamic behaviors such as finger gaiting, multi-finger coordination, the controlled use of gravity, and application of translational and torsional forces to the object.

How does the OpenAI system work?

First, they used a large distribution of simulations with randomized parameters to collect data for the control policy and vision-based pose estimator.

The control policy receives observed robot states and rewards from the distributed simulations. It then learns to map observations to actions using RNN and reinforcement learning.

The vision-based pose estimator renders scenes collected from the distributed simulations. It then learns to predict the pose of the object from images using a CNN, trained from the control policy.

The object pose is predicted from 3 camera feeds with the CNN. These cameras measure the robot fingertip locations using a 3D motion capture system and give them to the control policy to produce an action for the robot.

openai-reinforcement-learning-giving-robots-human-like-dexterity-img-0

OpenAI blog

You can place a block in the palm of the Shadow Dexterous hand and the Dactyl can reposition it into different orientations. For example, it can rotate the block to put a new face on top.

openai-reinforcement-learning-giving-robots-human-like-dexterity-img-1