Yesterday, Google researchers released three different research papers which describe their investigations in easy-to-adapt robotic autonomy by combining deep Reinforcement Learning with long-range planning. This research is made for people with a mobility impairment that makes them home-bound. The researchers propose to build service robots, trained using reinforcement learning to improve the independence of people with limited mobility.
The researchers have trained the local planner agents in order to perform basic navigation behaviors and traverse short distances safely without collisions with moving obstacles. These local planners take noisy sensor observations, such as a 1D lidar that helps in providing distances to obstacles, and output linear and angular velocities for robot control.
The researchers trained the local planner in simulation with AutoRL (AutomatedReinforcement Learning) which is a method that automates the search for RL rewards and neural network architecture. These local planners transfer to both real robots and to new, previously unseen environments. This works as building blocks for navigation in large spaces. The researchers then worked on a roadmap, a graph where nodes are locations and edges connect the nodes only if local planners can traverse between them reliably.
Automating Reinforcement Learning (AutoRL)
In the first paper, Learning Navigation Behaviors End-to-End with AutoRL, the researchers trained the local planners in small, static environments. It is difficult to work with standard deep RL algorithms, such as Deep Deterministic Policy Gradient (DDPG).
To make it easier, the researchers automated the deep Reinforcement Learning training. AutoRL is an evolutionary automation layer around deep RL that searches for a reward and neural network architecture with the help of a large-scale hyperparameter optimization. It works in two phases, reward search, and neural network architecture search. During the reward search, AutoRL concurrently trains a population of DDPG agents, with each having a slightly different reward function. At the end of the reward search phase, the reward that leads the agents to its destination most often gets selected. In the neural network architecture search phase, the process gets repeated. The researchers use the selected reward and tune the network layers.
This turns into an iterative process and which means AutoRL is not sample efficient. Training one agent takes 5 million samples while AutoRL training around 10 generations of 100 agents requires 5 billion samples which is equivalent to 32 years of training. The advantage is that after AutoRL, the manual training process gets automated, and DDPG does not experience catastrophic forgetfulness. Another advantage is that AutoRL policies are robust to the sensor, actuator and localization noise, which generalize to new environments.
In the second paper, PRM-RL: Long-Range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning, the researchers explain Sampling-based planners that tackle long-range navigation by approximating robot motions. In this paper, the researchers have combined PRMs with hand-tuned RL-based local planners (without AutoRL) for training robots locally and then adapting them to different environments.
The researchers trained a local planner policy in a generic simulated training environment, for each robot. Then they build a PRM with respect to that policy, called a PRM-RL, over a floor plan for the deployment environment.
For building a PRM-RL, the researchers connected the sampled nodes with the help of Monte Carlo simulation. The resulting roadmap can be tuned to both the abilities and geometry of the particular robot. Though the roadmaps for robots with the same geometry having different sensors and actuators will have different connectivity. At execution time, the RL agent easily navigates from roadmap waypoint to waypoint.
Long-Range Indoor Navigation with PRM-RL
In the third paper, the researchers have made several improvements to the original PRM-RL. They replaced the hand-tuned DDPG with AutoRL-trained local planners, which improves long-range navigation. They have also added Simultaneous Localization and Mapping (SLAM) maps, which robots use at execution time, as a source for building the roadmaps. As the SLAM maps are noisy, this change closes the “sim2real gap”, a phenomenon where simulation-trained agents significantly underperform when they are transferred to real-robots. Lastly, they have added distributed roadmap building to generate very large scale roadmaps containing up to 700,000 nodes.
The team compared PRM-RL to a variety of different methods over distances of up to 100m, well beyond the local planner range. The team realized that PRM-RL had 2 to 3 times the rate of success over baseline because the nodes were connected appropriately for the robot’s capabilities.
To conclude, Autonomous robot navigation can improve the independence of people with limited mobility. This is possible by automating the learning of basic, short-range navigation behaviors with AutoRL and using the learned policies with SLAM maps for building roadmaps.
To know more about this news, check out the Google AI blog post.