Yesterday, researchers at the Uber AI Labs released the Paired Open-Ended Trailblazer (POET) algorithm that pairs the generation of environmental challenges and the optimization of agents to solve those challenges.
The POET algorithm explores many different paths through the space of possible problems and solutions and, critically, allows these stepping-stone solutions to transfer between problems. The algorithms aim towards generating new tasks, optimizing solutions for them, and transferring agents between tasks to enable otherwise unobtainable advances.
Researchers have applied POET to create and solve bipedal walking environments. These environments were adapted from the BipedalWalker environments in OpenAI Gym, popularized in a series of blog posts and papers by David Ha. Each environment Ei is paired with a neural network-controlled agent Ai that tries to learn to navigate through that environment. Here’s an image that depicts an example environment and agent:
Source: Uber Engineering
In this experiment, the POET algorithm aims to achieve two goals, which are:
(1) evolve the population of environments towards diversity and complexity
(2) optimize agents to solve their paired environments.
During a single such run, POET generates a diverse range of complex and challenging environments, as well as their solutions.
POET also periodically performs transfer experiments to explore whether an agent optimized in one environment might serve as a stepping stone to better performance in a different environment. There are two types of transfer attempts:
- Direct transfer: Here, the agents from the originating environment are directly evaluated in the target environment.
- Proposal transfer: Here, agents take one ES optimization step in the target environment.
Source: Uber Engineering
By testing transfers to other active environments, POET harnesses the diversity of its multiple agent-environment pairs to its full potential, i.e., without missing any opportunities to gain an advantage from existing stepping stones.
Thus researchers mention that POET could invent radical new courses and solutions to them at the same time. It could similarly produce fascinating new kinds of soft robots for unique challenges it invents that only soft robots can solve.
POET could also generate simulated test courses for autonomous driving that both expose unique edge cases and demonstrate solutions to them. In their blog, the researchers said that they will release the source code soon and also that “more exotic applications are conceivable, like inventing new proteins or chemical processes that perform novel functions that solve problems in a variety of application areas. Given any problem space with the potential for diverse variations, POET can blaze a trail through it”.
Read more about Paired Open-Ended Trailblazer (POET) in detail in its research paper.
Here’s a video that demonstrates the working of the POET algorithm: