2 min read
PlaNet can easily solve a variety of image-based control tasks as well as compete with the advanced model-free agents. The Google AI team is also releasing the source code for the research community to further explore and build upon PlaNet.
How does PlaNet work?
PlaNet depends on a compact sequence of hidden or latent states. This is known called a latent dynamics model where instead of predicting directly from one image to the next image, the latent state forward is first predicted. “By compressing the images in this way, the agent can automatically learn more abstract representations, such as positions and velocities of objects, making it easier to predict forward without having to generate images along the way”, states the Google AI team.
In a latent dynamics model, the information of the input images gets integrated into the hidden states with the help of an encoder network. The hidden state then gets further projected forward to predict future images and rewards. For planning, past images are encoded into the current hidden state, and then the future rewards for multiple action sequences are predicted.
PlaNet agents are trained across a variety of image-based control tasks. These tasks pose different challenges such as partial observability, sparse rewards for catching a ball, etc. Moreover, a single PlaNet agent is trained to solve all six tasks. Without any changes to the hyperparameters, this multi-task agent is able to achieve the same mean performance as individual agents.
“We advocate for further research that focuses on learning accurate dynamics models on tasks of even higher difficulty, such as 3D environments and real-world robotics tasks. We are excited about the possibilities that model-based reinforcement learning opens up”, states the Google AI team.
For more information, check out the official Google AI PlaNet announcement.