Wayve, a new U.K. self-driving car startup, trained a car to drive in its imagination using a model-based deep reinforcement learning system. This system helps the prediction model to learn from real-world data collected offline. The car observes the motion of other agents in the scene, predicts their direction, thereby, making an informed driving decision.
The deep reinforcement learning system was trained using data collected during sunny weather in Cambridge, UK. The training process used World Models (Ha & Schmidhuber, 2018), with monocular camera input on an autonomous vehicle. Although the system has been trained for the sunny weather, it can still successfully drive in the rain. It does not get distracted by the reflections produced by puddles or the droplets of water on the camera lens.
The underlying training process
Firstly, the prediction model was trained on the collected data. A variational autoencoder was used to encode the images into a low dimensional state. After this, a probabilistic recurrent neural network was trained to develop a prediction model. This helps estimate the next probabilistic state based on the current state and action. Also, an encoder and prediction model is trained using the real-world data.
Once that is done, a driving policy is initialized and its performance is assessed using the prediction model in simulated experiences. Similarly, many simulated sequences can be trained, by imagining experiences. These imagined sequences can also be visualized to observe the learned policy.
“Using a prediction model, we can dream to drive on a massively parallel server, independent of the robotic vehicle. Furthermore, traditional simulation approaches require people to hand-engineer individual situations to cover a wide variety of driving scenarios. Learning a prediction model from data automates the process of scenario generation, taking the human engineer out of the loop” reads the Wayve blog post.
Generally, there are differences in appearance and behavior between simulator solutions and the real world, making it challenging to leverage knowledge acquired in the simulation. Wayve’s deep reinforcement learning system does not have this limitation as the system is trained directly using the real-world data. Hence, there is no major difference between the simulation and the real world.
Finally, as the learned simulator is differentiable, it is easy to directly optimize a driving policy using gradient descent.
“Wayve is committed to developing richer and more robust temporal prediction models and believe this is key to building intelligent and safe autonomous vehicles,” says the Wayve team.
For more information, check out the official Wayve blog post.