Yesterday, Google introduced a new Tensorflow-based framework named Dopamine, which aims to provide flexibility, stability, and reproducibility for both new and experienced RL researchers. This release also includes a set of colabs that clarify how to use the Dopamine framework.
Dopamine is inspired by one of the main components in reward-motivated behavior in the brain. It also reflects a strong historical connection between neuroscience and reinforcement learning research. Its main aim is to enable a speculative research that drives radical discoveries.
Dopamine framework feature highlights
Ease of Use
The two key considerations in Dopamine’s design are its clarity and simplicity. Its code is compact (about 15 Python files) and is well-documented. This is achieved by focusing on the Arcade Learning Environment (a mature, well-understood benchmark), and four value-based agents:
- A carefully curated simplified variant of the Rainbow agent, and
- The Implicit Quantile Network agent, which was presented last month at the International Conference on Machine Learning (ICML).
Google has provided the Dopamine code with full test coverage. These tests also serve as an additional form of documentation. Dopamine follows the recommendations given by Machado et al. (2018) on standardizing empirical evaluation with the Arcade Learning Environment.
It is important for new researchers to be able to quickly benchmark their ideas against established methods. Following this, Google has provided the full training data of the four provided agents, across the 60 games supported by the Arcade Learning Environment. They have also provided a website where one can quickly visualize the training runs for all provided agents on all 60 games.
Given below is a snapshot showcasing the training runs for the 4 agents on Seaquest, one of the Atari 2600 games supported by the Arcade Learning Environment.
The x-axis represents iterations, where each iteration is 1 million game frames (4.5 hours of real-time play); the y-axis is the average score obtained per play. The shaded areas show confidence intervals from 5 independent runs.
The Google community aims to empower researchers to try out new ideas, both incremental and radical with Dopamine ’s flexibility and ease-of-use. It is actively being used in Google’s research, giving them the flexibility to iterate quickly over many ideas.