Data

Facebook launches Horizon, its first open source reinforcement learning platform for large-scale products and services

3 min read

Facebook launched Horizon, its first open source reinforcement learning platform for large-scale products and services, yesterday. The workflows and algorithms in Horizon have been built on open source frameworks such as PyTorch 1.0, Caffe2, and Spark. This is what makes Horizon accessible to anyone who uses RL at scale.

“We developed this platform to bridge the gap between RL’s growing impact in research and its traditionally narrow range of uses in production. We deployed Horizon at Facebook over the past year, improving the platform’s ability to adapt RL’s decision-based approach to large-scale applications”, reads the Facebook blog.

Facebook has already used this new platform to gain performance benefits such as delivering more relevant notifications, optimizing streaming video bit rates, and improving personalized suggestions in Messenger. However, given the Horizon’s open design and toolset, it will also be benefiting other organizations in RL.

Harnessing reinforcement learning for large-scale production

Horizon uses reinforcement learning to make decisions at scale by taking into account the issues specific to the production environments. These include feature normalization, distributed training, large-scale deployment, and data sets with thousands of varying feature types.

Moreover, as per Facebook, applied RL models are more sensitive to noisy and unnormalized data as compared to the traditional deep networks. This is why Horizon preprocesses these state and action features in parallel with the help of Apache Spark. Once the training data gets preprocessed, PyTorch-based algorithms are used for normalization and training on the graphics processing unit.

Also, Horizon’s design focuses mainly on large clusters, where distributed training on many GPUs at once allows engineers to solve the problems with millions of examples. Horizon supports algorithms such as Deep Q-Network (DQN), parametric DQN, and deep deterministic policy gradient (DDPG) models. Then comes the training process in Horizon where a Counterfactual policy evaluation (CPE) is run. CPE refers to a set of methods that are used to predict the performance of a newly learned policy. Once the evaluation is done, its results are logged to TensorBoard. Once the training gets done, Horizon exports the models using ONNX, so that these models can be efficiently served at scale.

Now, usually, in many RL domains, the performance of a model is measured by trying it out. However, since Horizon performs large-scale production, it is important to ensure that the test models are tested thoroughly before deploying them at scale. To achieve this, Horizon solves policy optimization tasks, which in turn ensures that the training workflow also automatically runs state-of-the-art policy evaluation techniques. These techniques include sequential doubly robust policy evaluation and MAGIC.

The evaluation is then combined with anomaly detection which automatically alerts engineers if a new iteration of the model performs radically different than the previous one before the policy gets deployed to the public.

Facebook plans on adding new models & model improvements along with CPE integrated with real metrics to Horizon in the future.

“We are leveraging the Horizon platform to discover new techniques in model-based RL and reward shaping, and using the platform to explore a wide range of additional applications at Facebook, such as data center resource allocation and video recommendations. Horizon could transform the way engineers and ML models work together”, says Facebook.

For more information, check out the official Facebook blog.

Read Next

Facebook open sources set of Linux kernel products including BPF, Btrfs, Cgroup2, and others to address production issues

Facebook open sources QNNPACK, library for optimized mobile deep learning

Facebook’s Child Grooming Machine Learning system helped remove 8.7 million abusive images of children

Natasha Mathur

Tech writer at the Packt Hub. Dreamer, book nerd, lover of scented candles, karaoke, and Gilmore Girls.

Share
Published by
Natasha Mathur

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago