How Facebook data scientists use Bayesian optimization for tuning their online systems

Facebook data scientists had released a paper, Constrained Bayesian Optimization with Noisy Experiments in 2017 where they describe using Bayesian optimization to design rounds of A/B tests based on prior test results. An A/B test is a randomized experiment, used to determine which variant of A and B is more "effective". They are used for improving a product.

Facebook has a large array of backend systems serving billions of people every day. They have a large number of internal parameters that must be tuned carefully using live, randomized experiments, also known as A/B tests. Individual experiments may take a week or longer, so there is a challenge to optimize a set of parameters with the least number of experiments.

Bayesian optimization

Bayesian optimization is a technique used to solve optimization problems where the objective function (the online metric of interest) does not have an analytic expression. It can only be evaluated through some time consuming operations like a randomized experiment. Bayesian optimization allows joint tuning of more parameters with fewer experiments compared to a grid search or manual tuning. It also helps in finding better values.

The Gaussian process (GP) is a Bayesian model that works well for Bayesian optimization. GP provides good uncertainty estimates of how an online metric varies with the parameters of interest. It is illustrated as follows:

how-facebook-data-scientists-use-bayesian-optimization-for-tuning-their-online-systems-img-0

Source: Facebook research blog

The work in the paper was motivated by several challenges in using Bayesian optimization for tuning online systems. The challenges are noise, constraints, and batch experimentation. In the paper, the authors describe a Bayesian approach for handling observation noise in which they include the posterior uncertainty induced by noise in EI’s expectation.

In the paper, they describe a Bayesian approach for handling observation noise. A posterior uncertainty is induced by noise in EI’s expectation. Instead of computing the expectation of I(x) under the posterior of f(x), it is computed under the joint posterior of f(x) and f(x*). This expectation no longer has a closed form like El but can easily draw samples of values at past observations f(x_1), …, f(x_n) from the GP posterior. The conditional distribution f(x) | f(x_1), …, f(x_n) has closed form.

The results

The approach described in the paper is used to optimize various systems at Facebook. Two such optimizations are described in the paper. The first is to optimize six parameters from one of Facebook’s ranking systems. The second one was to optimize seven numeric compiler flags for the HipHop Virtual Machine (HHVM). The web servers powering Facebook use the HHVM to serve requests.

The end goal of this optimization was to reduce CPU usage on the web servers, with a constraint of keeping the peak memory usage less. This following figure shows the CPU usage of each configuration tested. There is a 100 total, it also shows the probability that each point satisfied the memory constraint:

how-facebook-data-scientists-use-bayesian-optimization-for-tuning-their-online-systems-img-1