4 min read

A group of researchers from Intel Labs and Facebook have published a paper titled, “A Study of BFLOAT16 for Deep Learning Training”. The paper presents a comprehensive study indicating the success of Brain Floating Point (BFLOAT16) half-precision format in Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems.

BFLOAT16 has a 7-bit mantissa and an 8-bit exponent, similar to FP32, but with less precision. BFLOAT16 was originally developed by Google and implemented in its third generation Tensor Processing Unit (TPU).

Many state of the art training platforms use IEEE-754 or automatic mixed precision as their preferred numeric format for deep learning training. However, these formats lack in representing error gradients during back propagation. Thus, they are not able to satisfy the required  performance gains. BFLOAT16 exhibits a dynamic range which can be used to represent error gradients during back propagation. This enables easier migration of deep learning workloads to BFLOAT16 hardware.

Image Source: BFLOAT16

In the above table, all the values are represented as trimmed full precision floating point values with 8 bits of mantissa with their dynamic range comparable to FP32. By adopting to BFLOAT16 numeric format, the core compute primitives such as Fused Multiply Add (FMA) can be built using 8-bit multipliers. This leads to significant reduction in area and power while preserving the full dynamic range of FP32.

How Deep neural network(DNNs) is trained with BFLOAT16?

The below figure shows the mixed precision data flow used to train deep neural networks using BFLOAT16 numeric format.

Image Source: BFLOAT16

  • The BFLOAT16 tensors are taken as input to the core compute kernels represented as General Matrix Multiply (GEMM) operations. It is then forwarded to the FP32 tensors as output.


  • The researchers have developed a library called Quantlib, represented as Q in the figure, to implement the emulation in multiple deep learning frameworks. One of the functions of a Quantlib is to modify the elements of an input FP32 tensor to echo the behavior of BFLOAT16. Quantlib is also used to modify a copy of the FP32 weights to BFLOAT16 for the forward pass.


  • The non-GEMM computations include batch-normalization and activation functions. The  FP32 always maintains the bias tensors.The FP32 copy of the weights updates the step uses to maintain model accuracy.

How does BFLOAT16 perform compared to FP32?

Convolution Neural Networks

Convolutional neural networks (CNN) are primarily used for computer vision applications such as image classification, object detection and semantic segmentation. AlexNet and ResNet-50 are used as the two representative models for the BFLOAT16 evaluation.

AlexNet demonstrates that BFLOAT16 emulation follows very near to the actual FP32 run and achieves 57.2% top-1 and 80.1% top-5 accuracy. Whereas in ResNet-50, the BFLOAT16 emulation follows the FP32 baseline almost exactly and achieves the same top-1 and top-5 accuracy.

Image Source: BFLOAT16

Similarly, the researchers were able to successfully demonstrate that BFLOAT16 is able to represent tensor values across many application domains including Recurrent Neural Networks, Generative Adversarial Networks (GANs) and Industrial Scale Recommendation System.

The researchers thus established that the dynamic range of BFLOAT16 is of the same range as that of FP32 and its conversion to/from FP32 is also easy. It is important to maintain the same range as FP32 since no hyper-parameter tuning is required for convergence in FP32. A hyperparameter is a parameter of choosing a set of optimal hyperparameters in machine learning for a learning algorithm. Researchers of this paper expect to see an industry-wide adoption of BFLOAT16 across emerging domains.

Recent reports suggest that Intel is planning to graft Google’s BFLOAT16 onto its processors  as well as on its initial Nervana Neural Network Processor for training, the NNP-T 1000.

Pradeep Dubey, who directs the Parallel Computing Lab at Intel and is also one of the researchers of this paper believes that for deep learning, the range of the processor is more important than the precision, which is the inverse of the rationale used for IEEE’s floating point formats.

Users are finding it interesting that a BFLOAT16 half-precision format is suitable for deep learning applications.

For more details, head over to the “A Study of BFLOAT16 for Deep Learning Training” paper.

Read Next

Intel’s new brain inspired neuromorphic AI chip contains 8 million neurons, processes data 1K times faster

Google plans to remove XSS Auditor used for detecting XSS vulnerabilities from its Chrome web browser

IntelliJ IDEA 2019.2 Beta 2 released with new Services tool window and profiling tools