The TensorFlow team released a developer preview of the newly added GPU backend support for TensorFlow Lite, earlier this week. A full open-source release for the same is planned to arrive later in 2019.
The team has been using the TensorFlow Lite GPU inference support at Google for several months now in their products. For instance, using the new GPU backend accelerated the foreground-background segmentation model by over 4x and the new depth estimation model by over 10x vs. Similarly, using GPU backend support for the YouTube Stories and Playground Stickers, the team saw an increase in speed by up to 5-10x in their real-time video segmentation model across a variety of phones.
They found out that the new GPU backend is much faster in performance (2-7x times faster) as compared to original floating point CPU implementation for different deep neural network models. The team also notes that GPU speed is most significant on more complex neural network models involving dense prediction/segmentation or classification tasks. For small models the speedup could be less and using CPU would be more beneficial as it would avoid latency costs during memory transfers.
How does it work?
The GPU delegate first gets initialized once the interpreter::ModifyGraphWithDelegate() is called in Objective- C++ or by calling Interpreter’s constructor with Interpreter.Options in Java. During this process, a canonical representation of the input neural network is built over which a set of transformation rules are applied.
After this, the compute shaders are generated and compiled. GPU backend currently makes use of OpenGL ES 3.1 Compute Shaders on Android and Metal Compute Shaders on iOS. Various architecture-specific optimizations are employed while creating compute shaders. After the optimization is complete, the shader programs are compiled and the new GPU inference engine gets ready. Depending on the inference for each input, inputs are moved to GPU if required, shader programs get executed, and outputs are moved to CPU if necessary.
The team intends to expand the coverage of operations, finalize the APIs and optimize the overall performance of the GPU backend in the future.
For more information, check out the official TensforFlow Lite GPU inference release notes.