Data

TensorFlow team releases a developer preview of TensorFlow Lite with new mobile GPU backend support

2 min read

The TensorFlow team released developer preview of the newly added GPU backend support for TensorFlow Lite, earlier this week. A full open-source release for the same is planned to arrive later in 2019.

The team has been using the TensorFlow Lite GPU inference support at Google for several months now in their products. For instance, using the new GPU backend accelerated the foreground-background segmentation model by over 4x and the new depth estimation model by over 10x vs. Similarly, using GPU backend support for the YouTube Stories and Playground Stickers, the team saw an increase in speed by up to 5-10x in their real-time video segmentation model across a variety of phones.

They found out that the new GPU backend is much faster in performance (2-7x times faster) as compared to original floating point CPU implementation for different deep neural network models. The team also notes that GPU speed is most significant on more complex neural network models involving dense prediction/segmentation or classification tasks. For small models the speedup could be less and using CPU would be more beneficial as it would avoid latency costs during memory transfers.

How does it work?

The GPU delegate first gets initialized once the interpreter::ModifyGraphWithDelegate() is called in Objective-C++ or by calling Interpreter’s constructor with Interpreter.Options in Java. During this process, a canonical representation of the input neural network is built over which a set of transformation rules are applied.

After this, the compute shaders are generated and compiled. GPU backend currently makes use of OpenGL ES 3.1 Compute Shaders on Android and Metal Compute Shaders on iOS. Various architecture-specific optimizations are employed while creating compute shaders. After the optimization is complete, the shader programs are compiled and the new GPU inference engine gets ready. Depending on the inference for each input, inputs are moved to GPU if required, shader programs get executed, and outputs are moved to CPU if necessary.

The team intends to expand the coverage of operations, finalize the APIs and optimize the overall performance of the GPU backend in the future.

For more information, check out the official TensforFlow Lite GPU inference release notes.

Read Next

Building your own Snapchat-like AR filter on Android using TensorFlow Lite [ Tutorial ]

TensorFlow 2.0 to be released soon with eager execution, removal of redundant APIs, tf function and more

Google AdaNet, TensorFlow-based AutoML framework

Natasha Mathur

Tech writer at the Packt Hub. Dreamer, book nerd, lover of scented candles, karaoke, and Gilmore Girls.

Share
Published by
Natasha Mathur

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago