Home Cloud & Networking Cloud Computing Google expands its machine learning hardware portfolio with Cloud TPU Pods (alpha)...

Google expands its machine learning hardware portfolio with Cloud TPU Pods (alpha) to effectively train and deploy TensorFlow machine learning models on GCP

Melisha Dsouza

December 13, 2018 - 3:04 am

3003

3 min read

Today, Google cloud announced the alpha availability of ‘Cloud TPU Pods’ that are tightly-coupled supercomputers built with hundreds of Google’s custom Tensor Processing Unit (TPU) chips and dozens of host machines, linked via an ultrafast custom interconnect. Google states that these pods make it easier, faster, and more cost-effective to develop and deploy cutting-edge machine learning workloads on Google Cloud. Developers can iterate over the training data in minutes and train huge production models in hours or days instead of weeks.

The Tensor Processing Unit (TPU), is an ASIC that powers several of Google’s major products, including Translate, Photos, Search, Assistant, and Gmail. It provides up to 11.5 petaflops of performance in a single pod.

Features of Cloud TPU Pods

#1 Proven Reference Models

Customers can take advantage of Google-qualified reference models that are optimized for performance, accuracy, and quality for many real-world use cases. These include object detection, language modeling, sentiment analysis, translation, image classification, and more.

#2 Connect Cloud TPUs to Custom Machine Types

Users can connect to Cloud TPUs from custom VM types. This will them optimally balance processor speeds, memory, and high-performance storage resources for their individual workloads.

#3 Preemptible Cloud TPU

Preemptible Cloud TPUs are 70% cheaper than on-demand instances. Long training runs with checkpointing or batch prediction on large datasets can now be done at an optimal rate using Cloud TPU’s.

#4 Integrated with GCP

Cloud TPUs and Google Cloud’s Data and Analytics services are fully integrated with other GCP offerings. This provides developers unified access across the entire service line. Developers can run machine learning workloads on Cloud TPUs and benefit from Google Cloud Platform’s storage, networking, and data analytics technologies.

#5 Additional features

Cloud TPUs perform really well at synchronous training. The Cloud TPU software stack transparently distributes ML models across multiple TPU devices in a Cloud TPU Pod to help customers achieve scalability. All Cloud TPUs are integrated with Google Cloud’s high-speed storage systems, ensuring that data input pipelines can keep up with the TPUs. Users do not have to manage parameter servers, deal with complicated custom networking configurations, or set up exotic storage systems to achieve unparalleled training performance in the cloud.

Performance and Cost benchmarking of Cloud TPU

Google compared the Cloud TPU Pods and Google Cloud VMs with NVIDIA Tesla V100 GPUs attached- using one of the MLPerf models called TensorFlow 1.12 implementations of ResNet-50 v1.5 (GPU version, TPU version). They trained ResNet-50 on the ImageNet image classification dataset.

The results of the test show that Cloud TPU Pods deliver near-linear speedups for large-scale training task; the largest Cloud TPU Pod configuration tested (256 chips) delivers a 200X speedup over an individual V100 GPU. Check out their methodology page for further details on this test. Training ResNet-50 on a full Cloud TPU v2 Pod costs almost 40% less than training the same model to the same accuracy on an n1-standard-64 Google Cloud VM with eight V100 GPUs attached. The full Cloud TPU Pod completes the training task 27 times faster.

Head over to Google Cloud’s official page to know more about Cloud TPU Pods. Alternatively, check out Cloud TPU’s documentation for more insights on the same.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Google expands its machine learning hardware portfolio with Cloud TPU Pods (alpha) to effectively train and deploy TensorFlow machine learning models on GCP

Features of Cloud TPU Pods

#1 Proven Reference Models

#2 Connect Cloud TPUs to Custom Machine Types

#3 Preemptible Cloud TPU

#4 Integrated with GCP

#5 Additional features

Performance and Cost benchmarking of Cloud TPU

Read Next

MobilePro

datapro

Programming

Subscribe to our newsletter