Nvidia recently announced that they will make Kubernetes available on its GPUs, at the Computer Vision and Pattern Recognition (CVPR) conference.
Although it is not generally available, developers will be allowed to use this technology in order to test the software and provide their feedback.
Source: Kubernetes on Nvidia GPUs
Kubernetes on NVIDIA GPUs will allow developers and DevOps engineers to build and deploy a scalable GPU-accelerated deep learning training. It can also be used to create inference applications on multi-cloud GPU clusters. Using this novel technology, developers can handle the growing number of AI applications and services. This will be possible by automating processes such as deployment, maintenance, scheduling and operation of GPU-accelerated application containers.
One can orchestrate deep learning and HPC applications on heterogeneous GPU clusters. It also includes easy-to-specify attributes such as GPU type and memory requirement. It also offers integrated metrics and monitoring capabilities for analyzing and improving GPU utilization on clusters.
Interesting features of Kubernetes on Nvidia GPUs include:
- GPU support in Kubernetes can be used via the NVIDIA device plugin
- One can easily specify GPU attributes such as GPU type and memory requirements for deployment in heterogeneous GPU clusters
- Visualizing and monitoring GPU metrics and health with an integrated GPU monitoring stack of NVIDIA DCGM , Prometheus and Grafana
- Support for multiple underlying container runtimes such as Docker and CRI-O
- Officially supported on all NVIDIA DGX systems (DGX-1 Pascal, DGX-1 Volta and DGX Station)
Read more about this exciting news on Nvidia Developer blog
- NVIDIA brings new deep learning updates at CVPR conference
- Kublr 1.9.2 for Kubernetes cluster deployment in isolated environments released!
- Distributed TensorFlow: Working with multiple GPUs and servers