NVIDIA announced the release of CUDA 10.2 last week. This is the last version to have macOS support for developing CUDA applications and will be completely dropped in the next release. Other updates include libcu++, new interoperability APIs, and more.
Key updates in CUDA 10.2
General CUDA 10.2 updates
- New APIs: CUDA 10.2 ships with CUDA Virtual Memory Management APIs. New interoperability APIs are added for buffer allocation, synchronization, and streaming. However, these are in beta and may change in future releases.
- Support for new operating systems: This release adds support for a few new operating systems including Fedora 29, Red Hat Enterprise Linux (RHEL) 7.x and 8.x, OpenSUSE 15.x, SUSE SLES 12.4 and SLES 15.x, Ubuntu 16.04.6 LTS and Ubuntu 18.04.3 LTS. In CUDA 10.2, RHEL 6.x is deprecated and support will be dropped in the next release of CUDA.
- Increased texture size limit for Maxwell+ GPUs: The 1D linear texture size limit for Maxwell+ GPUs in CUDA is now increased to 2^28.
Updates in CUDA tools
- The Nvidia CUDA Compiler (NVCC) now has support for Clang 8.0 and Xcode 10.2 as host compilers.
- There is a new -forward-unknown-to-host-compiler option that allows forwarding options not recognized by NVCC to the host compiler.
- Visual Profiler and NVProf now allow tracing features for non-root and non-admin users on desktop platforms. The events and metrics profiling is still restricted to non-root and non-admin users.
- Also, starting with CUDA 10.2, Visual Profiler and NVProf use dynamic/shared CUPTI library. Users are required to set the path to the CUPTI library before launching Visual Profiler and NVProf.
Updates in CUDA libraries
- cuBLAS: The cuBLAS library is a fast GPU-accelerated implementation of the standard basic linear algebra subroutines (BLAS). In CUDA 10.2, performance is further improved on some large and other GEMM sizes due to increased internal workspace size.
- cuSOLVER: This library includes a collection of direct solvers that deliver significant acceleration for computer vision, CFD, and linear optimization apps. In this release, a new Tensor Cores Accelerated Iterative Refinement Solver (TCAIRS) is introduced. The cusolverMg library includes ‘cusolverMgGetrf’ and ‘cusolverMgGetrs’ to support multi-GPU LU.
- cuFFT: This library provides GPU-accelerated FFT implementations that perform up to 10x faster than CPU-only alternatives. This release comes with improved performance and scalability for these use cases: multi-GPU non-power of 2 transforms, R2C and Z2D odd-sized transforms, 2D transforms with small sizes and large batch counts
These were a few updates in CUDA 10.2. Read the official release notes to know what else has shipped with this release.