1 min read

Yesterday, the team at NVIDIA released CUDA 10.1 with a new lightweight GEMM library, new functionalities and performance updates to existing libraries, and improvements to the CUDA Graphs APIs.

What’s new in CUDA 10.1?

Now there are new encoding and batched decoding functionalities in nvJPEG. This release also features faster performance for a broad set of random number generators in cuRAND. In this release, there is improved performance and support for fork/join kernels in CUDA Graphs APIs.

Compiler

In this release, the CUDA-C and CUDA-C++ compiler, nvcc, are found in the bin/ directory. They are built on top of the NVVM optimizer, which itself is built on top of the LLVM compiler infrastructure. Developers who are willing to target NVVM directly can do so by using the Compiler SDK, which is available in the nvvm/directory.

Tools

There are new development tools available in the bin/ directory including, few IDEs like nsight (Linux, Mac), Nsight VSE (Windows) and debuggers like cuda-memcheck, cuda-gdb (Linux), Nsight VSE (Windows). The tools also include a few profilers and utilities.

Libraries

This release comes with cuBLASLt, a new lightweight GEMM library with a flexible API and tensor core support for INT8 inputs and FP16 CGEMM split-complex matrix multiplication. CUDA 10.1 also features selective eigensolvers SYEVDX and SYGVDX in cuSOLVER. Few of the available utility libraries in the lib/ directory (DLLs on Windows are in bin/) are cublas (BLAS), cublas_device (BLAS Kernel Interface), cuda_occupancy (Kernel Occupancy Calculation [header file implementation]), etc.

To know more about this news in detail, check out the post by Nvidia.

Read Next

Implementing color and shape-based object detection and tracking with OpenCV and CUDA [Tutorial]

ClojureCUDA 0.6.0 now supports CUDA 10

Stable release of CUDA 10.0 out, with Turing support, tools and library changes