Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

CUDA 10.1 released with new tools, libraries, improved performance and more

Save for later
  • 120 min read
  • 2019-02-28 07:01:45

article-image
Yesterday, the team at NVIDIA released CUDA 10.1 with a new lightweight GEMM library, new functionalities and performance updates to existing libraries, and improvements to the CUDA Graphs APIs.

What’s new in CUDA 10.1?


Now there are new encoding and batched decoding functionalities in nvJPEG. This release also features faster performance for a broad set of random number generators in cuRAND. In this release, there is improved performance and support for fork/join kernels in CUDA Graphs APIs.

Compiler


In this release, the CUDA-C and CUDA-C++ compiler, nvcc, are found in the bin/ directory. They are built on top of the NVVM optimizer, which itself is built on top of the LLVM compiler infrastructure. Developers who are willing to target NVVM directly can do so by using the Compiler SDK, which is available in the nvvm/directory.

Tools


There are new development tools available in the bin/ directory including, few IDEs like nsight (Linux, Mac), Nsight VSE (Windows) and debuggers like cuda-memcheck, cuda-gdb (Linux), Nsight VSE (Windows). The tools also include a few profilers and utilities.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime

Libraries


This release comes with cuBLASLt, a new lightweight GEMM library with a flexible API and tensor core support for INT8 inputs and FP16 CGEMM split-complex matrix multiplication. CUDA 10.1 also features selective eigensolvers SYEVDX and SYGVDX in cuSOLVER. Few of the available utility libraries in the lib/ directory (DLLs on Windows are in bin/) are cublas (BLAS), cublas_device (BLAS Kernel Interface), cuda_occupancy (Kernel Occupancy Calculation [header file implementation]), etc.

To know more about this news in detail, check out the post by Nvidia.

Implementing color and shape-based object detection and tracking with OpenCV and CUDA [Tutorial]

ClojureCUDA 0.6.0 now supports CUDA 10

Stable release of CUDA 10.0 out, with Turing support, tools and library changes