Data

NVIDIA releases Kaolin, a PyTorch library to accelerate research in 3D computer vision and AI

3 min read

Deep learning and 3D vision research have led to major developments in the field of robotics and computer graphics. However, there is a dearth of systems that allow easy loading of popular 3D datasets and get the 3D data across various representations converted into modern machine learning frameworks. To overcome this barrier, researchers at NVIDIA have developed a 3D deep learning library for PyTorch called ‘Kaolin’. Last week, the researchers published the details of Kaolin in paper titled “Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research”.

Kaolin provides an efficient implementation of all core modules that are required to build 3D deep learning applications. According to NVIDIA, Kaolin can slash the job of preparing a 3D model for deep learning from 300 lines of code down to just five.

Key features offered by Kaolin

  • It supports all popular 3D representations like Polygon meshes, Pointclouds, Voxel grid, Signed distance functions, and Depth images.
  • It enables complex 3D datasets to be loaded into machine-learning frameworks, irrespective of how they’re represented or will be rendered. It can be implemented in diverse fields for instance robotics, self-driving cars, medical imaging, and virtual reality.
  • Kaolin has a suite of 3D geometric functions that allow manipulation of 3D content. Several rigid body transformations can be implemented in a variety of parameterizations like Euler angles, Lie groups, and Quaternions. It also permits differentiable image warping layers and also allows for 3D-2D projection, and 2D-3D back projection.
  • Kaolin reduces the large overhead involved in file handling, parsing, and augmentation into a single function call and renders support to many 3D datasets like ShapeNet and PartNet. The access to all data is provided via extensions to the PyTorch Dataset and DataLoader classes which makes pre-processing and loading 3D data simple and intuitive.

Kaolin’s modular differentiable renderer

A differentiable renderer is a process that supplies pixels as a function of model parameters to simulate a physical imaging system. It also supplies derivatives of the pixel values with respect to those parameters. With an aim to allow users the easy use of popular differentiable rendering methods, Kaolin provides a flexible and modular differentiable renderer.

It defines an abstract base class called ‘DifferentiableRenderer’ which contains abstract methods for each component in a rendering pipeline. The abstract methods allowed in Kaolin include geometric transformations, lighting, shading, rasterization, and projection. It also supports multiple lighting, shading, projection, and rasterization modes.

One of the important aspects of any computer vision task is visualizing data. Kaolin delivers visualization support for all of computer vision representation types. It is implemented via lightweight visualization libraries such as Trimesh, and pptk for running time visualization.

The researchers say, “While we view Kaolin as a major step in accelerating 3D DL research, the efforts do not stop here. We intend to foster a strong open-source community around Kaolin, and welcome contributions from other 3D deep learning researchers and practitioners.” The researchers are hopeful that the 3D community will try out Kaolin, and contribute to its development.

Many developers have expressed interest in the Kaolin PyTorch Library.

Read the research paper for more details about Kaolin’s roadmap. You can also check out NVIDIA’s official announcement.

Read Next

Facebook releases PyTorch 1.3 with named tensors, PyTorch Mobile, 8-bit model quantization, and more

Transformers 2.0: NLP library with deep interoperability between TensorFlow 2.0 and PyTorch, and 32+ pretrained models in 100+ languages

Introducing ESPRESSO, an open-source, PyTorch based, end-to-end neural automatic speech recognition (ASR) toolkit for distributed training across GPUs

Baidu adds Paddle Lite 2.0, new development kits, EasyDL Pro, and other upgrades to its PaddlePaddle deep learning platform

CNCF announces Helm 3, Kubernetes package manager and tool to manage charts and libraries

Vincy Davis

A born storyteller turned writer!

Share
Published by
Vincy Davis

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago