Deep learning and 3D vision research have led to major developments in the field of robotics and computer graphics. However, there is a dearth of systems that allow easy loading of popular 3D datasets and get the 3D data across various representations converted into modern machine learning frameworks. To overcome this barrier, researchers at NVIDIA have developed a 3D deep learning library for PyTorch called ‘Kaolin’. Last week, the researchers published the details of Kaolin in a paper titled “Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research”.
Research efforts in #3D computer vision and #AI are on the rise. To accelerate 3D #deeplearning research, NVIDIA releases Kaolin as a PyTorch library. See how researchers use Kaolin to move 3D models into the realm of neural networks. https://t.co/7wyudMnO6o
— NVIDIA AI (@NvidiaAI) November 13, 2019
Kaolin provides an efficient implementation of all core modules that are required to build 3D deep learning applications. According to NVIDIA, Kaolin can slash the job of preparing a 3D model for deep learning from 300 lines of code down to just five.
Key features offered by Kaolin
- It supports all popular 3D representations like Polygon meshes, Pointclouds, Voxel grid, Signed distance functions, and Depth images.
- It enables complex 3D datasets to be loaded into machine-learning frameworks, irrespective of how they’re represented or will be rendered. It can be implemented in diverse fields for instance robotics, self-driving cars, medical imaging, and virtual reality.
- Kaolin has a suite of 3D geometric functions that allow manipulation of 3D content. Several rigid body transformations can be implemented in a variety of parameterizations like Euler angles, Lie groups, and Quaternions. It also permits differentiable image warping layers and also allows for 3D-2D projection, and 2D-3D back projection.
- Kaolin reduces the large overhead involved in file handling, parsing, and augmentation into a single function call and renders support to many 3D datasets like ShapeNet and PartNet. The access to all data is provided via extensions to the PyTorch Dataset and DataLoader classes which makes pre-processing and loading 3D data simple and intuitive.
Kaolin’s modular differentiable renderer
A differentiable renderer is a process that supplies pixels as a function of model parameters to simulate a physical imaging system. It also supplies derivatives of the pixel values with respect to those parameters. With an aim to allow users the easy use of popular differentiable rendering methods, Kaolin provides a flexible and modular differentiable renderer.
It defines an abstract base class called ‘DifferentiableRenderer’ which contains abstract methods for each component in a rendering pipeline. The abstract methods allowed in Kaolin include geometric transformations, lighting, shading, rasterization, and projection. It also supports multiple lighting, shading, projection, and rasterization modes.
One of the important aspects of any computer vision task is visualizing data. Kaolin delivers visualization support for all of computer vision representation types. It is implemented via lightweight visualization libraries such as Trimesh, and pptk for running time visualization.
The researchers say, “While we view Kaolin as a major step in accelerating 3D DL research, the efforts do not stop here. We intend to foster a strong open-source community around Kaolin, and welcome contributions from other 3D deep learning researchers and practitioners.” The researchers are hopeful that the 3D community will try out Kaolin, and contribute to its development.
Many developers have expressed interest in the Kaolin PyTorch Library.
NVIDIA just released Kaolin, A PyTorch Library for Accelerating 3D Deep Learning. They reported a 110X speed up for MeshCNN!! I am looking forward to using their implementation https://t.co/nq5h97QUDT pic.twitter.com/WYtHqcNMg7
— Rana Hanocka (@RanaHanocka) November 13, 2019
Wow, I am really excited about this! I see huge potential of 3D deep learning impacting VR and AR, now I have an accessible library to experiment with ideas! Are there GPU requirements and pretrained models available with this repo?
— Andrew Mendez (@AndrewMendez19) November 13, 2019