NVIDIA has open sourced NVVL, a library that provides GPU accelerated video decoding for DL training.
Quick rundown of NVIDIA NVVL :
- The NVIDIA NVVL library uses hardware acceleration to load sequences of video frames to ease out the training of machine learning algorithms.
- It uses FFmpeg’s libraries to parse and read the compressed packets from video files and the video decoding hardware available on NVIDIA GPU. It can off-load and accelerate the decoding of these compressed packets, providing a ready-for-training tensor in GPU device memory.
- NVVL can additionally perform data augmentation while loading the frames. Frames can be scaled, cropped, and flipped horizontally using the GPUs dedicated texture mapping units.
- It significantly reduces the demands on the storage and I/O systems during training by using compressed video files instead of individual frame image files. Thereby saving upto 40X on storage space and bandwidth. Also reducing CPU load by 2X when training on video datasets.
- CUDA Toolkit. NVIDIA NVVL works well with versions 8.0 and above. It performs better with CUDA 9.0 or later.
- FFmpeg’s libavformat, libavcodec, libavfilter, and libavutil. These can be installed from source as in the example Dockerfiles or from the Ubuntu 16.04 packages libavcodec-dev libavfilter-dev libavformat-dev libavutil-dev.
NVIDIA has also provided a super-resolution example project which quantifies the performance advantage of using NVVL. When training this example project on a NVIDIA DGX-1, the CPU load when using NVVL was 50-60% of the load seen when using a normal dataloader for .png files.
For a complete list of details and code files, visit the NVIDIA Github.