Last month, Nathan Egge, a Senior Research Engineer at Mozilla explained technical details behind AV1 in depth at the Mile High Video Workshop in Denver. AV1 is a new open source royalty-free video codec that promises to help companies and individuals transmit high-quality video over the internet efficiently.
AV1 is developed by the Alliance for Open Media (AOMedia), an association of firms from the semiconductor industry, video on demand providers, and web browser developers, founded in 2015. Mozilla joined AOMedia as a founding member.
AV1 was created for a broad set of industry use cases such as video on demand/streaming, video conferencing, screen sharing, video game streaming, and broadcast. It is widely supported and adopted and gives at least 30% better than current generation video codecs.
The alliance was able to hit a key milestone with the release of AV1 1.0.0 specification earlier this year in June. The codec has seen increasing interest from various companies, for instance, YouTube launched AV1 Beta Playlist in September.
The following diagram shows the various stages in the working of a video codec:
We will cover the tools and algorithm used in some of these stages. Let’s see some of its technical details from Egge’s talk:
Profiles specify the bit depth and subsampling formats supported. In AV1 there are three profiles: Main, High, and Professional which differ in terms of their bit-depth and chroma subsampling. The following table shows their bit-depth and chroma subsampling:
|8-bit and 10-bit
|8-bit and 10-bit
|8-bit, 10-bit, and 12-bit
|4:0:0, 4:2:0, and 4:4:4
|4:0:0, 4:2:0, 4:2:2, and 4:4:4
In VP9 there is a concept of superframes that at some point becomes complicated. Superframes allows you to consolidate multiple coded frames into one single chunk.
AV1 comes with high-level syntax that includes: sequence header, frame header, tile group, and tiles. Sequence header starts a video stream, frame headers are at the beginning of a frame, a tile group is an independent group of tiles, and finally, we have tiles which can be independently decoded.
Multi-symbol entropy coder
Unlike VP9, which uses a tree-based boolean non-adaptive binary arithmetic encoder to encode all syntax elements, AV1 uses a symbol-to-symbol adaptive multi-symbol arithmetic coder. Each of its syntax element is a member of a specific alphabet of N elements, and a context is a set of N probabilities together with a count to facilitate fast early adaptation.
In addition to DCT and ADST transform types, AV1 introduces two other transforms called flipped ADST and identity transform as extended transform types. Identity transform enables you to effectively code residual blocks with edges and lines. AV1 thus comes with the advantage of a total of sixteen horizontal and vertical transform type combinations.
Intra prediction modes
Along with the 8 main directional modes from VP9, up to 56 more directions are added but not all of them are available at smaller sizes. The following are some of the prediction modes introduced in AV1:
- Smooth H + V modes allow you to smoothly interpolate between values in the left column and last value in the above row.
- Palette mode is introduced to the intra coder as a general extra coding tool. It will be especially useful for artificial videos like screen capture and games, where blocks can be approximated by a small number of unique colors.
The palette predictor for each plane of a block is depicted by:
- A color palette, with 2 to 8 colors
- Color indices for all pixels in the block
- Chroma from Luma (CfL) is a chroma-only intra predictor that models chroma pixels as a linear function of coincident reconstructed luma pixels.
First, the reconstructed luma pixels are subsampled into the chroma resolution, and then the DC component is removed to form the AC contribution. In order to approximate chroma AC component from the AC contribution, instead of requiring the decoder to imply scaling parameters, CfL determines the parameters based on the original chroma pixels and signals them in the bitstream. As a result, this reduces decoder complexity and yields more precise predictions. As for the DC prediction, it is computed using intra DC mode, which is sufficient for most chroma content and has mature fast implementations.
Constrained Directional Enhancement Filter (CDEF)
CDEF is a detail-preserving deringing filter, which is designed to be applied after deblocking. It works by estimating edge directions followed by applying a non-separable non-linear low-pass directional filter of size 5×5 with 12 non-zero weights. In order to avoid extra signaling, the decoder uses a normative fast search algorithm to compute the direction per 8×8 block that minimizes the quadratic error from a perfect directional pattern.
Film Grain Synthesis
In AV1, film grain synthesis is a normative post-processing applied outside of the encoding/decoding loop. Film grain is abundant in TV and movie content, which needs to be preserved while encoding. But, its random nature makes it difficult to compress with traditional coding tools.
In film grain synthesis, the grain is removed from the content before compression, its parameters are estimated and then sent in the AV1 bitstream. The grain is then synthesized based on the received parameters and added to the reconstructed video. For grainy content, film grain synthesis significantly reduces the bitrate necessary to reconstruct the grain with sufficient quality.
You can watch Into the Depths The Technical Details behind AV1 by Nathan Egge on YouTube: