1 min read

Nvidia and the MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) have open-sourced their video-to-video synthesis model. A generative adversarial learning framework is used as a method to generate high-resolution, photorealistic and temporally coherent results with various input format, including segmentation masks, sketches and poses.

There has been less research into video to video synthesis compared to image to image translation. Video to video synthesis aims to solve the problem of low visual quality and incoherency of video results in existing image synthesis approach. The research group proposed a novel video-to-video synthesis approach capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long.

An extensive experimental validation was performed on various datasets by the authors and the model showed better results than existing approaches in quantitative and qualitative perspectives. When this method was extended to multimodal video synthesis with identical input data, it produced new visual properties with high resolution and coherency.

Researchers suggested the model may be improved in the future by adding additional 3D cues such as depth maps to better synthesize turning cars. We can use object tracking to ensure an object maintains its colour and appearance throughout the video; and training with coarser semantic labels to solve issues in semantic manipulation.

The Video-to-Video Synthesis paper is on arxiv, the team’s model and data can be found on the Github page.

Read Next:

NVIDIA shows off GeForce RTX, real-time raytracing GPUs, as the holy grail of computer graphics to gamers

Nvidia unveils a new Turing architecture: “The world’s first ray tracing GPU”

Baidu announces ClariNet, a neural network for text-to-speech synthesis


Subscribe to the weekly Packt Hub newsletter. We'll send you the results of our AI Now Survey, featuring data and insights from across the tech landscape.

Being a Senior Content Marketing Editor at Packt Publishing, I handle vast array of content in the tech space ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development. With prior experience and understanding of Marketing I aspire to grow leaps and bounds in the Content & Digital Marketing field. On the personal front I am an ambivert and love to read inspiring articles and books on life and in general.

LEAVE A REPLY

Please enter your comment!
Please enter your name here