Data

Video-to-video synthesis method: A GAN by NVIDIA & MIT CSAIL is now Open source

1 min read

Nvidia and the MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) have open-sourced their video-to-video synthesis model. A generative adversarial learning framework is used as a method to generate high-resolution, photorealistic and temporally coherent results with various input format, including segmentation masks, sketches and poses.

There has been less research into video to video synthesis compared to image to image translation. Video to video synthesis aims to solve the problem of low visual quality and incoherency of video results in existing image synthesis approach. The research group proposed a novel video-to-video synthesis approach capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long.

An extensive experimental validation was performed on various datasets by the authors and the model showed better results than existing approaches in quantitative and qualitative perspectives. When this method was extended to multimodal video synthesis with identical input data, it produced new visual properties with high resolution and coherency.

Researchers suggested the model may be improved in the future by adding additional 3D cues such as depth maps to better synthesize turning cars. We can use object tracking to ensure an object maintains its colour and appearance throughout the video; and training with coarser semantic labels to solve issues in semantic manipulation.

The Video-to-Video Synthesis paper is on arxiv, the team’s model and data can be found on the Github page.

Read Next:

NVIDIA shows off GeForce RTX, real-time raytracing GPUs, as the holy grail of computer graphics to gamers

Nvidia unveils new Turing architecture: “The world’s first ray tracing GPU”

Baidu announces ClariNet, neural network for text-to-speech synthesis

Fatema Patrawala

Being a Senior Content Marketing Editor at Packt Publishing, I handle vast array of content in the tech space ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development. With prior experience and understanding of Marketing I aspire to grow leaps and bounds in the Content & Digital Marketing field. On the personal front I am an ambivert and love to read inspiring articles and books on life and in general.

Share
Published by
Fatema Patrawala

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago