DeOldify: Colorising and restoring B&W images and videos using a NoGAN approach

Wouldn’t it be magical if we could watch old black and white movie footages and images in color? Deep learning, more precisely, GANs can help here. A recent approach by a software researcher Jason Antic tagged as ‘DeOldify’ is a deep learning based project for colorizing and restoring old images and film footages.

https://twitter.com/johnbreslin/status/1127690102560448513

https://twitter.com/johnbreslin/status/1129360541955366913

In one of the sessions at the recent Facebook Developer Conference held from April 30 - May 1, 2019, Antic, along with Jeremy Howard, and Uri Manor talked about how by using GANs one can reconstruct images and videos, such as increasing their resolution or adding color to a black and white film. However, they also pointed out that GANs can be slow, and difficult and expensive to train. They demonstrated how to colorize old black & white movies and drastically increase the resolution of microscopy images using new PyTorch-based tools from fast.ai, the Salk Institute, and DeOldify that can be trained in just a few hours on a single GPU.

https://twitter.com/citnaj/status/1123748626965114880

DeOldify makes use of a NoGAN training, which combines the benefits of GAN training (wonderful colorization) while eliminating the nasty side effects (like flickering objects in the video). NoGAN training is crucial while getting some images or videos stable and colorful. An example of DeOldify trying to achieve a stable video is as follows:

deoldify-colorising-and-restoring-bw-images-and-videos-using-a-nogan-approach-img-0

Source: GitHub

Antic said, “the video is rendered using isolated image generation without any sort of temporal modeling tacked on. The process performs 30-60 minutes of the GAN portion of "NoGAN" training, using 1% to 3% of Imagenet data once. Then, as with still image colorization, we "DeOldify" individual frames before rebuilding the video.”

The three models in DeOldify

DeOldify includes three models including video, stable and artistic. Each of the models has its strengths and weaknesses, and their own use cases. The Video model is for video and the other two are for images.

Stable

https://twitter.com/johnbreslin/status/1126733668347564034

This model achieves the best results with landscapes and portraits and produces fewer zombies (where faces or limbs stay gray rather than being colored in properly). It generally has less unusual miscolorations than artistic, but it's also less colorful in general.

This model uses a resnet101 backbone on a UNet with an emphasis on width of layers on the decoder side. This model was trained with 3 critic pretrain/GAN cycle repeats via NoGAN, in addition to the initial generator/critic pretrain/GAN NoGAN training, at 192px. This adds up to a total of 7% of Imagenet data trained once (3 hours of direct GAN training).

Artistic

https://twitter.com/johnbreslin/status/1129364635730272256

This model achieves the highest quality results in image coloration, with respect to interesting details and vibrance. However, in order to achieve this, one has to adjust the rendering resolution or render_factor. Additionally, the model does not do as well as ‘stable’ in a few key common scenarios- nature scenes and portraits.

Artistic model uses a resnet34 backbone on a UNet with an emphasis on depth of layers on the decoder side. This model was trained with 5 critic pretrain/GAN cycle repeats via NoGAN, in addition to the initial generator/critic pretrain/GAN NoGAN training, at 192px. This adds up to a total of 32% of Imagenet data trained once (12.5 hours of direct GAN training).

Video

https://twitter.com/citnaj/status/1124719757997907968

The Video model is optimized for smooth, consistent and flicker-free video. This would definitely be the least colorful of the three models; while being almost close to the ‘stable’ model. In terms of architecture, this model is the same as "stable"; however, differs in training. It's trained for a mere 2.2% of Imagenet data once at 192px, using only the initial generator/critic pretrain/GAN NoGAN training (1 hour of direct GAN training).

DeOldify was achieved by combining certain approaches including:

Self-Attention Generative Adversarial Network: Here, Antic has modified the generator, a pre-trained U-Net, to have the spectral normalization and self-attention.

Two Time-Scale Update Rule: It’s just one to one generator/critic iterations and higher critic learning rate. This is modified to incorporate a "threshold" critic loss that makes sure that the critic is "caught up" before moving on to generator training. This is particularly useful for the "NoGAN" method.

NoGAN doesn’t have a separate research paper. This, in fact, is a new type of GAN training developed to solve some key problems in the previous DeOldify model. NoGAN includes the benefits of GAN training while spending minimal time doing direct GAN training.

Antic says, “I'm looking to make old photos and film look reeeeaaally good with GANs, and more importantly, make the project useful.” “I'll be actively updating and improving the code over the foreseeable future. I'll try to make this as user-friendly as possible, but I'm sure there's going to be hiccups along the way”, he further added.

To further know about the hardware components and other details head over to Jason Antic’s GitHub page.

Training Deep Convolutional GANs to generate Anime Characters [Tutorial]

Sherin Thomas explains how to build a pipeline in PyTorch for deep learning workflows

Using deep learning methods to detect malware in Android Applications