Wouldn’t it be magical if we could watch old black and white movie footages and images in color? Deep learning, more precisely, GANs can help here. A recent approach by a software researcher Jason Antic tagged as ‘DeOldify’ is a deep learning based project for colorizing and restoring old images and film footages.
— John Breslin (@johnbreslin) May 12, 2019
"These happy ladies are salvaging wood from the military barracks in Cork after it had been destroyed by fire…"
Photographer: W.D. Hogan
Date: 1921?@NLIreland Ref.: HOGW 140
— John Breslin (@johnbreslin) May 17, 2019
In one of the sessions at the recent Facebook Developer Conference held from April 30 – May 1, 2019, Antic, along with Jeremy Howard, and Uri Manor talked about how by using GANs one can reconstruct images and videos, such as increasing their resolution or adding color to a black and white film. However, they also pointed out that GANs can be slow, and difficult and expensive to train. They demonstrated how to colorize old black & white movies and drastically increase the resolution of microscopy images using new PyTorch-based tools from fast. ai, the Salk Institute, and DeOldify that can be trained in just a few hours on a single GPU.
— Jason Antic (@citnaj) May 2, 2019
DeOldify makes use of a NoGAN training, which combines the benefits of GAN training (wonderful colorization) while eliminating the nasty side effects (like flickering objects in the video). NoGAN training is crucial while getting some images or videos stable and colorful. An example of DeOldify trying to achieve a stable video is as follows:
Antic said, “the video is rendered using isolated image generation without any sort of temporal modeling tacked on. The process performs 30-60 minutes of the GAN portion of “NoGAN” training, using 1% to 3% of Imagenet data once. Then, as with still image colorization, we “DeOldify” individual frames before rebuilding the video.”
The three models in DeOldify
DeOldify includes three models including video, stable and artistic. Each of the models has its strengths and weaknesses, and their own use cases. The Video model is for video and the other two are for images.
Serbian-American inventor, electrical engineer, mechanical engineer, and futurist who is best known for his contributions to the design of the modern alternating current (AC) electricity supply system@citnaj pic.twitter.com/gQQv3vsp7i
— John Breslin (@johnbreslin) May 10, 2019
This model achieves the best results with landscapes and portraits and produces fewer zombies (where faces or limbs stay gray rather than being colored in properly). It generally has less unusual miscolorations than artistic, but it’s also less colorful in general.
This model uses a resnet101 backbone on a UNet with an emphasis on width of layers on the decoder side. This model was trained with 3 critic pretrain/GAN cycle repeats via NoGAN, in addition to the initial generator/critic pretrain/GAN NoGAN training, at 192px. This adds up to a total of 7% of Imagenet data trained once (3 hours of direct GAN training).
"We believe these are members of the Dobbyn family who lived at Leoville (a big house?) in Waterford. Miss Dobbyn ordered the photograph.
Photographer: Poole Photographic Studios, Waterford
Date: Circa 1892??"#Colourised #DeOldify @NLIreland
— John Breslin (@johnbreslin) May 17, 2019
This model achieves the highest quality results in image coloration, with respect to interesting details and vibrance. However, in order to achieve this, one has to adjust the rendering resolution or render_factor. Additionally, the model does not do as well as ‘stable’ in a few key common scenarios- nature scenes and portraits.
Artistic model uses a resnet34 backbone on a UNet with an emphasis on depth of layers on the decoder side. This model was trained with 5 critic pretrain/GAN cycle repeats via NoGAN, in addition to the initial generator/critic pretrain/GAN NoGAN training, at 192px. This adds up to a total of 32% of Imagenet data trained once (12.5 hours of direct GAN training).
I have to say- this turned out way better than I expected given just how difficult the source video is to work with! Here it is- Arrival of a Train at La Ciotat in 1885, colorized by DeOldify. pic.twitter.com/MNAuOXwcMY
— Jason Antic (@citnaj) May 4, 2019
The Video model is optimized for smooth, consistent and flicker-free video. This would definitely be the least colorful of the three models; while being almost close to the ‘stable’ model. In terms of architecture, this model is the same as “stable”; however, differs in training. It’s trained for a mere 2.2% of Imagenet data once at 192px, using only the initial generator/critic pretrain/GAN NoGAN training (1 hour of direct GAN training).
DeOldify was achieved by combining certain approaches including:
Self-Attention Generative Adversarial Network: Here, Antic has modified the generator, a pre-trained U-Net, to have the spectral normalization and self-attention.
Two Time-Scale Update Rule: It’s just one to one generator/critic iterations and higher critic learning rate. This is modified to incorporate a “threshold” critic loss that makes sure that the critic is “caught up” before moving on to generator training. This is particularly useful for the “NoGAN” method.
NoGAN doesn’t have a separate research paper. This, in fact, is a new type of GAN training developed to solve some key problems in the previous DeOldify model. NoGAN includes the benefits of GAN training while spending minimal time doing direct GAN training.
Antic says, “I’m looking to make old photos and film look reeeeaaally good with GANs, and more importantly, make the project useful.” “I’ll be actively updating and improving the code over the foreseeable future. I’ll try to make this as user-friendly as possible, but I’m sure there’s going to be hiccups along the way”, he further added.
To further know about the hardware components and other details head over to Jason Antic’s GitHub page.