In a paper published last week, NVIDIA researchers come up with a way to generate photos that look like they were clicked with a camera. This is done via using generative adversarial networks (GANs).
An alternative architecture for GANs
Borrowing from style transfer literature, the researchers use an alternative generator architecture for GANs. The new architecture induces an automatically learned unsupervised separation of high-level attributes of an image. These attributes can be pose or identity of a person. Images generated via the architecture have some stochastic variation applied to them like freckles, hair placement etc. The architecture allows intuitive and scale-specific control of the synthesis to generate different variations of images.
Better image quality than a traditional GAN
This new generator is better than the state-of-the-art with respect to image quality, the images have better interpolation properties and disentangles the latent variation factors better. In order to quantify the interpolation quality and disentanglement, the researchers propose two new automated methods which are applicable to any generator architecture. They use a new high quality, highly varied data set with human faces.
With motivation from transfer literature, NVIDIA researchers re-design the generator architecture to expose novel ways of controlling image synthesis. The generator starts from a learned constant input and adjusts the style of an image at each convolution layer. It makes the changes based on the latent code thereby having direct control over the strength of image features across different scales. When noise is injected directly into the network, this architectural change causes automatic separation of high-level attributes in an unsupervised manner.
In other words, the architecture combines different images, their attributes from the dataset, applies some variations to synthesize images that look real.
As proven in the paper, surprisingly, the redesign of images does not compromise image quality but instead improves it considerably. In conclusion with other works, a traditional GAN generator architecture is inferior to a style-based design. Not only human faces but they also generate bedrooms, cars, and cats with this new architecture.
This synthetic image generation has generated excitement among the public.
A comment from Hacker News reads: “This is just phenomenal. Can see this being a fairly disruptive force in the media industry. Also, sock puppet factories could use this to create endless numbers of fake personas for social media astroturfing.”
Another comment reads: “The improvements in GANs from 2014 are amazing. From coarse 32×32 pixel images, we have gotten to 1024×1024 images that can fool most humans.”
Fake photographic images as evidence?
As a thread on Twitter suggests, can this be the end of photography as evidence? Not very likely, at least for the time being. For something to be considered as evidence, there are many poses, for example, a specific person doing a specific action. As seen from the results in tha paper, some cat images are ugly and deformed, far from looking like the real thing. Also “Our training time is approximately one week on an NVIDIA DGX-1 with 8 Tesla V100 GPUs” now that a setup that costs up to $70K.
Besides, some speculate that there will be bills in 2019 to control the use of such AI systems:
My sense is that a lot of folks are pondering this, and we may see bills on this in 2019.
— Bobby Chesney (@BobbyChesney) December 15, 2018
Even the big names in AI are noticing this paper:
— Ian Goodfellow (@goodfellow_ian) December 13, 2018
You can see a video showcasing the generated images on YouTube.