How Google’s DeepMind is creating images with artificial intelligence

2 min read

The research team at DeepMind have been using deep reinforcement learning agents to generate images as humans do. DeepMind’s AI Agents understand how digits, characters, and portraits are actually constructed instead of analyzing pixels that represent it on a screen. DeepMind’s AI agents interact with the computer paint program, placing strokes on digital canvas and changing the brush size, pressure and color.

How does DeepMind generate images?

  • As a part of the initial training process, the agent starts by drawing random strokes with no visible intent or structure. Following the reinforcement learning approach, the agent is then ‘rewarded’. This ‘encourages’ it to produce meaningful drawings.
  • To monitor the performance of the first network, DeepMind trained a second neural network, called the discriminator. This discriminator predicts whether a particular drawing was produced by the agent, or if it was sampled from a dataset of real photographs.
  • The painting agent is rewarded by how much it manages to “fool” the discriminator into thinking that the drawings are real.

Most importantly, DeepMind’s AI agents produce images by writing graphics programs to interact with a paint environment. This is different from how a GAN works where the generator in GAN setups directly output pixels.  Moreover, the model can also apply what it has learned on the simulated paint program to re-create characters in other similar environments. This is because the framework is interpretable in the sense that it produces a sequence of motions that control a simulated brush.

Training DeepMind AI agents

This agent was trained to generate images resembling MNIST digits: it was shown what the digits look like, but not how they are drawn. By attempting to generate images that fool the discriminator, the agent learned to control the brush and to maneuver it to fit the style of different digits.

This model was also trained to reproduce specific images on real datasets. When trained to paint celebrity faces, the agent is capable of capturing the main traits of the face, such as shape, tone, and hairstyle, much like a street artist would when painting a portrait with a limited number of brush strokes.

Source: DeepMind Blog

For further details on methodology and experimentation, read the research paper.


Please enter your comment!
Please enter your name here