2 min read

When last week, Janelle Shane, a Research Scientist in optics, fed the infamous rabbit-duck illusion example to the Google Cloud Vision API, it gave “rabbit” as a result. However, when the image was rotated at a different angle, the Google Cloud Vision API predicted a “duck”.

Inspired by this, Max Woolf, a data scientist at Buzzfeed, further tested and concluded that the result really varies based on the orientation of the image:

Google Cloud Vision provides pretrained API models that allow you to derive insights from input images. The API classifies images into thousands of categories, detects individual objects and faces within images, and reads printed words within images. You can also train custom vision models with AutoML Vision Beta.

Woolf used Python for rotating the image and get predictions from the API for each rotation. He built the animations with R, ggplot2, and gganimate. To render these animations he used ffmpeg. Many times, in deep learning, a model is trained using a strategy in which the input images are rotated to help the model better generalize. Seeing the results of the experiment, Woolf concluded, “I suppose the dataset for the Vision API didn’t do that as much / there may be an orientation bias of ducks/rabbits in the training datasets.

The reaction to this experiment was pretty torn. While many Reddit users felt that there might be an orientation bias in the model, others felt that as the image is ambiguous there is no “right answer” and hence there is no problem with the model.

One of the Redditor said, “I think this shows how poorly many neural networks are at handling ambiguity.” Another Redditor commented, “This has nothing to do with a shortcoming of deep learning, failure to generalize, or something not being in the training set. It’s an optical illusion drawing meant to be visually ambiguous. Big surprise, it’s visually ambiguous to computer vision as well. There’s not ‘correct’ answer, it’s both a duck and a rabbit, that’s how it was drawn. The fact that the Cloud vision API can see both is actually a strength, not a shortcoming.

Woolf has open-sourced the code used to generate this visualization on his GitHub page, which also includes a CSV of the prediction results at every rotation. In case you are more curious, you can test the Cloud Vision API with the drag-and-drop UI provided by Google.

Read Next

Google Cloud security launches three new services for better threat detection and protection in enterprises

Generating automated image captions using NLP and computer vision [Tutorial]

Google Cloud Firestore, the serverless, NoSQL document database, is now generally available