Data

Researchers input rabbit-duck illusion to Google Cloud Vision API and conclude it shows orientation-bias

2 min read

When last week, Janelle Shane, a Research Scientist in optics, fed the infamous rabbit-duck illusion example to the Google Cloud Vision API, it gave “rabbit” as a result. However, when the image was rotated at a different angle, the Google Cloud Vision API predicted a “duck”.

Inspired by this, Max Woolf, a data scientist at Buzzfeed, further tested and concluded that the result really varies based on the orientation of the image:

Google Cloud Vision provides pretrained API models that allow you to derive insights from input images. The API classifies images into thousands of categories, detects individual objects and faces within images, and reads printed words within images. You can also train custom vision models with AutoML Vision Beta.

Woolf used Python for rotating the image and get predictions from the API for each rotation. He built the animations with R, ggplot2, and gganimate. To render these animations he used ffmpeg. Many times, in deep learning, a model is trained using a strategy in which the input images are rotated to help the model better generalize. Seeing the results of the experiment, Woolf concluded, “I suppose the dataset for the Vision API didn’t do that as much / there may be an orientation bias of ducks/rabbits in the training datasets.

The reaction to this experiment was pretty torn. While many Reddit users felt that there might be an orientation bias in the model, others felt that as the image is ambiguous there is no “right answer” and hence there is no problem with the model.

One of the Redditor said, “I think this shows how poorly many neural networks are at handling ambiguity.” Another Redditor commented, “This has nothing to do with a shortcoming of deep learning, failure to generalize, or something not being in the training set. It’s an optical illusion drawing meant to be visually ambiguous. Big surprise, it’s visually ambiguous to computer vision as well. There’s not ‘correct’ answer, it’s both a duck and a rabbit, that’s how it was drawn. The fact that the Cloud vision API can see both is actually a strength, not a shortcoming.

Woolf has open-sourced the code used to generate this visualization on his GitHub page, which also includes CSV of the prediction results at every rotation. In case you are more curious, you can test the Cloud Vision API with the drag-and-drop UI provided by Google.

Read Next

Google Cloud security launches three new services for better threat detection and protection in enterprises

Generating automated image captions using NLP and computer vision [Tutorial]

Google Cloud Firestore, the serverless, NoSQL document database, is now generally available

 

Bhagyashree R

Share
Published by
Bhagyashree R
Tags: AI News

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago