Data

MIT’s Transparency by Design Network: A high performance model that uses visual reasoning for machine interpretability

3 min read

A team of researchers from MIT Lincoln Laboratory’s Intelligence and Decision Technologies Group have created a neural network, named the Transparency by Design Network ( TbD-net).

This network is capable of performing human-like reasoning to respond to questions about the contents of images. The Transparency by Design model visually renders its thought process to solve problems, thereby, helping human analysts analyze its decision-making process.

The developers of Transparency by Design network built it with an aim to make the inner workings of the neural network transparent, meaning it focuses on finding out how the neural network works and thinks what it thinks.  One such example is finding out answers to questions like “what do the neural networks used in self-driving cars think the difference is between a pedestrian and stop sign?”, “when was the neural network able to come up with that difference?”, and so on.  Finding out these answers will help researchers teach the neural network to correct incorrect assumptions.

Other than that, Transparency by Design Network closing the gap between performance and interpretability, which is a common problem with today’s neural networks.

“Progress on improving performance in visual reasoning has come at the cost of interpretability,” says Ryan Soklaski, a TbD-net developer, as mentioned in the MIT blog post. The TbD-net comprises a collection of “modules,” which are small neural networks specialized to perform specific subtasks. So, whenever a visual-reasoning question is asked to TbD-net about an image, it first breaks down a question into subtasks, then assigns the appropriate module to fulfill its part.

According to Majumdar, another TbD-net developer, “Breaking a complex chain of reasoning into a series of smaller subproblems, each of which can be solved independently and composed, is a powerful and intuitive means for reasoning”.

After this, each module learns from the module before it and eventually produces the final, correct answer. Each module’s output is visually presented in an “attention mask” which shows heat-map blobs over objects within an image that the module considers an answer.

Overall, for the entire process, TbD-net uses AI techniques such as Adam optimization to interpret the human language questions and break these sentences into subtasks. It also uses multiple computer vision AI techniques like convolution neural networks that help interpret the imagery and uses visual reasoning to share its decision-making process.

When TbD-net was put to test, it achieved results surpassing the best-performing visual reasoning models. The model was evaluated using a visual question-answering dataset. This dataset consisted of 70,000 training images and 700,000 questions as well as test and validation sets of 15,000 images and 150,000 questions. The model managed to achieve a whopping 98.7 percent test accuracy on the dataset. But, the developers further improved this model’s result, achieving 99.1 % accuracy on the CLEVER dataset, with the help of regularization and increasing the spatial resolution.

The attention masks produced by the modules helped the researchers figure out what went wrong, thereby, helping them refine the model. This further resulted in a performance of 99.1 percent accuracy.

“Our model provides straightforward, interpretable outputs at every stage of the visual reasoning process,” says Mascharka.

For more information, be sure to check out the official research paper.

Read Next

Optical training of Neural networks is making AI more efficient

Diffractive Deep Neural Network (D2NN): UCLA-developed AI device can identify objects at the speed of light

MIT’s Duckietown Kickstarter project aims to make learning how to program self-driving cars affordable

Natasha Mathur

Tech writer at the Packt Hub. Dreamer, book nerd, lover of scented candles, karaoke, and Gilmore Girls.

Share
Published by
Natasha Mathur
Tags: AI News

Recent Posts

Harnessing Tech for Good to Drive Environmental Impact

At Packt, we are always on the lookout for innovative startups that are not only…

2 months ago

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago