Data

iCAN module uses faster R-CNN for detecting Human-Object Interaction

2 min read

Researchers from Virginia Tech, Chen Gao, Yuliang Zou, and Jia-Bin Huang, recently published a paper on ‘iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection.’ In it, they propose an ‘instance-centric attention module’ (iCAN) for human-object interaction detection. This module uses an incredibly fast regional convolutional neural network (R-CNN), which, in turn, is much more effective in identifying and understanding the human-object interaction.

In order to understand the situation in a scene or an image, computers need to recognize how humans interact with surrounding objects. This can be done using human-object interaction, localizes a person and an object, and then well as identifies the relationship – or interaction – between them.

The core idea of this research is that an image of a person or an object contains informational cues on the most relevant parts of an image for an algorithm to attend to – this means making predictions should be easier.

To exploit this cue, researchers propose an instance-centric attention module that learns to dynamically highlight regions in an image conditioned on the appearance of each instance. Thus, this network allows to selectively aggregate features relevant for recognizing human-object interactions. The researchers validated the efficacy of the proposed network using the COCO and HICO-DET datasets and showed that this approach compares favorably with the state-of-the-art.

iCAN module

Highlights of the iCAN paper include:

  1. The researchers have introduced an instance-centric attention module that allows the network to dynamically highlight informative regions for improving HOI detection.
  2. They have also established a new state-of-the-art performance on two large-scale HOI benchmark datasets.
  3. They conducted a detailed ablation study and error analysis to identify the relative contributions of the individual components and quantify different types of errors.
  4. They also released the source code and pre-trained models to facilitate future research.

Advantages of the iCAN module

  • Unlike hand-designed contextual features based on pose, the entire image, or secondary regions, iCAN’s attention map is automatically learned and jointly trained with the rest of the networks for improving the performance.
  • On comparing with attention modules designed for image-level classification, the instance-centric attention map provides greater flexibility as it allows attending to different regions in an image depending on different object instances.

To know about iCAN in detail head on to the research paper.

Read Next

Build intelligent interfaces with CoreML using CNN [Tutorial]

CapsNet: Are Capsule networks the antidote for CNNs kryptonite?

A new Stanford artificial intelligence camera uses hybrid optical-electronic CNN for rapid decision making

 

 

Savia Lobo

A Data science fanatic. Loves to be updated with the tech happenings around the globe. Loves singing and composing songs. Believes in putting the art in smart.

Share
Published by
Savia Lobo
Tags: AI News

Recent Posts

Harnessing Tech for Good to Drive Environmental Impact

At Packt, we are always on the lookout for innovative startups that are not only…

2 months ago

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago