Data

Diversity in Faces: IBM Research’s new dataset to help build facial recognition systems that are fair

2 min read

IBM research has released ‘Diversity in Faces’ (DiF) dataset which will help build better and diverse facial recognition systems by ensuring fairness. The DiF provides a dataset of annotations of 1 million human facial images. This dataset was built using publicly available images from the YFCC-100M Creative Commons data set.

Building facial recognition systems that meet fairness expectations, has been a long-standing goal for AI researchers. Most AI systems learn through datasets. If not trained with robust and diverse data sets, accuracy and fairness are at risk. For that reason, AI developers and the research community need to be thoughtful about what data they use for training. With the new DiF dataset, IBM researchers are building a strong, fair, and diverse dataset.

The DiF data set does not just measure different faces by age, gender, and skin tone. It also looks at other intrinsic facial features that include craniofacial distances, areas and ratios, facial symmetry and contrast, subjective annotations, and pose and resolution.

IBM annotated the faces using 10 well-established and independent coding schemes from the scientific literature. These 10 coding schemes were selected based on their strong scientific basis, computational feasibility, numerical representation, and interpretability.

Through thorough statistical analysis, IBM researchers found that the DiF dataset provided a more balanced distribution and broader coverage of facial images compared to previous datasets. Their analysis of the 10 initial coding schemes also provided them with an understanding of what is important for characterizing human faces.

In the future, they plan to use Generative Adversarial Networks (GANs) to possibly generate faces of any variety to synthesize training data as needed. They will also find ways (and encourage others as well) to improve on the initial ten coding schemes and add new ones.

You can request access to the DiF dataset on IBM website. You can also read the research paper for more information.

Read Next

Admiring the many faces of Facial Recognition with Deep Learning

Facebook introduces fully convolutional speech recognition approach and open sources wav2letter++ and flashlight

AWS updates the face detection, analysis and recognition capabilities in Amazon Rekognition

Sugandha Lahoti

Content Marketing Editor at Packt Hub. I blog about new and upcoming tech trends ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development.

Share
Published by
Sugandha Lahoti

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago