Home Data News Diversity in Faces: IBM Research’s new dataset to help build facial recognition...

Diversity in Faces: IBM Research’s new dataset to help build facial recognition systems that are fair

January 30, 2019 - 6:19 am

2388

2 min read

IBM research has released ‘Diversity in Faces’ (DiF) dataset which will help build better and diverse facial recognition systems by ensuring fairness. The DiF provides a dataset of annotations of 1 million human facial images. This dataset was built using publicly available images from the YFCC-100M Creative Commons data set.

Building facial recognition systems that meet fairness expectations, has been a long-standing goal for AI researchers. Most AI systems learn through datasets. If not trained with robust and diverse data sets, accuracy and fairness are at risk. For that reason, AI developers and the research community need to be thoughtful about what data they use for training. With the new DiF dataset, IBM researchers are building a strong, fair, and diverse dataset.

The DiF data set does not just measure different faces by age, gender, and skin tone. It also looks at other intrinsic facial features that include craniofacial distances, areas and ratios, facial symmetry and contrast, subjective annotations, and pose and resolution.

IBM annotated the faces using 10 well-established and independent coding schemes from the scientific literature. These 10 coding schemes were selected based on their strong scientific basis, computational feasibility, numerical representation, and interpretability.

Through thorough statistical analysis, IBM researchers found that the DiF dataset provided a more balanced distribution and broader coverage of facial images compared to previous datasets. Their analysis of the 10 initial coding schemes also provided them with an understanding of what is important for characterizing human faces.

In the future, they plan to use Generative Adversarial Networks (GANs) to possibly generate faces of any variety to synthesize training data as needed. They will also find ways (and encourage others as well) to improve on the initial ten coding schemes and add new ones.

You can request access to the DiF dataset on IBM website. You can also read the research paper for more information.

Top 6 Cybersecurity Books from Packt to Accelerate Your Career

Your Quick Introduction to Extended Events in Analysis Services from Blog…

Logging the history of my past SQL Saturday presentations from Blog…

Storage savings with Table Compression from Blog Posts – SQLServerCentral

Daily Coping 31 Dec 2020 from Blog Posts – SQLServerCentral

Learning Essential Linux Commands for Navigating the Shell Effectively

Exploring the Strategy Behavioral Design Pattern in Node.js

How to integrate a Medium editor in Angular 8

Implementing memory management with Golang’s garbage collector

How to create sales analysis app in Qlik Sense using DAR…

Diversity in Faces: IBM Research’s new dataset to help build facial recognition systems that are fair

Read Next

MobilePro

datapro

Programming

Subscribe to our newsletter