3 min read

Yesterday the Financial Times reported that Microsoft has quietly deleted its facial recognition database. More than 10 million images that were reportedly being used by companies to test their facial recognition software has been deleted.

The database known as MS Celeb, was the largest public facial recognition dataset in the world. The data was amassed by scraping images off the web under a Creative Commons license that allows academic reuse of photos. According to Microsoft Research’s paper on the database, it was originally designed to train tools for image captioning and news video analysis.

The existence of this database was revealed by Adam Harvey, a Berlin-based artist and researcher. Harvey’s team investigates the ethics, origins, and individual privacy implications of face recognition image datasets and their role in the expansion of biometric surveillance technologies.

The Financial Times ran an in-depth investigation that revealed that giant tech companies like IBM and Panasonic, and Chinese firms such as SenseTime and Megvii, as well as military researchers, were using the massive database to test their facial recognition software. And now Microsoft has quietly taken MS Celeb down.

“The site was intended for academic purposes,” Microsoft told  FT.com, explaining that they had deleted it, because “it was run by an employee that is no longer with Microsoft and has since been removed.”

Microsoft itself has used the data set to train facial recognition algorithms, Mr Harvey’s investigation found.

The company named the data set “Celeb” to indicate that the faces it had scraped were photos of public figures. But Mr Harvey found that the dataset also included several arguably private individuals, including security journalists such as Kim Zetter, Adrian Chen and Shoshana Zuboff, the author of Surveillance Capitalism, and Julie Brill, the former FTC commissioner responsible for protecting consumer privacy.

“Microsoft has exploited the term ‘celebrity’ to include people who merely work online and have a digital identity,” said Mr Harvey. “Many people in the target list are even vocal critics of the very technology Microsoft is using their name and biometric information to build.”

Tech experts have also anticipated that Microsoft might have deleted the data due to the violation of the EU’s General Data Protection Law by continuing to distribute the MS Celeb dataset after the regulations came into effect last year.

But Microsoft said it was not aware of any GDPR implications and that the site had been retired “because the research challenge is over”.

Engadget also reported that after the FT‘s investigation, datasets built by researchers at Duke University and Stanford University were also taken down.

According to Fast Company, last year Microsoft’s president, Brad Smith, spoke about fears of such technology that is creeping into everyday life and eroding our civil liberties along the way. It also turned down a facial recognition contract with California law enforcement on human rights grounds. While it may claim it wants regulation for facial recognition, but it may also want to use facial recognition technology to sell items listed on its grocery app Kroger and has eluded privacy-related scrutiny for years.

Although the database has been deleted, it is still available to researchers and companies that had previously downloaded it. Once the dataset has been posted online, and people download it, it does exist with them.

And now that it is completely free from any licensing, rules or controls which Microsoft previously owned. People are posting it on GitHub, hosting the files on Dropbox and Baidu Cloud, and there is no way from stopping them to continue to post it and use it for their own purposes.

Read Next

Microsoft Build 2019: Microsoft showcases new updates to MS 365 platform with focus on AI and developer productivity

Microsoft open sources SPTAG algorithm to make Bing smarter!

Introducing Minecraft Earth, Minecraft’s AR-based game for Android and iOS users


Subscribe to the weekly Packt Hub newsletter. We'll send you the results of our AI Now Survey, featuring data and insights from across the tech landscape.

Being a Senior Content Marketing Editor at Packt Publishing, I handle vast array of content in the tech space ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development. With prior experience and understanding of Marketing I aspire to grow leaps and bounds in the Content & Digital Marketing field. On the personal front I am an ambivert and love to read inspiring articles and books on life and in general.