IBM Releases Dataset to Help Reduce Bias in Facial Recognition Systems

IBM wants to make facial recognition systems more fair and accurate.

The company just released a research paper along with a substantial dataset of 1 million images with intrinsic facial features including facial symmetry, skin color, age, and gender.

The tech giant hopes to use the Diversity in Faces (DiF) dataset to advance the study of diversity in facial recognition and further aid the development of the technology.

“Face recognition is a long-standing challenge in the field of Artificial Intelligence (AI),” the authors of the paper wrote. “However, with recent advances in neural networks, face recognition has achieved unprecedented accuracy, built largely on data-driven deep learning methods.”

Lead scientist at IBM, John Smith told CNBC that many prominent datasets lack balance and coverage of facial images.

“In order for the technology to advance it needs to be built on diverse training data,” he said. “The data does not reflect the faces we see in the world.”

Bias in facial recognition technology is an ongoing issue in the industry and tech companies are starting to take steps to address the problem. In December, Microsoft president, Brad Smith, wrote a company blog post outlining risks and potential abuses of facial recognition technology, including privacy, democratic freedoms, and discrimination.

The company also wrote that it is calling for new laws that regulate artificial intelligence software to prevent bias.

Joy Buolamwini, a researcher at the M.I.T. Media Lab, researched how biases affect artificial intelligence and found the technology misidentified the gender of darker-skinned women 35 percent of the time.

“You can’t have ethical A.I. that’s not inclusive,” Buolamwini said in the New York Times. “And whoever is creating the technology is setting the standards.”

IBM’s Diversity in Faces dataset is available to the public and researchers are urging others to build on this work.

“We selected a solid starting point by using one million publicly available face images and by implementing ten facial coding schemes,” they wrote in the paper. “We hope that others will find ways to grow the data set to include more faces.”