20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017

Kate Crawford is a Principal Researcher at Microsoft Research and a Distinguished Research Professor at New York University. She has spent the last decade studying the social implications of data systems, machine learning, and artificial intelligence. Her recent publications address data bias and fairness, and social impacts of artificial intelligence among others.

This article attempts to bring our readers to Kate’s brilliant Keynote speech at NIPS 2017. It talks about different forms of bias in Machine Learning systems and the ways to tackle such problems. By the end of this article, we are sure you would want to listen to her complete talk on the NIPS Facebook page. All images in this article come from Kate's presentation slides and do not belong to us.

The rise of Machine Learning is every bit as far reaching as the rise of computing itself.

20-lessons-bias-machine-learning-systems-nips-2017-img-0

20-lessons-bias-machine-learning-systems-nips-2017-img-0

A vast new ecosystem of techniques and infrastructure are emerging in the field of machine learning and we are just beginning to learn their full capabilities. But with the exciting things that people can do, there are some really concerning problems arising. Forms of bias, stereotyping and unfair determination are being found in machine vision systems, object recognition models, and in natural language processing and word embeddings. High profile news stories about bias have been on the rise, from women being less likely to be shown high paying jobs to gender bias and object recognition datasets like MS COCO, to racial disparities in education AI systems.

20 lessons on bias in machine learning systems

Interest in the study of bias in ML systems has grown exponentially in just the last 3 years. It has more than doubled in the last year alone.

We are speaking different languages when we talk about bias. I.e., it means different things to different people/groups. Eg: in law, in machine learning, in geometry etc. Read more on this in the ‘What is bias?’ section below.

In the simplest terms, for the purpose of understanding fairness in machine learning systems, we can consider ‘bias’ as a skew that produces a type of harm.

Bias in MLaaS is harder to identify and also correct as we do not build them from scratch and are not always privy to how it works under the hood.

Data is not neutral. Data cannot always be neutralized. There is no silver bullet for solving bias in ML & AI systems.

There are two main kinds of harms caused by bias: Harms of allocation and harms of representation. The former takes an economically oriented view while the latter is more cultural.

Allocative harm is when a system allocates or withholds certain groups an opportunity or resource. To know more, jump to the ‘harms of allocation’ section.

When systems reinforce the subordination of certain groups along the lines of identity like race, class, gender etc., they cause representative harm. This is further elaborated in the ‘Harms of representation’ section.

Harm can further be classified into five types: stereotyping, recognition, denigration, under-representation and ex-nomination.

There are many technical approaches to dealing with the problem of bias in a training dataset such as scrubbing to neutral, demographic sampling etc among others. But they all still suffer from bias. Eg: who decides what is ‘neutral’.

When we consider bias purely as a technical problem, which is hard enough, we are already missing part of the picture. Bias in systems is commonly caused by bias in training data. We can only gather data about the world we have which has a long history of discrimination. So, the default tendency of these systems would be to reflect our darkest biases.

Structural bias is a social issue first and a technical issue second. If we are unable to consider both and see it as inherently socio-technical, then these problems of bias are going to continue to plague the ML field.

Instead of just thinking about ML contributing to decision making in say hiring or criminal justice, we also need to think of the role of ML in the harmful representation of human identity.

While technical responses to bias are very important and we need more of them, they won’t get us all the way to addressing representational harms to group identity. Representational harms often exceed the scope of individual technical interventions.

Developing theoretical fixes that come from the tech world for allocational harms is necessary but not sufficient. The ability to move outside our disciplinary boundaries is paramount to cracking the problem of bias in ML systems.

Every design decision has consequences and powerful social implications.

Datasets reflect not only the culture but also the hierarchy of the world that they were made in.

Our current datasets stand on the shoulder of older datasets building on earlier corpora.

Classifications can be sticky and sometimes they stick around longer than we intend them to, even when they are harmful.

ML can be deployed easily in contentious forms of categorization that could have serious repercussions. Eg: free-of-bias criminality detector that has Physiognomy at the heart of how it predicts the likelihood of a person being a criminal based on his appearance.

What is bias?

14th century: an oblique or diagonal line

16th century: undue prejudice

20th century: systematic differences between the sample and a population

In ML: underfitting (low variance and high bias) vs overfitting (high variance and low bias)

In Law: judgments based on preconceived notions or prejudices as opposed to the impartial evaluation of facts. Impartiality underpins jury selection, due process, limitations placed on judges etc. Bias is hard to fix with model validation techniques alone. So you can have an unbiased system in an ML sense producing a biased result in a legal sense.

Bias is a skew that produces a type of harm.

Where does bias come from?

Commonly from Training data. It can be incomplete, biased or otherwise skewed. It can draw from non-representative samples that are wholly defined before use. Sometimes it is not obvious because it was constructed in a non-transparent way. In addition to human labeling, other ways that human biases and cultural assumptions can creep in ending up in exclusion or overrepresentation of subpopulation. Case in point: stop-and-frisk program data used as training data by an ML system. This dataset was biased due to systemic racial discrimination in policing.

Harms of allocation

Majority of the literature understand bias as harms of allocation. Allocative harm is when a system allocates or withholds certain groups, an opportunity or resource. It is an economically oriented view primarily. Eg: who gets a mortgage, loan etc.

Allocation is immediate, it is a time-bound moment of decision making. It is readily quantifiable. In other words, it raises questions of fairness and justice in discrete and specific transactions.

Harms of representation

It gets tricky when it comes to systems that represent society but don't allocate resources. These are representational harms. When systems reinforce the subordination of certain groups along the lines of identity like race, class, gender etc.

It is a long-term process that affects attitudes and beliefs. It is harder to formalize and track. It is a diffused depiction of humans and society. It is at the root of all of the other forms of allocative harm.

5 types of allocative harms

20-lessons-bias-machine-learning-systems-nips-2017-img-1

Source: Kate Crawford’s NIPS 2017 Keynote presentation: Trouble with Bias

Stereotyping
- 2016 paper on word embedding that looked at Gender stereotypical associations and the distances between gender pronouns and occupations.
- Google translate swaps the genders of pronouns even in a gender-neutral language like Turkish

Recognition
- - When a group is erased or made invisible by a system
  - In a narrow sense, it is purely a technical problem. i.e., does a system recognize a face inside an image or video?
  - Failure to recognize someone’s humanity. In the broader sense, it is about respect, dignity, and personhood. The broader harm is whether the system works for you.
  - Eg: system could not process darker skin tones, Nikon’s camera s/w mischaracterized Asian faces as blinking, HP's algorithms had difficulty recognizing anyone with a darker shade of pale.

Denigration
- When people use culturally offensive or inappropriate labels
- Eg: autosuggestions when people typed ‘jews should’

Under-representation
- An image search of 'CEOs' yielded only one woman CEO at the bottom-most part of the page. The majority were white male.

ex-nomination

Technical responses to the problem of biases

Improve accuracy

Blacklist

Scrub to neutral

Demographics or equal representation

Awareness

Politics of classification

Where did identity categories come from? What if bias is a deeper and more consistent issue with classification?

20-lessons-bias-machine-learning-systems-nips-2017-img-4

Source: Kate Crawford’s NIPS 2017 Keynote presentation: Trouble with Bias

The fact that bias issues keep creeping into our systems and manifesting in new ways, suggests that we must understand that classification is not simply a technical issue but a social issue as well. One that has real consequences for people that are being classified. There are two themes:

Classification is always a product of its time

We are currently in the biggest experiment of classification in human history

Eg: labeled faces in the wild dataset has 77.5% men, and 83.5% white. An ML system trained on this dataset will work best for that group.

What can we do to tackle these problems?

Start working on fairness forensics
- Test our systems: eg: build pre-release trials to see how a system is working across different populations
- How do we track the life cycle of a training dataset to know who built it and what the demographics skews might be in that dataset

Start taking interdisciplinarity seriously
- Working with people who are not in our field but have deep expertise in other areas Eg: FATE (Fairness Accountability Transparency Ethics) group at Microsoft Research
- Build spaces for collaboration like the AI now institute.