NeurIPS 2018: Rethinking transparency and accountability in machine learning

Key takeaways from the discussion

To solve problems with machine learning, you must first understand them.

Different people or groups of people are going to define a problem in a different way. So, we shouldn't believe that the way we want to frame the problem computationally is the right way.

If we allow that our systems include people and society, it is clear that we have to help negotiate values, not simply define them.

Last week, at the 32nd NeurIPS 2018 annual conference, Nitin Koli, Joshua Kroll, and Deirdre Mulligan presented the common pitfalls we see when studying the human side of machine learning.

Machine learning is being used in high-impact areas like medicine, criminal justice, employment, and education for making decisions. In recent years, we have seen that this use of machine learning and algorithmic decision making have resulted in unintended discrimination. It’s becoming clear that even models developed with the best of intentions may exhibit discriminatory biases and perpetuate inequality.

Although researchers have been analyzing how to put concepts like fairness, accountability, transparency, explanation, and interpretability into practice in machine learning, properly defining these things can prove a challenge. Attempts have been made to define them mathematically, but this can bring new problems. This is because applying mathematical logic to human concepts that have unique and contested political and social dimensions necessarily has blind spots - every point of contestation can’t be integrated into a single formula. In turn, this can cause friction with other disciplines as well as the public.

Based on their research on what various terms mean in different contexts, Nitin Koli, Joshua Krill, and Deirdre Mulligan drew out some of the most common misconceptions machine learning researchers and practitioners hold.

Sociotechnical problems

To find a solution to a particular problem, data scientists need precise definitions. But how can we verify that these definitions are correct? Indeed, many definitions will be contested, depending on who you are and what you want them to mean. A definition that is fair to you will not necessarily be fair to me”, remarks Mr. Kroll.

Mr. Kroll explained that while definitions can be unhelpful, they are nevertheless essential from a mathematical perspective. This means there appears to be an unresolved conflict between concepts and mathematical rigor.

But there might be a way forward. Perhaps it’s wrong to simply think in this dichotomy of logical rigor v. the messy reality of human concepts.

One of the ways out of this impasse is to get beyond this dichotomy. Although it’s tempting to think of the technical and mathematical dimension on one side, with the social and political aspect on the other, we should instead see them as intricately related. They are, Kroll suggests, socio-technical problems.

Kroll goes on to say that we cannot ignore the social consequences of machine learning: “Technologies don’t live in a vacuum and if we pretend that they do we kind of have put our blinders on and decided to ignore any human problems.”

Fairness in machine learning

In the real world, fairness is a concept directly linked to processes. Think, for example, of the voting system. Citizens cast votes to their preferred candidates and the candidate who receives the most support is elected. Here, we can say that even though the winning candidate was not the one a candidate voted for, but at least he/she got the chance to participate in the process. This type of fairness is called procedural fairness.

However, in the technical world, fairness is often viewed in a subtly different way. When you place it in a mathematical context, fairness centers on outcome rather than process.

Kohli highlighted that trade offs between these different concepts can’t be avoided. They’re inevitable. A mathematical definition of fairness places a constraint over the behavior of a system, and this constraint will narrow down the cause of models that can satisfy these conditions.

So, if we decide to add too many fairness constraints to the system, some of them will be self-contradictory.

One more important point machine learning practitioners should keep in mind is that when we talk about the fairness of a system, that system isn’t a self-contained and coherent thing. It is not a logical construct - it’s a social one. This means there are a whole host of values, ideas, and histories that have an impact on its reality..

In practice, this ultimately means that the complexity of the real world from which we draw and analyze data can have an impact on how a model works. Kohli explained this by saying, “it doesn’t really matter... whether you are building a fair system if the context in which it is developed and deployed in is fundamentally unfair.”

Accountability in machine learning

Accountability is ultimately about trust. It’s about the extent you can be sure you know what is ‘true’ about a system. It refers to the fact that you know how it works and why it does things in certain ways. In more practical terms, it’s all about invariance and reliability.

To ensure accountability inside machine learning models, we need to follow a layered model. The bottom layer is an accounting or recording layer, that keeps track of what a given system is doing and the ways in which it might have been changed..

The next layer is a more analytical layer. This is where those records on the bottom layer are analyzed, with decisions made about performance - whether anything needs to be changed and how they should be changed.

The final and top-most layer is about responsibility. It’s where the proverbial buck stops - with those outside of the algorithm, those involved in its construction. “Algorithms are not responsible, somebody is responsible for the algorithm,” explains Kroll.

Transparency

Transparency is a concept heavily tied up with accountability. Arguably you have no accountability without transparency.

The layered approach discussed above should help with transparency, but it’s also important to remember that transparency is about much more than simply making data and code available. Instead, it demands that the decisions made in the development of the system are made available and clear too. Mr. Kroll emphasizes, “to the person at the ground-level for whom the decisions are being taken by some sort of model, these technical disclosures aren’t really useful or understandable.”

Explainability

In his paper Explanation in Artificial Intelligence: Insights from the Social Sciences, Tim Miller describes what is explainable artificial intelligence.

According to Miller, explanation takes many forms such as causal, contrastive, selective, and social. Causal explanation gives reasons behind why something happened, for example, while contrastive explanations can provide answers to questions like“Why P rather than not-P?".

But the most important point here is that explanations are selective. An explanation cannot include all reasons why something happened; explanations are always context-specific, a response to a particular need or situation.

Think of it this way: if someone asks you why the toaster isn’t working, you could just say that it’s broken. That might be satisfactory in some situations, but you could, of course, offer a more substantial explanation, outlining what was technically wrong with the toaster, how that technical fault came to be there, how the manufacturing process allowed that to happen, how the business would allow that manufacturing process to make that mistake… you could, of course, go on and on.

Data is not the truth

Today, there is a huge range of datasets available that will help you develop different machine learning models. These models can be useful, but it’s essential to remember that they are models. A model isn’t the truth - it’s an abstraction, a representation of the world in a very specific way.

One way of taking this fact into account is the concept of ‘construct validity’. This sounds complicated, but all it really refers to is the extent to which a test - say a machine learning algorithm - actually measures what it says it’s trying to measure. The concept is widely used in disciplines like psychology, but in machine learning, it simply refers to the way we validate a model based on its historical predictive accuracy.

In a nutshell, it’s important to remember that just as data is an abstraction of the world, models are also an abstraction of the data. There’s no way of changing this, but having an awareness that we’re dealing in abstractions ensures that we do not lapse into the mistake of thinking we are in the realm of ‘truth’.

To build a fair(er) systems will ultimately require an interdisciplinary approach, involving domain experts working in a variety of fields. If machine learning and artificial intelligence is to make a valuable and positive impact in fields such as justice, education, and medicine, it’s vital that those working in those fields work closely with those with expertise in algorithms. This won’t fix everything, but it will be a more robust foundation from which we can begin to move forward.

You can watch the full talk on the Facebook page of NeurIPS.

Researchers unveil a new algorithm that allows analyzing high-dimensional data sets more effectively, at NeurIPS conference

Accountability and algorithmic bias: Why diversity and inclusion matters [NeurIPS Invited Talk]

NeurIPS 2018: A quick look at data visualization for Machine learning by Google PAIR researchers [Tutorial]