The Cambridge Analytica scandal and ethics in data science

Earlier this month, Stack Overflow published the results of its 2018 developer survey. In it, there was an interesting set of questions around the concept of 'ethical code'. The main takeaway was ultimately that the area remains a gray area. The Cambridge Analytica scandal, however, has given the issue of 'ethical code' a renewed urgency in the last couple of days. The data analytics company are alleged to have not only been involved in votes in the UK and US, but also of harvesting copious amounts of data from Facebook (illegally).

For whistleblower Christopher Wylie, the issue of ethical code is particularly pronounced. “I created Steve Bannon’s psychological mindfuck tool” he told Carole Cadwalladr in an interview in the Guardian.

Cambridge Analytica: psyops or just market research?

Wylie is a data scientist whose experience over the last half a decade or so has been impressive. It’s worth noting however, that Wylie’s career didn’t begin in politics. His academic career was focused primarily on fashion forecasting. That might all seem a little prosaic, but it underlines the fact that data science never happens in a vacuum. Data scientists always operate within a given field. It might be tempting to view the world purely through the prism of impersonal data and cold statistics. To a certain extent you have to if you’re a data scientist or a statistician. But at the very least this can be unhelpful; at worst a potential threat to global democracy.

At one point in the interview Wylie remarks that:

...it’s normal for a market research company to amass data on domestic populations. And if you’re working in some country and there’s an auxiliary benefit to a current client with aligned interests, well that’s just a bonus.

This is potentially the most frightening thing. Cambridge Analytica’s ostensible role in elections and referenda isn’t actually that remarkable. For all the vested interests and meetings between investors, researchers and entrepreneurs, the scandal is really just the extension of data mining and marketing tactics employed by just about every organization with a digital presence on the planet.

Data scientists are always going to be in a difficult position. True, we're not all going to end up working alongside Steve Bannon. But your skills are always being deployed with a very specific end in mind. It’s not always easy to see the effects and impact of your work until later, but it’s still essential for data scientists and analysts to be aware of whose data is being collected and used, how it’s being used and why.

Who is responsible for the ethics around data and code?

There was another interesting question in the Stack Overflow survey that's relevant to all of this. The survey asked respondents who was ultimately most responsible for code that accomplishes something unethical. 57.5% claimed upper management were responsible, 22.8% said the person who came up with the idea, and 19.7% said it was the responsibility of the developer themselves.

Clearly the question is complex. The truth lies somewhere between all three. Management make decisions about what’s required from an organizational perspective, but the engineers themselves are, of course, a part of the wider organizational dynamic. They should be in a position where they are able to communicate any personal misgivings or broader legal issues with the work they are being asked to do.

The case of Wylie and Cambridge Analytica is unique, however. But it does highlight that data science can be deployed in ways that are difficult to predict. And without proper channels of escalation and the right degree of transparency it's easy for things to remain secretive, hidden in small meetings, email threads and paper trails. That's another thing that data scientists need to remember. Office politics might be a fact of life, but when you're a data scientist you're sitting on the apex of legal, strategic and political issues. To refuse to be aware of this would be naive.

What the Cambridge Analytica story can teach data scientists

But there's something else worth noting. This story also illustrates something more about the world in which data scientists are operating. This is a world where traditional infrastructure is being dismantled. This is a world where privatization and outsourcing is viewed as the route towards efficiency and 'value for money'. Whether you think that’s a good or bad thing isn’t really the point here. What’s important is that it makes the way we use data, even the code we write more problematic than ever because it’s not always easy to see how it’s being used. Arguably Wylie was naive. His curiosity and desire to apply his data science skills to intriguing and complex problems led him towards people who knew just how valuable he could be.

Wylie has evidently developed greater self-awareness. This is perhaps the main reason why he has come forward with his version of events. But as this saga unfolds it’s worth remembering the value of data scientists in the modern world - for a range of organizations. It’s made the concept of the 'citizen data scientist' take on an even more urgent and literal meaning. Yes data science can help to empower the economy and possibly even toy with democracy. But it can also be used to empower people, improve transparency in politics and business.

If anything, the Cambridge Analytica saga proves that data science is a dangerous field - not only the sexiest job of the twenty-first century, but one of the most influential in shaping the kind of world we're going to live in. That's frightening, but it's also pretty exciting.