Pitfalls of Big Data I: What do you care about?
We are seeing increasing use of machine learning in rather every sphere in life and I’ll discuss some of possible pit falls of the trend.
I will start with the rather obvious statement that “machines don’t decide what do you care about”. You decide what your goals are and the machine won’t do it for you. For example, you decide that you care about expected profit and hence design a trading system for that, but you can also care about other things like minimizing the maximum loss. When designing a program to sort job applicants you can decide to sort them by skills, retention, cultural fit or some combinations of these. These goals are also often implicit and we even forget that there can exist an alternative. For example, google maps implicitly optimizes for the time taken to reach a particular destination, but as Daniele Quercia talks about in this video (https://www.youtube.com/watch?v=AJg9SXIcPiM), there are other things you might want to optimize upon, like how pleasant it is to go through the path. Problem also arises because the decision has already been taken by someone else (the programmer), you don’t get to decide what do you care about it. There can be several examples of it, through its news feed, Facebook wants to increase “engagement” + its ad revenue, but you might want diversification of your world view. This partly a problem of misaligned incentives but it is also a “What do you care about” issue because Facebook can choose to care about diversification of media people consume or world peace or whatever.
There has been some recent discussion about the trolley problem in context of self driving car. For example, for a fully autonomous car, if there arise a situation where there are only two possibilities, save the passenger or save the bystanders, who should the car be designed to save? You can say that it should save the most number of people but of course one can come up with situations where this doesn’t appear to be the best answer, say president is in the car. You might invent some ever encompassing framework to solve the problem, but there will exist “equally” arguable frameworks which will lead to different answers. In this case the choice is difficult but explicit, but in most of the situations it will be implicit and the makers of the system will be making these choices without even thinking about them. And this is true for all technologies and not just machine learning as such.
Now, with machine learning people are recognizing that it isn’t always “fair”. For example face recognition usually works much poorly for black faces [1], than whites, there was seen racial bias in software giving risk scores to inmates [2] and google image search was shown to exaggerate occupational gender bias [3]. We can talk about why such problems arises and how to solve them, but first you have to decide that these are (also) the problems which you care about. Just like we have decided to care about profit, shortest time of arrival etc.
If you think about these things, things get very political very quickly. Do you care about privacy, do you care about people in low bandwidth countries, do you care about blind people accessing our website, do you care about black people going to jail more often etc etc. But things getting political is a good sign, because with recognizing something as political comes an acceptance of the fact there is no one answer to the question, as the answer depends on who you are, where are you coming from and what are the costs of caring you are willing to pay, and Who do you care about.
[1]: http://motherboard.vice.com/read/the-inherent-bias-of-facial-recognition
[2]: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
[3]: http://mjskay.com/papers/chi_2015_gender-bias-in-image-search.pdf
Next Post: Pitfalls of Big Data II: Bad Data