Big data is the current hype, the thing you need to do to find the best job in the world. I’ve started using machine learning tools a decade ago, and when I saw this book, it felt like it was answering some concerns I had. Let’s see what’s inside.
Content and opinions
The first two chapters set the environment of the discussions in the book. Start with the way a model works, why people trust them, why we want to create new ones, and then another chapter on why we should trust the author. She has indeed the background to understand models and the job experience to see first hands how a model can be used. Of course, something that is missing here is that lots of the elements of the book are happening in the US. Hopefully the EU will be smart and learn from the US mistakes (at least there are efforts to lower the amount of data Facebook and Google are agglomerating on users).
OK, let’s start with the first weapon, the one targeted at students. I was quite baffled that all this started with a random ranking from a low profile newspaper. And now, every university tries to maximize its reputation based on numbers, not on the actual quality of new students. If universities are really spending that much money on advertisement to the point of driving tuition fees sky-high (which is a future crisis in waiting, by the way!).
The second weapon is one we all know: online ad. Almost all websites survive one way or another with revenue from online advertisement. All websites are connected through links to social networks, ad agencies… and these companies churn out information, deduction based on this gigantic pile of data. If advertisement didn’t have an effect on people, there would be no market for it.
Moving out to something completely difference: justice. It is something that also happens in France. We have far right extremists that want to have stats (it is forbidden to have racial stats there) to show that some categories of the population need to be checked more often than others. It is indeed completely unfair and also the proof that we are targeting some types of crimes and not others. I found the way the weapon worked was clearly, from the start, skewed. How could anyone not see the problem?
Then let’s go on with even worse with getting a job. Or the chapter after about keeping the job. Both times, the main issue is that the WMD helps the companies maximize their profit and minimize their risk. There are two issues there: the first one, only sure prospects are going to be hired, and this is based on… stats accumulated through the years and they are racially biased. And when they have a job, the objective is not to optimize the happiness of the employee, even if doing so would enhance the profitability.
The next two are also related, credit and insurance. It is nice to see that credit scores started as a way to remove biased, it is terrible to see that we went back there and scores are now dictated by obscure tools. And then, they know even impact insurance, not to optimize one’s cost, but to optimize revenue for the insurance company. I believe in all having to pay the same amount and all having the same covering on things like health (not for driving, because we can all be better drivers, but we cannot optimize our genes!). All goes to a really individualistic society, and it is scary.
Finally even elections are rigged. I knew that messages were sent to appeal to each category, but it is scary to see that it is actually used to lie. We all know that politicians are lying to us, but now, they even don’t care about giving us different lies. And social networks and ad companies have even more power to make us do things as they see fit. The fact that Facebook officially publishes some of its tests on users just makes me mad.
OK, the book is full of examples of bad usage of big data. I saw fist hand on scientific applications that it is easy to make a mistake when creating a model. In my case, the optimization of a modeler and more specifically the delta between each iteration. When trying to minimize the number of non convergence issues, if we only try to find the same time step as the original app, we are missing the point, we are trying to map a proxy. The real objective is to find a new time step that would also keep the number of convergence issues low, different ones.
Another example is just all these WDM actually. They are more often than not based on neural networks and deed learning algorithms (which is actually the same). We fuel lots of effort in making them better, but the issue is that we don’t know what they are doing (in that regards, all horror sic-fi movie with a crazy AI comes to mind, as well as Asimov’s books). This has been the case for decades, and although we know equivalent algorithms that could give us the explanation, we stay on these black boxes because they are cost-effective (we don’t have to choose the proper equivalent algorithm, we just train) and scalable (which may not be the case for the equivalent algorithm, as they don’t have the same priority in research it would seem!). The nice thing about the book is also that it underlines an issue that I haven’t even thought about. All these algorithms try to reproduce a past behavior. But humanity is evolving and things that were considered true in the past are not longer true (race anyone?). As such, if we are giving these WDM absolute power, we will just rot as a civilization and probably collapse.
I’m not against big data and machine learning. I think the current trend is clearly explained in this book and also corresponds to something I felt before this hype: let’s choose a good algorithm, let’s train the model and let’s see why it chooses some answers and not others. We may then be onto something or we may see that it is biased and we need to go back to the board. Considering the state of big data, we definitely need to go back to the board.