It seems that Packt Publishing is on a publishing spree on Machine Learning in Python. After Building Machine Learning Systems In Python for which I was technical reviewer, Packt published Learning Scikit-Learn In Python last November.
Content and opinions
This book is shorter than the more general Building Machine Learning Systems, and it is geared toward one specific module: scikit-learn. My first impression was the unbalance in the table of content. Almost half the book is one chapter about supervised learning, and the other 3 chapters share the other half. It’s noticeable because the last two chapters have no subsubparts, whereas the second chapter has lots of them. It feels like you will spend a lot of time on supervised learning, and then just fly over the rest. Actually, you don’t, as it is only that less aspects of unsupervised learning and advanced features are covered, but what is covered is still properly covered.
OK, let’s start with the usual first chapter. With Packt, you always get in it what you need to install what you need, and it is not different here. It also has some warnings about what you may want to avoid (curse of dimensionality…). Then the second chapter starts with SVMs and then a naïve Bayes classifier. This went well, with good images and tables to understand what was going on. Then decision trees were introduced. I didn’t understand why the first example here didn’t use the same function to split between training and testing datasets. It was the only occurrence in the book. I had some troubles understanding what the displayed tree was about, how it was supposed to be understood (this is something provided by the scikit), as I never saw one before, but the test explained it properly. The last part in this chapter was on regression, with the tools that were used before but now in regression mode.
The next chapters is a small introduction to unsupervised learning. It mainly revolves around PCA and k-means. Disclaimer: I hate PCA, as usually data (IMHO) cannot be described linearly. At least the authors introduced the kernel trick in PCA even though it is not clearly mentioned what it does. Classification with k-means is then introduced and some additional algorithms.
This is where I almost lost my nerves. At page 75, the authors claim that “a theorem called the Law of Large Numbers tells us that when we repeat an experiment a large number of times (for example, measuring somebody’s height), the distribution of results can be approximated by a Gaussian.” Sorry, but this is plainly wrong. The LLN states that averaging a sufficient number of trials, you can approximate the expected value. This has NOTHING to do with the fact that the law is Gaussian. The closest theorem I know is the Central Limit Theorem which states that the sum of trials from a law can be approximated by a Gaussian. It is quite different. If what the authors claim were to be true, all the random laws would be Gaussian, which is absurd…
OK, back to the last chapter. In a way, it is close to what PCA does for the first part: selecting or extracting features. Are they also unsupervised learning algorithms? No, they can play a preprocessing part (just like any dimensionality reduction algorithm would) in a Machine Learning workflow. What is nice is scikit-learn’s tools for model search and selection. They may actually be what I prefer in the scikit because it is so fundamental. At least they have a good overview in the book.
Unfortunately, the book is published just after the more complete Building Machine Learning Systems In Python. As it is shorter and not that cheaper, it will always be compared to Building ML Systems.
My other worry is about the lack of knowledge of fundamental statistical theory by the authors. This destroys whatever confidence I had, and it is too bad because the rest of the book seems sound.