Big data is the current hype, the thing you need to do to find the best job in the world. I’ve started using machine learning tools a decade ago, and when I saw this book, it felt like it was answering some concerns I had. Let’s see what’s inside.

Read More

I work in an international company, and there are lots of people from different cultures around me, and with whom I need to interact. Out of the blue, it feels like it’s easy to work with all of them, I mean, how difficult could it be to work with them?

Actually, it’s easy, but sometimes interactions are intriguing and people do not react the way you expect them to react. And why is that? Lots of reasons, of course, but one of them is that they have a different culture and do not expect you to explicitly tell them what they did wrong (which is something I do. A lot).

Read More

I have trouble with slides. I hate them. I’ve followed a training of presentation to make better ones, and with more or less no slides anymore. I liked that training very much, but it’s difficult to apply to scientific presentations. As such, I’ve decided to read this book who is about scientific presentations (published by IEEE-Wiley) and to see how other people apprehend slides.

Read More

Last year, my colleagues and I presented a paper on giga model simulations in an SPE conference: Giga-Model Simulations In A Commercial Simulator – Challenges & Solutions. During this talk, we talked about the complexity of I/O for such simulations. We had ordered data as input that we needed to split in chunks to send them on the relevant MPI ranks, and then the same process was required for writing the results, gathering the chunks and then writing them down to the disk.

The central point is that some clusters have parallel file systems, and these works well when you try to access big blobs of aligned data. In fact, as they are the bottleneck of the whole system, you need to limit the number of accesses to what you actually require. For instance in HDF5, you can specify the alignment of datasets, so you can say that all HDF5 datasets will be aligned on the filesystem specifications (so for instance 1MB if your Lustre/GPFS has a chunk size of 1MB) and read or write chunks that are multiple of these values.

Read More