For each algorithm and program, there are architectures that are better than others. Some computation may need a lot of FLOPS, but FLOPS are not the only thing to consider. Communication and memory bandwidth and latency are as important as computational power, specially since memory speed and CPU speed are decoupled.
I had this discussion with one of my Ph.D. advisors some months ago when we talked about correctly using the computers we had then (dual cores), and I had almost the same one in my new job here: applied maths (finite differences, signal processing, …) graduate students are not taught how to use current computers, so how could they develop an HPC program correctly?
I think it goes even further than that, and it will be a part of this post. What I see is that trainees and newly-hired people (to some extent myself included) lack a lot of basic Computer Science knowledge, and even IT knowledge.
I was looking for an introductory book on peer-to-peer (P2P) application and their application to grid computation. Web services was a bonus, as it is something I don’t usually play with.
After Advanced Computer Architecture and Parallel Processing, I’m going to review another book from the same serie. As the title hints it, the goal of this book is to introduce the tools that may be used in parallel, grid and distributed computing. This is the layer above the architecture the last book presented.
This is my first review. I read this book some time ago but I still want to write about it because the topic is very interesting.
In March 2008 issue, IEEE Computers published a case study on large-scale parallel scientific code development. I’d like to comment this article, a very good one in my mind.
Five research centers were analyzed, or more precisely their development tool and process. Each center did a research in a peculiar domain, but they seem share some Computational Fluid Dynamics basis.
In my lab, we frequently process huge amounts of data, each process can take hours or days. The problem is that we don’t have a usable tool to do this.
Our legacy software is in C and we plan on moving to Python in the next weeks. We could use some commercial software, but it is not optimal.
This is where P2P comes into the game. We have a lot of unused computers or dual cores that are not used even at 50% because we are not trained in parallel computing (and we won’t in the near future). By “we”, I mainly mean PhD students. Our background is signal or image processing, not Computer Science and even less parallel computing. Those unused computers could be used for our computations, but this implies that the computer is only used if nobody works on it, that we only use what is available at a precise moment, and that some computers may get used during the computations. That’s why P2P seems an elegant idea, as a grid computing tool.
P2P computation is not new in the lab (we developed P2P-MPI in Java for instance), but for our team, it is. For the time being, I did not find much about the tools that we could use, but the JXTA protocol seems a good start. I hope I will be able to talk more about this subject in the near future.