In March 2008 issue, IEEE Computers published a case study on large-scale parallel scientific code development. I’d like to comment this article, a very good one in my mind.
Five research centers were analyzed, or more precisely their development tool and process. Each center did a research in a peculiar domain, but they seem share some Computational Fluid Dynamics basis.
What technologies are used ?
Although the centers are very different, they use a common set of technologies :
- The core langages are C/C++ or Fortran, and there is no surprise there, as these languages are portable accross platforms (if used correctly) and can be very fast, thanks to the evolution of the compilers.
- MPI is the glue used between processors, and nothing else (like OpenMP). This can be surprising, but using only one tool and the most generic one is to be expected. Too many tools for the same application leads to tensions between programmers and inside the different modules. With only MPI, programmers must rely on the implementation of the MPI library to optimize communication between processors that share memory, but it means that the programmers have more time tuning the application (a big issue in scientific code).
- For such large applications, everything must be modular, and even can be run separately. This enables teams to work only on part of the application.
- Version control is integrated into the developement, with either CVS or Subversion. There are no automatic tests prior or after the commits. The fact that there are tests (unit, integration, …) must one of the reasons these applications are still updated, upgraded and holding up over the years. Students are trained to test their code (I often see code that isn’t tested in research labs, even if tests would have caught a lot of “stupid” errors) in several ways, one does not test scientific code as a standard code.
- Every project uses well-known libraries like BLAS, LAPACK, the usual I/O, …
Whereas Computer Science (CS) students are taught how to write an application in an efficient way (robust but rapidly written), Scientific Computing (SC) students must develop fast algorithms in a short time. This is needed because parallel computation is used when a serial computation would take too much time, but even parallelized, these computations can take several hours or days.
Having a prototype is great, but it only is half the job. Once you have a prototype, you can test it, tune it if needed, and then it must be parallelized (sometimes it is parallelized during prototyping; I tend to think that parallel code must be introduced after a first draft, but it doesn’t mean that I didn’t think about how to parallelize my code). At this point, it is not sure that the code executes well on several dozens of processors, but it can be tested on a small farm (talking about farms, one great thing about Subversion is that it can trigger actions, like building and testing code on a farm, this is a must have).
It seems that none of these centers have found a correct parallel debugger for their application. Even for a multithread program on a simple computer, mastering the debugger and then debugging the code is hard. A lot of manpower will have to be put in this domain…
What technologies could also be used ?
Here are some of my thoughts about what could be used to enhance the quality of such an application, some of them are already used in some of these centers (so it is not completely crazy to express them here) :
- A compilation and testing farm should always be set up. These centers use a lot of different computers and plateforms, using a farm allows the use of different slaves to build and test the application almost everywhere.
- The application should be steered by a so-called high level language, like Python. Every module can really be separated and wrapped in a cross-platform way. One team used Python for a Fortran bridge, but they abandoned the idea. This is sad because with Numpy, there is a nice cross-platform tool that works really well For wrapping Fortran code. For C and C++, a lot of tools exist for this bridge (SWIG, my favorite, Boost.Python, …).
- Using Python as a link between modules can help with the debugging as well. If there is a problem somewhere, the state of the application can be easilly saved and restored to debug the issue. This is not always possible, but for reproductible bugs (they are the simplest one, I agree, not-reproductible bugs are much more annoying), it can help a lot.
- Sometimes, one node of the cluster can fail. In a long computation this means a lot of wasted time. A more robust message-passing tool could be used, but I’ve failed to find the adequate one for the moment. IPython1 may be a first clue toward the solution.
I do not pretend to know the truth with my comments; these applications are developed for a very long time, far longer than my own development experience, and thus I’m not in the position of knowing better than the people working on them (if one of them is reading this post, I’d like to congratulate her/him for the hard work). But I think that sometimes a new look at a problem may solve it, and Python may be an efficient tool for these applications, leading to even better scientific applications.