Some months ago, I had a TotalView tutorial, thanks to my job. Now, I’ve actually used it to debug one of my parallel applications and I would like to share my experience with fantastic tool.
First TotalView is not only a parallel debugger available on several Linux and Unix platforms. It also is a memory checker (MemoryScape and the TotalView plugin) as well as a reverse debugger, that is, you can roll back the execution of a program, even after it crashed (where it would be useless with a standard debugger like GDB).
Inside the main TotalView window, each program with its threads and processes can be accessed, reopened, even if you closed the application window. The only drawback is that it is not possible to remove an application from this window…
Launching Totalview raises a window allowing to launch a new program, attach to a running one or analyze a core dump. If the application uses MPI, it must be indicated (several implementations are available).
Once the application is launched, it is possible to actually debug it. The interface shows which process and thread is currently selected (the list of processes and threads is available in the lowest tab window). Unfortunately, there is no way to browse the code, so you have to go through your code (you can “dive” into a function by double-clicking on a call) to put a breakpoint somewhere.
For TotalView, breakpoints are a special case of action points. On action points, you can stop the program, or execute a simple code. You can also tell Totalview to stop the program when the program went a specific number of time through an instruction (efficient when the error shows up at the hundredth-or-so iteration of a loop).
There are several ways of stopping when arriving at an action point: stopping as soon as one thread/process arrives, when all arrived at it, a group, … There also a lot of other functions that are quite usefull.
Exploring variables is one of the obvious uses of a debugger. Without it, debugging is often useless. TotalView allows to “dive” into a variable, and then explore it. A multi-dimensional variable can be sliced, and then compared between processes. When a variable is modified, it appears in yellow. It is then possible to compare an MPI communication result (for instance).
When comparing to other parallel debugger (like DDT), the array display is not as beautiful. TotalView has other advantages, as having its own C/C++/Fortran debugger, without relying on gdb.
MemoryScape is TotalView’s memory tool. It captures OS memory calls and watches what the application does.
The first option is to quard memory blocks. It’s less efficient that Fortran’s bound checks, but it is less costly (as the memory guards are only checked when the program stops). Other options include paint blocks (a pattern is “painted” inside the block, and if it shows up somewhere else in the code, it’s that the block wasn’t worrectly initializd, for instance), hoarded memory (deallocated memory is not immediatelly freed, which can then lead to detect memory corruption) and of course leak detection.
Several graphs can be drawn, but some are misleading (as the memory pie, which does not show the truth).
Replay Engine is a reverse debugger. When the program crashed, it is possible to rewind the execution to find where the problem first showed up.
Of course, the rewind option is based on snapshots, which means that you cannot replay a really big program (that uses several GB), that ReplayEngine chooses when to do a snapshot, and it is possible that the instant you want was not captured. I never used the ReplayEngine because of these pitfalls (no reverse debugger can escape them).
Although it is pretty much expensive, TotalView is very helpfull. When I had to parallize with MPI a scientific code, it was simple to use the MPI library I used, and the variable display helped me fix the communications in no time.
I never had a real use for MemoryScape. The leak detection is efficient, but like Valgrind, some detected leaks are not real leaks. The guarded memory could have been useful, but as I had read issues, it couldn’t help me.
In the end, I would recommand TotalView as a parallel debugger. With an efficient parallel profiler, it is one of the need-to-have tools in one’s toolbox.
Link to the official TotalView website: http://totalviewtech.com/