Profiling with Valgrind/Callgrind

1 Star2 Stars3 Stars4 Stars5 Stars (12 votes, average: 3.92 out of 5)
Loading...

Profiling comes in three different flaviors. The first is emulation, where a processor behavior is emulated, the second is sampling, where at regular intervals, the profiler samples the status of a program, and fianlly instrulentation, where the profiler gets information when a subroutine is called and when it returns. As with the Heisenberg uncertainty, profiling changes the exact behavior of your program. This is something you have to remember when analyzing a profile.

Valgrind is an Open Source emulation profiler. It is freely available on standard Linux platforms. As it is an emulation, it is far slower than the actual program. This means that the I/O are underestimated. The advantage is that you can have every detail on the memory behavior (cache misses for instance). Valgrind does not emulate all processors, but you can tweak it to approach your own one.

Read More

Some months ago, I had a TotalView tutorial, thanks to my job. Now, I’ve actually used it to debug one of my parallel applications and I would like to share my experience with fantastic tool.
First TotalView is not only a parallel debugger available on several Linux and Unix platforms. It also is a memory checker (MemoryScape and the TotalView plugin) as well as a reverse debugger, that is, you can roll back the execution of a program, even after it crashed (where it would be useless with a standard debugger like GDB).

Read More

This is the question I asked myself recently. If you write a scientific code in Fortran, you can expect a huge performance boost compared to the same program in C or C++. Well, unless you use some compiler extensions, in which case you get the same performance, or better.
Let’s try this on a 3D propagation sample, with a 8-points stencil.

Read More

Sometimes, a C or C++ array structure must be used in Python, and it’s always better to be able to use the underlying array to do some Numpy computations. To that purpose, Numpy proposes the array interface.

I will now expose an efficient way to use SWIG to generate the array interface and exposing the __array_struct__ property.

Read More

Contrary to what the title may hint to, this book is an introduction to C++ and the Qt library. And in the process, the authors tried to teach some good practices through design patterns. So if you’re a good C++ or Qt programer, this book is not for you. If you’re a beginner, the answer is in my review.

Read More

In March 2008 issue, IEEE Computers published a case study on large-scale parallel scientific code development. I’d like to comment this article, a very good one in my mind.

Five research centers were analyzed, or more precisely their development tool and process. Each center did a research in a peculiar domain, but they seem share some Computational Fluid Dynamics basis.

Read More

This question was asked on the Scipy mailing-list last year (well, one week ago). Nathan Bell proposed a skeleton that I used to create an out typemap for SWIG.

  1. %typemap(out) std::vector<double> {
    
  2.     int length = $1.size();
    
  3.     $result = PyArray_FromDims(1, &amp;length, NPY_DOUBLE);
    
  4.     memcpy(PyArray_DATA((PyArrayObject*)$result),&amp;((*(&amp;$1))[0]),sizeof(double)*length);
    
  5. }

This typemap uses obviously Numpy, so don’t forget to initialize the module and to import it. Then there is a strange instruction in memcpy. &((*(&$1))[0]) takes the address of the array of the vector, but as it is wrapped by SWIG, one has to get to the std::vector by dereferencing the SWIG wrapper. Then one can get the first element in the vector and take the address.

Edit on May 2017: This is my most recent trials with this.

  1. %typemap(out) std::vector<float> {
    
  2.     npy_intp length = $1.size();
    
  3.     $result = PyArray_SimpleNew(1, &amp;length, NPY_FLOAT);
    
  4.     memcpy(PyArray_DATA((PyArrayObject*)$result),$1.data(),sizeof(float)*length);
    
  5. }

Starting from Visual Studio 2005, every executable or dynamic library must declare the libraries it uses with a manifest file. This manifest can be embedded in the executable or library, and this is the best way to deal with it.

When using Scons, this embedding does not occur automatically. One has to overload the SharedLibrary builder so that a post-action is made after building the library :

def MSVCSharedLibrary(env, library, sources, **args):
  cat=env.OriginalSharedLibrary(library, sources, **args)
  env.AddPostAction(cat, 'mt.exe -nologo -manifest ${TARGET}.manifest -outputresource:$TARGET;2')
  return cat
 
 env['BUILDERS']['OriginalSharedLibrary'] = env['BUILDERS']['SharedLibrary']
 env['BUILDERS']['SharedLibrary'] = MSVCSharedLibrary

With this method, the embedding is made for every library, which is handy. The same can be done for the Program builder with the line :

  env.AddPostAction(cat, 'mt.exe -nologo -manifest ${TARGET}.manifest -outputresource:$TARGET;1')

When moving to Python, the real big problem that arises is the transformation of a Python array into the C++ container the team used for years.

Let’s set some hypothesis :

  • there is a separation between the class containing the data and the class that uses the data (iterators, …)
  • the containing class can be changed (policy or strategy pattern)

The first hypothesis is derived from the responsibility principle, the two classes have two distinct responsibilities, the first allocates the data space and allows simple access to it, the second allows usual operations (assignation, comparison tests or iterations for instance).

The second one will be the heart of the wrapper. It allows to change the way data is stored and accessed in a simple way.

Read More