As I have to parellize some programs developed in my new lab, I monitor CPU usage during thier execution. I do not usually need MPI to optimize them (although sometimes it is needed), only OpenMP, which means I can track /proc/ to get CPU and instantaneously memory usages.
So I wrote a small script that can be used by anyone for this purpose. I’ll explain how it works now.
Each process has an entry in /proc/ with its pid. For Linux and BSD, threads inside a process are also processes (well light ones) and are located in /proc/%pid%/task/, with a content similar to the /proc/%pid% folder. So I only have to read stat to get the data I need, as well as statm for the memory consumption.
def collectData(pid, task): """ Collect process list """ f1 = open("/proc/%d/task/%s/stat"%(pid,task)) f2 = open("/proc/%d/task/%s/statm"%(pid,task)) t = datetime.datetime.now() stat = f1.readline().split() mem = f2.readline().split() d = dict([(name, name(el)) for (name, el) in zip(names, stat)]) # names is a global variable with the data's names d["pmem"] = 100 * float(mem) * pagesizepercent # pagesizepercent is a global variable return t, d
This function will be called regularly inside a thread for the monitored process and each of its threads.
class MonitorThread(threading.Thread): """ The monitor thread saves the process info every second """ def __init__(self, pid): import collections self.pid = pid threading.Thread.__init__(self) self.data = collections.defaultdict(dict) self.process = True def run(self): import os import time while self.process: threads = os.listdir("/proc/%d/task/" % self.pid) for thread in threads: t, d = collectData(self.pid, thread) d["current_time"] = t if "now" in self.data[thread]: now = self.data[thread]["now"] d['pcpu'] = 1e6 * ((d['utime'] + d['stime']) - (now['utime'] + now['stime'])) / float((getTime(t) - getTime(now["current_time"]))) self.data[thread][getTime(t)] = d self.data[thread]["now"] = d time.sleep(1)
Here I launch a new acquisition every second, but it can be customized, depending of the time during which your program runs. The CPU usage is computed like top and htop.
Now that I have a thread that can analyze the threads behaviour, I can use subprocess to launch it.
if __name__ == "__main__": import sys import os import pickle import subprocess stdin = open(sys.argv) stdout = open(sys.argv, "w") process = subprocess.Popen(sys.argv[3:], stdin = stdin, stdout = stdout) thread = MonitorThread(process.pid) thread.start() process.wait() thread.process = False thread.join() f = open('%d.data' % process.pid, 'w') pickle.dump(thread.data, f)
The first argument of the script will be the input file for the process, the second the output file. The third argument is the actual program that will be monitored and then analyzed, and subsequent arguments will be passed to its command line.
The instructions in this function are not very complicated :
- Launch the program
- Lauch the monitoring thread with the process’ pid
- Wait for the program to stop
- Stop the thread
- Dump the data in a pickle file
Of course, I’d like to draw some graphics, so here is a sample for displaying CPU usage as well as the memory consumption :
def displayCPU(data, pid): """ Displays and saves the graph """ import pylab import numpy spid = str(pid) c = 0 threads = data.keys() threads.sort() for thread in threads: d = data[thread] keys = d.keys() keys.remove("now") keys.sort() mykeys = numpy.array(keys)/1e6 #convert µs to s mykeys -= mykeys pylab.plot(mykeys[1:], [d[key]['pcpu'] for key in keys[1:]], colours[c], label = thread) c = c+1 if spid == thread: pylab.plot(mykeys[1:], [d[key]['pmem'] for key in keys[1:]], 'k', label = 'MEM') pylab.ylim([-5, 105]) pylab.legend(loc=6) pylab.savefig('%d.svg' % pid) pylab.savefig('%d.png' % pid) pylab.close()
I use a different color for each thread. The memory plot (drawn in black) is only made for the main thread, as all threads share the same memory (so they are all identical).
This may result in the following graph, where one thread is always idle (the green one, I don’t know why Linux uses a third idle thread), one is working until some point near the end of the program (the red one), and finally one is always working (the blue one):
Some additional variables and functions are not shown here, but are available in the whole script. You can check on them if you want.
Download the whole script there (analyze.py).
I noticed a mistake in the memory consumption monitoring that I fixed.
I also fixed a mistake when the monitored program needs more than one command line argument.