I. The situation

I.a. Why do I care about memory leaks ?

Because I am poor.

Poor people tend to have old computers. Old computers tend to have a small amount of RAM. And, sadly, recent GNU/linux distributions tend to be really bloated.

Practical advantages of hunting memory leak are 1) you get to know more stuff about software debugging 2) you can be proud of building a program which can run non-stop on a system without the need to restart it every two days to reset its memory.

Last disclaimer: having a proper test suite is a big plus when doing such optimisations, you can tweak your existing code, recompile it and check it without fearing to introduce some new nasty bugs.

I.b. Is it a post about how Haskell is great ?

It could have been but no. Although I will talk about profiling and debugging a Haskell program you don't need to know Haskell to read this post since it revolves around a language binding to a C library. However you should have a basic understanding of C software (heap, stack, malloc/free) and an elementary sense of logic.

II. How to spot a memory leak from quite a long way away ?

Fortunately computers are deterministic machines, hence we should use empirical tool to monitor them. (sic!)

I could try to use GHC comprehensive profiling tools, valgrind or other advance debug tools to first analyze my program but let's keep thing simple, wouldn't you ?

A memory leak is by definition a bug which causes the program to always need more memory to run. A very idiotic way to search for them is to create a small testbench calling a lots of times the same function(s) in your program and, in the same time, to monitor its memory consumption with ps (that's just an example). What we are interested in is the resident size memory of our program. So a small watch "ps -p $PROGRAM_PID -o rsz,command" should do the trick.

You will eventually get bored of watching the output of ps to check if your program memory stays stable or is continually increasing. This is why I have written a small PERL script polling ps -o rsz and formatting the data to gnuplot to get a nice graph of the situation. The gnuplot animation is based on an idea from Thanassis Tsiodras.

bash:Development/glib $ memory_monitor.pl ./myProgram
leaky memory

can't read SVG ? go www.getfirefox.com

constant memory

can't read SVG ? go www.getfirefox.com

Left, program with a memory leak. Right, garbage collector stabilizing memory.

III. Isolate the problem

I've create 3 small test benches : a reference C program calling only xfconf_init, xfconf_get_channel and xfconf_shutdown, the same program implemented with raw Haskell FFI instructions and finally one calling my binding.

  • The C program runs fine.

  • The simple FFI example too.

  • Only my gtk2hs-based code has a problem.

Good news ! The problem comes effectively from my binding and not from some obscure limitations of GHC or Haskell.

Bad news ! The memory leak comes from one of the 3 previous functions. A rapid recompilation showed that it's effectively a bug with my xfconf_get_channel function. Thus I should try to monitor glib gobjects in my program.

IV. Musing with Glib and GDB

IV.a. Basic commands

Since I only know how to debug C programs and I'm too lazy to search for their Haskell counterparts, I choose not to monitor Haskell GObject functions but directly Glib ones. A quick su -c 'debuginfo-install glib2 xfconf' install all the required debugging symbols on my Fedora system. Let's fire gdb to see what's happening:

bash:Development/glib $ gdb ./myHaskellBinding
GNU gdb (GDB) Fedora (7.1-34.fc13)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /mnt/data/Development/glib/myHaskellBinding...(no debugging symbols found)...done.
(gdb)

Ok, now I will set breakpoints on some characteristic GObject functions, namely g_object_ref and g_object_unref and launch the program.

(gdb) b g_object_ref
Breakpoint 1 at 0x804b540
(gdb) b g_object_unref
Breakpoint 2 at 0x804b190
(gdb) run
Starting program: /mnt/data/Development/glib/myHaskellBinding
[Thread debugging using libthread_db enabled]
[New Thread 0x1900b70 (LWP 12649)]

Breakpoint 1, IA__g_object_ref (_object=0x812c190) at gobject.c:2392
2392    {
Missing separate debuginfos, use: debuginfo-install gmp-4.3.1-6.fc13.i686

Right, I've got an object, but which one ? If I simply use print _object the gdb prompt will kindly print its name. If it's not working for you, you can try the old method by calling g_type_name(g_type).

Breakpoint 1, IA__g_object_ref (_object=0x812c190) at gobject.c:2392
2392    {
Missing separate debuginfos, use: debuginfo-install gmp-4.3.1-6.fc13.i686
(gdb) p (GObject*)(_object)
$1 = 0x812c190 [XfconfChannel]
(gdb) p ((GObject*)(_object))->g_type_instance->g_class->g_type
$2 = 135470184
(gdb) p (gchar*)g_type_name(((GObject*)(_object))->g_type_instance->g_class->g_type)
$3 = (gchar *) 0x23acf6 "XfconfChannel"
(gdb) c
Continuing.

Going through all the call to g_object_(un)ref can be quite boring, especially if you are dealing with a lot of different object (or a very buggy binding to glib). Hopefully you can automate GDB …

Note that backtrace returns only a limited set of three frames (the current call to g_object_\*, xxxx_info and ??). That's because Haskell is a functional language and I very doubt that it uses conventional calling conventions or stacks to run. It may happen that GDB get completly lost due to Haskell trashing the C stack but even in this extreme case GDB can be resumed and still offers a lot of valuable help. Unfortunately as a side effect, (hardware) memory watches are not working (do not ask me why ^^).

IV.b. GDB command files

Clearly at this point I am wishing for a way to script GDB and speed up all these typing. Hopefully you can script GDB ! Thanks to Philip Withnall post on the subject. Just use the -x FILE option. … some GDB manual pages later … I've written a quick script to monitor gobject references. Note that you could have put a hardware watch point on ((GObject*)(_object))->ref_count too. Anyway, my GDB script is:

# vim:filetype=gdb:

set pagination off

# You will have to replace '\e' by the value of the escape 0x1B character
# (Control+V+Esc sous Vi)
set $RED="\e[31m"
set $GREEN="\e[32m"
set $RESET="\e[0m"

# We wait in main for glib libraries to be loaded
break main
run

# When in main, define our breakpoints ...
break g_object_ref
commands
silent
set $myType=((GTypeClass*)((GTypeInstance*)_object)->g_class)->g_type
set $myName=(gchar*)g_type_name($myType)
set $myCount=((GObject*)(_object))->ref_count + 1
printf "%s",$GREEN
printf "[%d] +++ %s at 0x%x\n",$myCount,$myName,_object
printf "%s",$RESET
continue
end

# When in main, define our breakpoints ...
break g_object_unref
commands
silent
set $myType=((GTypeClass*)((GTypeInstance*)_object)->g_class)->g_type
set $myName=(gchar*)g_type_name($myType)
set $myCount=((GObject*)(_object))->ref_count - 1
printf "%s",$RED
printf "[%d] --- %s at 0x%x\n",$myCount,$myName,_object
printf "%s",$RESET
continue
end

# When in main, define our breakpoints ... and RESUME !
continue

quit

Which give the following output when run with our simple C program:

bash:Development/glib $ gdb -x glib.gdb ./demo
Breakpoint 1, main () at demo.c:18
18      GError *error = NULL;
Breakpoint 2 at 0xd1cb44: file gobject.c, line 2392.
Breakpoint 3 at 0xd1c825: file gobject.c, line 2422.
received XfconfChannel address [0x804b090]
[0] --- XfconfChannel at 0x804b090
[0] --- DBusGProxy at 0x80532c8
[0] --- DBusGProxy at 0x8053300
received XfconfChannel address [0x804b108]
[0] --- XfconfChannel at 0x804b108
[0] --- DBusGProxy at 0x80533a8
[0] --- DBusGProxy at 0x8053400
received XfconfChannel address [0x804b150]
[0] --- XfconfChannel at 0x804b150
[0] --- DBusGProxy at 0x8053438
[0] --- DBusGProxy at 0x8053470

IV.c. Haskell binding and Garbage Collection

Let's use the previous GDB script on a real faulty program to see how it can help. Some things to keep in mind are:

  • Haskell manages external C structures through ForeignPtrs, they are special wrappers associated with a) a C pointer (array, struct, …) and b) a C function, called in Haskell a finalizer, to free them (free, g_strfreev, g_object_unref, …). This way, ForeignPtr data can be managed by Haskell garbage collector which will call their finalizer when the data are no longer needed.

  • Unfortunately garbage collection adds some delays between the time where you finish using your data and the time when they are effectively freed. You can force the Haskell runtime to perform garbage collection with the performGC function but you must also give it enough time to complete it. Typically a short call to getChar or threadDelay is enough. This is what is used in the following examples (when "press any key to continue" is printed).

  • When your library (glib) requires calling an initialisation and end function (as gtk_init and gtk_main_quit) to set up and clean its environment, it may happen that your call to the "end" function occurred before the garbage collector has finished to clean all the GOBjects, leading to some errors or segfaults.

  • The two following programs are identical. They retrieve 3 XfconfChannel objects with xfconf_channel_get and 3 with xfconf_channel_new. The former returns an singleton XfconfChannel owned by the xfconfd daemon, the latter returns a standalone one.

Naïve implementation
Breakpoint 1, 0x08060683 in main ()
Breakpoint 2 at 0xd1cb44: file gobject.c, line 2392.
Breakpoint 3 at 0xd1c825: file gobject.c, line 2422.
[New Thread 0x1900b70 (LWP 10387)]
[2] +++ XfconfChannel at 0x812f190
[3] +++ XfconfChannel at 0x812f190
[4] +++ XfconfChannel at 0x812f190
[3] --- XfconfChannel at 0x812f190
[New Thread 0x2301b70 (LWP 10388)]
"QuickCheck"
[4] +++ XfconfChannel at 0x812f190
[5] +++ XfconfChannel at 0x812f190
[6] +++ XfconfChannel at 0x812f190
[5] --- XfconfChannel at 0x812f190
"QuickCheck"
[6] +++ XfconfChannel at 0x812f190
[7] +++ XfconfChannel at 0x812f190
[8] +++ XfconfChannel at 0x812f190
[7] --- XfconfChannel at 0x812f190
"QuickCheck"
[2] +++ XfconfChannel at 0x812f1a8
[3] +++ XfconfChannel at 0x812f1a8
[2] --- XfconfChannel at 0x812f1a8
"QuickCheck"
[2] +++ XfconfChannel at 0x812f1c0
[3] +++ XfconfChannel at 0x812f1c0
[2] --- XfconfChannel at 0x812f1c0
"QuickCheck"
[2] +++ XfconfChannel at 0x812f208
[3] +++ XfconfChannel at 0x812f208
[2] --- XfconfChannel at 0x812f208
"QuickCheck"
press any key to continue
[New Thread 0x2d02b70 (LWP 10389)]
[Switching to Thread 0x2301b70 (LWP 10388)]
[6] --- XfconfChannel at 0x812f190
[5] --- XfconfChannel at 0x812f190
[4] --- XfconfChannel at 0x812f190
[1] --- XfconfChannel at 0x812f1a8
[1] --- XfconfChannel at 0x812f1c0
[1] --- XfconfChannel at 0x812f208

[Switching to Thread 0x130e90 (LWP 10383)]
[3] --- XfconfChannel at 0x812f190
[0] --- DBusGProxy at 0x81374c8
[0] --- DBusGProxy at 0x8137500
[Thread 0x2d02b70 (LWP 10389) exited]
[Thread 0x2301b70 (LWP 10388) exited]
[Thread 0x1900b70 (LWP 10387) exited]

Program exited normally.
Patched implementation
Breakpoint 1, 0x08060763 in main ()
Breakpoint 2 at 0xd1cb44: file gobject.c, line 2392.
Breakpoint 3 at 0xd1c825: file gobject.c, line 2422.
[New Thread 0x1900b70 (LWP 9263)]
[2] +++ XfconfChannel at 0x812f190
[3] +++ XfconfChannel at 0x812f190
[2] --- XfconfChannel at 0x812f190
[New Thread 0x2301b70 (LWP 9264)]
"QuickCheck"
[3] +++ XfconfChannel at 0x812f190
[4] +++ XfconfChannel at 0x812f190
[3] --- XfconfChannel at 0x812f190
"QuickCheck"
[4] +++ XfconfChannel at 0x812f190
[5] +++ XfconfChannel at 0x812f190
[4] --- XfconfChannel at 0x812f190
"QuickCheck"
[2] +++ XfconfChannel at 0x812f1a8
[1] --- XfconfChannel at 0x812f1a8
"QuickCheck"
[2] +++ XfconfChannel at 0x812f1c0
[1] --- XfconfChannel at 0x812f1c0
"QuickCheck"
[2] +++ XfconfChannel at 0x812f208
[1] --- XfconfChannel at 0x812f208
"QuickCheck"
press any key to continue
[Switching to Thread 0x2301b70 (LWP 9264)]
[0] --- XfconfChannel at 0x812f208
[0] --- XfconfChannel at 0x812f1c0
[0] --- XfconfChannel at 0x812f1a8
[New Thread 0x2d02b70 (LWP 9265)]
[3] --- XfconfChannel at 0x812f190
[2] --- XfconfChannel at 0x812f190
[1] --- XfconfChannel at 0x812f190

[Switching to Thread 0x130e90 (LWP 9259)]
[0] --- XfconfChannel at 0x812f190
[0] --- DBusGProxy at 0x81374c8
[0] --- DBusGProxy at 0x8137500
[Thread 0x2d02b70 (LWP 9265) exited]
[Thread 0x2301b70 (LWP 9264) exited]
[Thread 0x1900b70 (LWP 9263) exited]

Program exited normally.

Haskell adds in both implementation a useless reference-and-unreference action for each object but we will ignore it since it has no impact on the GObject resource management. So what went wrong within the naïve implementation ? Each new object is referenced one more time than in the patched implementation. Unfortunately this lets the reference count of each standalone object (0x812f1a8, 0x812f1c0, 0x812f208) set to 1 and prevent them from being freed. Same problem with the singleton object (0x812f190) which ends with a reference count of 3 instead of 0.

The patch was pretty simple in this case, I re-read the gtk2hs Glib documentation and source code (!), replaced makeNewGObject by constructNewGObject for the singleton object and write a custom FFI wrapper for xfconf_channel_new. With this patch, every XfconfChannel object ends with a reference count of 0 and is correctly disposed.

End of the story.

/DS, who is working on his next f****** memory leak.