Debugging GObjects in Glib
By Johan on Friday, October 1 2010, 00:58 - The wonderful world of computer programming - Permalink
I'm working (since too long) on a Haskell binding to the XFCE library "xfconf" because, you know, in order to improve one programming skills, one should write code. Anyway, after harassing the Haskell mailing-lists with my newbie questions, I got a not-so-bad working binding to xfconf.
… memory leaks apart.
It took me a lots of efforts to finally solve this memory issue, mainly because I'm a bad programmer who don't know a damn about software runtime and also because I don't know what tools to use with Haskell.
I. The situation
I.a. Why do I care about memory leaks ?
Because I am poor.
Poor people tend to have old computers. Old computers tend to have a small amount of RAM. And, sadly, recent GNU/linux distributions tend to be really bloated.
Practical advantages of hunting memory leak are 1) you get to know more stuff about software debugging 2) you can be proud of building a program which can run non-stop on a system without the need to restart it every two days to reset its memory.
Last disclaimer: having a proper test suite is a big plus when doing such optimisations, you can tweak your existing code, recompile it and check it without fearing to introduce some new nasty bugs.
I.b. Is it a post about how Haskell is great ?
It could have been but no. Although I will talk about profiling and debugging a Haskell program you don't need to know Haskell to read this post since it revolves around a language binding to a C library. However you should have a basic understanding of C software (heap, stack, malloc/free) and an elementary sense of logic.
II. How to spot a memory leak from quite a long way away ?
Fortunately computers are deterministic machines, hence we should use empirical tool to monitor them. (sic!)
I could try to use GHC comprehensive profiling tools, valgrind or other advance debug tools to first analyze my program but let's keep thing simple, wouldn't you ?
A memory leak is by definition a bug which causes the program to always
need more memory to run. A very idiotic way to search for them is to
create a small testbench calling a lots of times the same function(s) in
your program and, in the same time, to monitor its memory consumption
with ps
(that's just an example). What we are interested in is the
resident size memory of our program. So a small watch "ps -p
$PROGRAM_PID -o rsz,command"
should do the trick.
You will eventually get bored of watching the output of ps
to check if
your program memory stays stable or is continually increasing. This is
why I have written a small PERL script polling ps -o rsz
and
formatting the data to gnuplot to get a nice graph of the
situation. The gnuplot animation is based on an idea from Thanassis
Tsiodras.
bash:Development/glib $ memory_monitor.pl ./myProgram
Left, program with a memory leak. Right, garbage collector stabilizing memory.
III. Isolate the problem
I've create 3 small test benches : a reference C program calling only
xfconf_init
, xfconf_get_channel
and xfconf_shutdown
, the same
program implemented with raw Haskell FFI instructions and finally one
calling my binding.
The C program runs fine.
The simple FFI example too.
Only my gtk2hs-based code has a problem.
Good news ! The problem comes effectively from my binding and not from some obscure limitations of GHC or Haskell.
Bad news ! The memory leak comes from one of the 3 previous functions.
A rapid recompilation showed that it's effectively a bug with my
xfconf_get_channel
function. Thus I should try to monitor glib
gobjects in my program.
IV. Musing with Glib and GDB
IV.a. Basic commands
Since I only know how to debug C programs and I'm too lazy to search for
their Haskell counterparts, I choose not to monitor Haskell GObject
functions but directly Glib ones. A quick su -c 'debuginfo-install
glib2 xfconf'
install all the required debugging symbols on my Fedora
system. Let's fire gdb to see what's happening:
bash:Development/glib $ gdb ./myHaskellBinding GNU gdb (GDB) Fedora (7.1-34.fc13) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i686-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /mnt/data/Development/glib/myHaskellBinding...(no debugging symbols found)...done. (gdb)
Ok, now I will set breakpoints on some characteristic GObject functions,
namely g_object_ref
and g_object_unref
and launch the program.
(gdb) b g_object_ref Breakpoint 1 at 0x804b540 (gdb) b g_object_unref Breakpoint 2 at 0x804b190 (gdb) run Starting program: /mnt/data/Development/glib/myHaskellBinding [Thread debugging using libthread_db enabled] [New Thread 0x1900b70 (LWP 12649)] Breakpoint 1, IA__g_object_ref (_object=0x812c190) at gobject.c:2392 2392 { Missing separate debuginfos, use: debuginfo-install gmp-4.3.1-6.fc13.i686
Right, I've got an object, but which one ? If I simply use print
_object
the gdb prompt will kindly print its name. If it's not working
for you, you can try the old method by calling g_type_name(g_type)
.
Breakpoint 1, IA__g_object_ref (_object=0x812c190) at gobject.c:2392 2392 { Missing separate debuginfos, use: debuginfo-install gmp-4.3.1-6.fc13.i686 (gdb) p (GObject*)(_object) $1 = 0x812c190 [XfconfChannel] (gdb) p ((GObject*)(_object))->g_type_instance->g_class->g_type $2 = 135470184 (gdb) p (gchar*)g_type_name(((GObject*)(_object))->g_type_instance->g_class->g_type) $3 = (gchar *) 0x23acf6 "XfconfChannel" (gdb) c Continuing.
Going through all the call to g_object_(un)ref
can be quite boring,
especially if you are dealing with a lot of different object (or a very
buggy binding to glib). Hopefully you can automate GDB …
Note that backtrace
returns only a limited set of three frames (the
current call to g_object_\*
, xxxx_info
and ??
). That's because
Haskell is a functional language and I very doubt that it uses
conventional calling conventions or stacks to run. It may happen that
GDB get completly lost due to Haskell trashing the C stack but even in
this extreme case GDB can be resumed and still offers a lot of valuable
help. Unfortunately as a side effect, (hardware) memory watches are not
working (do not ask me why ^^).
IV.b. GDB command files
Clearly at this point I am wishing for a way to script GDB and speed up
all these typing. Hopefully you can script GDB ! Thanks to Philip
Withnall post on the subject. Just use the -x FILE
option. … some
GDB manual pages later … I've written a quick script to monitor gobject
references. Note that you could have put a hardware watch point on
((GObject*)(_object))->ref_count
too. Anyway, my GDB script is:
# vim:filetype=gdb: set pagination off # You will have to replace '\e' by the value of the escape 0x1B character # (Control+V+Esc sous Vi) set $RED="\e[31m" set $GREEN="\e[32m" set $RESET="\e[0m" # We wait in main for glib libraries to be loaded break main run # When in main, define our breakpoints ... break g_object_ref commands silent set $myType=((GTypeClass*)((GTypeInstance*)_object)->g_class)->g_type set $myName=(gchar*)g_type_name($myType) set $myCount=((GObject*)(_object))->ref_count + 1 printf "%s",$GREEN printf "[%d] +++ %s at 0x%x\n",$myCount,$myName,_object printf "%s",$RESET continue end # When in main, define our breakpoints ... break g_object_unref commands silent set $myType=((GTypeClass*)((GTypeInstance*)_object)->g_class)->g_type set $myName=(gchar*)g_type_name($myType) set $myCount=((GObject*)(_object))->ref_count - 1 printf "%s",$RED printf "[%d] --- %s at 0x%x\n",$myCount,$myName,_object printf "%s",$RESET continue end # When in main, define our breakpoints ... and RESUME ! continue quit
Which give the following output when run with our simple C program:
bash:Development/glib $ gdb -x glib.gdb ./demo Breakpoint 1, main () at demo.c:18 18 GError *error = NULL; Breakpoint 2 at 0xd1cb44: file gobject.c, line 2392. Breakpoint 3 at 0xd1c825: file gobject.c, line 2422. received XfconfChannel address [0x804b090] [0] --- XfconfChannel at 0x804b090 [0] --- DBusGProxy at 0x80532c8 [0] --- DBusGProxy at 0x8053300 received XfconfChannel address [0x804b108] [0] --- XfconfChannel at 0x804b108 [0] --- DBusGProxy at 0x80533a8 [0] --- DBusGProxy at 0x8053400 received XfconfChannel address [0x804b150] [0] --- XfconfChannel at 0x804b150 [0] --- DBusGProxy at 0x8053438 [0] --- DBusGProxy at 0x8053470
IV.c. Haskell binding and Garbage Collection
Let's use the previous GDB script on a real faulty program to see how it can help. Some things to keep in mind are:
Haskell manages external C structures through
ForeignPtr
s, they are special wrappers associated with a) a C pointer (array, struct, …) and b) a C function, called in Haskell a finalizer, to free them (free
,g_strfreev
,g_object_unref
, …). This way,ForeignPtr
data can be managed by Haskell garbage collector which will call their finalizer when the data are no longer needed.Unfortunately garbage collection adds some delays between the time where you finish using your data and the time when they are effectively freed. You can force the Haskell runtime to perform garbage collection with the
performGC
function but you must also give it enough time to complete it. Typically a short call togetChar
orthreadDelay
is enough. This is what is used in the following examples (when "press any key to continue" is printed).When your library (glib) requires calling an initialisation and end function (as
gtk_init
andgtk_main_quit
) to set up and clean its environment, it may happen that your call to the "end" function occurred before the garbage collector has finished to clean all the GOBjects, leading to some errors or segfaults.The two following programs are identical. They retrieve 3 XfconfChannel objects with
xfconf_channel_get
and 3 withxfconf_channel_new
. The former returns an singleton XfconfChannel owned by the xfconfd daemon, the latter returns a standalone one.
Naïve implementation Breakpoint 1, 0x08060683 in main () Breakpoint 2 at 0xd1cb44: file gobject.c, line 2392. Breakpoint 3 at 0xd1c825: file gobject.c, line 2422. [New Thread 0x1900b70 (LWP 10387)] [2] +++ XfconfChannel at 0x812f190 [3] +++ XfconfChannel at 0x812f190 [4] +++ XfconfChannel at 0x812f190 [3] --- XfconfChannel at 0x812f190 [New Thread 0x2301b70 (LWP 10388)] "QuickCheck" [4] +++ XfconfChannel at 0x812f190 [5] +++ XfconfChannel at 0x812f190 [6] +++ XfconfChannel at 0x812f190 [5] --- XfconfChannel at 0x812f190 "QuickCheck" [6] +++ XfconfChannel at 0x812f190 [7] +++ XfconfChannel at 0x812f190 [8] +++ XfconfChannel at 0x812f190 [7] --- XfconfChannel at 0x812f190 "QuickCheck" [2] +++ XfconfChannel at 0x812f1a8 [3] +++ XfconfChannel at 0x812f1a8 [2] --- XfconfChannel at 0x812f1a8 "QuickCheck" [2] +++ XfconfChannel at 0x812f1c0 [3] +++ XfconfChannel at 0x812f1c0 [2] --- XfconfChannel at 0x812f1c0 "QuickCheck" [2] +++ XfconfChannel at 0x812f208 [3] +++ XfconfChannel at 0x812f208 [2] --- XfconfChannel at 0x812f208 "QuickCheck" press any key to continue [New Thread 0x2d02b70 (LWP 10389)] [Switching to Thread 0x2301b70 (LWP 10388)] [6] --- XfconfChannel at 0x812f190 [5] --- XfconfChannel at 0x812f190 [4] --- XfconfChannel at 0x812f190 [1] --- XfconfChannel at 0x812f1a8 [1] --- XfconfChannel at 0x812f1c0 [1] --- XfconfChannel at 0x812f208 [Switching to Thread 0x130e90 (LWP 10383)] [3] --- XfconfChannel at 0x812f190 [0] --- DBusGProxy at 0x81374c8 [0] --- DBusGProxy at 0x8137500 [Thread 0x2d02b70 (LWP 10389) exited] [Thread 0x2301b70 (LWP 10388) exited] [Thread 0x1900b70 (LWP 10387) exited] Program exited normally.
Patched implementation Breakpoint 1, 0x08060763 in main () Breakpoint 2 at 0xd1cb44: file gobject.c, line 2392. Breakpoint 3 at 0xd1c825: file gobject.c, line 2422. [New Thread 0x1900b70 (LWP 9263)] [2] +++ XfconfChannel at 0x812f190 [3] +++ XfconfChannel at 0x812f190 [2] --- XfconfChannel at 0x812f190 [New Thread 0x2301b70 (LWP 9264)] "QuickCheck" [3] +++ XfconfChannel at 0x812f190 [4] +++ XfconfChannel at 0x812f190 [3] --- XfconfChannel at 0x812f190 "QuickCheck" [4] +++ XfconfChannel at 0x812f190 [5] +++ XfconfChannel at 0x812f190 [4] --- XfconfChannel at 0x812f190 "QuickCheck" [2] +++ XfconfChannel at 0x812f1a8 [1] --- XfconfChannel at 0x812f1a8 "QuickCheck" [2] +++ XfconfChannel at 0x812f1c0 [1] --- XfconfChannel at 0x812f1c0 "QuickCheck" [2] +++ XfconfChannel at 0x812f208 [1] --- XfconfChannel at 0x812f208 "QuickCheck" press any key to continue [Switching to Thread 0x2301b70 (LWP 9264)] [0] --- XfconfChannel at 0x812f208 [0] --- XfconfChannel at 0x812f1c0 [0] --- XfconfChannel at 0x812f1a8 [New Thread 0x2d02b70 (LWP 9265)] [3] --- XfconfChannel at 0x812f190 [2] --- XfconfChannel at 0x812f190 [1] --- XfconfChannel at 0x812f190 [Switching to Thread 0x130e90 (LWP 9259)] [0] --- XfconfChannel at 0x812f190 [0] --- DBusGProxy at 0x81374c8 [0] --- DBusGProxy at 0x8137500 [Thread 0x2d02b70 (LWP 9265) exited] [Thread 0x2301b70 (LWP 9264) exited] [Thread 0x1900b70 (LWP 9263) exited] Program exited normally.
Haskell adds in both implementation a useless reference-and-unreference
action for each object but we will ignore it since it has no impact on
the GObject resource management. So what went wrong within the naïve
implementation ? Each new object is referenced one more time than in the
patched implementation. Unfortunately this lets the reference count of
each standalone object (0x812f1a8
, 0x812f1c0
, 0x812f208
) set to 1
and prevent them from being freed. Same problem with the singleton
object (0x812f190
) which ends with a reference count of 3 instead of
0.
The patch was pretty simple in this case, I re-read the gtk2hs Glib
documentation and source code (!), replaced makeNewGObject
by
constructNewGObject
for the singleton object and write a custom FFI
wrapper for xfconf_channel_new
. With this patch, every XfconfChannel
object ends with a reference count of 0 and is correctly disposed.
End of the story.
/DS, who is working on his next f****** memory leak.