2012-04-17 18:01:22 UTC
not completely sure this is gc's fault, but it does fail it's own
disclaim_test, so I need help fixing that.
That was the short version. The long one is this. Starting with
FreeBSD 9.0 a number of ports that use libgc fail during build; the
ones I care about are STklos  and Gauche . Both currently use
libgc 7.1, both segfault on startup inside GC_FreeBSDGetDataStart
(os_dep.c:1654; I can provide backtraces on request), and both
worked fine on FreeBSD 7.x and 8.x.
I tried updating libgc from the ancient 7.1 to 7.2-alpha6 and to
current git (from github.com/ivmai/bdwgc) -- this helps a little.
The programs don't fail immediately on startup, but STklos hangs
on one of it's unit tests (while testing threads), and Gauche crashes
while compiling (or loading?) a thread-related library (the crash
is not inside of libgc).
Now while I'm not completely sure the new problems are caused by
the garbage collector, here's one datapoint: libgc fails one of
it's own regression tests (disclaim_test), but only when it's
configured with --enable-threads=posix and --enable-gc-debug (it
passes all the tests without threads or without gc-debug).
When I run .libs/disclaim_test (from the latest bdwgc git) manually
it produces this output:
Unthreaded disclaim test.
GDB shows that the crash is in GC_finalized_malloc; here's the
#0 0x000000080086c181 in GC_finalized_malloc (client_lb=24,
fclos=0x401460) at fnlz_mlc.c:142
#1 0x0000000000400d9f in pair_new (car=0x0, cdr=0x0) at
#2 0x000000000040112c in test (data=0x0) at tests/disclaim_test.c:169
#3 0x0000000000401201 in main () at tests/disclaim_test.c:212
It appears that GC_getspecific(GC_thread_key) call at fnlz_mlc.c:139
returns NULL and everything goes downhill from that.
I tesed this on a 2-core machine running FreeBSD 9.0 (amd64), but
it seems that all FreeBSD versions are affected -- see build logs
at  (devel/boehm-gc-threaded is the only one that fails regression
tests; that is the version compiled with threads).
So, how do I debug this particular error? What can be causing it?
(The question I actually want to ask is how to fix STklos and Guile,
but I'll start with working regressios tests first).