Discussion:
[Gc] gc, guile, and NetBSD
Thomas Klausner
2014-10-27 15:12:35 UTC
Permalink
Hi!

I'm trying to get a usable guile-2.0.11 on NetBSD-7.99.1/x86_64.
guile-1.8.x works fine on the same system.

I'm using gc-7.4.2 from pkgsrc, which currently has no additional
patches. It is compiled with --enable-cplusplus and --disable-threads.
'make check' passes without errors.

The guile build fails with:

GEN guile-procedures.texi
GC_is_visible test failed
[1] Broken pipe cat alist.doc ar... |
Abort trap (core dumped) GUILE_INSTALL_LO...

which is the first time that the linked guile binary is actually
executed.

The backtrace looks like this:
Core was generated by `guile'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f7ff4d0e2ca in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0 0x00007f7ff4d0e2ca in _lwp_kill () from /usr/lib/libc.so.12
#1 0x00007f7ff4d0df55 in abort () at /archive/foreign/src/lib/libc/stdlib/abort.c:74
#2 0x00007f7ff7415d2b in GC_default_is_visible_print_proc () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#3 0x00007f7ff7416031 in GC_is_visible () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#4 0x00007f7ff786d479 in scm_storage_prehistory () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#5 0x00007f7ff787bc7a in scm_i_init_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#6 0x00007f7ff78c9b6a in scm_i_init_thread_for_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#7 0x00007f7ff78c9b89 in with_guile_and_parent () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#8 0x00007f7ff7414ca9 in GC_call_with_stack_base () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#9 0x00007f7ff78ca0af in scm_with_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#10 0x00007f7ff787bc3c in scm_boot_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#11 0x000000000040104d in main ()

The code that is running at that point is at
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=libguile/gc.c;h=13823c054cb0d08cbaf1ced5209c41d9857b8bd4;hb=HEAD#l623

I've talked with some guile developers on IRC, and they claim that
they are just using gc like intended, and that this is a problem in gc
on NetBSD, not in guile.

I'm out of my depth here, and hope that someone on this list has an
insight in what might be going wrong here and how to fix it.

Thanks,
Thomas
Ivan Maidanski
2014-10-28 21:44:00 UTC
Permalink
Hi Thomas,
Based on the stack trace: the pointer passed to GC_is_visible is not visible to GC.
Did some other (older) GC release work? What's about GC master branch?
Does GC with --enable-threads work correctly?
Regards,
Ivan
Mon, 27 oct 2014, 18:12 +03:00 from Thomas Klausner <***@giga.or.at>:
Hi!
I'm trying to get a usable guile-2.0.11 on NetBSD-7.99.1/x86_64.
guile-1.8.x works fine on the same system.
I'm using gc-7.4.2 from pkgsrc, which currently has no additional
patches. It is compiled with --enable-cplusplus and --disable-threads.
'make check' passes without errors.
The guile build fails with:
  GEN      guile-procedures.texi
GC_is_visible test failed
[1]   Broken pipe             cat alist.doc ar... |
      Abort trap (core dumped) GUILE_INSTALL_LO...
which is the first time that the linked guile binary is actually
executed.
The backtrace looks like this:
Core was generated by `guile'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f7ff4d0e2ca in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0x00007f7ff4d0e2ca in _lwp_kill () from /usr/lib/libc.so.12
#1  0x00007f7ff4d0df55 in abort () at /archive/foreign/src/lib/libc/stdlib/abort.c:74
#2  0x00007f7ff7415d2b in GC_default_is_visible_print_proc () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#3  0x00007f7ff7416031 in GC_is_visible () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#4  0x00007f7ff786d479 in scm_storage_prehistory () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#5  0x00007f7ff787bc7a in scm_i_init_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#6  0x00007f7ff78c9b6a in scm_i_init_thread_for_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#7  0x00007f7ff78c9b89 in with_guile_and_parent () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#8  0x00007f7ff7414ca9 in GC_call_with_stack_base () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#9  0x00007f7ff78ca0af in scm_with_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#10 0x00007f7ff787bc3c in scm_boot_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#11 0x000000000040104d in main ()
The code that is running at that point is at
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=libguile/gc.c;h=13823c054cb0d08cbaf1ced5209c41d9857b8bd4;hb=HEAD#l623
I've talked with some guile developers on IRC, and they claim that
they are just using gc like intended, and that this is a problem in gc
on NetBSD, not in guile.
I'm out of my depth here, and hope that someone on this list has an
insight in what might be going wrong here and how to fix it.
Thanks,
 Thomas
Ivan Maidanski
2014-10-28 21:43:56 UTC
Permalink
Hi Thomas,
Based on the stack trace: the pointer passed to GC_is_visible is not visible to GC.
Did some other (older) GC release work? What's about GC master branch?
Does GC with --enable-threads work correctly?
Regards,
Ivan
Mon, 27 oct 2014, 18:12 +03:00 from Thomas Klausner <***@giga.or.at>:
Hi!
I'm trying to get a usable guile-2.0.11 on NetBSD-7.99.1/x86_64.
guile-1.8.x works fine on the same system.
I'm using gc-7.4.2 from pkgsrc, which currently has no additional
patches. It is compiled with --enable-cplusplus and --disable-threads.
'make check' passes without errors.
The guile build fails with:
  GEN      guile-procedures.texi
GC_is_visible test failed
[1]   Broken pipe             cat alist.doc ar... |
      Abort trap (core dumped) GUILE_INSTALL_LO...
which is the first time that the linked guile binary is actually
executed.
The backtrace looks like this:
Core was generated by `guile'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f7ff4d0e2ca in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0x00007f7ff4d0e2ca in _lwp_kill () from /usr/lib/libc.so.12
#1  0x00007f7ff4d0df55 in abort () at /archive/foreign/src/lib/libc/stdlib/abort.c:74
#2  0x00007f7ff7415d2b in GC_default_is_visible_print_proc () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#3  0x00007f7ff7416031 in GC_is_visible () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#4  0x00007f7ff786d479 in scm_storage_prehistory () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#5  0x00007f7ff787bc7a in scm_i_init_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#6  0x00007f7ff78c9b6a in scm_i_init_thread_for_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#7  0x00007f7ff78c9b89 in with_guile_and_parent () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#8  0x00007f7ff7414ca9 in GC_call_with_stack_base () from /usr/obj/wip/guile2/work/.buildlink/lib/libgc.so.1
#9  0x00007f7ff78ca0af in scm_with_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#10 0x00007f7ff787bc3c in scm_boot_guile () from /usr/obj/wip/guile2/work/guile-2.0.11/libguile/.libs/libguile-2.0.so.22
#11 0x000000000040104d in main ()
The code that is running at that point is at
http://git.savannah.gnu.org/gitweb/?p=guile.git;a=blob;f=libguile/gc.c;h=13823c054cb0d08cbaf1ced5209c41d9857b8bd4;hb=HEAD#l623
I've talked with some guile developers on IRC, and they claim that
they are just using gc like intended, and that this is a problem in gc
on NetBSD, not in guile.
I'm out of my depth here, and hope that someone on this list has an
insight in what might be going wrong here and how to fix it.
Thanks,
 Thomas
t***@jp.sony.com
2014-10-29 04:06:59 UTC
Permalink
Hi,
Post by Thomas Klausner
GEN guile-procedures.texi
GC_is_visible test failed
[1] Broken pipe cat alist.doc ar... |
Abort trap (core dumped) GUILE_INSTALL_LO...
Current GC_FirstDLOpenedLinkMap() for NetBSD calls dlinfo(RTLD_SELF,
RTLD_DI_LINKMAP, &lm) to find link_map. So it will find link_map of
libgc.

With guile's case, libgc is link to libguile and libguile is linked to
the guile command, so libgc is not the first one in the link_map chain.

That's why, data section of libguile, where scm_protects exists, isn't
added to GC root and GC_is_visible fails.

The attaced patch works for me.

enami.
Thomas Klausner
2014-10-29 17:01:45 UTC
Permalink
Hi Enami-san!
Post by t***@jp.sony.com
Hi,
Post by Thomas Klausner
GEN guile-procedures.texi
GC_is_visible test failed
[1] Broken pipe cat alist.doc ar... |
Abort trap (core dumped) GUILE_INSTALL_LO...
Current GC_FirstDLOpenedLinkMap() for NetBSD calls dlinfo(RTLD_SELF,
RTLD_DI_LINKMAP, &lm) to find link_map. So it will find link_map of
libgc.
With guile's case, libgc is link to libguile and libguile is linked to
the guile command, so libgc is not the first one in the link_map chain.
That's why, data section of libguile, where scm_protects exists, isn't
added to GC root and GC_is_visible fails.
The attaced patch works for me.
Thank you very much!

I can confirm that this patch makes guile2 build for me.

Can this patch please integrated into gc?
Thomas
Post by t***@jp.sony.com
enami.
--- dyn_load.c.orig 2014-06-03 15:08:02.000000000 +0900
+++ dyn_load.c 2014-10-29 13:02:43.000000000 +0900
@@ -687,8 +687,16 @@
if( cachedResult == 0 ) {
# if defined(NETBSD) && defined(RTLD_DI_LINKMAP)
struct link_map *lm = NULL;
- if (!dlinfo(RTLD_SELF, RTLD_DI_LINKMAP, &lm))
- cachedResult = lm;
+ if (!dlinfo(RTLD_SELF, RTLD_DI_LINKMAP, &lm) && lm != NULL) {
+ /*
+ * Now, lm points link_map object of libgc. Since it
+ * might not be the first dynamically linked object,
+ * try to find it (object next to the main object).
+ */
+ while (lm->l_prev)
+ lm = lm->l_prev;
+ cachedResult = lm->l_next;
+ }
# else
int tag;
for( dp = _DYNAMIC; (tag = dp->d_tag) != 0; dp++ ) {
Ivan Maidanski
2014-11-02 07:54:52 UTC
Permalink
Hi Thomas,

Merged to master: https://github.com/ivmai/bdwgc/commit/8f6f15858cd5ae6c4f7fb6da935f8276632413cc
(I will copy it to release-7_4 later.)

Thank you
Post by Thomas Klausner
Hi Enami-san!
Post by t***@jp.sony.com
Hi,
Post by Thomas Klausner
GEN guile-procedures.texi
GC_is_visible test failed
[1] Broken pipe cat alist.doc ar... |
Abort trap (core dumped) GUILE_INSTALL_LO...
Current GC_FirstDLOpenedLinkMap() for NetBSD calls dlinfo(RTLD_SELF,
RTLD_DI_LINKMAP, &lm) to find link_map. So it will find link_map of
libgc.
With guile's case, libgc is link to libguile and libguile is linked to
the guile command, so libgc is not the first one in the link_map chain.
That's why, data section of libguile, where scm_protects exists, isn't
added to GC root and GC_is_visible fails.
The attaced patch works for me.
Thank you very much!
I can confirm that this patch makes guile2 build for me.
Can this patch please integrated into gc?
 Thomas
Post by t***@jp.sony.com
enami.
--- dyn_load.c.orig 2014-06-03 15:08:02.000000000 +0900
+++ dyn_load.c 2014-10-29 13:02:43.000000000 +0900
@@ -687,8 +687,16 @@
if( cachedResult == 0 ) {
# if defined(NETBSD) && defined(RTLD_DI_LINKMAP)
struct link_map *lm = NULL;
- if (!dlinfo(RTLD_SELF, RTLD_DI_LINKMAP, &lm))
- cachedResult = lm;
+ if (!dlinfo(RTLD_SELF, RTLD_DI_LINKMAP, &lm) && lm != NULL) {
+ /*
+ * Now, lm points link_map object of libgc. Since it
+ * might not be the first dynamically linked object,
+ * try to find it (object next to the main object).
+ */
+ while (lm->l_prev)
+ lm = lm->l_prev;
+ cachedResult = lm->l_next;
+ }
# else
int tag;
for( dp = _DYNAMIC; (tag = dp->d_tag) != 0; dp++ ) {
_______________________________________________
bdwgc mailing list
https://lists.opendylan.org/mailman/listinfo/bdwgc
Greg Troxel
2014-10-29 12:37:07 UTC
Permalink
Current GC_FirstDLOpenedLinkMap() for NetBSD calls dlinfo(RTLD_SELF,
RTLD_DI_LINKMAP, &lm) to find link_map. So it will find link_map of
libgc.

With guile's case, libgc is link to libguile and libguile is linked to
the guile command, so libgc is not the first one in the link_map chain.

That's why, data section of libguile, where scm_protects exists, isn't
added to GC root and GC_is_visible fails.

Very interesting! That's great to have figured out what's going on.

I wonder if this is a general issue not about NetBSD, or really why
guile/gc seem to work on Linux and other platforms. Is it a happy
accident of the dynamic linker and ordering, or is a more general fix
appropriate.
t***@jp.sony.com
2014-11-08 02:33:13 UTC
Permalink
# Sorry for late reply as I had a trip last week.
Post by Greg Troxel
I wonder if this is a general issue not about NetBSD, or really why
guile/gc seem to work on Linux and other platforms. Is it a happy
accident of the dynamic linker and ordering, or is a more general fix
appropriate.
The GC_FirstDLOpenedLinkMap() itself is used to implement
GC_register_dynamic_libraries() on some architecutre (not limited to
NetBSD).

But the bug was in the code to implement GC_FirstDLOpenedLinkMap() only
for NetBSD. So, other architecutre doesn't suffered from this bug.

Last time I implemented the code to run gauche on netbsd-6, I was
confused about the interface of dlinfo on NetBSD.

enami.

Loading...