Discussion:
[Gc] Desperately needing GC 7.1
(too old to reply)
M***@sophia.inria.fr
2007-12-18 09:02:03 UTC
Permalink
Hello there,

As stated in the subject of this mail, I'm desperately needing the GC v 7.1.

I'm actually totally confused with the development model adopted for the GC!

Indeed, the current stable version, the version 7.0 which is 6 months old
as a severe problem:

IT IS KNOWN NOT TO COMPILE ON RECENT MACOS X VERSION (LEOPARD).

This is a dramatic because it means that all the softwares that relies
on the GC no longer compiles on that platform. This would not be such a
problem if MacOS X was an obscure platform hardly used but today it is
a MAJOR problem...

I know that some patches exist. I know that the current CVS version
most likely fixes this small problem. However, it is NOT POSSIBLE to
rely on a CVS version for releasing a STABLE software! It is
impossible to know if the current CVS version works, if it has been
tested, if it contains experimental code. It is impossible to figure
out its life span. Etc, etc. All in all, using this version is the road to
severe problems if used in a final product.

Hence, we (I allow myself to switch from _I_ to _we_ because I cannot
imagine that I'm the only one facing this problem) need milestones. We
need at some point the GC to be stabilized and official releases
produced. We don't need these versions to contain all the patches that
have been submitted. We only need something that evolves with our new
hardware and new OSes.

So please, I beg it to you: produce these versions from time to time
but on a regular basis.

Thanks a million in advance.
--
Manuel

ps: I hope that no one will read in this mail anything agressive or
rude. I'm in debt with all of you that are contributing to the GC. In
particular, I known what I owe to Hans for all his effort during all these
years.
Andreas Tobler
2007-12-18 10:06:09 UTC
Permalink
Post by M***@sophia.inria.fr
As stated in the subject of this mail, I'm desperately needing the GC v 7.1.
I'm actually totally confused with the development model adopted for the GC!
Indeed, the current stable version, the version 7.0 which is 6 months old
IT IS KNOWN NOT TO COMPILE ON RECENT MACOS X VERSION (LEOPARD).
To whom is it known?

Here it (a gc-7.0 dated from second of July 07) compiles and runs on
ppc-32, i686 and x86_64 on OS-X 10.5.1.

CVS has one cosmetic change in regard of Darwin. And this change went in
last week.
Post by M***@sophia.inria.fr
I know that some patches exist. I know that the current CVS version
most likely fixes this small problem. However, it is NOT POSSIBLE to
rely on a CVS version for releasing a STABLE software! It is
impossible to know if the current CVS version works, if it has been
tested, if it contains experimental code. It is impossible to figure
out its life span. Etc, etc. All in all, using this version is the road to
severe problems if used in a final product.
Would you mind telling us in detail where you encounter the problem and
how you configure?

Thanks,
Andreas
M***@sophia.inria.fr
2007-12-18 10:10:02 UTC
Permalink
Hello Andreas,

First of all thanks for your quick answer. I appreciate it.
Post by Andreas Tobler
Post by M***@sophia.inria.fr
As stated in the subject of this mail, I'm desperately needing the GC v 7.1.
I'm actually totally confused with the development model adopted for the GC!
Indeed, the current stable version, the version 7.0 which is 6 months old
IT IS KNOWN NOT TO COMPILE ON RECENT MACOS X VERSION (LEOPARD).
To whom is it known?
On October 30th, I have posted a mail about problem with Leopard. Here is
an excerpt of a mail that has been posted to the Bigloo mailing list (the
compiler I'm in charge of), reporting again that problem :

-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
cd /Users/sgonzi/bigloo3.0c/gc-boehm && \
if [ "yes" = "yes" ]; then \
make \
/Users/sgonzi/bigloo3.0c/lib/3.0c/libbigloogc-3.0c.dylib \
CFLAGS="-DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include -DGC_DARWIN_THREADS -DFINALIZE_ON_DEMAND -I/Users/sgonzi/bigloo3.0c/lib/3.0c -O3 -fno-reorder-blocks -fPIC -fPIC" \
LD="gcc -r"; \
fi)

gcc -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include -DGC_DARWIN_THREADS -DFINALIZE_ON_DEMAND -I/Users/sgonzi/bigloo3.0c/lib/3.0c -O3 -fno-reorder-blocks -fPIC -fPIC -c -o os_dep.o os_dep.c
os_dep.c: In function catch_exception_raise:
os_dep.c:3908: error: x86_exception_state32_t has no member named faultvaddr
make[4]: *** [os_dep.o] Error 1
make[3]: *** [gcshared] Error 2
make[2]: *** [gc] Error 1
make[1]: *** [boot-gc] Error 2
make: *** [boot] Error 2
==
My sytem: 2 GHz Intel Core 2 Duo,Mac OSX 10.5 (Leopard), gcc=i686-apple-darwin9-gcc-4.0.1:
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----

Thanks to you!
--
Manuel

ps: As I said in my previous mail. This is likely to be a "small" problem
this the GC 7.0 is binary compatible with 10.4. It's only the compilation
that fails on 10.5.
Andreas Tobler
2007-12-18 10:36:14 UTC
Permalink
Hi Manuel,
Post by M***@sophia.inria.fr
On October 30th, I have posted a mail about problem with Leopard. Here is
an excerpt of a mail that has been posted to the Bigloo mailing list (the
The problem is in your bigloo config machine.

Darwin needs the configure scripts from inside gc-boehm to detect if it
has to use Leopard or Tiger style symbol naming.
--
checking for x86_thread_state32_t.eax... no
checking for x86_thread_state32_t.__eax... yes
--

A direct Makefile approach for Darwin with boehm-gc does not work.

Andreas
M***@sophia.inria.fr
2007-12-18 10:37:45 UTC
Permalink
Post by Andreas Tobler
A direct Makefile approach for Darwin with boehm-gc does not work.
Okay. Please correct me if I'm wrong:

You say that it is no longer possible to use Makefile.direct on Leopard
(it used to be until 10.4) and from now on we *must* use a traditional:
./configure && make

Thanks again for your help.
--
Manuel
Andreas Tobler
2007-12-18 10:55:07 UTC
Permalink
Post by M***@sophia.inria.fr
Post by Andreas Tobler
A direct Makefile approach for Darwin with boehm-gc does not work.
You say that it is no longer possible to use Makefile.direct on Leopard
./configure && make
That is correct.

See configure.ac line 232ff.
Post by M***@sophia.inria.fr
Thanks again for your help.
Np.

Andreas
Bruce Hoult
2007-12-18 10:13:25 UTC
Permalink
Post by M***@sophia.inria.fr
Indeed, the current stable version, the version 7.0 which is 6 months old
Note that that predates Leopard.
Post by M***@sophia.inria.fr
IT IS KNOWN NOT TO COMPILE ON RECENT MACOS X VERSION (LEOPARD).
This is a dramatic because it means that all the softwares that relies
on the GC no longer compiles on that platform. This would not be such a
problem if MacOS X was an obscure platform hardly used but today it is
a MAJOR problem...
I know that some patches exist. I know that the current CVS version
most likely fixes this small problem. However, it is NOT POSSIBLE to
rely on a CVS version for releasing a STABLE software!
It's entirely possible to use a CVS version for a 3rd party stable
release. All you have to do is note the version of each file. True,
it would be easier with subversion, which has a global version number,
but it's certainly possible with CVS.

If you asked nicely, Hans could probably even put a suitable tag on an
appropriate set of file versions for you. Especially if you gave him
the appropriate set of commands. (I am not implying that I'm creating
any obligation on Hans' part to do this)
Post by M***@sophia.inria.fr
Hence, we (I allow myself to switch from _I_ to _we_ because I cannot
imagine that I'm the only one facing this problem) need milestones. We
need at some point the GC to be stabilized and official releases
produced. We don't need these versions to contain all the patches that
have been submitted. We only need something that evolves with our new
hardware and new OSes.
No one has any right to demand anything from volunteers writing free
and open software.

If it doesn't work for you, you're free to fix it. If you don't have
the skills to fix it then you're free to pay someone who does.

For your information, I just downloaded gc-7.0.tar.gz on my Core2 Duo
MacBook Pro running Leopard and typed "./configure;make;make check"
and it built fine and passed the tests.

Regards,
Bruce
M***@sophia.inria.fr
2007-12-18 10:17:28 UTC
Permalink
Post by Bruce Hoult
No one has any right to demand anything from volunteers writing free
and open software.
Come on, that's not what I said. I repeat my self:
I FEEL IN DEBT WITH HANS.

I'm not saying anything bad against anyone here. I'm just begging for
some help.

Regarding free softwares, the only reason, I feel I can ask help is
because I contribute my self. I have my share (and for more than
a decade), please visit:
http://www-sop.inria.fr/mimosa/fp/Bigloo/
http://hop.inria.fr.
Post by Bruce Hoult
If it doesn't work for you, you're free to fix it. If you don't have
the skills to fix it then you're free to pay someone who does.
No, I cannot fix it by myself simply because I don't have the time to do
it. More precisely, the time I could spend on the GC would taken
on the time I use to developed the other softwares I'm in charge.

One more time:

I'M NOT ACCUSING ANYONE.
I'M NOT STARTING A FLAME.

I THANK YOU ALL FOR WHAT YOU ARE DOING AND FOR YOUR CONTRIBUTION.

Sorry, I don't how to say more nicely.
Post by Bruce Hoult
For your information, I just downloaded gc-7.0.tar.gz on my Core2 Duo
MacBook Pro running Leopard and typed "./configure;make;make check"
and it built fine and passed the tests.
Great. I'm glad to read that. That's good news. I will then have to
understand why it fails in some situations.
--
Manuel
Piet van Oostrum
2007-12-18 14:35:42 UTC
Permalink
BH> For your information, I just downloaded gc-7.0.tar.gz on my Core2 Duo
BH> MacBook Pro running Leopard and typed "./configure;make;make check"
BH> and it built fine and passed the tests.
However, there are also other problems with 7.0 on Mac OS X, even on Tiger.
I have experienced lockups in inkscape and w3m, apparently related to the
GC, and I am not alone. Downgrading to 6.8 helped. See
http://www.nabble.com/boehmgc-7.0-td13410364.html#a13419887 and this may
also be related to recent messages about problems with multithreaded
programs.
--
Piet van Oostrum <***@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: ***@vanoostrum.org
Joel Reymont
2007-12-23 03:41:58 UTC
Permalink
I would like to add that I'm experiencing the following issue on Leopard
as part of trying to compile Bigloo:

Starting program: /private/tmp/Xactestjoelr
Reading symbols for shared libraries +++. done
^C
Program received signal SIGINT, Interrupt.
0xffff027a in __spin_lock ()
(gdb) where
#0 0xffff027a in __spin_lock ()
#1 0x960a4539 in pthread_mutex_unlock ()
#2 0x0003e47b in GC_release_mark_lock ()
#3 0x0003ea85 in GC_stop_world ()
#4 0x0002f084 in GC_stopped_mark ()
#5 0x0002fb3e in GC_try_to_collect_inner ()
#6 0x00039e04 in GC_init_inner ()
#7 0x00039f03 in GC_init ()
#8 0x00001f69 in main (argc=1, argv=0xbffff2d8)
at /tmp/actestjoelr.c:21

This is using the stock 7.0 tarball. I will try with CVS of course but
would love to know why the above lockup is happening.

Thanks, Joel
Hans Boehm
2008-01-11 04:59:08 UTC
Permalink
In the interest of making available a GC7.1 asap, I generated a
gc-7.1alpha2. As far as I know, this is in pretty good shape.
However, it needs more testing on a variety of platforms. I'd
appreciate some more help in doing so. This is your chance to make
sure it works on your favorite platform before 7.1 is released!

You can get the tar file from the usual place, i.e.

http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/gc-7.1alpha2.tar.gz

The CVS repository should also have a corresponding tag. My convention
for 7.1+ is that alphaN for N odd is an unreleased CVS version, and
the even version(s) correspond to released snapshots.

In the absence of major problems, I would like to aim for a 7.1
release by early next week.

7.1alpha2 works around the MacOS10.5 problem by disabling parallel-mark
on the platform and adding a FIXME comment in what I think is the
problem spot. Clearly this needs a better solution, though perhaps
not in time for 7.1.

Hans
M***@sophia.inria.fr
2008-01-11 05:54:25 UTC
Permalink
Hello Hans,
Post by Hans Boehm
In the interest of making available a GC7.1 asap, I generated a
gc-7.1alpha2. As far as I know, this is in pretty good shape.
However, it needs more testing on a variety of platforms. I'd
appreciate some more help in doing so. This is your chance to make
sure it works on your favorite platform before 7.1 is released!
You can get the tar file from the usual place, i.e.
I will test this new version today. Currently, I have just downloaded
the tarball and untar-ed it. I suspect a first small problem.

-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
$ tar xvfz gc-7.1alpha
...
gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops.a
gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops_gpl.a
...
$ file gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops.a
gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops.a: current ar archive
$ ar tv gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops.a
rw-r--r-- 11363/100 19200 May 19 02:12 2006 atomic_ops.o
$ ar x libatomic_ops.a
$ file atomic_ops.o
atomic_ops.o: ELF 64-bit LSB relocatable, IA-64, version 1 (SYSV), not stripped
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----

I suspect that these two .a files should be removed from the tarball.

Yesterday, I have struggled more than two hours with the OSX port because I had
not noticed that gc-7.0.tar.gz contains two .o files compiled for IA64, namely

gc-7.0/libatomic_ops-1.2/src/atomic_ops_stack.o
gc-7.0/libatomic_ops-1.2/src/atomic_ops_malloc.o

For a reason that I have not investigated, these two pre-compiled
files ended up added to the .a library built on OSX. In consequence,
of course, the OSX linker complained when linking an executable
against the static version of the GC lib. Removing these two files from
the archive has solved the problem.
--
Manuel
Boehm, Hans
2008-01-11 17:58:18 UTC
Permalink
Thanks. This is a deficiency in the way "make dist" currently works, in that it includes everything in the libatomic_ops tree, combined with my failure to clean up properly. These files were never in the CVS tree.

I added a new 7.1alpha2-revised tar file to the distribution directory that is identical to the original, except that these two files have been removed. I think that if you are just building the GC, the .a files don't really hurt. However, if you are using the tree for something else, e.g. to extract libatomic_ops (which is not unreasonable, since I haven't otherwise had a chance to update that distribution in a while), they are a problem.

Hans
-----Original Message-----
Sent: Thursday, January 10, 2008 9:54 PM
To: Boehm, Hans
Subject: Re: [Gc] Re: Desperately needing GC 7.1
Hello Hans,
Post by Hans Boehm
In the interest of making available a GC7.1 asap, I generated a
gc-7.1alpha2. As far as I know, this is in pretty good shape.
However, it needs more testing on a variety of platforms. I'd
appreciate some more help in doing so. This is your chance to make
sure it works on your favorite platform before 7.1 is released!
You can get the tar file from the usual place, i.e.
I will test this new version today. Currently, I have just
downloaded the tarball and untar-ed it. I suspect a first
small problem.
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|--
---|-----|-----
$ tar xvfz gc-7.1alpha
...
gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops.a
gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops_gpl.a
...
$ file gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops.a
gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops.a: current
ar archive $ ar tv gc-7.1alpha2/libatomic_ops-1.2/src/libatomic_ops.a
rw-r--r-- 11363/100 19200 May 19 02:12 2006 atomic_ops.o $
ar x libatomic_ops.a $ file atomic_ops.o
atomic_ops.o: ELF 64-bit LSB relocatable, IA-64, version 1
(SYSV), not stripped
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|--
---|-----|-----
I suspect that these two .a files should be removed from the tarball.
Yesterday, I have struggled more than two hours with the OSX
port because I had not noticed that gc-7.0.tar.gz contains
two .o files compiled for IA64, namely
gc-7.0/libatomic_ops-1.2/src/atomic_ops_stack.o
gc-7.0/libatomic_ops-1.2/src/atomic_ops_malloc.o
For a reason that I have not investigated, these two
pre-compiled files ended up added to the .a library built on
OSX. In consequence, of course, the OSX linker complained
when linking an executable against the static version of the
GC lib. Removing these two files from the archive has solved
the problem.
--
Manuel
M***@sophia.inria.fr
2008-01-11 10:52:55 UTC
Permalink
Hello Hans,
Post by Hans Boehm
In the interest of making available a GC7.1 asap, I generated a
gc-7.1alpha2.
I have now integrated the GC7.1alpha2 in my Bigloo development tree.
Post by Hans Boehm
In the absence of major problems, I would like to aim for a 7.1
release by early next week.
So far I have successfully tested the new version of the Bigloo on the
following platforms:

- linux-x86
- linux-powerpc
- linux-armel
- MacOSX 10.5 (x86)

Everything went smoothly. Thanks you very much.

I still have to test OSX 10.4 and MinGW. I hope to be able to make this
test Monday morning.

Regarding the Armel port I still have a small problem. I'm porting Bigloo
and HOP to the Nokia N800. The toolchain for that machine uses QEMU.
It seems that QEMU/arm contains a bug that prevents the plain version of
your GC to work out of the box. The following small patch is needed:

-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
--- gc-7.1alpha2.orig/include/private/gcconfig.h 2007-08-14 22:49:48.000000000 +0200
+++ gc-7.1alpha2/include/private/gcconfig.h 2008-01-11 10:22:23.000000000 +0100
@@ -1738,9 +1738,19 @@
# endif
# ifdef LINUX
# define OS_TYPE "LINUX"
-# define LINUX_STACKBOTTOM
-# undef STACK_GRAN
-# define STACK_GRAN 0x10000000
+/*---------------------------------------------------------------------*/
+/* Bigloo start */
+/*---------------------------------------------------------------------*/
+# ifdef QEMU
+# define HEURISTIC2
+# else
+# define LINUX_STACKBOTTOM
+# undef STACK_GRAN
+# define STACK_GRAN 0x10000000
+# endif
+/*---------------------------------------------------------------------*/
+/* Bigloo end */
+/*---------------------------------------------------------------------*/
# ifdef __ELF__
# define DYNAMIC_LOADING
# include <features.h>
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----


Unfortunately, I don't know how to detect automatically if QEMU is running.
Thus, manually, when I compile under QEMU, I force the flag -DQEMU.

I hope this helps.
--
Manuel
Hans Boehm
2008-01-20 04:40:28 UTC
Permalink
Manuel -

Thanks.

Is the core ARM/QEMU problem here that /proc/self/stat is not emulated?

Is there any reason to believe that this is really ARM specific?

I'm fine with adding an option that works around this problem. But
I'd really like the macro to be a bit more generic, like
NO_PROC_SELF_STAT, if that sounds right to you.

Hans
Post by M***@sophia.inria.fr
Hello Hans,
Post by Hans Boehm
In the interest of making available a GC7.1 asap, I generated a
gc-7.1alpha2.
I have now integrated the GC7.1alpha2 in my Bigloo development tree.
Post by Hans Boehm
In the absence of major problems, I would like to aim for a 7.1
release by early next week.
So far I have successfully tested the new version of the Bigloo on the
- linux-x86
- linux-powerpc
- linux-armel
- MacOSX 10.5 (x86)
Everything went smoothly. Thanks you very much.
I still have to test OSX 10.4 and MinGW. I hope to be able to make this
test Monday morning.
Regarding the Armel port I still have a small problem. I'm porting Bigloo
and HOP to the Nokia N800. The toolchain for that machine uses QEMU.
It seems that QEMU/arm contains a bug that prevents the plain version of
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
--- gc-7.1alpha2.orig/include/private/gcconfig.h 2007-08-14 22:49:48.000000000 +0200
+++ gc-7.1alpha2/include/private/gcconfig.h 2008-01-11 10:22:23.000000000 +0100
@@ -1738,9 +1738,19 @@
# endif
# ifdef LINUX
# define OS_TYPE "LINUX"
-# define LINUX_STACKBOTTOM
-# undef STACK_GRAN
-# define STACK_GRAN 0x10000000
+/*---------------------------------------------------------------------*/
+/* Bigloo start */
+/*---------------------------------------------------------------------*/
+# ifdef QEMU
+# define HEURISTIC2
+# else
+# define LINUX_STACKBOTTOM
+# undef STACK_GRAN
+# define STACK_GRAN 0x10000000
+# endif
+/*---------------------------------------------------------------------*/
+/* Bigloo end */
+/*---------------------------------------------------------------------*/
# ifdef __ELF__
# define DYNAMIC_LOADING
# include <features.h>
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
Unfortunately, I don't know how to detect automatically if QEMU is running.
Thus, manually, when I compile under QEMU, I force the flag -DQEMU.
I hope this helps.
--
Manuel
M***@sophia.inria.fr
2008-01-24 09:40:20 UTC
Permalink
Post by Hans Boehm
Is the core ARM/QEMU problem here that /proc/self/stat is not emulated?
You are correct. /proc is not emulated. It uses the one of the host
platform (i.e. an x32 Linux).
Post by Hans Boehm
Is there any reason to believe that this is really ARM specific?
I'm fine with adding an option that works around this problem. But
I'd really like the macro to be a bit more generic, like
NO_PROC_SELF_STAT, if that sounds right to you.
Yes, it does. Just to be sure that I understand you correctly: you are
only suggesting to replace the name QEMU with NO_PROC_SELF_STAT. Is
that correct?
--
Manuel
Hans Boehm
2008-01-25 05:19:10 UTC
Permalink
Could you check that the attached patch works? If so, I'll check
that in.

Hans
Post by M***@sophia.inria.fr
Post by Hans Boehm
Is the core ARM/QEMU problem here that /proc/self/stat is not emulated?
You are correct. /proc is not emulated. It uses the one of the host
platform (i.e. an x32 Linux).
Post by Hans Boehm
Is there any reason to believe that this is really ARM specific?
I'm fine with adding an option that works around this problem. But
I'd really like the macro to be a bit more generic, like
NO_PROC_SELF_STAT, if that sounds right to you.
Yes, it does. Just to be sure that I understand you correctly: you are
only suggesting to replace the name QEMU with NO_PROC_SELF_STAT. Is
that correct?
--
Manuel
M***@sophia.inria.fr
2008-01-29 13:27:38 UTC
Permalink
Hello Hans,

Sorry for my late answer. I have been too busy to conduct the test
earlier. I apologize.
Post by Hans Boehm
Could you check that the attached patch works? If so, I'll check
that in.
It works. I have modified my own version to use your NO_PROC_STAT flag
and everything went fine. Thanks a lot.

Cheers,
--
Manuel
Boehm, Hans
2008-01-29 19:28:09 UTC
Permalink
Thanks. I checked that patch into CVS.

Hans
-----Original Message-----
Sent: Tuesday, January 29, 2008 5:28 AM
To: Boehm, Hans
Subject: Re: [Gc] Re: Desperately needing GC 7.1
Hello Hans,
Sorry for my late answer. I have been too busy to conduct the
test earlier. I apologize.
Post by Hans Boehm
Could you check that the attached patch works? If so, I'll
check that
Post by Hans Boehm
in.
It works. I have modified my own version to use your
NO_PROC_STAT flag and everything went fine. Thanks a lot.
Cheers,
--
Manuel
Andreas Tobler
2008-01-12 22:08:57 UTC
Permalink
Post by Hans Boehm
In the interest of making available a GC7.1 asap, I generated a
gc-7.1alpha2. As far as I know, this is in pretty good shape.
However, it needs more testing on a variety of platforms. I'd
appreciate some more help in doing so. This is your chance to make
sure it works on your favorite platform before 7.1 is released!
You can get the tar file from the usual place, i.e.
http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/gc-7.1alpha2.tar.gz
The CVS repository should also have a corresponding tag. My convention
for 7.1+ is that alphaN for N odd is an unreleased CVS version, and
the even version(s) correspond to released snapshots.
In the absence of major problems, I would like to aim for a 7.1
release by early next week.
7.1alpha2 works around the MacOS10.5 problem by disabling parallel-mark
on the platform and adding a FIXME comment in what I think is the
problem spot. Clearly this needs a better solution, though perhaps
not in time for 7.1.
OS-X 10.5.1 intel was ok here (core2duo), cvs version. With and without
parallel-mark. So no improvements.

OS-X 10.5.1 on G4 is borked. Investigating. Doesn't matter if parallel
mark or not. A nasty race condition. The gctest hangs in 99% of trials.

OS-X 10.4.11 was ok on G4. So it might be a OS issue. I have one in the
area of libjava. Real blocking, the OS seems totally frozen until I hit
ctrl-c....

CVS version is borked for builds outside the source tree, I always have
to make distclean in libatomic-ops when configuring.

I'm investigating.

Anyway, OS-X needs some stability time in terms of 'the os should get
stable' before we hurry..... if you know what I mean.

Just FYI.

Andreas
Andreas Tobler
2008-01-13 21:08:17 UTC
Permalink
Post by Andreas Tobler
OS-X 10.5.1 intel was ok here (core2duo), cvs version. With and without
parallel-mark. So no improvements.
That was not quite correct. Parallel mark indeed does not seem to work
here. Or is it depending on the weather ...
Post by Andreas Tobler
OS-X 10.5.1 on G4 is borked. Investigating. Doesn't matter if parallel
mark or not. A nasty race condition. The gctest hangs in 99% of trials.
Ok, I think I found a reason. The gctest does not use GC_INIT anymore
for this target. Adding a GC_INIT again makes my G4 under Leopard (aka.
10.5.1) work with and without parallel-mark. Yes, with and without,
double checked.

But as said, Tiger on G4 was ok w/o modifying test.c.

Andreas

Index: tests/test.c
===================================================================
RCS file: /cvsroot/bdwgc/bdwgc/tests/test.c,v
retrieving revision 1.12
diff -u -r1.12 test.c
--- tests/test.c 25 Oct 2007 00:41:06 -0000 1.12
+++ tests/test.c 13 Jan 2008 21:06:57 -0000
@@ -65,7 +65,7 @@

/* Call GC_INIT only on platforms on which we think we really need it, */
/* so that we can test automatic initialization on the rest. */
-#if defined(__CYGWIN32__) || defined (_AIX)
+#if defined(__CYGWIN32__) || defined (_AIX) || defined (DARWIN)
# define GC_COND_INIT() GC_INIT()
#else
# define GC_COND_INIT()
Hans Boehm
2008-01-14 04:09:33 UTC
Permalink
Andreas -

I'm not sure we're on the right track here:

On DARWIN, GC_INIT() expands to just GC_init(), and that should be
happening implicitly, even without the call. I suspect that the explicit
GC_INIT() call just alters the timing enough that we don't observe the
race.

Last I looked at the code, it still seemed to me that there was a problem
with parallel mark in that the thread stopping code seemed to
inadvertently also stop the marker threads (as created by
start_mark_threads()), which I believe could have bad results, such as the
ones we're observing. I suspect this does not happen on uniprocessors
since, without explicit instructions to the contrary via an environment
variable, the collector will not create any separate marker threads in
this case. But it would explain problems on dual core machines. I put a
FIXME comment in roughly the right place, I think.

The last paragraph, even if I'm right, clearly does not explain the
failure without parallel marking on Leopard/G4. And it would be great to
understand that. In the event of a failure, is the stack trace the same
as before?

I wouldn't apply this patch without understanding this in more details.
It seems too likely that we're just hiding a bug that will reappear in a
more complicated context.

Thanks.

Hans
Post by Andreas Tobler
Post by Andreas Tobler
OS-X 10.5.1 intel was ok here (core2duo), cvs version. With and without
parallel-mark. So no improvements.
That was not quite correct. Parallel mark indeed does not seem to work
here. Or is it depending on the weather ...
Post by Andreas Tobler
OS-X 10.5.1 on G4 is borked. Investigating. Doesn't matter if parallel
mark or not. A nasty race condition. The gctest hangs in 99% of trials.
Ok, I think I found a reason. The gctest does not use GC_INIT anymore
for this target. Adding a GC_INIT again makes my G4 under Leopard (aka.
10.5.1) work with and without parallel-mark. Yes, with and without,
double checked.
But as said, Tiger on G4 was ok w/o modifying test.c.
Andreas
Index: tests/test.c
===================================================================
RCS file: /cvsroot/bdwgc/bdwgc/tests/test.c,v
retrieving revision 1.12
diff -u -r1.12 test.c
--- tests/test.c 25 Oct 2007 00:41:06 -0000 1.12
+++ tests/test.c 13 Jan 2008 21:06:57 -0000
@@ -65,7 +65,7 @@
/* Call GC_INIT only on platforms on which we think we really need it, */
/* so that we can test automatic initialization on the rest. */
-#if defined(__CYGWIN32__) || defined (_AIX)
+#if defined(__CYGWIN32__) || defined (_AIX) || defined (DARWIN)
# define GC_COND_INIT() GC_INIT()
#else
# define GC_COND_INIT()
Andreas Tobler
2008-01-14 21:38:59 UTC
Permalink
Hi Hans,
And I'm not sure if I'm really on the wrong track.
The GC_INIT in test.c was not meant for going into cvs.
It should only help me to get a starting point where to look for the
real issue.
Post by Hans Boehm
On DARWIN, GC_INIT() expands to just GC_init(), and that should be
happening implicitly, even without the call. I suspect that the
explicit GC_INIT() call just alters the timing enough that we don't
observe the race.
Agreed, but, without calling GC_init in the test case I never reach
GC_init_dyld in gdb.
A notice in the dyn_load.c says:

/* The _dyld_* functions have an internal lock so no _dyld functions
can be called while the world is stopped without the risk of a deadlock.
Because of this we MUST setup callbacks BEFORE we ever stop the world.
This should be called BEFORE any thread in created and WITHOUT the
allocation lock held. */

Now the trace looks like this:
Entering GC_mprotect_thread_notify
^C
Program received signal SIGINT, Interrupt.
0x943ce9d8 in mach_msg_trap ()
(gdb) bt
#0 0x943ce9d8 in mach_msg_trap ()
#1 0x943d5900 in mach_msg ()
#2 0x00060fe4 in GC_mprotect_thread_notify (id=1) at ../bdwgc/os_dep.c:3547
#3 0x0006743c in GC_stop_world () at ../bdwgc/darwin_stop_world.c:514
#4 0x0005240c in GC_stopped_mark (stop_func=0x520b0
<GC_never_stop_func>) at ../bdwgc/alloc.c:468
#5 0x000531cc in GC_try_to_collect_inner (stop_func=0x520b0
<GC_never_stop_func>) at ../bdwgc/alloc.c:356
#6 0x0005fedc in GC_init_inner () at ../bdwgc/misc.c:730
#7 0x0005ffa8 in GC_enable_incremental () at ../bdwgc/misc.c:788
#8 0x00003d14 in main () at ../bdwgc/tests/test.c:1614


In frame 6 we jump to GC_try_to_collect_inner which calls
GC_stopped_mark and this one calls GC_stop_world before ever having
called GC_init_dyld.

Putting the GC_init_dyld code before misc.c:730 helps a little bit. But
not reliably.

Only if I add a usleep(1); after GC_init_dyld(); I get a more reliable
result on my G4.

Strange.....

The attached diff is also not meant for inclusion. It should show where
I am.
Post by Hans Boehm
Last I looked at the code, it still seemed to me that there was a
problem with parallel mark in that the thread stopping code seemed to
inadvertently also stop the marker threads (as created by
start_mark_threads()), which I believe could have bad results, such as
the ones we're observing. I suspect this does not happen on
uniprocessors since, without explicit instructions to the contrary via
an environment variable, the collector will not create any separate
marker threads in this case. But it would explain problems on dual core
machines. I put a FIXME comment in roughly the right place, I think.
That's a different problem. Or not ... I'm investigating still.

Andreas
M***@sophia.inria.fr
2008-01-15 03:58:50 UTC
Permalink
If time permits before the official 7.1 is unleashed, I have two additional
remarks concerning the 7.1 alpha2. The first one in this mail, the second
one in the following...

1- In order to use Makefile.direct with 7.0 and 7.1 and MacOS X 10.5 on x32
computers, I have had to add manually the compilation option:

-DHAS_X86_THREAD_STATE32___EAX

This was not needed on 10.4.

2- A Bigloo user has reported that Makefile.direct does not work with icc
on Linux. I have not had the time to investigate myself, I will
as soon as possible.

I don't know if you wish to maintain Makefile.direct but I hope you
will because using autoconf, automake and, libtool might be a true
headache. I have for instance failed to use these tools for Bigloo because
the compilation needs to bootstrap the compiler and thus it needs to
use the libraries, including the GC, before installing them. I'm not
knowledgeable enough to do this decently with libtool.

In addition, I'm a little bit concerned about the portability, in a
broad sense, of these tools (for instance, what about MacOS Xcode or
Windows VisualStudio).

Cheers,
--
Manuel
Hans Boehm
2008-01-25 05:26:08 UTC
Permalink
I'm generally planning to maintain Makefile.direct. I also tend to
use it for testing on slow machines.

The problem is that occasionally some features are very hard to test
for without autoconf. The MacOSX issue you point out is one of those,
IIRC. In those cases, my inclination is to have Makefile.direct
assume the most common case (sometimes defined to be the one that
applies on my machine :-) ) and have configure.ac do something
more generally correct.

This does mean that I'm still in favor of patches that replace autoconf
tests with macro tests when that's possible without making a complete
mess of things.

Hans
Post by M***@sophia.inria.fr
If time permits before the official 7.1 is unleashed, I have two additional
remarks concerning the 7.1 alpha2. The first one in this mail, the second
one in the following...
1- In order to use Makefile.direct with 7.0 and 7.1 and MacOS X 10.5 on x32
-DHAS_X86_THREAD_STATE32___EAX
This was not needed on 10.4.
2- A Bigloo user has reported that Makefile.direct does not work with icc
on Linux. I have not had the time to investigate myself, I will
as soon as possible.
I don't know if you wish to maintain Makefile.direct but I hope you
will because using autoconf, automake and, libtool might be a true
headache. I have for instance failed to use these tools for Bigloo because
the compilation needs to bootstrap the compiler and thus it needs to
use the libraries, including the GC, before installing them. I'm not
knowledgeable enough to do this decently with libtool.
In addition, I'm a little bit concerned about the portability, in a
broad sense, of these tools (for instance, what about MacOS Xcode or
Windows VisualStudio).
Cheers,
--
Manuel
M***@sophia.inria.fr
2008-01-25 06:04:48 UTC
Permalink
Hello Hans,
Post by Hans Boehm
I'm generally planning to maintain Makefile.direct. I also tend to
use it for testing on slow machines.
Great. I'm glad to read that.
Post by Hans Boehm
The problem is that occasionally some features are very hard to test
for without autoconf. The MacOSX issue you point out is one of those,
IIRC. In those cases, my inclination is to have Makefile.direct
assume the most common case (sometimes defined to be the one that
applies on my machine :-) ) and have configure.ac do something
more generally correct.
I understand that. I'm facing something similar with Bigloo. The tough problems
never occurred on x32 intel Linux box (i.e., the platform I'm personally
using).
Post by Hans Boehm
This does mean that I'm still in favor of patches that replace autoconf
tests with macro tests when that's possible without making a complete
mess of things.
That's actually fine by me. In the future, I will pay attention to this.
If I'm able to provide with CPP macros, I will.

Thanks again.
--
Manuel
Boehm, Hans
2008-01-29 19:37:42 UTC
Permalink
Andreas -

It sounds like, in light of your insight, maybe the right thing to do here for now is to actually check in a patch that forces the GC_INIT() call for Darwin, i.e. treats it like AIX or Cygwin. If we do that, and disable parallel marking for now, does everything work reliably?

I think that both of these should be fixed eventually. But it may be a good idea to get 7.1 out in the meantime.

Thanks.

Hans
-----Original Message-----
Sent: Monday, January 14, 2008 1:39 PM
To: Boehm, Hans
Subject: Re: [Gc] Re: Desperately needing GC 7.1
Hi Hans,
And I'm not sure if I'm really on the wrong track.
The GC_INIT in test.c was not meant for going into cvs.
It should only help me to get a starting point where to look
for the real issue.
Post by Hans Boehm
On DARWIN, GC_INIT() expands to just GC_init(), and that should be
happening implicitly, even without the call. I suspect that the
explicit GC_INIT() call just alters the timing enough that we don't
observe the race.
Agreed, but, without calling GC_init in the test case I never
reach GC_init_dyld in gdb.
/* The _dyld_* functions have an internal lock so no _dyld functions
can be called while the world is stopped without the risk
of a deadlock.
Because of this we MUST setup callbacks BEFORE we ever
stop the world.
This should be called BEFORE any thread in created and WITHOUT the
allocation lock held. */
Entering GC_mprotect_thread_notify
^C
Program received signal SIGINT, Interrupt.
0x943ce9d8 in mach_msg_trap ()
(gdb) bt
#0 0x943ce9d8 in mach_msg_trap ()
#1 0x943d5900 in mach_msg ()
#2 0x00060fe4 in GC_mprotect_thread_notify (id=1) at
../bdwgc/os_dep.c:3547
#3 0x0006743c in GC_stop_world () at ../bdwgc/darwin_stop_world.c:514
#4 0x0005240c in GC_stopped_mark (stop_func=0x520b0
<GC_never_stop_func>) at ../bdwgc/alloc.c:468
#5 0x000531cc in GC_try_to_collect_inner (stop_func=0x520b0
<GC_never_stop_func>) at ../bdwgc/alloc.c:356
#6 0x0005fedc in GC_init_inner () at ../bdwgc/misc.c:730
#7 0x0005ffa8 in GC_enable_incremental () at ../bdwgc/misc.c:788
#8 0x00003d14 in main () at ../bdwgc/tests/test.c:1614
In frame 6 we jump to GC_try_to_collect_inner which calls
GC_stopped_mark and this one calls GC_stop_world before ever
having called GC_init_dyld.
Putting the GC_init_dyld code before misc.c:730 helps a
little bit. But not reliably.
Only if I add a usleep(1); after GC_init_dyld(); I get a more
reliable result on my G4.
Strange.....
The attached diff is also not meant for inclusion. It should
show where I am.
Post by Hans Boehm
Last I looked at the code, it still seemed to me that there was a
problem with parallel mark in that the thread stopping code
seemed to
Post by Hans Boehm
inadvertently also stop the marker threads (as created by
start_mark_threads()), which I believe could have bad
results, such as
Post by Hans Boehm
the ones we're observing. I suspect this does not happen on
uniprocessors since, without explicit instructions to the
contrary via
Post by Hans Boehm
an environment variable, the collector will not create any separate
marker threads in this case. But it would explain problems on dual
core machines. I put a FIXME comment in roughly the right
place, I think.
That's a different problem. Or not ... I'm investigating still.
Andreas
Andreas Tobler
2008-01-30 21:43:14 UTC
Permalink
Hello Hans,
Post by Boehm, Hans
It sounds like, in light of your insight, maybe the right thing to do here for now is to actually check in a patch that forces the GC_INIT() call for Darwin, i.e. treats it like AIX or Cygwin. If we do that, and disable parallel marking for now, does everything work reliably?
I think that both of these should be fixed eventually. But it may be a good idea to get 7.1 out in the meantime.
The patch for the moment would only consist of the GC_INIT in
test/test.c, see below.

The documentation already exists since 2003, doc/README.darwin
---
Darwin/MacOSX Support - December 16, 2003
=========================================

Important Usage Notes
=====================

GC_init() MUST be called before calling any other GC functions. This
is necessary to properly register segments in dynamic libraries. This
call is required even if you code does not use dynamic libraries as the
dyld code handles registering all data segments.
. . .
---


Unfortunately I do not have much time to follow in deep what's going on
here. A person got severely ill which needs some more effort from my side.


Adding the below to the test case makes it at least work under ppc-32
and x86 Darwin.

Andreas

Index: tests/test.c
===================================================================
RCS file: /cvsroot/bdwgc/bdwgc/tests/test.c,v
retrieving revision 1.12
diff -u -r1.12 test.c
--- tests/test.c 25 Oct 2007 00:41:06 -0000 1.12
+++ tests/test.c 30 Jan 2008 21:36:04 -0000
@@ -65,7 +65,7 @@

/* Call GC_INIT only on platforms on which we think we really need it, */
/* so that we can test automatic initialization on the rest. */
-#if defined(__CYGWIN32__) || defined (_AIX)
+#if defined(__CYGWIN32__) || defined (_AIX) || defined (DARWIN)
# define GC_COND_INIT() GC_INIT()
#else
# define GC_COND_INIT()
Boehm, Hans
2008-01-30 22:03:43 UTC
Permalink
Thanks. And apologies for dragging you back into this.

I put a variant of that patch in my tree, and will check it in shortly. (I'll also get rid of the underscores in CYGWIN32 and AIX. This includes gc_priv.h, so the internal names are better, and makes this look more consistent.)

I had clearly forgotten about that disclaimer ...

I think that with this change, everything else can wait. If someone else wants to look at the parallel mark issue on Darwin, and whether the marker threads are accidentally stopped, that's probably the next most pressing issue there.

Hans
-----Original Message-----
Sent: Wednesday, January 30, 2008 1:43 PM
To: Boehm, Hans
Subject: Re: [Gc] Re: Desperately needing GC 7.1
Hello Hans,
Post by Boehm, Hans
It sounds like, in light of your insight, maybe the right
thing to do here for now is to actually check in a patch that
forces the GC_INIT() call for Darwin, i.e. treats it like AIX
or Cygwin. If we do that, and disable parallel marking for
now, does everything work reliably?
Post by Boehm, Hans
I think that both of these should be fixed eventually. But
it may be a good idea to get 7.1 out in the meantime.
The patch for the moment would only consist of the GC_INIT in
test/test.c, see below.
The documentation already exists since 2003, doc/README.darwin
---
Darwin/MacOSX Support - December 16, 2003
=========================================
Important Usage Notes
=====================
GC_init() MUST be called before calling any other GC
functions. This is necessary to properly register segments in
dynamic libraries. This call is required even if you code
does not use dynamic libraries as the dyld code handles
registering all data segments.
. . .
---
Unfortunately I do not have much time to follow in deep
what's going on here. A person got severely ill which needs
some more effort from my side.
Adding the below to the test case makes it at least work
under ppc-32 and x86 Darwin.
Andreas
Index: tests/test.c
===================================================================
RCS file: /cvsroot/bdwgc/bdwgc/tests/test.c,v
retrieving revision 1.12
diff -u -r1.12 test.c
--- tests/test.c 25 Oct 2007 00:41:06 -0000 1.12
+++ tests/test.c 30 Jan 2008 21:36:04 -0000
@@ -65,7 +65,7 @@
/* Call GC_INIT only on platforms on which we think we
really need it, */
/* so that we can test automatic initialization on the
rest. */
-#if defined(__CYGWIN32__) || defined (_AIX)
+#if defined(__CYGWIN32__) || defined (_AIX) || defined (DARWIN)
# define GC_COND_INIT() GC_INIT()
#else
# define GC_COND_INIT()
Continue reading on narkive:
Loading...