Discussion:
[Gc] Powerpc/m68k/freebsd issue running test_stack
(too old to reply)
Ivan Maidanski
2012-07-14 09:54:28 UTC
Permalink
Hi Ian,

I fixed build problem with kfreebsd/i386: https://github.com/ivmai/libatomic_ops/commit/613f39d369045e8fc385a439f67a575cddcc6fa1 (push to master and to https://github.com/ivmai/libatomic_ops/tree/libatomic_ops-7_2-hotfix-1 - will go to v7.2d release including that of bdwgc)

I'll try to find the problem with test_malloc on kfreebsd/i386.

I can't access ppc and m68k platforms, so it would be good if someone else prepares the patches.

Regards,
Ivan
Hi,
We've received Debian bug #680100 [1] that test_stack is spinning out
on powerpc. There is also a possibly related bug with test_stack
seeming to give a bus error on m68k [2], and on i386 with the freebsd
kernel [3].
The change that you pushed the other day should only have affected
ia64; so I think what we're seeing here is an existing issue just
showing up because the recent update has made libatomic-ops rebuild on
all these architectures. The freebsd one pre-dates recent changes...
Maybe these are related, or maybe not. It would be great if anyone
with these architectures could duplicate the problem to give us some
more clues.
-i
[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680066
[3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655872
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Ivan Maidanski
2012-07-22 09:11:33 UTC
Permalink
Hi Ian,

The bug is fixed and I pushed the fix (https://github.com/ivmai/libatomic_ops/commit/6c81a6bda886a3cd2894b27a734b75bb40586ef5) to 7.2d release candidate branch as well: https://github.com/ivmai/libatomic_ops/tree/libatomic_ops-7_2-hotfix-1

Regards,
Ivan
Source: libatomic-ops
Source-Version: 7.3~alpha1+git20120718-1
We believe that the bug you reported is fixed in the latest version of
libatomic-ops, which is due to be installed in the Debian FTP archive.
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
...
Closes: 655872
libatomic-ops (7.3~alpha1+git20120718-1) unstable; urgency=low
.
* Closes: 655872 -- Update from upstream git which fixes issues with
older compilers as used on kfreebsd
(613f39d369045e8fc385a439f67a575cddcc6fa1). Thanks Ivan!
...
Hi Ian,
I fixed build problem with kfreebsd/i386: https://github.com/ivmai/libatomic_ops/commit/613f39d369045e8fc385a439f67a575cddcc6fa1 (push to master and to https://github.com/ivmai/libatomic_ops/tree/libatomic_ops-7_2-hotfix-1 - will go to v7.2d release including that of bdwgc)
I'll try to find the problem with test_malloc on kfreebsd/i386.
I can't access ppc and m68k platforms, so it would be good if someone else prepares the patches.
Regards,
Ivan
Hi,
We've received Debian bug #680100 [1] that test_stack is spinning out
on powerpc. There is also a possibly related bug with test_stack
seeming to give a bus error on m68k [2], and on i386 with the freebsd
kernel [3].
The change that you pushed the other day should only have affected
ia64; so I think what we're seeing here is an existing issue just
showing up because the recent update has made libatomic-ops rebuild on
all these architectures. The freebsd one pre-dates recent changes...
Maybe these are related, or maybe not. It would be great if anyone
with these architectures could duplicate the problem to give us some
more clues.
-i
[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680066
[3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655872
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Ivan Maidanski
2012-09-23 09:44:34 UTC
Permalink
Hi Michael, Thorsten and Wouter,

A couple of months, You reported libatomic_ops-7.3alpaha test_stack failure on Alpha, PowerPC and m68k:
* http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100
* http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680066

1. Could you please verify that the problem exists in the recent libatomic_ops-7.2:

git clone git://github.com/ivmai/libatomic_ops.git -b release-7_2
cd libatomic_ops
./configure
make check

2. If the problem exits (either in v7.2 or 7.3alpha) and it is a regression, please point me to the version you compare with.

3. I could spend some time (during next several weeks) to find out the root cause of these failures (and prepare the patch) but I don't have access to the mentioned hardware, so a collaboration with you is required to fix these problems.

Thank you.

PS. There should be no problem with freebsd/i686 any longer.

Regards,
Ivan
Hi,
We've received Debian bug #680100 [1] that test_stack is spinning out
on powerpc. There is also a possibly related bug with test_stack
seeming to give a bus error on m68k [2], and on i386 with the freebsd
kernel [3].
The change that you pushed the other day should only have affected
ia64; so I think what we're seeing here is an existing issue just
showing up because the recent update has made libatomic-ops rebuild on
all these architectures. The freebsd one pre-dates recent changes...
Maybe these are related, or maybe not. It would be great if anyone
with these architectures could duplicate the problem to give us some
more clues.
-i
[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100
[2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680066
[3] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=655872
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Thorsten Glaser
2012-09-23 10:53:25 UTC
Permalink
Post by Ivan Maidanski
Hi,
We've received Debian bug #680100 [1] that test_stack is spinning out
on powerpc. There is also a possibly related bug with test_stack
Interesting about the spinning. This sort of matches problems I had
with things like Qt4 which seem to wait for return of children forever,
and I’m still suspecting a threads (possibly kernel/eglibc) bug there
somewhere.
Post by Ivan Maidanski
git clone git://github.com/ivmai/libatomic_ops.git -b release-7_2
cd libatomic_ops
./configure
make check
Will do that, for both release-7_2 and master. Will take a bit though ☺
Post by Ivan Maidanski
3. I could spend some time (during next several weeks) to find out the
root cause of these failures (and prepare the patch) but I don't have
access to the mentioned hardware, so a collaboration with you is
required to fix these problems.
https://wiki.debian.org/Aranym/Quick for m68k.

bye,
//mirabilos
--
Solange man keine schmutzigen Tricks macht, und ich meine *wirklich*
schmutzige Tricks, wie bei einer doppelt verketteten Liste beide
Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz
hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r
Thorsten Glaser
2012-09-23 11:07:35 UTC
Permalink
Yes, it does (see attached file).

Cannot test in master due to:

***@aranym:~/libatomic_ops-master # mksh autogen.sh
libtoolize: putting auxiliary files in `.'.
libtoolize: copying file `./ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
configure.ac:14: installing `./compile'
configure.ac:5: installing `./config.guess'
configure.ac:5: installing `./config.sub'
configure.ac:8: installing `./install-sh'
configure.ac:8: installing `./missing'
src/Makefile.am: installing `./depcomp'
configure.ac:166: required file `pkgconfig/atomic_ops-uninstalled.pc.in' not found
autoreconf: automake failed with exit status: 1
1|***@aranym:~/libatomic_ops-master # ls pkgconfig/
atomic_ops.pc.in
Post by Ivan Maidanski
2. If the problem exits (either in v7.2 or 7.3alpha) and it is a
regression, please point me to the version you compare with.
No idea about that, I’ve not been doing m68k work for “very long”,
and I usually build without the tests just to get things doing.
But with an ARAnyM instance or two of your own, it’s easy enough
to look yourself, I guess, gdb works. (Do make sure to use the
current 0.9.13-3.1 upload or a backport thereof, I’ve got backports
for e.g. Kubuntu hardy and lucid on hand if needed.)

bye,
//mirabilos
--
«MyISAM tables -will- get corrupted eventually. This is a fact of life. »
“mysql is about as much database as ms access” – “MSSQL at least descends
from a database” “it's a rebranded SyBase” “MySQL however was born from a
flatfile and went downhill from there” – “at least jetDB doesn’t claim to
be a database” -- Tonnerre, psychoschlumpf and myself in #nosec
Ivan Maidanski
2012-09-23 12:44:30 UTC
Permalink
Hi Thorsten,

According to your log, the test fails with SIGBUS.
Is cas.l asm instruction supported on ARAnyM emulator?

If no then we should add some feature macro tests to gcc/m68k.h (to disable use of cas.l if unsupported)
The temporal workaround is to comment out #define AO_HAVE_compare_and_swap_full in gcc/m68k.h
(Another very slow workaround is to use CFLAGS=-DAO_USE_PTHREAD_DEFS)

Regards,
Ivan
Yes, it does (see attached file).
Cannot test in master due to:
***@aranym:~/libatomic_ops-master # mksh autogen.sh
libtoolize: putting auxiliary files in `.'.
libtoolize: copying file `./ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
configure.ac:14: installing `./compile'
configure.ac:5: installing `./config.guess'
configure.ac:5: installing `./config.sub'
configure.ac:8: installing `./install-sh'
configure.ac:8: installing `./missing'
src/Makefile.am: installing `./depcomp'
configure.ac:166: required file `pkgconfig/atomic_ops-uninstalled.pc.in' not found
autoreconf: automake failed with exit status: 1
1|***@aranym:~/libatomic_ops-master # ls pkgconfig/
atomic_ops.pc.in
Post by Ivan Maidanski
2. If the problem exits (either in v7.2 or 7.3alpha) and it is a
regression, please point me to the version you compare with.
No idea about that, I’ve not been doing m68k work for “very long”,
and I usually build without the tests just to get things doing.
But with an ARAnyM instance or two of your own, it’s easy enough
to look yourself, I guess, gdb works. (Do make sure to use the
current 0.9.13-3.1 upload or a backport thereof, I’ve got backports
for e.g. Kubuntu hardy and lucid on hand if needed.)
bye,
//mirabilos
--
«MyISAM tables -will- get corrupted eventually. This is a fact of life. »
“mysql is about as much database as ms access” – “MSSQL at least descends
from a database” “it's a rebranded SyBase” “MySQL however was born from a
flatfile and went downhill from there” – “at least jetDB doesn’t claim to
be a database” -- Tonnerre, psychoschlumpf and myself in #nosec
Thorsten Glaser
2012-09-23 14:36:08 UTC
Permalink
Is cas.l asm instruction supported on ARAnyM emulator?
Yes, even recently fixed AFAIK, hence my version numbering
recommendation. I just wrote a small test program to see
whether it really matches the description in the M68000PRM
(and yes, it does):

***@aranym:~ # gcc -o t t1.c t2.S
***@aranym:~ # ./t
#1: cmp(1->1) new(2) val(1->2)
#2: cmp(3->2) new(4) val(^->2)
***@aranym:~ # cat t1.c
#include <stdio.h>

volatile int v __attribute__((__aligned__(4)));

extern int cas_l(int cmpval, int newval, volatile int *valptr);

int
main(void)
{
int rv;

v = 1;
rv = cas_l(1, 2, &v);
printf("#1: cmp(1->%d) new(2) val(1->%d)\n", rv, v);
rv = cas_l(3, 4, &v);
printf("#2: cmp(3->%d) new(4) val(^->%d)\n", rv, v);
return (0);
}
***@aranym:~ # cat t2.S
.text
.globl cas_l
cas_l: move.l 4(%sp),%d0
move.l 8(%sp),%d1
move.l 12(%sp),%a0
cas.l %d0,%d1,(%a0)
rts

bye,
//mirabilos
--
emacs als auch vi zum Kotzen finde (joe rules) und pine für den einzig
bedienbaren textmode-mailclient halte (und ich hab sie alle ausprobiert). ;)
Hallooooo, ich bin der Holger ("Hallo Holger!"), und ich bin ebenfalls
... pine-User, und das auch noch gewohnheitsmäßig ("Oooooooohhh"). [aus dasr]
Thorsten Glaser
2012-11-20 20:42:54 UTC
Permalink
Dixi quod…
Post by Thorsten Glaser
Yes, it does (see attached file).
By the way:

***@aranym:~/libatomic_ops-7_2 # tests/
test_stack
Segmentation fault
139|***@aranym:~/libatomic_ops-7_2 # gdb tests/
test_stack
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "m68k-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/libatomic_ops-7_2/tests/test_stack...done.
(gdb) r
Starting program: /root/libatomic_ops-7_2/tests/test_stack
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/m68k-linux-gnu/libthread_db.so.1".
[New Thread 0xc41664c0 (LWP 22160)]
[Thread 0xc41664c0 (LWP 22160) exited]

Program received signal SIGSEGV, Segmentation fault.
0x80000bbc in add_elements (n=741812669) at test_stack.c:66
66 add_elements(n-1);
(gdb) bt
#0 0x80000bbc in add_elements (n=741812669) at test_stack.c:66
#1 0x80000bc2 in add_elements (n=741812670) at test_stack.c:66
#2 0x80000bc2 in add_elements (n=741812671) at test_stack.c:66
#3 0x80000bc2 in add_elements (n=741812672) at test_stack.c:66
#4 0x80000bc2 in add_elements (n=741812673) at test_stack.c:66
#5 0x80000bc2 in add_elements (n=741812674) at test_stack.c:66
#6 0x80000bc2 in add_elements (n=741812675) at test_stack.c:66
#7 0x80000bc2 in add_elements (n=741812676) at test_stack.c:66
#8 0x80000bc2 in add_elements (n=741812677) at test_stack.c:66
[… this repeats … more than 2100 times…]

With debugging:

# ./
test_stack2

Debug: nthreads=1; list_length=1
Debug: nthreads=2; list_length=3
Debug: nthreads=-1073636204; list_length=746006802
Segmentation fault

That is:
199 for (nthreads = 1; nthreads <= max_nthreads; ++nthreads)
200 {
201 int i;
202 pthread_t thread[MAX_NTHREADS];
203 int list_length = nthreads*(nthreads+1)/2;
204 long long start_time;
205 list_element * le;
206
207 /* + */ printf("Debug: nthreads=%d; list_length=%d\n", nthreads,
list_length);
208 add_elements(list_length);
209 # ifdef VERBOSE

That number is actually C0019C94, constant across runs. According to /proc/$$/
maps this is:
c0000000-c0019000 r-xp 00000000 fe:81 1030344 /lib/m68k-linux-gnu/ld-2.13.so
c0019000-c001b000 rw-p 00000000 00:00 0
c001b000-c001c000 r--p 00019000 fe:81 1030344 /lib/m68k-linux-gnu/ld-2.13.so

So, maybe something with the dynamic linker?

Anyway, an idea why/how nthreads could be corrupted here?

bye,
//mirabilos (please Cc me, I’m not on the list)
Thorsten Glaser
2012-11-20 21:08:52 UTC
Permalink
Dixi quod…
Post by Thorsten Glaser
Anyway, an idea why/how nthreads could be corrupted here?
Heh. Might want to look at whether the asm code trashes
registers because I just did this:

# ./test_stack2
Debug: nthreads=1; list_length=1
Debug: nthreads=2; list_length=3
Debug: nthreads=3; list_length=6
Debug: nthreads=4; list_length=10
About 1000000 pushes + 1000000 pops in 1 threads: 15780 msecs
About 1000000 pushes + 1000000 pops in 2 threads: 15840 msecs
About 1000000 pushes + 1000000 pops in 3 threads: 15820 msecs
About 1000000 pushes + 1000000 pops in 4 threads: 15960 msecs

After this:

***@aranym:~/libatomic_ops-7_2 # fgrep -C4 __sync_ src/atomic_ops/sysdeps/gcc/
m68k.h
AO_INLINE int
AO_compare_and_swap_full(volatile AO_t *addr,
AO_t old, AO_t new_val)
{
return (int)__sync_bool_compare_and_swap(addr, old, new_val);
}
#define AO_HAVE_compare_and_swap_full

#include "../ao_t_is_int.h"

It’s, of course, much slower (as the GCC __sync_* builtins use the
kernel helper nowadays, which though is guaranteed to work, and not
suffer from any (past) emulation bugs or, worse, CAS bugs on real
metal (see the linux-m68k list for some recent explanation on those),
but at least it works.

So you might want to make libatomic-ops on m68k completely use just
the GCC (4.6) atomic builtins if available (I patched our GCC 4.6
to use the trunk code for them and offer them) and fall back to the
asm stuff only for old Linux and nōn-Linux like FreeMiNT.

Just an idea.

bye,
//mirabilos (still, please Cc me)
Thorsten Glaser
2012-11-20 21:26:53 UTC
Permalink
Dixi quod

Post by Thorsten Glaser
Post by Ivan Maidanski
1. Could you please verify that the problem exists in the recent
Yes, it does (see attached file).
test_stack
Segmentation fault
That being said, a build of the Debian unstable package, with
running the testsuite, succeeded. On the other hand, in the
build chroot I’ve already a patched eglibc with a fix for 5-
and 6-argument “cancellable syscalls” (whatever they are), so
this might also have helped.

See attached build log. It’s not 7.2 though


bye,
//mirabilos
--
Darwinism never[
]applied to wizardkind. There's a more than fair amount of[
]
stupidity in its gene-pool[
]never eradicated[
]magic evens the odds that way.
It's[
]harder to die for us than[
]muggles[
]wonder if, as technology[
]better
[
]same will[
]happen there too. Dursleys' continued existence indicates so.
Ivan Maidanski
2012-11-21 16:03:35 UTC
Permalink
Hi Thorsten,

Could you summarize please you suggestion how to fix this? Use GCC builtins for AO_test_and_set_full and AO_compare_and_swap_full for m68k if GCC 4.6+, right?

Regards,
Ivan
Post by Thorsten Glaser
Dixi quod

Post by Thorsten Glaser
Post by Ivan Maidanski
1. Could you please verify that the problem exists in the recent
Yes, it does (see attached file).
test_stack
Segmentation fault
That being said, a build of the Debian unstable package, with
running the testsuite, succeeded. On the other hand, in the
build chroot I’ve already a patched eglibc with a fix for 5-
and 6-argument “cancellable syscalls” (whatever they are), so
this might also have helped.
See attached build log. It’s not 7.2 though

bye,
//mirabilos
--
Darwinism never[
]applied to wizardkind. There's a more than fair amount of[
]
stupidity in its gene-pool[
]never eradicated[
]magic evens the odds that way.
It's[
]harder to die for us than[
]muggles[
]wonder if, as technology[
]better
[
]same will[
]happen there too. Dursleys' continued existence indicates so.
Post by Thorsten Glaser
_______________________________________________
Gc mailing list
Post by Thorsten Glaser
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Thorsten Glaser
2012-11-21 18:51:11 UTC
Permalink
Hi Ivan,

sorry for my three a bit disoriented quick posts. I had wanted
to do some “quick” tests, and then, thinking that was “it”,
posted the results, but got something new after each message ☹
Post by Ivan Maidanski
Could you summarize please you suggestion how to fix this?
I need to make a couple more checks. *Maybe* it got fixed
with the new eglibc patch already, since the Debian package
7.3~alpha3+git20121114-1 unexpectedly *passed* all its tests.

So, right now, do nothing.
Post by Ivan Maidanski
Use GCC builtins for AO_test_and_set_full and AO_compare_and_swap_full
for m68k if GCC 4.6+, right?
The GCC builtins can be used on m68k if *all* of these two
conditions are matched:
• GCC 4.6 or above
• a link-time check with the functions passes, to weed
out GCC builds that do not yet have the patch to add
these functions
• a Linux kernel with certain bugs fixed, no non-Linux
(I can dig out exact versions if you want, but for
now, roughly 3.2 or up should work)

However, it’s slower in all cases. I’m Cc’ing the porter
list so the Debian and Linux kernel people can say whether
they think that libatomic-ops should continue to use CAS
and friends, or switch to the kernel function if available.

The Linux kernel version check is run-time, but honestly,
if it’s there at compile time it’ll be there at run-time
too, besides GCC and others use the kernel helper as well
nowadays. A non-Linux kernel can't use them, though.

bye,
//mirabilos
--
Darwinism never[…]applied to wizardkind. There's a more than fair amount of[…]
stupidity in its gene-pool[…]never eradicated[…]magic evens the odds that way.
It's[…]harder to die for us than[…]muggles[…]wonder if, as technology[…]better
[…]same will[…]happen there too. Dursleys' continued existence indicates so.
Thorsten Glaser
2012-11-21 19:21:56 UTC
Permalink
Dixi quod…
Post by Thorsten Glaser
Post by Ivan Maidanski
Could you summarize please you suggestion how to fix this?
I need to make a couple more checks. *Maybe* it got fixed
with the new eglibc patch already, since the Debian package
7.3~alpha3+git20121114-1 unexpectedly *passed* all its tests.
OK, after some tests, I can confirm that the patched libc,
independent of the kernel, fixes the test_stack program.

@Debian eglibc maintainers: So this means that yes, please
include the patch when your policy permits (it’s fine if it
is not possible until after the wheezy release and unfreeze,
I’ll carry a patched version until then).

@boehm-gc people: I think you need to not do anything. Note
that I could not test master as autoreconf didn’t want to
work for me, but Ivan, if you could private-mail me a tarball
of master where I just need to run ./configure && make check,
I will gladly do that.

@Debian-68k people: this tests the eglibc patch and validates
that we indeed want it; as for the kernel patch, the test has
not yet compiled far enough to see. I’m running the rebuilt
kernel already (the atari flavour is built early enough ☺),
and it at least shows no regressions.

bye,
//mirabilos
--
“It is inappropriate to require that a time represented as
seconds since the Epoch precisely represent the number of
seconds between the referenced time and the Epoch.”
-- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2
Alexander Herz
2012-11-22 16:13:51 UTC
Permalink
Hi,

is there a way to control the alignment of the boehm allocated memory
(for SSE)?

Thx,
Alex
Ivan Maidanski
2012-11-24 14:14:37 UTC
Permalink
Hi Alexander,
Hi,
is there a way to control the alignment of the boehm allocated memory
(for SSE)?
There is GC_memalign (and GC_posix_memalign) replacing GC_malloc in case an alignment should be set. This exists for primarily for malloc routines redirection purpose and is not well tested. Atomic companion (GC_memalign_atomic) is missing (but it could be added on demand). There is no way to setup default alignment at runtime (I can't say right now about configuration options for alignment).

Regards,
Ivan
Thx,
Alex
Ivan Maidanski
2012-11-24 13:28:24 UTC
Permalink
Hi Thorsten,
Post by Thorsten Glaser
Dixi quod

Post by Ivan Maidanski
Could you summarize please you suggestion how to fix this?
I need to make a couple more checks. *Maybe* it got fixed
with the new eglibc patch already, since the Debian package
7.3~alpha3+git20121114-1 unexpectedly *passed* all its tests.
OK, after some tests, I can confirm that the patched libc,
independent of the kernel, fixes the test_stack program.
@Debian eglibc maintainers: So this means that yes, please
include the patch when your policy permits (it’s fine if it
is not possible until after the wheezy release and unfreeze,
I’ll carry a patched version until then).
@boehm-gc people: I think you need to not do anything. Note
that I could not test master as autoreconf didn’t want to
work for me, but Ivan, if you could private-mail me a tarball
of master where I just need to run ./configure && make check,
I will gladly do that.
Here they are:
http://www.ivmaisoft.com/_bin/bdwgc/gc-7.3alpha3-20121123.tar.bz2
http://www.ivmaisoft.com/_bin/bdwgc/libatomic_ops-7.3alpha3-20121123.tar.bz2

Regards,
Ivan
@Debian-68k people: this tests the eglibc patch and validates
that we indeed want it; as for the kernel patch, the test has
not yet compiled far enough to see. I’m running the rebuilt
kernel already (the atari flavour is built early enough ☺),
and it at least shows no regressions.
bye,
//mirabilos
Thorsten Glaser
2012-12-02 16:40:08 UTC
Permalink
Post by Ivan Maidanski
http://www.ivmaisoft.com/_bin/bdwgc/libatomic_ops-7.3alpha3-20121123.tar.bz2
Sorry it took so long, but here are the results (after quite
some fiddling): all 4 tests passed. (Fiddling because the
tarball was incomplete, it lacked install-sh, ltmain.sh, etc.)

So the current code looks working, and whether we want to use
the kernel helper or not is a political/technical discussion
that has only a change request priority, not a bug.
Post by Ivan Maidanski
http://www.ivmaisoft.com/_bin/bdwgc/gc-7.3alpha3-20121123.tar.bz2
I guess I could test that one too… one fails:

libtool: link: gcc -fexceptions -Wall -Wextra -g -O2 -fno-strict-aliasing -o .libs/disclaim_bench disclaim_bench.o ./.libs/libgc.so
make[2]: Leaving directory `/root/share/gc-7.3alpha3'
make check-TESTS
make[2]: Entering directory `/root/share/gc-7.3alpha3'
FAILED: File conversions differ
Aborted
FAIL: cordtest
Switched to incremental mode
Emulating dirty bits with mprotect/signals
Completed 6 tests
Allocated 11991263 collectable objects
Allocated 1218 uncollectable objects
Allocated 7500000 atomic objects
Allocated 137760 stubborn objects
Finalized 13217/13217 objects - finalization is probably ok
Total number of bytes allocated is 581540222
Final heap size is 27717632 bytes
Completed 532 collections
Collector appears to work
PASS: gctest
Found 1 leaked objects:
0x80044fd0 (tests/leak_test.c:19, sz=4, NORMAL)
Found 9 leaked objects:
0x80045f10 (tests/leak_test.c:19, sz=5, NORMAL)
0x80045f30 (tests/leak_test.c:19, sz=6, NORMAL)
0x80045f50 (tests/leak_test.c:19, sz=7, NORMAL)
0x80045f70 (tests/leak_test.c:19, sz=8, NORMAL)
0x80045f90 (tests/leak_test.c:19, sz=9, NORMAL)
0x80045fb0 (tests/leak_test.c:19, sz=10, NORMAL)
0x80045fd0 (tests/leak_test.c:19, sz=11, NORMAL)
0x80045ff0 (tests/leak_test.c:19, sz=12, NORMAL)
0x80044fe8 (tests/leak_test.c:12, sz=4, NORMAL)
PASS: leaktest
Final heap size is 131072
PASS: middletest
GC_check_heap_block: found 1 smashed heap objects:
0x8004bff8 in or near object at 0x8004bfd0 (tests/smash_test.c:22, sz=40)
GC_check_heap_block: found 1 smashed heap objects:
0x8004bff8 in or near object at 0x8004bfd0 (tests/smash_test.c:22, sz=40)
GC_check_heap_block: found 2 smashed heap objects:
0x800b72b8 in or near object at 0x800b7290 (tests/smash_test.c:22, sz=40)
0x8004bff8 in or near object at 0x8004bfd0 (tests/smash_test.c:22, sz=40)
PASS: smashtest
PASS: hugetest
Heap size: 65536
Heap size: 131072
PASS: realloc_test
PASS: staticrootstest
Found 1 leaked objects:
0x80046fe8 (tests/thread_leak_test.c:29, sz=4, NORMAL)
Found 1 leaked objects:
0x80046fd0 (tests/thread_leak_test.c:29, sz=4, NORMAL)
Found 1 leaked objects:
0x80046fe8 (tests/thread_leak_test.c:29, sz=4, NORMAL)
Found 1 leaked objects:
0x80046fd0 (tests/thread_leak_test.c:29, sz=4, NORMAL)
Found 1 leaked objects:
0x80046fe8 (tests/thread_leak_test.c:29, sz=4, NORMAL)
PASS: threadleaktest
PASS: threadkey_test
subthread_create: created 185 threads (179 ended)
PASS: subthread_create
PASS: initsecondarythread
Threaded disclaim test.
PASS: disclaim_test
fin. ratio time/s time/fin.
regular finalization: 0.9922 187.34 4.50171e-05
finalize on reclaim: 0.9999 71.79 1.71169e-05
no finalization: 0.0000 14.2 N/A
PASS: disclaim_bench
====================================
1 of 14 tests failed
Please report to ***@linux.hpl.hp.com
====================================
make[2]: *** [check-TESTS] Error 1
make[2]: Leaving directory `/root/share/gc-7.3alpha3'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/root/share/gc-7.3alpha3'
make: *** [check-recursive] Error 1

bye,
//mirabilos
--
Darwinism never[…]applied to wizardkind. There's a more than fair amount of[…]
stupidity in its gene-pool[…]never eradicated[…]magic evens the odds that way.
It's[…]harder to die for us than[…]muggles[…]wonder if, as technology[…]better
[…]same will[…]happen there too. Dursleys' continued existence indicates so.
Pavel Raiskup
2012-12-05 07:50:56 UTC
Permalink
Hello gc guys!
Post by Thorsten Glaser
Post by Ivan Maidanski
http://www.ivmaisoft.com/_bin/bdwgc/libatomic_ops-7.3alpha3-20121123.tar.bz2
Sorry it took so long, but here are the results (after quite
some fiddling): all 4 tests passed.
I have still problems on ppc/ppc64 architectures with test_stack testsuite
program:

| PASS: test_atomic_pthreads
| FAILED
| FAILED
| /bin/sh: line 5: 9896 Aborted (core dumped) ${dir}$tst
| FAIL: test_stack

I was unable to find here in mail archives system (glibc, kernel, ..)
requirements for supporting this test on PowerPC - thus I'm not sure how
to fix/help you regarding this issue.

Something about my system (Red Hat Enterprise Linux 6):

| $ rpm -q automake autoconf gcc glibc
| automake-1.11.1-1.2.el6.noarch
| autoconf-2.63-5.1.el6.noarch
| gcc-4.4.7-3.el6.ppc64
| glibc-2.12-1.106.el6.ppc64
| glibc-2.12-1.106.el6.ppc

And /proc/cpuinfo snippet:

processor : 0
cpu : POWER7 (architected), altivec supported
clock : 3550.000000MHz
revision : 2.1 (pvr 003f 0201)

processor : 1
cpu : POWER7 (architected), altivec supported
clock : 3550.000000MHz
revision : 2.1 (pvr 003f 0201)

processor : 2
cpu : POWER7 (architected), altivec supported
clock : 3550.000000MHz
revision : 2.1 (pvr 003f 0201)

processor : 3
cpu : POWER7 (architected), altivec supported
clock : 3550.000000MHz
revision : 2.1 (pvr 003f 0201)

[..]

Note that I checked even the released versions of libatomic_ops, I also
checked development tree from git - I seems that AO_stack* support it is
still not fixed.

Could I help you with diagnostics somehow?

Pavel
Thorsten Glaser
2012-12-05 18:08:50 UTC
Permalink
Post by Pavel Raiskup
I was unable to find here in mail archives system (glibc, kernel, ..)
requirements for supporting this test on PowerPC - thus I'm not sure how
to fix/help you regarding this issue.
Can you try with this patch applied for more debugging?

--- tests/test_stack.c~ Wed Dec 5 18:06:29 2012
+++ tests/test_stack.c Wed Dec 5 18:08:22 2012
@@ -78,6 +78,8 @@ typedef struct le {

AO_stack_t the_list = AO_STACK_INITIALIZER;

+#define VERBOSE
+
void add_elements(int n)
{
list_element * le;
@@ -240,6 +242,8 @@ int main(int argc, char **argv)
int list_length = nthreads*(nthreads+1)/2;
long long start_time;
list_element * le;
+printf("Before add_elements: exper_n=%d nthreads=%d max_nthreads=%d list_length=%d\n",exper_n,nthreads,max_nthreads,list_length);
+fflush(stdout);

add_elements(list_length);
# ifdef VERBOSE

bye,
//mirabilos
--
I want one of these. They cost 720 € though… good they don’t have the HD hole,
which indicates 3½″ floppies with double capacity… still. A tad too much, atm.
‣ http://www.floppytable.com/floppytable-images-1.html
Pavel Raiskup
2012-12-06 07:29:42 UTC
Permalink
Post by Thorsten Glaser
Post by Pavel Raiskup
I was unable to find here in mail archives system (glibc, kernel, ..)
requirements for supporting this test on PowerPC - thus I'm not sure how
to fix/help you regarding this issue.
Can you try with this patch applied for more debugging?
Oh, of course. I should have put here the VERBOSE output even before.
It has still the same non-deterministic behavior as was described before -
it fails always after different number of steps, sometimes it gets into
infinite loop.

I'm attaching tarball with several runs of the test/test_stack program.

Ok, could you point me where should I look? I have tried the
"compare_and_swap", "test_and_set", "fetch_and_add" low-level calls and
they works well with 5+ threads.

Note, I don't have the access to the same machine as before at the moment,
I need to make an reservation and it may take even days -> attached logs
are from another powerpc machine with same HW (Fedora 16):

$ rpm -q automake autoconf gcc glibc
automake-1.11.1-7.fc16.noarch
autoconf-2.68-2.fc15.noarch
gcc-4.6.2-1.fc16.ppc64
glibc-2.14.90-24.fc16.6.ppc64
glibc-2.14.90-24.fc16.6.ppc

$ head /proc/cpuinfo
processor : 0
cpu : POWER7 (architected), altivec supported
clock : 3550.000000MHz
revision : 2.1 (pvr 003f 0201)

processor : 1
cpu : POWER7 (architected), altivec supported
clock : 3550.000000MHz
revision : 2.1 (pvr 003f 0201)

Pavel

Ivan Maidanski
2012-09-24 18:35:18 UTC
Permalink
Hi Michael,
Post by Ivan Maidanski
Hi Michael, Thorsten and Wouter,
A couple of months, You reported libatomic_ops-7.3alpaha test_stack failure
on Alpha, PowerPC and m68k: *
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100
* http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680066
On Alpha the test_stack failure is not present in the release-7_2 branch but
is present in master. It therefore looks like a regression.
The test_stack failure only happens when running under an SMP linux kernel.
In the testing I have just performed against git master it always resulted in
a detected fail in the test_stack test but the Debian packaged version can
result in a list item pointing to itself, thus an infinite loop in the
check_list() function in tests/test_stack.c.
I bisected between the head of the release-7_2 branch and master which results
cc51941b02fddc75952831eb8e28b06d340d2bef is the first bad commit
commit cc51941b02fddc75952831eb8e28b06d340d2bef
Date: Mon Mar 26 08:06:19 2012 +0400
Use __builtin_expect in CAS failure loop condition checks (GCC only)
* src/atomic_ops.c (lock, block_all_signals): Use AO_EXPECT_FALSE.
* src/atomic_ops.h (AO_EXPECT_FALSE): New macro.
* src/atomic_ops/generalize-small.template
(AO_XSIZE_fetch_and_add_full, AO_XSIZE_fetch_and_add_acquire,
AO_XSIZE_fetch_and_add_release): Use AO_EXPECT_FALSE for CAS failure
check.
* src/atomic_ops/generalize.h (AO_fetch_and_add_full,
AO_fetch_and_add_acquire, AO_fetch_and_add_release, AO_fetch_and_add,
AO_and_full, AO_or_full, AO_xor_full): Likewise.
* src/atomic_ops/sysdeps/gcc/arm.h
(AO_compare_double_and_swap_double): Likewise.
* src/atomic_ops_stack.c (AO_stack_push_explicit_aux_release,
AO_stack_pop_explicit_aux_acquire, AO_stack_push_release,
AO_stack_pop_acquire): Likewise.
* src/atomic_ops/generalize-small.h: Regenerate.
:040000 040000 15676aef9361c647a597a25e8dd0e152d34681cb
a209c59999ae9072859594d532125236b898779c M src
Something really subtle must be going on as the only change in the commit is
to wrap if statement conditionals with AO_EXPECT_FALSE().
Looks like.
Will something like this (for master) fix the problem on alpha?:
src/atomic_ops.h:

#if __GNUC__ >= 3 && !defined(LINT2)
# define AO_EXPECT_FALSE(expr) __builtin_expect(expr, 0)

->

#if __GNUC__ >= 3 && !defined(LINT2) && !defined(__alpha__)
# define AO_EXPECT_FALSE(expr) __builtin_expect(expr, 0)

If yes then, right, this is gcc/alpha bug. A more precise workaround (targeting particular AO_EXPECT_FALSE use) could done then.

Regards,
Ivan
Cheers
Michael.
Michael Cree
2012-09-26 09:52:09 UTC
Permalink
Post by Ivan Maidanski
Hi Michael,
Post by Ivan Maidanski
Hi Michael, Thorsten and Wouter,
A couple of months, You reported libatomic_ops-7.3alpaha test_stack
failure on Alpha, PowerPC and m68k: *
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100
* http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680066
1. Could you please verify that the problem exists in the recent
On Alpha the test_stack failure is not present in the release-7_2 branch
but is present in master. It therefore looks like a regression.
I bisected between the head of the release-7_2 branch and master which
cc51941b02fddc75952831eb8e28b06d340d2bef is the first bad commit
commit cc51941b02fddc75952831eb8e28b06d340d2bef
Date: Mon Mar 26 08:06:19 2012 +0400
Use __builtin_expect in CAS failure loop condition checks (GCC only)
Looks like.
#if __GNUC__ >= 3 && !defined(LINT2)
# define AO_EXPECT_FALSE(expr) __builtin_expect(expr, 0)
->
#if __GNUC__ >= 3 && !defined(LINT2) && !defined(__alpha__)
# define AO_EXPECT_FALSE(expr) __builtin_expect(expr, 0)
Yep, that fixes it, but as you state below that is not a real fix.
Post by Ivan Maidanski
If yes then, right, this is gcc/alpha bug. A more precise workaround
(targeting particular AO_EXPECT_FALSE use) could done then.
When I get a chance I'll take a look at the code generated by gcc about one of
the AO_EXPECT_FALSEs and see if I can spot any problems.

Cheers
Michael.
Ivan Maidanski
2012-10-22 07:08:36 UTC
Permalink
Hi Michael,

I guess you haven't got time to inspect assembly of broken code (Alpha target).

I think it would be ok for now, just to find which use of AO_EXPECT_FALSE results in broken code and create a workaround avoiding AO_EXPECT_FALSE exactly at that place.
There are only several places to check in atomic_ops_stack.c:

1. AO_stack_push_explicit_aux_release:
  while (AO_EXPECT_FALSE(!AO_compare_and_swap_release(list, next, x_bits)));
->
  while (!AO_compare_and_swap_release(list, next, x_bits));

2. AO_stack_pop_explicit_aux_acquire:
  if (AO_EXPECT_FALSE(first != AO_load(list))) {
->
  if (first != AO_load(list)) {

3. AO_stack_pop_explicit_aux_acquire:
  if (AO_EXPECT_FALSE(!AO_compare_and_swap_release(list, first, next))) {
->
  if (!AO_compare_and_swap_release(list, first, next)) {

Could you please check this?
(It would be good to put some workaround in the upcoming libatomic_ops 7.2 and 7.3alpha4 releases.)

Thank you.

Regards,
Ivan
Post by Ivan Maidanski
Hi Michael,
Post by Ivan Maidanski
Hi Michael, Thorsten and Wouter,
A couple of months, You reported libatomic_ops-7.3alpaha test_stack
failure on Alpha, PowerPC and m68k: *
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100
* http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680066
1. Could you please verify that the problem exists in the recent
On Alpha the test_stack failure is not present in the release-7_2 branch
but is present in master. It therefore looks like a regression.
I bisected between the head of the release-7_2 branch and master which
cc51941b02fddc75952831eb8e28b06d340d2bef is the first bad commit
commit cc51941b02fddc75952831eb8e28b06d340d2bef
Date: Mon Mar 26 08:06:19 2012 +0400
Use __builtin_expect in CAS failure loop condition checks (GCC only)
Looks like.
#if __GNUC__ >= 3 && !defined(LINT2)
# define AO_EXPECT_FALSE(expr) __builtin_expect(expr, 0)
->
#if __GNUC__ >= 3 && !defined(LINT2) && !defined(__alpha__)
# define AO_EXPECT_FALSE(expr) __builtin_expect(expr, 0)
Yep, that fixes it, but as you state below that is not a real fix.
Post by Ivan Maidanski
If yes then, right, this is gcc/alpha bug. A more precise workaround
(targeting particular AO_EXPECT_FALSE use) could done then.
When I get a chance I'll take a look at the code generated by gcc about one of
the AO_EXPECT_FALSEs and see if I can spot any problems.
Cheers
Michael.
_______________________________________________
Gc mailing list
Post by Ivan Maidanski
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Michael Cree
2012-10-22 22:37:33 UTC
Permalink
Post by Ivan Maidanski
I guess you haven't got time to inspect assembly of broken code (Alpha target).
I was busy at the time but then I totally forgot about this!
Post by Ivan Maidanski
I think it would be ok for now, just to find which use of
AO_EXPECT_FALSE results in broken code and create a workaround
avoiding AO_EXPECT_FALSE exactly at that place.
Interestingly doing any one of options 2 or 3 result in the test suite
passing.
Post by Ivan Maidanski
if (AO_EXPECT_FALSE(first != AO_load(list))) {
->
if (first != AO_load(list)) {
if (AO_EXPECT_FALSE(!AO_compare_and_swap_release(list, first,
next))) {
->
if (!AO_compare_and_swap_release(list, first, next)) {
They are quite close together in the code. I wonder if there is some
interaction between them. I really must look at the generated
assembly to see if the compiler is at fault but that will probably
have to wait another day or two.

Cheers
Michael.
Ivan Maidanski
2012-11-09 10:10:37 UTC
Permalink
Hi Michael,

I've committed the workaround (avoiding __builtin_expect in AO_stack_pop_explicit_aux_acquire for gcc-4/alpha): https://github.com/ivmai/libatomic_ops/commit/00d7cb807015b109df11b9227fbc7f35babdee16
(But, of course, it would be good if someone report/fix bug in gcc.)

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100#15

Regards,
Ivan
Post by Ivan Maidanski
I guess you haven't got time to inspect assembly of broken code
(Alpha target).
I was busy at the time but then I totally forgot about this!
Post by Ivan Maidanski
I think it would be ok for now, just to find which use of
AO_EXPECT_FALSE results in broken code and create a workaround
avoiding AO_EXPECT_FALSE exactly at that place.
Interestingly doing any one of options 2 or 3 result in the test suite
passing.
Post by Ivan Maidanski
if (AO_EXPECT_FALSE(first != AO_load(list))) {
->
if (first != AO_load(list)) {
if (AO_EXPECT_FALSE(!AO_compare_and_swap_release(list, first,
next))) {
->
if (!AO_compare_and_swap_release(list, first, next)) {
They are quite close together in the code. I wonder if there is some
interaction between them. I really must look at the generated
assembly to see if the compiler is at fault but that will probably
have to wait another day or two.
Cheers
Michael.
_______________________________________________
Gc mailing list
Post by Ivan Maidanski
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Michael Cree
2012-09-24 08:43:43 UTC
Permalink
Post by Ivan Maidanski
Hi Michael, Thorsten and Wouter,
A couple of months, You reported libatomic_ops-7.3alpaha test_stack failure
on Alpha, PowerPC and m68k: *
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680100
* http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=680066
On Alpha the test_stack failure is not present in the release-7_2 branch but
is present in master. It therefore looks like a regression.

The test_stack failure only happens when running under an SMP linux kernel.
In the testing I have just performed against git master it always resulted in
a detected fail in the test_stack test but the Debian packaged version can
result in a list item pointing to itself, thus an infinite loop in the
check_list() function in tests/test_stack.c.

I bisected between the head of the release-7_2 branch and master which results
in:

cc51941b02fddc75952831eb8e28b06d340d2bef is the first bad commit
commit cc51941b02fddc75952831eb8e28b06d340d2bef
Author: Ivan Maidanski <***@mail.ru>
Date: Mon Mar 26 08:06:19 2012 +0400

Use __builtin_expect in CAS failure loop condition checks (GCC only)

* src/atomic_ops.c (lock, block_all_signals): Use AO_EXPECT_FALSE.
* src/atomic_ops.h (AO_EXPECT_FALSE): New macro.
* src/atomic_ops/generalize-small.template
(AO_XSIZE_fetch_and_add_full, AO_XSIZE_fetch_and_add_acquire,
AO_XSIZE_fetch_and_add_release): Use AO_EXPECT_FALSE for CAS failure
check.
* src/atomic_ops/generalize.h (AO_fetch_and_add_full,
AO_fetch_and_add_acquire, AO_fetch_and_add_release, AO_fetch_and_add,
AO_and_full, AO_or_full, AO_xor_full): Likewise.
* src/atomic_ops/sysdeps/gcc/arm.h
(AO_compare_double_and_swap_double): Likewise.
* src/atomic_ops_stack.c (AO_stack_push_explicit_aux_release,
AO_stack_pop_explicit_aux_acquire, AO_stack_push_release,
AO_stack_pop_acquire): Likewise.
* src/atomic_ops/generalize-small.h: Regenerate.

:040000 040000 15676aef9361c647a597a25e8dd0e152d34681cb
a209c59999ae9072859594d532125236b898779c M src

Something really subtle must be going on as the only change in the commit is
to wrap if statement conditionals with AO_EXPECT_FALSE().

Cheers
Michael.
Continue reading on narkive:
Loading...