Discussion:
Re[13]: [Gc] Performance of bdwgc7.2 had degraded compared to 6.8 - the patch to test
(too old to reply)
Ivan Maidanski
2010-12-01 20:47:43 UTC
Permalink
Hi all,

It seems the observed degradation can be discovered by 2 tests:
1) by benchmarking v71 vs v72a2+test2_patch;
2) by benchmarking v71 vs v72a2+test3_patch.

test2 patch reverts the relevant changes of:
2008-08-21 Hans Boehm <***@hp.com>

test3 patch reverts the relevant changes of:
2009-05-22 Hans Boehm <***@hp.com> (Largely from Ludovic Cortes)

Regards.
Hi,
So, there is no real difference in speed between 7.1 and 7.1+patch, right?
PS. I'm preparing some more test patches...
7.2a4 7.2a2 7.1 7.0 7.0a7 6.8 7.1+ivan-30nov
bague 0.76 0.77 0.77 0.76 0.77 0.77 0.77
beval 1.33 1.41 1.29 1.41 1.29 1.31 1.44
boyer 2.23 2.23 2.13 2.14 2.13 2.15 2.13
cgc 0.47 0.48 0.48 0.47 0.48 0.46 0.47
conform 1.91 1.91 1.74 1.72 1.73 1.79 1.71
earley 2.49 2.50 2.08 2.13 2.09 2.23 2.09
fib 0.01 0.01 0.01 0.01 0.01 0.01 0.01
fft 2.51 2.52 2.52 2.50 2.52 2.49 2.5
leval 1.12 1.13 1.05 1.01 1.02 1.09 1.02
maze 1.67 1.40 1.36 1.35 1.26 1.39 1.35
mbrot 7.03 7.05 7.04 7.03 7.05 7.05 7.02
nucleic 1.18 1.20 1.20 1.16 1.16 1.34 1.17
peval 1.46 1.47 1.20 1.19 1.20 1.18 1.2
puzzle 1.96 1.92 1.97 1.96 1.92 1.93 1.92
queens 2.29 2.29 1.55 1.56 1.55 1.44 1.56
qsort 1.65 1.64 1.63 1.62 1.63 1.63 1.63
rgc 1.28 1.28 1.23 1.23 1.24 1.28 1.23
sieve 1.58 1.60 1.44 1.42 1.41 1.51 1.43
traverse 5.14 5.15 3.55 3.60 3.56 3.58 3.59
almabench 1.45 1.45 1.45 1.45 1.45 1.46 1.45
SUM 39.52 39.41 35.69 35.72 35.47 36.09 35.69
--
Manuel
ATTACHMENT: application/pgp-signature
Ludovic Courtès
2010-12-02 20:53:09 UTC
Permalink
Hi Ivan,
Post by Ivan Maidanski
1) by benchmarking v71 vs v72a2+test2_patch;
2) by benchmarking v71 vs v72a2+test3_patch.
Damn, I feel guilty now. ;-)

Did you measure the effect of each patch individually? It would be
interesting to know.

For the record, the discussion that led to the second patch started here:

http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2570

The next-to-final patch was posted here:

http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2634

The intent was to /exclude/ ELF sections containing relocated read-only
data from the GC roots on GNU systems, thereby reducing the amount of
memory that needs to be scanned.

The effect on libguile was discussed here:

http://thread.gmane.org/gmane.lisp.guile.devel/9247

In this message, I wrote:

However, when not linking with `-z relro', static allocation leads to
slightly degraded performance and increased heap usage (perhaps due to
misidentified pointers in the `.data.rel.ro' section?). This is
probably worth some investigation on the BDW-GC side.

So, ahem, I feel twice as guilty now...

Could the problem be caused by the search for LOAD segments when a
PT_GNU_RELRO is encountered, in GC_register_dynlib_callback? The code
does:

--8<---------------cut here---------------start------------->8---
for( i = 0; i < (int)(info->dlpi_phnum); ((i++),(p++)) ) {
switch( p->p_type ) {
case PT_GNU_RELRO:
/* This entry is known to be constant and will eventually be remapped
read-only. However, the address range covered by this entry is
typically a subset of a previously encountered `LOAD' segment, so
we need to exclude it. */
{
for (j = n_load_segs; --j >= 0; ) {
--8<---------------cut here---------------end--------------->8---

It does look quadratic to me.

However, it would only impact initialization time, which is probably
negligible on long-running programs (such as the Bigloo benchmarks, I
suppose).

Hmm, OTOH, GC_is_visible calls GC_register_dynamic_libraries, which runs
the above code, so this could be a problem.

What do you think?

Thanks,
Ludo’.
Ivan Maidanski
2010-12-03 21:07:14 UTC
Permalink
Hi Ludovic,

Please don't hurry to blame yourself ;)

Strange but the bigloo tests show the degradation problem has nothing with the patches below.
Post by Ludovic Courtès
Hi Ivan,
Post by Ivan Maidanski
1) by benchmarking v71 vs v72a2+test2_patch;
2) by benchmarking v71 vs v72a2+test3_patch.
Cortes)
Damn, I feel guilty now. ;-)
Did you measure the effect of each patch individually? It would be
interesting to know.
http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2570
http://thread.gmane.org/gmane.comp.programming.garbage-collection.boehmgc/2634
The intent was to /exclude/ ELF sections containing relocated read-only
data from the GC roots on GNU systems, thereby reducing the amount of
memory that needs to be scanned.
http://thread.gmane.org/gmane.lisp.guile.devel/9247
However, when not linking with `-z relro', static allocation leads to
slightly degraded performance and increased heap usage (perhaps due to
misidentified pointers in the `.data.rel.ro' section?). This is
probably worth some investigation on the BDW-GC side.
So, ahem, I feel twice as guilty now...
Could the problem be caused by the search for LOAD segments when a
PT_GNU_RELRO is encountered, in GC_register_dynlib_callback? The code
--8<---------------cut here---------------start------------->8---
for( i = 0; i < (int)(info->dlpi_phnum); ((i++),(p++)) ) {
switch( p->p_type ) {
/* This entry is known to be constant and will eventually be remapped
read-only. However, the address range covered by this entry is
typically a subset of a previously encountered `LOAD' segment, so
we need to exclude it. */
{
for (j = n_load_segs; --j >= 0; ) {
--8<---------------cut here---------------end--------------->8---
It does look quadratic to me.
However, it would only impact initialization time, which is probably
negligible on long-running programs (such as the Bigloo benchmarks, I
suppose).
Hmm, OTOH, GC_is_visible calls GC_register_dynamic_libraries, which runs
the above code, so this could be a problem.
What do you think?
Thanks,
Ludo'.
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
M***@inria.fr
2010-12-03 08:37:56 UTC
Permalink
Hi Ivan,
2 patches are attached (I'm sorry if you didn't received them earlier - I sent them to the mailing list).
Yes? I have missed them there too. Strange...
please compare 2 patches independently (I still dont know exactly where the problem is).
It looks like the performance slowdown comes from something else.
test2 and test3 behave has 7.2a2 and 7.2a4.

7.2a4 7.2a2 7.1 7.0 7.0a7 6.8 7.1+ivan-30nov 7.2a2-test2 7.2a2-test3
bague 0.76 0.77 0.77 0.76 0.77 0.77 0.77 0.77 0.77
beval 1.33 1.41 1.29 1.41 1.29 1.31 1.44 1.42 1.42
boyer 2.23 2.23 2.13 2.14 2.13 2.15 2.13 2.24 2.23
cgc 0.47 0.48 0.48 0.47 0.48 0.46 0.47 0.48 0.49
conform 1.91 1.91 1.74 1.72 1.73 1.79 1.71 1.92 1.92
earley 2.49 2.50 2.08 2.13 2.09 2.23 2.09 2.52 2.52
fib 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
fft 2.51 2.52 2.52 2.50 2.52 2.49 2.5 2.52 2.52
leval 1.12 1.13 1.05 1.01 1.02 1.09 1.02 1.14 1.13
maze 1.67 1.40 1.36 1.35 1.26 1.39 1.35 1.38 1.39
mbrot 7.03 7.05 7.04 7.03 7.05 7.05 7.02 7.06 7.07
nucleic 1.18 1.20 1.20 1.16 1.16 1.34 1.17 1.2 1.21
peval 1.46 1.47 1.20 1.19 1.20 1.18 1.2 1.47 1.49
puzzle 1.96 1.92 1.97 1.96 1.92 1.93 1.92 1.93 1.92
queens 2.29 2.29 1.55 1.56 1.55 1.44 1.56 2.36 2.36
qsort 1.65 1.64 1.63 1.62 1.63 1.63 1.63 1.65 1.65
rgc 1.28 1.28 1.23 1.23 1.24 1.28 1.23 1.29 1.28
sieve 1.58 1.60 1.44 1.42 1.41 1.51 1.43 1.59 1.59
traverse 5.14 5.15 3.55 3.60 3.56 3.58 3.59 5.13 5.15
almabench 1.45 1.45 1.45 1.45 1.45 1.46 1.45 1.46 1.46
SUM 39.52 39.41 35.69 35.72 35.47 36.09 35.69 39.54 39.58

Cheers,
--
Manuel
Ivan Maidanski
2010-12-03 21:27:15 UTC
Permalink
Hi Manuel,

It's strange...

Please confirm that you don't compile GC (for this benchmark) with multi-threading support and don't use GC_DEBUG (and GC_debug_ routines).

If yes, then the only difference between gc71+test1_patch and gc72a2+test2_patch+test3_patch is in GC_clear-a_few_frames() (in alloc.c). Please benchmark gc72a2+test4_patch (which is attached).

Regards.
Post by Ludovic Courtès
Hi Ivan,
2 patches are attached (I'm sorry if you didn't received them
earlier - I sent them to the mailing list).
Yes? I have missed them there too. Strange...
please compare 2 patches independently (I still dont know exactly where
the problem is).
It looks like the performance slowdown comes from something else.
test2 and test3 behave has 7.2a2 and 7.2a4.
7.2a4 7.2a2 7.1 7.0 7.0a7 6.8 7.1+ivan-30nov 7.2a2-test2 7.2a2-test3
bague 0.76 0.77 0.77 0.76 0.77 0.77 0.77 0.77
0.77
beval 1.33 1.41 1.29 1.41 1.29 1.31 1.44 1.42
1.42
boyer 2.23 2.23 2.13 2.14 2.13 2.15 2.13 2.24
2.23
cgc 0.47 0.48 0.48 0.47 0.48 0.46 0.47 0.48
0.49
conform 1.91 1.91 1.74 1.72 1.73 1.79 1.71 1.92
1.92
earley 2.49 2.50 2.08 2.13 2.09 2.23 2.09 2.52
2.52
fib 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.01
fft 2.51 2.52 2.52 2.50 2.52 2.49 2.5 2.52
2.52
leval 1.12 1.13 1.05 1.01 1.02 1.09 1.02 1.14
1.13
maze 1.67 1.40 1.36 1.35 1.26 1.39 1.35 1.38
1.39
mbrot 7.03 7.05 7.04 7.03 7.05 7.05 7.02 7.06
7.07
nucleic 1.18 1.20 1.20 1.16 1.16 1.34 1.17 1.2
1.21
peval 1.46 1.47 1.20 1.19 1.20 1.18 1.2 1.47
1.49
puzzle 1.96 1.92 1.97 1.96 1.92 1.93 1.92 1.93
1.92
queens 2.29 2.29 1.55 1.56 1.55 1.44 1.56 2.36
2.36
qsort 1.65 1.64 1.63 1.62 1.63 1.63 1.63 1.65
1.65
rgc 1.28 1.28 1.23 1.23 1.24 1.28 1.23 1.29
1.28
sieve 1.58 1.60 1.44 1.42 1.41 1.51 1.43 1.59
1.59
traverse 5.14 5.15 3.55 3.60 3.56 3.58 3.59 5.13
5.15
almabench 1.45 1.45 1.45 1.45 1.45 1.46 1.45 1.46
1.46
SUM 39.52 39.41 35.69 35.72 35.47 36.09 35.69 39.54
39.58
Cheers,
--
Manuel
ATTACHMENT: application/pgp-signature
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
M***@inria.fr
2010-12-04 04:38:32 UTC
Permalink
Hi Ivan,
Post by Ivan Maidanski
Please confirm that you don't compile GC (for this benchmark) with
multi-threading support and don't use GC_DEBUG (and GC_debug_
routines).
I do confirm that if I use these I'm not aware of it. On our side, there
is a change between 7.1 and 7.2: up to 7.1 we were using Makefile.direct
and from 7.2 we switched to using the generated Makefile. The
generated Makefile adds many compilation options. For the sake of the
example, here the command issued by Make to compile the file alloc.c of
the version 7.2

gcc -DPACKAGE_NAME=\"gc\" -DPACKAGE_TARNAME=\"gc\" -DPACKAGE_VERSION=\"7.2alpha2\" "-DPACKAGE_STRING=\"gc 7.2alpha2\"" -DPACKAGE_BUGREPORT=\"***@hp.com\" -DGC_VERSION_MAJOR=7 -DGC_VERSION_MINOR=2 -DGC_ALPHA_VERSION=2 -DPACKAGE=\"gc\" -DVERSION=\"7.2alpha2\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DNO_EXECUTE_PERMISSION=1 -DALL_INTERIOR_POINTERS=1 -DATOMIC_UNCOLLECTABLE=1 -I./include -fexceptions -I libatomic_ops/src -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include -DFINALIZE_ON_DEMAND -I/misc/lab/bigloo/bench/3.5b-7.2alpha2-test2/bigloo3.5b/lib/3.5b -fPIC -fPIC -I/misc/lab/bigloo/bench/3.5b-7.2alpha2-test2/bigloo3.5b/lib/3.5b -MT alloc.lo -MD -MP -MF .deps/alloc.Tpo -c alloc.c -o alloc.o >/dev/null 2>&1

For the version 7.1, we have:

gcc -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include -DFINALIZE_ON_DEMAND -I/misc/lab/bigloo/bench/3.5b-7.1/bigloo3.5b/lib/3.5b -fPIC -c -o alloc.o alloc.c

Could the differences in the compilation options be the reason for the
performance slowdown? In particular, what's the impact of
-DALL_INTERIOR_POINTERS=1? I will run the benchmark without this option
and come back to you when done (when I will have been able to
understand how to tweak the Makefile for not using this option
anymore).
--
Manuel
Ivan Maidanski
2010-12-04 06:56:11 UTC
Permalink
Hello Manuel,

1. Also, the command line for v72a2 includes -DNO_EXECUTE_PERMISSION -fexceptions.

2. Please also do benchmarking with that small patch I've attached in my previous post.
Post by M***@inria.fr
ALL_INTERIOR is not the reason for the different performance because
/bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I./include
-I./include -fexceptions -O3 -DNO_DEBUGGING -Iinclude
-Ilibatomic_ops-install/include -DFINALIZE_ON_DEMAND
-I/misc/lab/bigloo/bench/3.5b-7.2alpha4/bigloo3.5b/lib/3.5b -fPIC -MT alloc.lo
-MD -MP -MF .deps/alloc.Tpo -c -o alloc.lo alloc.c
libtool: compile: gcc -DHAVE_CONFIG_H -I./include -I./include -fexceptions
-O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include
-DFINALIZE_ON_DEMAND
-I/misc/lab/bigloo/bench/3.5b-7.2alpha4/bigloo3.5b/lib/3.5b -fPIC -MT alloc.lo
-MD -MP -MF .deps/alloc.Tpo -c alloc.c -fPIC -DPIC -o .libs/alloc.o
libtool: compile: gcc -DHAVE_CONFIG_H -I./include -I./include -fexceptions
-O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include
-DFINALIZE_ON_DEMAND
-I/misc/lab/bigloo/bench/3.5b-7.2alpha4/bigloo3.5b/lib/3.5b -fPIC -MT alloc.lo
-MD -MP -MF .deps/alloc.Tpo -c alloc.c -o alloc.o >/dev/null 2>&1
mv -f .deps/alloc.Tpo .deps/alloc.Plo
Hi Ivan,
Post by Ivan Maidanski
Please confirm that you don't compile GC (for this benchmark) with
multi-threading support and don't use GC_DEBUG (and GC_debug_
routines).
I do confirm that if I use these I'm not aware of it. On our side, there
is a change between 7.1 and 7.2: up to 7.1 we were using Makefile.direct
and from 7.2 we switched to using the generated Makefile. The
generated Makefile adds many compilation options. For the sake of the
example, here the command issued by Make to compile the file alloc.c of
the version 7.2
gcc -DPACKAGE_NAME=\"gc\" -DPACKAGE_TARNAME=\"gc\"
-DPACKAGE_VERSION=\"7.2alpha2\" "-DPACKAGE_STRING=\"gc
-DGC_VERSION_MAJOR=7 -DGC_VERSION_MINOR=2 -DGC_ALPHA_VERSION=2
-DPACKAGE=\"gc\" -DVERSION=\"7.2alpha2\" -DSTDC_HEADERS=1
-DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1
-DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DNO_EXECUTE_PERMISSION=1
-DALL_INTERIOR_POINTERS=1 -DATOMIC_UNCOLLECTABLE=1 -I./include -fexceptions -I
libatomic_ops/src -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include
-DFINALIZE_ON_DEMAND
-I/misc/lab/bigloo/bench/3.5b-7.2alpha2-test2/bigloo3.5b/lib/3.5b -fPIC -fPIC
-I/misc/lab/bigloo/bench/3.5b-7.2alpha2-test2/bigloo3.5b/lib/3.5b -MT alloc.lo
-MD -MP -MF .deps/alloc.Tpo -c alloc.c -o alloc.o >/dev/null 2>&1
gcc -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include
-DFINALIZE_ON_DEMAND -I/misc/lab/bigloo/bench/3.5b-7.1/bigloo3.5b/lib/3.5b
-fPIC -c -o alloc.o alloc.c
Could the differences in the compilation options be the reason for the
performance slowdown? In particular, what's the impact of
-DALL_INTERIOR_POINTERS=1? I will run the benchmark without this option
and come back to you when done (when I will have been able to
understand how to tweak the Makefile for not using this option
anymore).
--
Manuel
ATTACHMENT: application/pgp-signature
M***@inria.fr
2010-12-04 07:02:49 UTC
Permalink
Post by Ivan Maidanski
1. Also, the command line for v72a2 includes -DNO_EXECUTE_PERMISSION -fexceptions.
...In the meantime I have checked -fexceptions which has no influence on
performance. I will check NO_EXECUTE_PERMISSION too.
--
Manuel
Ludovic Courtès
2010-12-07 15:24:23 UTC
Permalink
Hello,
Post by M***@inria.fr
Could the differences in the compilation options be the reason for the
performance slowdown? In particular, what's the impact of
-DALL_INTERIOR_POINTERS=1?
All-interior-pointers can increase execution time and/or heap size in my
experience. Note that it can also be turned on/off at run time, by
setting ‘GC_all_interior_pointers’ before any call to GC_INIT.

Did you measure the impact of all-interior-pointers?

Thanks,
Ludo’.
M***@inria.fr
2010-12-04 05:07:27 UTC
Permalink
ALL_INTERIOR is not the reason for the different performance because
with 7.2alpha4 the Makefile issues commands such as:

/bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I./include -I./include -fexceptions -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include -DFINALIZE_ON_DEMAND -I/misc/lab/bigloo/bench/3.5b-7.2alpha4/bigloo3.5b/lib/3.5b -fPIC -MT alloc.lo -MD -MP -MF .deps/alloc.Tpo -c -o alloc.lo alloc.c
libtool: compile: gcc -DHAVE_CONFIG_H -I./include -I./include -fexceptions -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include -DFINALIZE_ON_DEMAND -I/misc/lab/bigloo/bench/3.5b-7.2alpha4/bigloo3.5b/lib/3.5b -fPIC -MT alloc.lo -MD -MP -MF .deps/alloc.Tpo -c alloc.c -fPIC -DPIC -o .libs/alloc.o
libtool: compile: gcc -DHAVE_CONFIG_H -I./include -I./include -fexceptions -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include -DFINALIZE_ON_DEMAND -I/misc/lab/bigloo/bench/3.5b-7.2alpha4/bigloo3.5b/lib/3.5b -fPIC -MT alloc.lo -MD -MP -MF .deps/alloc.Tpo -c alloc.c -o alloc.o >/dev/null 2>&1
mv -f .deps/alloc.Tpo .deps/alloc.Plo
--
Manuel
M***@inria.fr
2010-12-11 07:37:11 UTC
Permalink
Hi Ivan,
Post by Ivan Maidanski
Please confirm that you don't compile GC (for this benchmark) with multi-threading support and don't use GC_DEBUG (and GC_debug_ routines).
If yes, then the only difference between gc71+test1_patch and gc72a2+test2_patch+test3_patch is in GC_clear-a_few_frames() (in alloc.c). Please benchmark gc72a2+test4_patch (which is attached).
No difference (and I confirm once more, no multi-threading, no debugging).

7.2a4 7.2a2 7.1 7.0 7.0a7 6.8 7.1+ivan-30nov 7.2a2-test2 7.2a2-test3 7.2a2-test4
bague 0.76 0.77 0.77 0.76 0.77 0.77 0.77 0.77 0.77 0.77
beval 1.33 1.41 1.29 1.41 1.29 1.31 1.44 1.42 1.42 1.47
boyer 2.23 2.23 2.13 2.14 2.13 2.15 2.13 2.24 2.23 2.24
cgc 0.47 0.48 0.48 0.47 0.48 0.46 0.47 0.48 0.49 0.47
conform 1.91 1.91 1.74 1.72 1.73 1.79 1.71 1.92 1.92 1.88
earley 2.49 2.50 2.08 2.13 2.09 2.23 2.09 2.52 2.52 2.5
fib 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
fft 2.51 2.52 2.52 2.50 2.52 2.49 2.5 2.52 2.52 2.51
leval 1.12 1.13 1.05 1.01 1.02 1.09 1.02 1.14 1.13 1.12
maze 1.67 1.40 1.36 1.35 1.26 1.39 1.35 1.38 1.39 1.44
mbrot 7.03 7.05 7.04 7.03 7.05 7.05 7.02 7.06 7.07 7.05
nucleic 1.18 1.20 1.20 1.16 1.16 1.34 1.17 1.2 1.21 1.18
peval 1.46 1.47 1.20 1.19 1.20 1.18 1.2 1.47 1.49 1.46
puzzle 1.96 1.92 1.97 1.96 1.92 1.93 1.92 1.93 1.92 1.94
queens 2.29 2.29 1.55 1.56 1.55 1.44 1.56 2.36 2.36 2.31
qsort 1.65 1.64 1.63 1.62 1.63 1.63 1.63 1.65 1.65 1.65
rgc 1.28 1.28 1.23 1.23 1.24 1.28 1.23 1.29 1.28 1.29
sieve 1.58 1.60 1.44 1.42 1.41 1.51 1.43 1.59 1.59 1.6
traverse 5.14 5.15 3.55 3.60 3.56 3.58 3.59 5.13 5.15 5.15
almabench 1.45 1.45 1.45 1.45 1.45 1.46 1.45 1.46 1.46 1.45
SUM 39.52 39.41 35.69 35.72 35.47 36.09 35.69 39.54 39.58 39.49

I looks like queens is pretty critical. That benchmark allocates mostly
lists. So one might conjecture that there is a problem with allocating
this particular objects. Since lists are so widely used, Bigloo uses a
special allocation function that is implemented as follows (for the sake
of completeness I give here the implementation for 6.8 and also for
7.x).

-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
#define GC_INLINE_ALLOC_6xx( res, size, default_alloc ) \
ptr_t op; \
ptr_t *opp; \
DCL_LOCK_STATE; \
\
opp = (void **)&(GC_objfreelist[ (long)ALIGNED_WORDS( size ) ]); \
FASTLOCK(); \
if( !FASTLOCK_SUCCEEDED() || (op = *opp) == 0 ) { \
FASTUNLOCK(); \
return default_alloc; \
} \
*opp = obj_link( op ); \
GC_words_allocd += (long)ALIGNED_WORDS( size ); \
FASTUNLOCK(); \
\
res = (obj_t)op;

#define GC_INLINE_ALLOC_7xx( res, size, default_alloc ) \
void *op; \
void **opp; \
size_t lg; \
DCL_LOCK_STATE; \
\
lg = GC_size_map[ size ]; \
opp = (void **)&(GC_objfreelist[ lg ]); \
LOCK(); \
\
if( EXPECT((op = *opp) == 0, 0) ) { \
UNLOCK(); \
return default_alloc; \
} \
*opp = obj_link( op ); \
GC_bytes_allocd += GRANULES_TO_BYTES( lg ); \
UNLOCK(); \
\
res = (obj_t)op;

#if( BGL_GC_VERSION < 700 )
# define GC_INLINE_ALLOC GC_INLINE_ALLOC_6xx
#else
# define GC_INLINE_ALLOC GC_INLINE_ALLOC_7xx
#endif

GC_API obj_t
make_pair( obj_t car, obj_t cdr ) {
obj_t pair;

GC_INLINE_ALLOC( pair, PAIR_SIZE, alloc_make_pair( car, cdr ) );

#if( !defined( TAG_PAIR ) )
pair->pair_t.header = MAKE_HEADER( PAIR_TYPE, PAIR_SIZE );
#endif
pair->pair_t.car = car;
pair->pair_t.cdr = cdr;

return BPAIR( pair );
}
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----

If you think that might help, tomorrow I can try to do a little bit a
profiling in order to understand where the performance difference for
queens comes from.
--
Manuel
Ivan Maidanski
2010-12-13 19:46:30 UTC
Permalink
Hi Manuel,

Yes, as we can see from the latest benchmark result, the degradation comes from that you switched from Makefile.direct to the auto-generated one. So, it will be easy, I hope, to find what compiler flag to blame...

Regards.
Post by Ludovic Courtès
Hi Ivan,
ok. Try this one (benchmark gc71 vs gc71+test6_patch).
The patch contains only all differences in .c/h files between v71 and
v72a2 (excluding changes in atomic_ops).
Regards.
7.2a4 7.2a2 7.1 7.0 7.0a7 6.8 7.1+ivan-30nov 7.2a2-test2 7.2a2-test3
7.2a2-test4 7.1-test6
bague 0.76 0.77 0.77 0.76 0.77 0.77 0.77 0.77
0.77 0.77 0.77
beval 1.33 1.41 1.29 1.41 1.29 1.31 1.44 1.42
1.42 1.47 1.45
boyer 2.23 2.23 2.13 2.14 2.13 2.15 2.13 2.24
2.23 2.24 2.13
cgc 0.47 0.48 0.48 0.47 0.48 0.46 0.47 0.48
0.49 0.47 0.48
conform 1.91 1.91 1.74 1.72 1.73 1.79 1.71 1.92
1.92 1.88 1.74
earley 2.49 2.50 2.08 2.13 2.09 2.23 2.09 2.52
2.52 2.5 2.11
fib 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
0.01 0.01 0.01
fft 2.51 2.52 2.52 2.50 2.52 2.49 2.5 2.52
2.52 2.51 2.51
leval 1.12 1.13 1.05 1.01 1.02 1.09 1.02 1.14
1.13 1.12 1.02
maze 1.67 1.40 1.36 1.35 1.26 1.39 1.35 1.38
1.39 1.44 1.36
mbrot 7.03 7.05 7.04 7.03 7.05 7.05 7.02 7.06
7.07 7.05 7.04
nucleic 1.18 1.20 1.20 1.16 1.16 1.34 1.17 1.2
1.21 1.18 1.17
peval 1.46 1.47 1.20 1.19 1.20 1.18 1.2 1.47
1.49 1.46 1.22
puzzle 1.96 1.92 1.97 1.96 1.92 1.93 1.92 1.93
1.92 1.94 1.93
queens 2.29 2.29 1.55 1.56 1.55 1.44 1.56 2.36
2.36 2.31 1.56
qsort 1.65 1.64 1.63 1.62 1.63 1.63 1.63 1.65
1.65 1.65 1.63
rgc 1.28 1.28 1.23 1.23 1.24 1.28 1.23 1.29
1.28 1.29 1.23
sieve 1.58 1.60 1.44 1.42 1.41 1.51 1.43 1.59
1.59 1.6 1.43
traverse 5.14 5.15 3.55 3.60 3.56 3.58 3.59 5.13
5.15 5.15 3.58
almabench 1.45 1.45 1.45 1.45 1.45 1.46 1.45 1.46
1.46 1.45 1.47s
SUM 39.52 39.41 35.69 35.72 35.47 36.09 35.69 39.54
39.58 39.49 35.84
So, do I understand you correctly when I think that the problem is
coming from the compilation flags. Once again, prior to 7.2, we were
using Makefile.direct which was easier to modify.
M***@inria.fr
2010-12-13 20:09:51 UTC
Permalink
Post by Ivan Maidanski
Yes, as we can see from the latest benchmark result, the degradation
comes from that you switched from Makefile.direct to the auto-generated
one. So, it will be easy, I hope, to find what compiler flag to blame...
Yes, probably, but I have checked without being able to spot the
difference (apart from -fexception that we have already
checked). Libtool adds some complexity so it's not absolutely
straightforward (at least for me) to understand where the two
compilation processes differs.
--
Manuel
Ivan Maidanski
2010-12-13 20:56:51 UTC
Permalink
Hi,

As you previously pointed out:
gcc -DPACKAGE_NAME=\"gc\" -DPACKAGE_TARNAME=\"gc\"
Post by M***@inria.fr
-DPACKAGE_VERSION=\"7.2alpha2\" "-DPACKAGE_STRING=\"gc
-DGC_VERSION_MAJOR=7 -DGC_VERSION_MINOR=2 -DGC_ALPHA_VERSION=2
-DPACKAGE=\"gc\" -DVERSION=\"7.2alpha2\" -DSTDC_HEADERS=1
-DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1
-DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1
-DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DNO_EXECUTE_PERMISSION=1
-DALL_INTERIOR_POINTERS=1 -DATOMIC_UNCOLLECTABLE=1 -I./include -fexceptions -I
libatomic_ops/src -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include
-DFINALIZE_ON_DEMAND
-I/misc/lab/bigloo/bench/3.5b-7.2alpha2-test2/bigloo3.5b/lib/3.5b -fPIC -fPIC
-I/misc/lab/bigloo/bench/3.5b-7.2alpha2-test2/bigloo3.5b/lib/3.5b -MT alloc.lo
-MD -MP -MF .deps/alloc.Tpo -c alloc.c -o alloc.o >/dev/null 2>&1
gcc -O3 -DNO_DEBUGGING -Iinclude -Ilibatomic_ops-install/include
-DFINALIZE_ON_DEMAND -I/misc/lab/bigloo/bench/3.5b-7.1/bigloo3.5b/lib/3.5b
-fPIC -c -o alloc.o alloc.c
So, the difference in options, which might influence the speed, is that v7.2 has some additional ones:
-DNO_EXECUTE_PERMISSION -DALL_INTERIOR_POINTERS -DATOMIC_UNCOLLECTABLE.

You already tested NO_EXECUTE_PERMISSION, so do it for the rest 2 ones...

Regards.
Post by M***@inria.fr
Post by Ivan Maidanski
Yes, as we can see from the latest benchmark result, the degradation
comes from that you switched from Makefile.direct to the auto-generated
one. So, it will be easy, I hope, to find what compiler flag to blame...
Yes, probably, but I have checked without being able to spot the
difference (apart from -fexception that we have already
checked). Libtool adds some complexity so it's not absolutely
straightforward (at least for me) to understand where the two
compilation processes differs.
--
Manuel
ATTACHMENT: application/pgp-signature
M***@inria.fr
2010-12-14 06:45:00 UTC
Permalink
Houra. You got it!

Without -DNO_EXECUTE_PERMISSION -DALL_INTERIOR_POINTERS
-DATOMIC_UNCOLLECTABLE here is what we get:

7.2a4 7.2a2 7.1 7.0 7.0a7 6.8 7.1+ivan-30nov 7.2a2-test2 7.2a2-test3 7.2a2-test4 7.2a4-sans
bague 0.76 0.77 0.77 0.76 0.77 0.77 0.77 0.77 0.77 0.77 0.76
beval 1.33 1.41 1.29 1.41 1.29 1.31 1.44 1.42 1.42 1.47 1.43
boyer 2.23 2.23 2.13 2.14 2.13 2.15 2.13 2.24 2.23 2.24 2.14
cgc 0.47 0.48 0.48 0.47 0.48 0.46 0.47 0.48 0.49 0.47 0.47
conform 1.91 1.91 1.74 1.72 1.73 1.79 1.71 1.92 1.92 1.88 1.72
earley 2.49 2.50 2.08 2.13 2.09 2.23 2.09 2.52 2.52 2.5 2.1
fib 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
fft 2.51 2.52 2.52 2.50 2.52 2.49 2.5 2.52 2.52 2.51 2.53
leval 1.12 1.13 1.05 1.01 1.02 1.09 1.02 1.14 1.13 1.12 1.03
maze 1.67 1.40 1.36 1.35 1.26 1.39 1.35 1.38 1.39 1.44 1.36
mbrot 7.03 7.05 7.04 7.03 7.05 7.05 7.02 7.06 7.07 7.05 7.04
nucleic 1.18 1.20 1.20 1.16 1.16 1.34 1.17 1.2 1.21 1.18 1.17
peval 1.46 1.47 1.20 1.19 1.20 1.18 1.2 1.47 1.49 1.46 1.2
puzzle 1.96 1.92 1.97 1.96 1.92 1.93 1.92 1.93 1.92 1.94 1.97
queens 2.29 2.29 1.55 1.56 1.55 1.44 1.56 2.36 2.36 2.31 1.56
qsort 1.65 1.64 1.63 1.62 1.63 1.63 1.63 1.65 1.65 1.65 1.62
rgc 1.28 1.28 1.23 1.23 1.24 1.28 1.23 1.29 1.28 1.29 1.23
sieve 1.58 1.60 1.44 1.42 1.41 1.51 1.43 1.59 1.59 1.6 1.42
traverse 5.14 5.15 3.55 3.60 3.56 3.58 3.59 5.13 5.15 5.15 3.62
almabench 1.45 1.45 1.45 1.45 1.45 1.46 1.45 1.46 1.46 1.45 1.46
SUM 39.52 39.41 35.69 35.72 35.47 36.09 35.69 39.54 39.58 39.49 35.84

One of these options causes the performance slowdown. I will add
a new patch for Bigloo for removing them at once. Many thanks
for your help.

Sincerely,
--
Manuel
Александр Петросян (PAF)
2010-12-14 10:02:12 UTC
Permalink
Manual, colleagues,
clearly we're all interested in one-by-one elimination.
-DNO_EXECUTE_PERMISSION -DATOMIC_UNCOLLECTABLE
(without -DALL_INTERIOR_POINTERS)
-DALL_INTERIOR_POINTERS -DATOMIC_UNCOLLECTABLE
(without -DNO_EXECUTE_PERMISSION)

and
-DNO_EXECUTE_PERMISSION -DALL_INTERIOR_POINTERS
(without -DATOMIC_UNCOLLECTABLE)

There are more combinations, but let's hope you'll hit the answer in these three.
I strongly suspect 1st one ;)

Alexander Petrossian (PAF), Russia, Moscow
Houra. You got it!
Without -DNO_EXECUTE_PERMISSION -DALL_INTERIOR_POINTERS
7.2a4 7.2a2 7.1 7.0 7.0a7 6.8 7.1+ivan-30nov 7.2a2-test2 7.2a2-test3 7.2a2-test4 7.2a4-sans
bague 0.76 0.77 0.77 0.76 0.77 0.77 0.77 0.77 0.77 0.77 0.76
beval 1.33 1.41 1.29 1.41 1.29 1.31 1.44 1.42 1.42 1.47 1.43
boyer 2.23 2.23 2.13 2.14 2.13 2.15 2.13 2.24 2.23 2.24 2.14
cgc 0.47 0.48 0.48 0.47 0.48 0.46 0.47 0.48 0.49 0.47 0.47
conform 1.91 1.91 1.74 1.72 1.73 1.79 1.71 1.92 1.92 1.88 1.72
earley 2.49 2.50 2.08 2.13 2.09 2.23 2.09 2.52 2.52 2.5 2.1
fib 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
fft 2.51 2.52 2.52 2.50 2.52 2.49 2.5 2.52 2.52 2.51 2.53
leval 1.12 1.13 1.05 1.01 1.02 1.09 1.02 1.14 1.13 1.12 1.03
maze 1.67 1.40 1.36 1.35 1.26 1.39 1.35 1.38 1.39 1.44 1.36
mbrot 7.03 7.05 7.04 7.03 7.05 7.05 7.02 7.06 7.07 7.05 7.04
nucleic 1.18 1.20 1.20 1.16 1.16 1.34 1.17 1.2 1.21 1.18 1.17
peval 1.46 1.47 1.20 1.19 1.20 1.18 1.2 1.47 1.49 1.46 1.2
puzzle 1.96 1.92 1.97 1.96 1.92 1.93 1.92 1.93 1.92 1.94 1.97
queens 2.29 2.29 1.55 1.56 1.55 1.44 1.56 2.36 2.36 2.31 1.56
qsort 1.65 1.64 1.63 1.62 1.63 1.63 1.63 1.65 1.65 1.65 1.62
rgc 1.28 1.28 1.23 1.23 1.24 1.28 1.23 1.29 1.28 1.29 1.23
sieve 1.58 1.60 1.44 1.42 1.41 1.51 1.43 1.59 1.59 1.6 1.42
traverse 5.14 5.15 3.55 3.60 3.56 3.58 3.59 5.13 5.15 5.15 3.62
almabench 1.45 1.45 1.45 1.45 1.45 1.46 1.45 1.46 1.46 1.45 1.46
SUM 39.52 39.41 35.69 35.72 35.47 36.09 35.69 39.54 39.58 39.49 35.84
One of these options causes the performance slowdown. I will add
a new patch for Bigloo for removing them at once. Many thanks
for your help.
Sincerely,
--
Manuel
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
PAF
Ludovic Courtès
2010-12-14 13:19:03 UTC
Permalink
Hi,
Post by M***@inria.fr
Without -DNO_EXECUTE_PERMISSION -DALL_INTERIOR_POINTERS
I’d think that only ALL_INTERIOR_POINTERS can have an impact here.
Can you confirm?

Also, disabling it at run-time with ‘GC_all_interior_pointers = 0’
should give the same result.

Thanks,
Ludo’.
Ivan Maidanski
2010-12-14 19:29:42 UTC
Permalink
Hi all,

If think the same (ALL_INTERIOR_POINTERS slow down the performance of the benchmark).

I think -D ALL_INTERIOR_POINTERS should present by default when building the collector but the application which does not need pointers to objects' interiors be recognized should call GC_set_all_interior_pointers(0) at runtime before GC_INIT().

PS. The presence of NO_EXECUTE_PERMISSION should typically positively influence the speed, I think.

Regards.
Hi,
Post by M***@inria.fr
Without -DNO_EXECUTE_PERMISSION -DALL_INTERIOR_POINTERS
I'd think that only ALL_INTERIOR_POINTERS can have an impact here.
Can you confirm?
Also, disabling it at run-time with GC_all_interior_pointers = 0'
should give the same result.
Thanks,
Ludo'.
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Bruce Hoult
2010-12-15 00:04:07 UTC
Permalink
Post by Ivan Maidanski
I think -D ALL_INTERIOR_POINTERS should present by default when building the collector
I can't see any situation in which -D ALL_INTERIOR_POINTERS is needed
for correctness other than:

- in the presence of multiple inheritance in C++, in which case a
pointer to a base class (other than the first) may be the only pointer
to an object.

- in code in which it is possible to take the address of a struct
member or array element and this is the only reference to the object.

In particular, anyone writing a compiler for a language without these
features (which includes Scheme) should be able to safely turn off -D
ALL_INTERIOR_POINTERS unless they are doing something strange in
hand-written libraries.
Carsten Kehler Holst
2010-12-15 07:17:05 UTC
Permalink
We need ALL_INTERIOR_POINTERS both for objects (yes we have multiple inheritance) and strings.

We have used ALL_INTERIOR_POINTERS both in 6.8 and now in 7.2a4. And we experience the degradation. I tried turning ALL_INTERIOR_POINTERS off but then our programs wont run so unfortunately i can't give you numbers for that.

I hope Manuel will run his test both with the different settings but also try 6.8 with ALL_INTERIOR_POINTERS turned on.

Regards
Carsten
Post by Bruce Hoult
Post by Ivan Maidanski
I think -D ALL_INTERIOR_POINTERS should present by default when building the collector
I can't see any situation in which -D ALL_INTERIOR_POINTERS is needed
- in the presence of multiple inheritance in C++, in which case a
pointer to a base class (other than the first) may be the only pointer
to an object.
- in code in which it is possible to take the address of a struct
member or array element and this is the only reference to the object.
In particular, anyone writing a compiler for a language without these
features (which includes Scheme) should be able to safely turn off -D
ALL_INTERIOR_POINTERS unless they are doing something strange in
hand-written libraries.
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
====================================================
Click https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg==
====================================================
This message has been scanned for malware by Websense. www.websense.com
Ludovic Courtès
2010-12-15 10:16:10 UTC
Permalink
Hi,
Post by Bruce Hoult
In particular, anyone writing a compiler for a language without these
features (which includes Scheme) should be able to safely turn off -D
ALL_INTERIOR_POINTERS unless they are doing something strange in
hand-written libraries.
Guile happily turns it off and registers a few displacements for
specific cases.

There’s one case where it would have helped, though, which is the
implementation of vlists (see
<https://secure.wikimedia.org/wikipedia/en/wiki/VList>.)

Thanks,
Ludo’.
M***@inria.fr
2010-12-17 16:53:42 UTC
Permalink
Post by Ivan Maidanski
If think the same (ALL_INTERIOR_POINTERS slow down the performance of the benchmark).
I think -D ALL_INTERIOR_POINTERS should present by default when building the collector but the application which does not need pointers to objects' interiors be recognized should call GC_set_all_interior_pointers(0) at runtime before GC_INIT().
PS. The presence of NO_EXECUTE_PERMISSION should typically positively influence the speed, I think.
I confirm that too. ALL_INTERIOR_POINTERS is the one that slows down the
performance. The other two have no impact.

Cheers,
--
Manuel
Carsten Kehler Holst
2010-12-17 18:49:32 UTC
Permalink
Did you try 6.8 with ALL_INTERIOR_POINTERS turned on?
Our problem is that we have had it on both in 6.8 and in 7.2a4.
It is quite possible that the problem has to do with ALL_INTERIOR_POINTERS but something must have changed between 6.8 and 7.2.

Regards
Carsten
Visual Prolog Team
Post by M***@inria.fr
Post by Ivan Maidanski
If think the same (ALL_INTERIOR_POINTERS slow down the performance of the benchmark).
I think -D ALL_INTERIOR_POINTERS should present by default when building the collector but the application which does not need pointers to objects' interiors be recognized should call GC_set_all_interior_pointers(0) at runtime before GC_INIT().
PS. The presence of NO_EXECUTE_PERMISSION should typically positively influence the speed, I think.
I confirm that too. ALL_INTERIOR_POINTERS is the one that slows down the
performance. The other two have no impact.d
Cheers,
--
Manuel
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
This message has been scanned for malware by Websense. www.websense.com
Ivan Maidanski
2010-12-17 20:03:36 UTC
Permalink
Hi Carsten,

1. What's about GC v7.1? Please tell us between which official release you see the speed degradation.

2. Please announce the flags passed when building GC (in both cases).

Regards.
Post by Carsten Kehler Holst
Did you try 6.8 with ALL_INTERIOR_POINTERS turned on?
Our problem is that we have had it on both in 6.8 and in 7.2a4.
It is quite possible that the problem has to do with ALL_INTERIOR_POINTERS but
something must have changed between 6.8 and 7.2.
Regards
Carsten
Visual Prolog Team
Post by M***@inria.fr
Post by Ivan Maidanski
If think the same (ALL_INTERIOR_POINTERS slow down the performance of
the benchmark).
Post by M***@inria.fr
Post by Ivan Maidanski
I think -D ALL_INTERIOR_POINTERS should present by default when
building the collector but the application which does not need pointers to
objects' interiors be recognized should call
GC_set_all_interior_pointers(0) at runtime before GC_INIT().
Post by M***@inria.fr
Post by Ivan Maidanski
PS. The presence of NO_EXECUTE_PERMISSION should typically positively
influence the speed, I think.
Post by M***@inria.fr
I confirm that too. ALL_INTERIOR_POINTERS is the one that slows down the
performance. The other two have no impact.d
Cheers,
--
Manuel
djamel magri
2010-12-20 14:27:49 UTC
Permalink
Hi Ivan,

any idea when 7.2 stable version will be released?

Thanks
Post by Ivan Maidanski
Hi Carsten,
1. What's about GC v7.1? Please tell us between which official release you
see the speed degradation.
2. Please announce the flags passed when building GC (in both cases).
Regards.
Post by Carsten Kehler Holst
Did you try 6.8 with ALL_INTERIOR_POINTERS turned on?
Our problem is that we have had it on both in 6.8 and in 7.2a4.
It is quite possible that the problem has to do with
ALL_INTERIOR_POINTERS but
Post by Carsten Kehler Holst
something must have changed between 6.8 and 7.2.
Regards
Carsten
Visual Prolog Team
Post by M***@inria.fr
Post by Ivan Maidanski
If think the same (ALL_INTERIOR_POINTERS slow down the performance of
the benchmark).
Post by M***@inria.fr
Post by Ivan Maidanski
I think -D ALL_INTERIOR_POINTERS should present by default when
building the collector but the application which does not need pointers
to
Post by Carsten Kehler Holst
objects' interiors be recognized should call
GC_set_all_interior_pointers(0) at runtime before GC_INIT().
Post by M***@inria.fr
Post by Ivan Maidanski
PS. The presence of NO_EXECUTE_PERMISSION should typically positively
influence the speed, I think.
Post by M***@inria.fr
I confirm that too. ALL_INTERIOR_POINTERS is the one that slows down
the
Post by Carsten Kehler Holst
Post by M***@inria.fr
performance. The other two have no impact.d
Cheers,
--
Manuel
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Ivan Maidanski
2010-12-20 17:52:16 UTC
Permalink
Hi,

Yes, we need at least a next alpha release (the previous one is one year old). Anyway, it's up to Hans to decide...

Regards.
Post by Ludovic Courtès
Hi Ivan,
any idea when 7.2 stable version will be released?
Thanks
Post by Ivan Maidanski
Hi Carsten,
1. What's about GC v7.1? Please tell us between which official
release you
Post by Ivan Maidanski
see the speed degradation.
2. Please announce the flags passed when building GC (in both cases).
Regards.
Fri, 17 Dec 2010 19:49:32 +0100 "Carsten Kehler Holst"
Post by Carsten Kehler Holst
Did you try 6.8 with ALL_INTERIOR_POINTERS turned on?
Our problem is that we have had it on both in 6.8 and in 7.2a4.
It is quite possible that the problem has to do with
ALL_INTERIOR_POINTERS but
Post by Carsten Kehler Holst
something must have changed between 6.8 and 7.2.
Regards
Carsten
Visual Prolog Team
Post by M***@inria.fr
Post by Ivan Maidanski
If think the same (ALL_INTERIOR_POINTERS slow down the
performance of
Post by Ivan Maidanski
Post by Carsten Kehler Holst
the benchmark).
Post by M***@inria.fr
Post by Ivan Maidanski
I think -D ALL_INTERIOR_POINTERS should present by default
when
Post by Ivan Maidanski
Post by Carsten Kehler Holst
building the collector but the application which does not need
pointers
Post by Ivan Maidanski
to
Post by Carsten Kehler Holst
objects' interiors be recognized should call
GC_set_all_interior_pointers(0) at runtime before GC_INIT().
Post by M***@inria.fr
Post by Ivan Maidanski
PS. The presence of NO_EXECUTE_PERMISSION should typically
positively
Post by Ivan Maidanski
Post by Carsten Kehler Holst
influence the speed, I think.
Post by M***@inria.fr
I confirm that too. ALL_INTERIOR_POINTERS is the one that slows
down
Post by Ivan Maidanski
the
Post by Carsten Kehler Holst
Post by M***@inria.fr
performance. The other two have no impact.d
Cheers,
--
Manuel
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Loading...