Discussion:
[Gc] a bug in v.7.1 ?
(too old to reply)
Glauco Masotti
2011-03-01 16:35:21 UTC
Permalink
Hello. I am back here after some years.
The reason is that, after using gc-7.1 without any problem for nearly 2 years, I faced an error which I think can only be explained with a bug of the collector.
The code is complex, so that it's not easy to replicate the bug in a simple example.

It happens that I get an allocated array overwritten when writing another array!
They should have nothing to do with each other, but in fact they map to overlapping addresses in memory!

Well, I don't see how this can happen if not for the fact that the collector considers erroneously the space of the former array as no more used
and thus frees it, so that this space is allocated to the latter array.
I got the same error linking with gc-6.6, while version 6.2 works fine (I tried with MUNMAP enabled).
I would like to know if anyone has faced a similar behavior, and thus if this is a known problem,
and if corrections concerning this has been made in the latest v. 7.2, so that I can try it with a reasonable confidence.

Comments and suggestions will be greatly appeciated.

Best regards,
--- Glauco Masotti
Bruce Hoult
2011-03-01 17:38:45 UTC
Permalink
Post by Glauco Masotti
It happens that I get an allocated array overwritten when writing another array!
They should have nothing to do with each other, but in fact they map to
overlapping addresses in memory!
Well, I don't see how this can happen if not for the fact that the collector
considers erroneously the space of the former array as no more used
and thus frees it, so that this space is allocated to the latter array.
Disable GC_free() (for example with "#define GC_free(p) 0") and see if
it still happens.
Glauco Masotti
2011-03-01 19:03:47 UTC
Permalink
Hi Bruce.

I made more than that: I disabled the GC!
A lot of the code has never been tested in this condition, because for years I used to go with the GC,
however I continued to write the code as if manual memory management (MMM) could also be used.
Maybe I have some leaks, but hitherto everything seems to go right! I am impressed :-)

But I am so surprised that this problem came out after using 7.1 for years!
As for 6.6 it behaves the same, while 6.2, contrarily to what I said in my previous mail, only delays the problem.
So, if it's a bug it has been there for a long time, so it's very strange that none has found it before.
If it's not a bug, what can be the cause of this problem, given that with MMM the program behaves correctly?!

--- GM

FYI: I am using VC++ 6.0 with Win XP SP3.

----- Original Message -----
From: "Bruce Hoult" <***@hoult.org>
To: "Glauco Masotti" <***@libero.it>
Cc: <***@linux.hpl.hp.com>
Sent: Tuesday, March 01, 2011 6:38 PM
Subject: Re: [Gc] a bug in v.7.1 ?
Post by Bruce Hoult
Post by Glauco Masotti
It happens that I get an allocated array overwritten when writing another array!
They should have nothing to do with each other, but in fact they map to
overlapping addresses in memory!
Well, I don't see how this can happen if not for the fact that the collector
considers erroneously the space of the former array as no more used
and thus frees it, so that this space is allocated to the latter array.
Disable GC_free() (for example with "#define GC_free(p) 0") and see if
it still happens.
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Bruce Hoult
2011-03-01 22:14:24 UTC
Permalink
Post by Glauco Masotti
Hi Bruce.
I made more than that: I disabled the GC! A lot of the code has never been
tested in this condition, because for years I used to go with the GC,
however I continued to write the code as if  manual memory management (MMM)
could also be used.
Maybe I have some leaks, but hitherto everything seems to go right! I am impressed :-)
But I am so surprised that this problem came out after using 7.1 for years!
As for 6.6 it behaves the same, while 6.2, contrarily to what I said in my
previous mail, only delays the problem.
So, if it's a bug it has been there for a long time, so it's very strange
that none has found it before.
If it's not a bug, what can be the cause of this problem, given that with
MMM the program behaves correctly?!
If your code works with malloc() and free() then my guess that you're
using GC_free() is correct?

When you use the GC, are you mapping free() to GC_free() or to no-op?
Glauco Masotti
2011-03-02 09:00:21 UTC
Permalink
I would like it were so Bruce.

I normally map free to no-op.

There are only a few selected GC_FREE calls in the code, for some large arrays (allocated with malloc_atomic).
This part of the code is there since several years, and super-tested, so I cannot believe it's the cause of this
malfunction.
In any case, if I placed some free in excess, this would compromise MMM as well, isn't it?
On the contrary the program works fine with MMM (although I verified that there are several leaks, so that the heap gets
huge quickly),
instead with the GC this problem showed up.

Some 3-4 years ago I dug into MM and GC. I found also some problems/bugs with older versions of the collector,
then I relied on 7.1, as I said, without encountering problems till a few days ago.
Now I am blocked by this bug, and I still have no idea were it comes from.

Maybe my memory of all the issues involved with MM and GC is a bit faded out.
Anyone has other suggestions to debug and possibly solve a case like this?

--- GM


----- Original Message -----
From: "Bruce Hoult" <***@hoult.org>
To: "Glauco Masotti" <***@libero.it>
Cc: <***@linux.hpl.hp.com>
Sent: Tuesday, March 01, 2011 11:14 PM
Subject: Re: [Gc] a bug in v.7.1 ?
Post by Glauco Masotti
Hi Bruce.
I made more than that: I disabled the GC! A lot of the code has never been
tested in this condition, because for years I used to go with the GC,
however I continued to write the code as if  manual memory management (MMM)
could also be used.
Maybe I have some leaks, but hitherto everything seems to go right! I am impressed :-)
But I am so surprised that this problem came out after using 7.1 for years!
As for 6.6 it behaves the same, while 6.2, contrarily to what I said in my
previous mail, only delays the problem.
So, if it's a bug it has been there for a long time, so it's very strange
that none has found it before.
If it's not a bug, what can be the cause of this problem, given that with
MMM the program behaves correctly?!
If your code works with malloc() and free() then my guess that you're
using GC_free() is correct?

When you use the GC, are you mapping free() to GC_free() or to no-op?
Bruce Hoult
2011-03-02 09:17:58 UTC
Permalink
On Wed, Mar 2, 2011 at 10:00 PM, Glauco Masotti
Post by Glauco Masotti
I would like it were so Bruce.
I normally map free to no-op.
There are only a few selected GC_FREE calls in the code, for some large
arrays (allocated with malloc_atomic).
This part of the code is there since several years, and super-tested, so I
cannot believe it's the cause of this malfunction.
OK, but have you actually tried it without them?
Post by Glauco Masotti
In any case, if I placed some free in excess, this would compromise MMM as well, isn't it?
Maybe, maybe not.

Use-after-free bugs can be a problem in several ways.

Most obviously if the memory manager has actually reused that memory
for something else but also the adding the block to a free list can
corrupt the first few bytes of the existing data. Many programs won't
actually access that data or will silently produce incorrect results
if they do. Modifying the first few bytes will also corrupt the NEXT
link which means that when the block is eventually used again the
pointer to the first free item of that size becomes junk.

There are a number of things in the GC that can affect the timing of
when any particular block of memory might be allocated again. For
example for small objects the exact choice of sizes chosen to round up
allocations to. Or for large objects a setting such as
GC_use_entire_heap. I'm sure there are others too. The defaults or
heuristics can easily change between versions and have no effect on
correct code but result in buggy code causing problems at different
points in execution or even not at all.
Glauco Masotti
2011-03-02 20:40:58 UTC
Permalink
Alright Bruce, thanks for your suggestions.
I resumed some debug tools I developed some years ago for occasions like this, and they indicate that memory gets
corrupted somewhere (some writing out of bounds, I think), so it's very much likely that also the collector gets fooled,
sooner or later.
It's very strange that no apparent malfunctioning showed up, before and apart this.
The code is quite complex and I fear I must undergo a long and tedious debug session.
Maybe I will resort also to the debug tools of the GC.
I will keep you and the folks out there informed, if something relevant or instructive will come out from my labor.
Take care.
---- Glauco Masotti

----- Original Message -----
From: "Bruce Hoult" <***@hoult.org>
To: "Glauco Masotti" <***@libero.it>
Cc: <***@linux.hpl.hp.com>
Sent: Wednesday, March 02, 2011 10:17 AM
Subject: Re: [Gc] a bug in v.7.1 ?
Post by Bruce Hoult
On Wed, Mar 2, 2011 at 10:00 PM, Glauco Masotti
Post by Glauco Masotti
I would like it were so Bruce.
I normally map free to no-op.
There are only a few selected GC_FREE calls in the code, for some large
arrays (allocated with malloc_atomic).
This part of the code is there since several years, and super-tested, so I
cannot believe it's the cause of this malfunction.
OK, but have you actually tried it without them?
Post by Glauco Masotti
In any case, if I placed some free in excess, this would compromise MMM as
well, isn't it?
Maybe, maybe not.
Use-after-free bugs can be a problem in several ways.
Most obviously if the memory manager has actually reused that memory
for something else but also the adding the block to a free list can
corrupt the first few bytes of the existing data. Many programs won't
actually access that data or will silently produce incorrect results
if they do. Modifying the first few bytes will also corrupt the NEXT
link which means that when the block is eventually used again the
pointer to the first free item of that size becomes junk.
There are a number of things in the GC that can affect the timing of
when any particular block of memory might be allocated again. For
example for small objects the exact choice of sizes chosen to round up
allocations to. Or for large objects a setting such as
GC_use_entire_heap. I'm sure there are others too. The defaults or
heuristics can easily change between versions and have no effect on
correct code but result in buggy code causing problems at different
points in execution or even not at all.
Boehm, Hans
2011-03-04 20:19:38 UTC
Permalink
I would also recommend, on general principles, trying the CVS version. There have been quite a few bug fixes since 7.1.

Hans
-----Original Message-----
On Behalf Of Glauco Masotti
Sent: Wednesday, March 02, 2011 12:41 PM
To: Bruce Hoult
Subject: Re: [Gc] a bug in v.7.1 ?
Alright Bruce, thanks for your suggestions.
I resumed some debug tools I developed some years ago for occasions
like this, and they indicate that memory gets
corrupted somewhere (some writing out of bounds, I think), so it's very
much likely that also the collector gets fooled,
sooner or later.
It's very strange that no apparent malfunctioning showed up, before and apart this.
The code is quite complex and I fear I must undergo a long and tedious debug session.
Maybe I will resort also to the debug tools of the GC.
I will keep you and the folks out there informed, if something relevant
or instructive will come out from my labor.
Take care.
---- Glauco Masotti
----- Original Message -----
Sent: Wednesday, March 02, 2011 10:17 AM
Subject: Re: [Gc] a bug in v.7.1 ?
Post by Bruce Hoult
On Wed, Mar 2, 2011 at 10:00 PM, Glauco Masotti
Post by Glauco Masotti
I would like it were so Bruce.
I normally map free to no-op.
There are only a few selected GC_FREE calls in the code, for some
large
Post by Bruce Hoult
Post by Glauco Masotti
arrays (allocated with malloc_atomic).
This part of the code is there since several years, and super-
tested, so I
Post by Bruce Hoult
Post by Glauco Masotti
cannot believe it's the cause of this malfunction.
OK, but have you actually tried it without them?
Post by Glauco Masotti
In any case, if I placed some free in excess, this would compromise
MMM as
Post by Bruce Hoult
Post by Glauco Masotti
well, isn't it?
Maybe, maybe not.
Use-after-free bugs can be a problem in several ways.
Most obviously if the memory manager has actually reused that memory
for something else but also the adding the block to a free list can
corrupt the first few bytes of the existing data. Many programs won't
actually access that data or will silently produce incorrect results
if they do. Modifying the first few bytes will also corrupt the NEXT
link which means that when the block is eventually used again the
pointer to the first free item of that size becomes junk.
There are a number of things in the GC that can affect the timing of
when any particular block of memory might be allocated again. For
example for small objects the exact choice of sizes chosen to round
up
Post by Bruce Hoult
allocations to. Or for large objects a setting such as
GC_use_entire_heap. I'm sure there are others too. The defaults or
heuristics can easily change between versions and have no effect on
correct code but result in buggy code causing problems at different
points in execution or even not at all.
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Glauco Masotti
2011-03-05 09:17:42 UTC
Permalink
Thanks Hans.
The memory mismanagements that I have found so far were not critical. So the situation it's not clear. I will dig into
this ASAP. Unfortunately these days I am taken also by other urgent tasks.
--- Glauco Masotti

----- Original Message -----
From: "Boehm, Hans" <***@hp.com>
To: "Glauco Masotti" <***@libero.it>; "Bruce Hoult" <***@hoult.org>
Cc: <***@linux.hpl.hp.com>
Sent: Friday, March 04, 2011 9:19 PM
Subject: RE: [Gc] a bug in v.7.1 ?
Post by Boehm, Hans
I would also recommend, on general principles, trying the CVS version. There have been quite a few bug fixes since
7.1.
Hans
-----Original Message-----
On Behalf Of Glauco Masotti
Sent: Wednesday, March 02, 2011 12:41 PM
To: Bruce Hoult
Subject: Re: [Gc] a bug in v.7.1 ?
Alright Bruce, thanks for your suggestions.
I resumed some debug tools I developed some years ago for occasions
like this, and they indicate that memory gets
corrupted somewhere (some writing out of bounds, I think), so it's very
much likely that also the collector gets fooled,
sooner or later.
It's very strange that no apparent malfunctioning showed up, before and apart this.
The code is quite complex and I fear I must undergo a long and tedious debug session.
Maybe I will resort also to the debug tools of the GC.
I will keep you and the folks out there informed, if something relevant
or instructive will come out from my labor.
Take care.
---- Glauco Masotti
----- Original Message -----
Sent: Wednesday, March 02, 2011 10:17 AM
Subject: Re: [Gc] a bug in v.7.1 ?
Post by Bruce Hoult
On Wed, Mar 2, 2011 at 10:00 PM, Glauco Masotti
Post by Glauco Masotti
I would like it were so Bruce.
I normally map free to no-op.
There are only a few selected GC_FREE calls in the code, for some
large
Post by Bruce Hoult
Post by Glauco Masotti
arrays (allocated with malloc_atomic).
This part of the code is there since several years, and super-
tested, so I
Post by Bruce Hoult
Post by Glauco Masotti
cannot believe it's the cause of this malfunction.
OK, but have you actually tried it without them?
Post by Glauco Masotti
In any case, if I placed some free in excess, this would compromise
MMM as
Post by Bruce Hoult
Post by Glauco Masotti
well, isn't it?
Maybe, maybe not.
Use-after-free bugs can be a problem in several ways.
Most obviously if the memory manager has actually reused that memory
for something else but also the adding the block to a free list can
corrupt the first few bytes of the existing data. Many programs won't
actually access that data or will silently produce incorrect results
if they do. Modifying the first few bytes will also corrupt the NEXT
link which means that when the block is eventually used again the
pointer to the first free item of that size becomes junk.
There are a number of things in the GC that can affect the timing of
when any particular block of memory might be allocated again. For
example for small objects the exact choice of sizes chosen to round
up
Post by Bruce Hoult
allocations to. Or for large objects a setting such as
GC_use_entire_heap. I'm sure there are others too. The defaults or
heuristics can easily change between versions and have no effect on
correct code but result in buggy code causing problems at different
points in execution or even not at all.
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Glauco Masotti
2011-03-06 18:18:44 UTC
Permalink
I am at it.
As suggested by Hans, I stepped to the last version.
The problem is still there, so the chances that it's a bug in my code are high.
In order to find it I need to resort also to the debugging facilities of the collector. But... I don't get anything in
the log file!?
I remember that with older versions (e.g. 6.2, 6.6) it was easy to get a very verbose gc.log file.
With 7.1, and 7.2 the default log file is <name of the executable>.log, right? But there is nothing in it, a part what I
explicitly wrote there with GC_WARNING (!?).
I defined GC_DEBUG before gc.h, and GC_QUIET=0.
In the documentation there is apparently nothing new regarding this issues.
What did I do wrong or forgot to do?
Sorry to bother you, but perhaps I am getting too old for this tasks :-( and a simple hint from you could save me a lot
of time.
--- GM

----- Original Message -----
From: "Boehm, Hans" <***@hp.com>
To: "Glauco Masotti" <***@libero.it>; "Bruce Hoult" <***@hoult.org>
Cc: <***@linux.hpl.hp.com>
Sent: Friday, March 04, 2011 9:19 PM
Subject: RE: [Gc] a bug in v.7.1 ?
Post by Boehm, Hans
I would also recommend, on general principles, trying the CVS version. There have been quite a few bug fixes since
7.1.
Hans
-----Original Message-----
On Behalf Of Glauco Masotti
Sent: Wednesday, March 02, 2011 12:41 PM
To: Bruce Hoult
Subject: Re: [Gc] a bug in v.7.1 ?
Alright Bruce, thanks for your suggestions.
I resumed some debug tools I developed some years ago for occasions
like this, and they indicate that memory gets
corrupted somewhere (some writing out of bounds, I think), so it's very
much likely that also the collector gets fooled,
sooner or later.
It's very strange that no apparent malfunctioning showed up, before and apart this.
The code is quite complex and I fear I must undergo a long and tedious debug session.
Maybe I will resort also to the debug tools of the GC.
I will keep you and the folks out there informed, if something relevant
or instructive will come out from my labor.
Take care.
---- Glauco Masotti
----- Original Message -----
Sent: Wednesday, March 02, 2011 10:17 AM
Subject: Re: [Gc] a bug in v.7.1 ?
Post by Bruce Hoult
On Wed, Mar 2, 2011 at 10:00 PM, Glauco Masotti
Post by Glauco Masotti
I would like it were so Bruce.
I normally map free to no-op.
There are only a few selected GC_FREE calls in the code, for some
large
Post by Bruce Hoult
Post by Glauco Masotti
arrays (allocated with malloc_atomic).
This part of the code is there since several years, and super-
tested, so I
Post by Bruce Hoult
Post by Glauco Masotti
cannot believe it's the cause of this malfunction.
OK, but have you actually tried it without them?
Post by Glauco Masotti
In any case, if I placed some free in excess, this would compromise
MMM as
Post by Bruce Hoult
Post by Glauco Masotti
well, isn't it?
Maybe, maybe not.
Use-after-free bugs can be a problem in several ways.
Most obviously if the memory manager has actually reused that memory
for something else but also the adding the block to a free list can
corrupt the first few bytes of the existing data. Many programs won't
actually access that data or will silently produce incorrect results
if they do. Modifying the first few bytes will also corrupt the NEXT
link which means that when the block is eventually used again the
pointer to the first free item of that size becomes junk.
There are a number of things in the GC that can affect the timing of
when any particular block of memory might be allocated again. For
example for small objects the exact choice of sizes chosen to round
up
Post by Bruce Hoult
allocations to. Or for large objects a setting such as
GC_use_entire_heap. I'm sure there are others too. The defaults or
heuristics can easily change between versions and have no effect on
correct code but result in buggy code causing problems at different
points in execution or even not at all.
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
_______________________________________________
Gc mailing list
http://www.hpl.hp.com/hosted/linux/mail-archives/gc/
Glauco Masotti
2011-03-07 19:00:25 UTC
Permalink
I compiled the GC as a static library with GC_ASSERTIONS, GC_PRINT_VERBOSE_STATS defined, so that I got the log. Nothing
wrong or alarming however.

I followed the steps suggested in debugging.html, in the documentation.
After GC_gc_no++; I placed a call to GC_is_marked(GC_base(problematic_object));
Where problematic_object is the object that I get overwritten.

problematic_object = vector(5100, 24359);

where vector is taken from "Numerical Recipes in C" and is defined as:

float *vector(long nl, long nh)
/* allocate a float vector with subscript range v[nl..nh] */
{
float *v;
v=(float *)malloc_atomic((size_t) ((nh-nl+1+NR_END)*sizeof(float)));
if (!v) nrerror("allocation failure in vector()");
return v-nl+NR_END;
}

It results that the object has been marked! Thus it's NOT about to be reclaimed right?
So the object is still alive after the collection.
Moreover it contains correct data till I get to a call to:

e = dvector(1, 999) // the double version of vector
This allocation causes problematic_object to be overwritten! <<<
How is it possible that malloc_atomic returns an address which is in the range of a live object?
Perhaps, at this point, the only explanation is that the data structures used by the allocator has been messed up.
But the GC senses everything as normal, no assertion is violated and no warning is issued!
What else can I try to find out (and possibly fix) what's wrong?
I need your help in order to proceed. I am at a dead point.

--- Glauco Masotti
Klaus Treichel
2011-03-07 19:48:53 UTC
Permalink
Hi Giauco,
Post by Glauco Masotti
I compiled the GC as a static library with GC_ASSERTIONS, GC_PRINT_VERBOSE_STATS defined, so that I got the log. Nothing
wrong or alarming however.
I followed the steps suggested in debugging.html, in the documentation.
After GC_gc_no++; I placed a call to GC_is_marked(GC_base(problematic_object));
Where problematic_object is the object that I get overwritten.
problematic_object = vector(5100, 24359);
float *vector(long nl, long nh)
/* allocate a float vector with subscript range v[nl..nh] */
{
float *v;
v=(float *)malloc_atomic((size_t) ((nh-nl+1+NR_END)*sizeof(float)));
if (!v) nrerror("allocation failure in vector()");
return v-nl+NR_END;
}
Is it possible that the value returned is below the real start of the
float vector?
If yes then there is no pointer to the just alocated memory and the
collector will reclaim that chunk of memory.
Post by Glauco Masotti
It results that the object has been marked! Thus it's NOT about to be reclaimed right?
So the object is still alive after the collection.
e = dvector(1, 999) // the double version of vector
This allocation causes problematic_object to be overwritten! <<<
How is it possible that malloc_atomic returns an address which is in the range of a live object?
Perhaps, at this point, the only explanation is that the data structures used by the allocator has been messed up.
But the GC senses everything as normal, no assertion is violated and no warning is issued!
What else can I try to find out (and possibly fix) what's wrong?
I need your help in order to proceed. I am at a dead point.
Klaus
Glauco Masotti
2011-03-07 23:17:54 UTC
Permalink
Post by Klaus Treichel
Post by Glauco Masotti
Where problematic_object is the object that I get overwritten.
problematic_object = vector(5100, 24359);
float *vector(long nl, long nh)
/* allocate a float vector with subscript range v[nl..nh] */
{
float *v;
v=(float *)malloc_atomic((size_t) ((nh-nl+1+NR_END)*sizeof(float)));
if (!v) nrerror("allocation failure in vector()");
return v-nl+NR_END;
}
Is it possible that the value returned is below the real start of the
float vector?
If yes then there is no pointer to the just alocated memory and the
collector will reclaim that chunk of memory.
Thanks Klaus for the suggestion. I was thinking about this too. The case is certainly so!

However this means that the standard routines of Numerical Recipes (NR) cannot be used with the collector!!!
I think that NR algorithms are pretty widespread, so it's surprising that nobody hit this issue hitherto.
I haven't searched exhaustively all the messages in the archive, but perhaps Hans should remember if this problem has
come out before.

I have a lot of stuff derived from NR algorithms, and I have been using it together with the GC for several years
without having encountered problems so far.
But it's true that almost always the allocated vectors or matrices have their starting index = 1, or sometimes = 0.
In these cases (given NR_END = 1), the returned pointer either coincides with what is returned by malloc_atomic or
points inside that area, so this should be fine for the collector.
The allocation routines however are generalized to allow the definition of arrays with arbitrary subscript range.
As long as we use MMM this causes no problem, although it's not ANSI compliant (there is a discussion of this issue in
the book).

However this is the first time that I defined a vector with a large initial subscript! And it turned out that the
collector gets fooled by this.
I modified the code using a subscript range starting from 0, and . everything ran smoothly!
So this bug and a solution have been found, but I should place limitations in the allowed subscript range for NR
allocation routines.

Given the popularity of NR code, maybe this case deserves a citation in the documentation. What do you think?

Best regards,
--- Glauco Masotti
Boehm, Hans
2011-03-08 01:51:56 UTC
Permalink
Post by Glauco Masotti
Given the popularity of NR code, maybe this case deserves a citation in
the documentation. What do you think?
A good idea. I was aware of that issue with the NR code. As Bruce points out, the C and C++ standards have always been clear that this is not conforming, so it was really an unfortunate choice for expressing those algorithms.

Hans
Richard O'Keefe
2011-03-08 03:03:02 UTC
Permalink
Post by Glauco Masotti
However this means that the standard routines of Numerical Recipes (NR) cannot be used with the collector!!!
This is hardly surprising. The authors deliberately chose to do something
that the C standard was careful (for excellent reason having nothing to do with GC)
to prohibit. That the routines work on any platform is a lucky accident.
Bruce Hoult
2011-03-07 23:38:59 UTC
Permalink
 return v-nl+NR_END;
This is not standards conformant C. POinters to objects must point to
within the object, or 1 item past the end.

It will work fine on pretty much any modern machine and OS, but will
fail hard on one that uses segments and detects out of bounds on them.

Note that even with a traditional malloc(), you can't pass the return
value to free() – you have to pass ptr+nl-NR_END (i.e. v).

You need to make sure that v is stored somewhere that the GC can see
it. Probably the cleanest way is to add an extra argument to vector in
which you pass the address of a local variable which you can
subsequently ignore. This will work unless the array is passed out
from the function that allocates it.

Or, you can rewrite your code to only use arrays where nl is <=
NR_END, as you discovered.
Glauco Masotti
2011-03-08 08:39:19 UTC
Permalink
First of all let me thank all those who have contributed to clarify this issue.
I am also glad that this thing will probably be highlighted in the documentation.
POinters to objects must point to within the object, or 1 item past the end.
1 item past the end is fine? I forgot this detail. (***)
Note that even with a traditional malloc(), you can't pass the return
value to free() â?" you have to pass ptr+nl-NR_END (i.e. v).
In fact the corresponding free for use with MMM is defined as:

void free_vector(float *v, long nl, long nh)
/* free a float vector allocated with vector() */
{
free((FREE_ARG) (v+nl-NR_END), (nh-nl+1+NR_END)*sizeof(float));
}
You need to make sure that v is stored somewhere that the GC can see
it. Probably the cleanest way is to add an extra argument to vector in
which you pass the address of a local variable which you can
subsequently ignore. This will work unless the array is passed out
from the function that allocates it.
Hmm...
Or, you can rewrite your code to only use arrays where nl is <= NR_END, as you discovered.
I think this is preferable. The complications of the other case are not worth the trouble.
In principle, although somewhat weird, subscripts could even go negative, as long as (***) is satisfied.
But a safe and reasonable assumption could certainly be: 0 <= nl <= NR_END (in practice 0 or 1 :-|

--- Glauco Masotti

David Spencer
2011-03-08 00:01:42 UTC
Permalink
Post by Glauco Masotti
I followed the steps suggested in debugging.html, in the
documentation.
After GC_gc_no++; I placed a call to
GC_is_marked(GC_base(problematic_object));
Where problematic_object is the object that I get overwritten.
problematic_object = vector(5100, 24359);
float *vector(long nl, long nh)
/* allocate a float vector with subscript range v[nl..nh] */
{
float *v;
v=(float *)malloc_atomic((size_t) ((nh-nl+1+NR_END)*sizeof(float)));
if (!v) nrerror("allocation failure in vector()");
return v-nl+NR_END;
}
Unless NR_END is greater than nl, the behavior of the expression in
the return statement is undefined. The result of the pointer
subtraction (v-nl+NR_END) does not point to the same object as the
pointer operand (v).

There is no valid pointer to the allocated object after the return.
The object is therefore highly problematic.

The bug is in the function, not the collector.
Continue reading on narkive:
Loading...