[Gc] using a lot of finalizers?

Discussion:

Basile Starynkevitch

2016-05-25 08:20:32 UTC

Hello,

In my MELT monitor (see https://github.com/bstarynk/melt-monitor-2015/
for details) I am having a lot finalizers. You might consider (GC-wise)
that it is sort-of some Lisp (but multi-threaded, with a small thread
pool of about half a dozen threads) interpreter.

Basically, I have (conceptually) a lot of immutable GC-ed values
(allocated with GC_MALLOC) and some mutable GC-ed "items" (also
allocated with GC_MALLOC). Those items are registering a finalizer with
GC_REGISTER_FINALIZER_IGNORE_SELF at creation time.

In the event I would have a large (e.g. a dozen of gigabytes) GC heap,
is it acceptable to have many (e.g. half a million) of items with
finalizers and much more (e.g. several millions) values.

So my question becomes: can I have many items, each having registered a
finalizer, or is it not acceptable performance-wise?

Or should I put a lot of design effort to avoid finalizers?

My blind guess is that since "gc/gc_cpp.h" is using
GC_REGISTER_FINALIZER_IGNORE_SELF it should be acceptable to have a lot
of finalizers.

Regards.

--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Bruce Hoult

2016-05-25 09:32:52 UTC

Permalink

Hi Basile,

There is no big scaling problem with having huge numbers of objects with
finalizers.

There is a hash table holding a reference to each object with a finalizer.
The table is grown as necessary, but access will remain O(1).

On each GC, every object in the hash table is walked to see whether it was
unreachable and needs finalizing. This is O(n) on the number of objects
with finalizers. The rest of the gc work is generally O(n) on the number of
reachable objects.

Objects with finalizers require two GC cycles to be collected. The
finalizer is run on the first GC, and the finalizer forgotten. On the 2nd
GC the object will (usually) be a normal unreachable object and be
collected.

But the question that must be asked is: WHY do you want so many finalizers?
What will they do? This is usually a sign of a bad design. Finalizers
should usually be associated with resources external to the program: files,
or network ports, or GUI windows or the like.

On Wed, May 25, 2016 at 11:20 AM, Basile Starynkevitch <

Post by Basile Starynkevitch
Hello,
In my MELT monitor (see https://github.com/bstarynk/melt-monitor-2015/
for details) I am having a lot finalizers. You might consider (GC-wise)
that it is sort-of some Lisp (but multi-threaded, with a small thread pool
of about half a dozen threads) interpreter.
Basically, I have (conceptually) a lot of immutable GC-ed values
(allocated with GC_MALLOC) and some mutable GC-ed "items" (also allocated
with GC_MALLOC). Those items are registering a finalizer with
GC_REGISTER_FINALIZER_IGNORE_SELF at creation time.
In the event I would have a large (e.g. a dozen of gigabytes) GC heap, is
it acceptable to have many (e.g. half a million) of items with finalizers
and much more (e.g. several millions) values.
So my question becomes: can I have many items, each having registered a
finalizer, or is it not acceptable performance-wise?
Or should I put a lot of design effort to avoid finalizers?
My blind guess is that since "gc/gc_cpp.h" is using
GC_REGISTER_FINALIZER_IGNORE_SELF it should be acceptable to have a lot of
finalizers.
Regards.
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***
_______________________________________________
bdwgc mailing list
https://lists.opendylan.org/mailman/listinfo/bdwgc
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Basile Starynkevitch

2016-05-25 17:14:49 UTC

Permalink

Post by Bruce Hoult
Hi Basile,
There is no big scaling problem with having huge numbers of objects
with finalizers.
There is a hash table holding a reference to each object with a
finalizer. The table is grown as necessary, but access will remain O(1).
On each GC, every object in the hash table is walked to see whether it
was unreachable and needs finalizing. This is O(n) on the number of
objects with finalizers. The rest of the gc work is generally O(n) on
the number of reachable objects.
Objects with finalizers require two GC cycles to be collected. The
finalizer is run on the first GC, and the finalizer forgotten. On the
2nd GC the object will (usually) be a normal unreachable object and be
collected.
On Wed, May 25, 2016 at 11:20 AM, Basile Starynkevitch
In my MELT monitor (see
https://github.com/bstarynk/melt-monitor-2015/ for details) I am
having a lot finalizers. You might consider (GC-wise) that it is
sort-of some Lisp (but multi-threaded, with a small thread pool of
about half a dozen threads) interpreter.
Basically, I have (conceptually) a lot of immutable GC-ed values
(allocated with GC_MALLOC) and some mutable GC-ed "items" (also
allocated with GC_MALLOC). Those items are registering a finalizer
with GC_REGISTER_FINALIZER_IGNORE_SELF at creation time.
In the event I would have a large (e.g. a dozen of gigabytes) GC
heap, is it acceptable to have many (e.g. half a million) of items
with finalizers and much more (e.g. several millions) values.
So my question becomes: can I have many items, each having
registered a finalizer, or is it not acceptable performance-wise?

To be more precise: Assuming you know a tiny bit of Scheme (or Lisp),
each MELT-monitor item is a Scheme-symbol like
thing. It has a unique printable name. For example both im &
im__6u7e2UmXWwKFsm are printable names of items (and I would talk here
of a given item by using its printable name). They both share the same
radix "im". The first item im has no suffix, but the second item
im__6u7e2UmXWwKFsm has __6u7e2UmXWwKFsm as a suffix. Actually, the
suffix is some mangling (or name encoding) of a random 96 bits
suffix-number (with 0 corresponding to the lack of suffix). And that
suffix is unique for a given radix. So for the radix "ix" there is only
one item of suffix __6u7e2UmXWwKFsm.

The printable names of items are unique, and are useful to persist a
subpart of the heap in some textual file(s). I want to be able to find
an item, given its radix and its suffix-number.

Post by Bruce Hoult
But the question that must be asked is: WHY do you want so many
finalizers? What will they do?
This is usually a sign of a bad design.

I have a "weak symbol table" mapping radix & suffix-number to symbols.
So I am able, given a string like "im__I am managing it by having,
inside each radix, some big array (allocated with GC_MALLOC_ATOMIC)
holding some hash-table. The item's finalizer would remove the entry in
that hash-table.

Since each item is finalized, I am sometimes able to manage payloads
inside items which wants to be finalized, such as files or HTML identifiers.

Do you have some other suggestion to implement that?

Regards.

--
Basile STARYNKEVITCHhttp://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Basile Starynkevitch

2016-05-25 17:37:19 UTC

Permalink

On 05/25/2016 07:14 PM, Basile Starynkevitch wrote:
[...]

Post by Basile Starynkevitch
To be more precise: Assuming you know a tiny bit of Scheme (or Lisp),
each MELT-monitor item is a Scheme-symbol like
thing. It has a unique printable name. For example both im &
im__6u7e2UmXWwKFsm are printable names of items (and I would talk here
of a given item by using its printable name). They both share the same
radix "im". The first item im has no suffix, but the second item
im__6u7e2UmXWwKFsm has __6u7e2UmXWwKFsm as a suffix. Actually, the
suffix is some mangling (or name encoding) of a random 96 bits
suffix-number (with 0 corresponding to the lack of suffix). And that
suffix is unique for a given radix. So for the radix "ix" there is
only one item of suffix __6u7e2UmXWwKFsm.
The printable names of items are unique, and are useful to persist a
subpart of the heap in some textual file(s). I want to be able to find
an item, given its radix and its suffix-number.

Post by Bruce Hoult
But the question that must be asked is: WHY do you want so many
finalizers? What will they do?
This is usually a sign of a bad design.

Sorry for the typo in the above paragraph: Should be:

I have a "weak symbol table" mapping radix & suffix-number to symbols.
So I am able, given a string like "im__6u7e2UmXWwKFsm" to find the item
of that printable name (by first converting the suffix I
"__6u7e2UmXWwKFsm" to the equivalent suffix-number). I am managing that
"weak symbol table" by having, inside each radix, some big array
(allocated with GC_MALLOC_ATOMIC) holding some hash-table. And of course
I have a dictionnary of radixes. The item's finalizer would remove the
entry in that hash-table (specific to its radix).

Post by Basile Starynkevitch
Since each item is finalized, I am sometimes able to manage payloads
inside items which wants to be finalized, such as files or HTML identifiers.
Do you have some other suggestion to implement that?

Regard, thanks for reading!