Hi Ivan,
Thanks for the response!
Post by Ivan MaidanskiCould you please summarize your past writings about the
implementations difficulties? Thanks.
Last weekend I rewrote Guile's weak maps and weak sets, and so these
things are fresh in my mind.
1. You can't reliably use the "buckets and chains" strategy to
implement weak hash tables with libgc.
The reason is that with this strategy, creating an entry in the
table allocates memory: typically one spine segment and one rib
segment. Typically you would allocate the rib with a GC kind that
only marks the value if the key has not been nulled out.
However, if the GC collects the key and nulls out one or both parts
of the rib, or even the link from the spine to the rib, you are
still left with at least an extra spine segment.
You can lazily collect spine segments, but that will only happen
when your hash table gets full, relatively speaking. Or you can go
over the whole table and remove dead ribs and their corresponding
spine segments, but this is expensive. However, when would you do
this? The right time to do it is after GC, when disappearing links
are nulled out, but there is no after-GC hook. More on this
later.
Consider a use pattern like:
foo = new_weak_table ();
while (1)
weak_table_set (foo, new_key (), new_value());
If you never go over the whole table to remove links, *there is
practically no upper bound on your memory usage* because you will
keep adding entries to the table, GC happens less and less
frequently relative to the insertion rate because of the rising
heap size (due to spines, ribs, and the keys and values that
haven't yet been collected), and the dead mappings still take up
memory (via the spines and ribs). It is actually worse if you have
the ability to resize tables, for some loads.
I haven't proven that result mathematically but I'm convinced that
weak maps implemented as buckets-and-chains hash tables is a bad
idea with libgc. So the new weak maps I used in Guile are
open-addressed linear-probe tables with robin hood collision
resolution. That solves this dead-entries-occupying-memory issue.
2. Keeping an accurate idea of the load factor of the table is
important to know when to resize the table, either up or down.
However if the GC is constantly removing entries, then your idea of
how many elements are in the table is imprecise. In a
multithreaded program there is no way to know how many elements are
actually in your table. You can know the upper bound but not the
number of those elements that have been removed.
This is important in the workload for (1). Say your load has
reached the point at which you would resize up. But what if you
actually don't need to do so, because there are enough nulled out
entries? Unfortunately you can't know that.
Also it's not a good idea to traverse the table looking for nulled
entries before making the resize decision, and not only because
it's expensive to visit each entry. The real reason is that if a
large portion of your keys are really alive, you could get into
tortoise-and-hare races in which reaping the dead entries in the
table does allow you to move forward, but you end up traversing the
whole table more times than you would if you just resized anyway.
The only way to keep an accurate count in a weak table is to have
integration with the GC. If the GC nulls out an element in a
particular memory region, it could decrement a counter. But there
is no support for this in libgc.
3. The proper time to update the count for a weak table is after GC,
as I have said. But there libgc does not have any notifiers that
GC has happened. It has the start_callback, but you can't
generally do anything there; what I need is a stop_callback that
happens outside the alloc lock, in which I can allocate memory.
Currently with newer GC I use the start_callback to tell Guile to
register a procedure to be called "whenever", which will happen
after GC. With older GC I have to play a trick with finalizers;
it's in the attached code, and is terrible.
4. Moving items in an open-addressed table requires moving
disappearing links. There is no way to do this currently in libgc:
you have to unregister one and register another. This allocates
memory.
5. Accessing elements of a weak table is very expensive, as it has to
happen in the alloc lock. If weak tables were part of GC, it could
make it cheaper to access elements, maybe. But I won't push this,
as I also need custom equality predicates. Currently traversing a
set with an occupancy of about 4000 of 7000 slots, and reshuffling
the table as nulled elements are removed, takes about 100
nanoseconds per element. I'm estimating that based on the total of
about 300 or 500 microseconds to reap the table, which is a lot.
As you can imagine there is a fair amount of variance depending on
the number of items that need to be moved.
6. There is no way around the fact that values in a weak-key map are
strong references, and can thus keep their keys alive
indefinitely. This issue is covered in the ephemeron paper I
linked to, and could also be solved by the GC.
I have attached my new weak set implementation, as it shows most of the
issues. Weak maps are the same except the entry has a value as well,
and for singly-weak tables (key-weak or value-weak, but not
doubly-weak), we have to use a special GC kind to get the marking
right. (Doubly weak tables can just use atomic memory regions.)
Thanks for reading. The implementation follows. It's pretty
straightforward but you'll need a couple of typedefs to get it:
/* Function that returns nonzero if the given object is the one we are
looking for. */
typedef int (*scm_t_set_predicate_fn) (SCM obj, void *closure);
/* Function to fold over the elements of a set. */
typedef SCM (*scm_t_set_fold_fn) (void *closure, SCM key, SCM result);