Discussion:
[Gc] glibc 2.19 lock elision bug
Paul Bone
2014-06-25 01:34:57 UTC
Permalink
Hi.

I wrote earlier about a bug I was trying to find regarding lock elision in
glibc 2.19 that affects Boehm GC. I beleive that this affects all
applications that use Boehm GC, not just Mercury, as mono applications have
also been crashing since I upgraded glibc. I would like some feedback and
help with the following change:

The git branch containing this change can be found here:
https://github.com/PaulBone/bdwgc/tree/fix_tsx_bug

Thank you.
Paul Bone
2014-06-27 02:13:31 UTC
Permalink
Post by Paul Bone
Hi.
I wrote earlier about a bug I was trying to find regarding lock elision in
glibc 2.19 that affects Boehm GC. I beleive that this affects all
applications that use Boehm GC, not just Mercury, as mono applications have
also been crashing since I upgraded glibc. I would like some feedback and
https://github.com/PaulBone/bdwgc/tree/fix_tsx_bug
The patch I sent had a bug. This one fixes it and also enables the
workaround for versions greater than 2.19
Ivan Maidanski
2014-07-19 12:48:39 UTC
Permalink
Hi Paul,

Thank you for the patch.
I reviewed it and applied several changes - see https://github.com/ivmai/bdwgc/commit/757af8aa17ed107ff2915f93967087835c0100dc :
* fix version check: major >= 2 && minor >= 19 -> major > 2 || major == 2 && minor >= 19
* fix scope of workaround application - only Linux/x64 with Glibc (instead of Linux on all CPUs)
* avoid potential recursion (in case strdup/free redirected to GC_strdup/free)
* avoid extra configure checks (currently all major targets could be compiled even without running configure&make)
* code refactoring:
** move version parsing to a separate function
** use mutex static initializer if workaround is not needed
** call ABORT on pthread function call error

Please test new code.

But still I don't know whether the proposed solution is good enough to be in master. The questions are:
* is there a link pointing to a lock elision bug implementation in Glibc? If not, it is better to register such
* this workaround simply disables lock elision (causing potentially degradation of performance). which pthread primitive causes malfunction? if trylock then we could avoid it.

Regards,
Ivan
Post by Paul Bone
Post by Paul Bone
Hi.
I wrote earlier about a bug I was trying to find regarding lock elision in
glibc 2.19 that affects Boehm GC. I beleive that this affects all
applications that use Boehm GC, not just Mercury, as mono applications have
also been crashing since I upgraded glibc. I would like some feedback and
https://github.com/PaulBone/bdwgc/tree/fix_tsx_bug
The patch I sent had a bug. This one fixes it and also enables the
workaround for versions greater than 2.19
From 3c22928504e6a0d306d39a60686e963bcd04ede4 Mon Sep 17 00:00:00 2001
Date: Wed, 25 Jun 2014 11:17:50 +1000
Subject: [PATCH] Workaround Linux NTPL lock elision bug.
glibc 2.19 on Linux x86-64 platforms includes support for lock elision,
by using Intel's TSX support when it is available. Without modifying an
application this converts suitable critical sections that use mutex into
transactional memory critical sections. See http://lwn.net/Articles/534758/
If a problem occurs that means that transactional memory can't be used, such
as a system call or buffer overflow, the pthreads implementation will catch
this error and retry the critical section using a normal mutex.
I noticed that since upgrading glibc that programs using Boehm GC crash, one
of these crashes was an assertion that the owner field of a mutex was
invalid. The assertion was generated by the pthreads implementation.
I believe that there is a bug in glibc that when a mutex cannot be used
safely for transactions that some series of events causes it's owner field
to be set incorrectly (or cleared when it shouldn't be).
I've found that I can work around this problem by having Boehm GC use an
error checking mutex, which I believe doesn't use lock elision and in my
testing doesn't crash.
XXX: This work-around mostly works except for linking the feature detection
in configure.ac to the conditional compilation in pthread_support.c as there
isn't an obvious way to make it work for automake and Makefile.direct.
Could I have some help updating the build system please?
    Define GC_setup_mark_lock() This procedure creates the lock specifying a
    pthread_mutexattr_t structure. This is used to disable lock elision on
    Linux with glibc 2.19 or greater.
    If we're using Linux then check for the gnu extensions required to
    identify the version of glibc at runtime.
    Call GC_setup_mark_lock() when initialising the collector.
---
 configure.ac | 13 ++++++++++
 include/private/pthread_support.h | 2 ++
 misc.c | 3 +++
 pthread_support.c | 53 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+)
diff --git a/configure.ac b/configure.ac
index 7667949..6853e97 100644
--- a/configure.ac
+++ b/configure.ac
@@ -646,6 +646,19 @@ case "$host" in
  *) AC_MSG_RESULT(no) ;;
 esac
 
+dnl Check for specific glibc functions and definitions that we need to for
+dnl the glibc 2.19 workaround.
+HAVE_LIBC_VERSION_H=no
+HAVE_GNU_GET_LIBC_VERSION=no
+case "${host}" in
+ *-linux*)
+ AC_CHECK_HEADER([gnu/libc-version.h], HAVE_LIBC_VERSION_H=yes)
+ AC_CHECK_FUNC([gnu_get_libc_version], HAVE_GNU_GET_LIBC_VERSION=yes)
+ ;;
+esac
+AC_SUBST(HAVE_LIBC_VERSION_H)
+AC_SUBST(HAVE_GNU_GET_LIBC_VERSION)
+
 dnl Include defines that have become de facto standard.
 dnl ALL_INTERIOR_POINTERS and NO_EXECUTE_PERMISSION can be overridden
 dnl in the startup code.
diff --git a/include/private/pthread_support.h b/include/private/pthread_support.h
index 525a9aa..017f194 100644
--- a/include/private/pthread_support.h
+++ b/include/private/pthread_support.h
@@ -148,6 +148,8 @@ GC_INNER_PTHRSTART GC_thread GC_start_rtn_prepare_thread(
                                         struct GC_stack_base *sb, void *arg);
 GC_INNER_PTHRSTART void GC_thread_exit_proc(void *);
 
+GC_INNER void GC_setup_mark_lock(void);
+
 #endif /* GC_PTHREADS && !GC_WIN32_THREADS */
 
 #endif /* GC_PTHREAD_SUPPORT_H */
diff --git a/misc.c b/misc.c
index df434a1..dccf5f3 100644
--- a/misc.c
+++ b/misc.c
@@ -875,6 +875,9 @@ GC_API void GC_CALL GC_init(void)
         /* else */ InitializeCriticalSection (&GC_allocate_ml);
      }
 # endif /* GC_WIN32_THREADS */
+# if (defined(GC_PTHREADS) && !defined(GC_WIN32_THREADS))
+ GC_setup_mark_lock();
+# endif /* GC_PTHREADS */
 # if (defined(MSWIN32) || defined(MSWINCE)) && defined(THREADS)
       InitializeCriticalSection(&GC_write_cs);
 # endif
diff --git a/pthread_support.c b/pthread_support.c
index c00b93d..56fc94b 100644
--- a/pthread_support.c
+++ b/pthread_support.c
@@ -95,6 +95,10 @@
   typedef unsigned int sem_t;
 #endif /* GC_DGUX386_THREADS */
 
+#ifdef HAVE_LIBC_VERSION_H
+# include <gnu/libc-version.h>
+#endif
+
 /* Undefine macros used to redirect pthread primitives. */
 # undef pthread_create
 # ifndef GC_NO_PTHREAD_SIGMASK
@@ -1973,12 +1977,61 @@ GC_INNER void GC_lock(void)
   /* defined. */
   static pthread_mutex_t mark_mutex =
         {0, 0, 0, PTHREAD_MUTEX_ERRORCHECK_NP, {0, 0}};
+#elif defined(HAVE_GNU_GET_LIBC_VERSION) && defined(HAVE_LIBC_VERSION_H)
+ static pthread_mutex_t mark_mutex;
 #else
   static pthread_mutex_t mark_mutex = PTHREAD_MUTEX_INITIALIZER;
 #endif
 
 static pthread_cond_t builder_cv = PTHREAD_COND_INITIALIZER;
 
+GC_INNER void GC_setup_mark_lock(void)
+{
+#if defined(HAVE_GNU_GET_LIBC_VERSION) && defined(HAVE_LIBC_VERSION_H)
+ pthread_mutexattr_t attr;
+ char *version_str = NULL;
+ char *strtok_save;
+ char *version_part;
+ char *version_str;
+
+ if (0 != pthread_mutexattr_init(&attr)) {
+ goto error;
+ }
+
+ /*
+ ** Check for version 2.19 or greater.
+ */
+ version_str = strdup(gnu_get_libc_version());
+ version_part = strtok_r(version_str, ".", &strtok_save);
+ if ((NULL != version_part) && (2 <= atoi(version_part))) {
+ version_part = strtok_r(NULL, ".", &strtok_save);
+ if ((NULL != version_part) && (19 <= atoi(version_part))) {
+ /*
+ * Disable lock elision on this version of glibc.
+ */
+ if (0 != pthread_mutexattr_settype(&attr,
+ PTHREAD_MUTEX_ERRORCHECK))
+ {
+ goto error;
+ }
+ }
+ }
+
+ if (0 != pthread_mutex_init(&mark_mutex, &attr)) {
+ goto error;
+ }
+ pthread_mutexattr_destroy(&attr);
+ if (NULL != version_str) {
+ free(version_str);
+ }
+ return;
+
+ perror("Error setting up marker mutex");
+ exit(1);
+#endif /* HAVE_GNU_GET_LIBC_VERSION && HAVE_LIBC_VERSION_H */
+}
+
 GC_INNER void GC_acquire_mark_lock(void)
 {
     GC_ASSERT(GC_mark_lock_holder != NUMERIC_THREAD_ID(pthread_self()));
--
2.0.0
--
Paul Bone
_______________________________________________
bdwgc mailing list
https://lists.opendylan.org/mailman/listinfo/bdwgc
Paul Bone
2014-09-18 07:07:03 UTC
Permalink
Post by Ivan Maidanski
Hi Paul,
Thank you for the patch.
* fix version check: major >= 2 && minor >= 19 -> major > 2 || major == 2 && minor >= 19
* fix scope of workaround application - only Linux/x64 with Glibc (instead of Linux on all CPUs)
* avoid potential recursion (in case strdup/free redirected to GC_strdup/free)
* avoid extra configure checks (currently all major targets could be compiled even without running configure&make)
** move version parsing to a separate function
** use mutex static initializer if workaround is not needed
** call ABORT on pthread function call error
Please test new code.
* is there a link pointing to a lock elision bug implementation in Glibc? If not, it is better to register such
* this workaround simply disables lock elision (causing potentially degradation of performance). which pthread primitive causes malfunction? if trylock then we could avoid it.
Hi Ivan,

I've finally made time to test this, it works without any problems. I made
one minor change and that is to use PTHREAD_MUTEX_NORMAL rather than
PTHREAD_MUTEX_ERRORCHECK. My colleague Peter Wang discovered that
PTHREAD_MUTEX_NOMRAL is sufficient to disable lock elision on current
versions of glibc (and I've tested it). The benefit is that without error
checking this may be faster.

There's no glibc bug report that I know of yet but I wrote to the author of
glibc's lock elision, Andi Kleen, who isn't yet convinced that it's a bug in
glibc. He's pointed me at a patch to glibc that I should try and report
back but I haven't yet had the time.

The pthread primitive that throws the error is pthread_cond_wait.

I wrote the following to Andi:

I noticed the following error:
mercury_compile: ../nptl/pthread_mutex_lock.c:80: __pthread_mutex_cond_lock: Assertion `mutex->__data.__owner == 0' failed.

This is thrown (indirectly) from a call to pthread_cond_wait in
pthread_support.c line 2036 in Boehm GC 7.4.2 I have the same problem
with Boehm Gc 7.2. There doesn't appear to be anything suspicious about
the use of the mutex or condition variable involved here.

A different bug affecting libtirpc and mount_nfs also started occuring
when I upgraded to eglibc 2.19. When investigating this I found that
eglibc 2.19 introduced lock elision using TSX extensions and found your
article here: http://lwn.net/Articles/534758/ I use an i7-4770 processor
which supports TSX. (I chose this one because I wanted to experiment
with some lock free code myself.)

I've looked at the NTPL code and the Boehm code and I don't see anything
obvious - not that the NTPL assembler is easy to read. Given that the
assertion refers to the __owner field, and that on elision paths don't
update this field I wonder if they're related, that is that not updating
the __owner field has other issues.

This mutex and condition variable refer to the Boehm collector's marking
phase, which will read and update a lot of memory. Is the mutex code
falling back from lock elision to normal locks for this mutex and then
triggering the assertion because the owner field hasn't been updated?

So I'm afraid that this isn't enough to pin down the cause of the issue.
What I guess is happening is that the mutex starts being used with HLE, but
then due to some reason, possibly pthread_cond_wait, HLE cannot be used but
the __owner field hasn't been fixed. Or pthread_cond_wait simply doesn't
account for mutexes that use HLE and therefore may have an invalid __owner
field. I'm afraid that the glibc sources are quite opaque.

All the best.
--
Paul Bone
Ivan Maidanski
2014-09-23 21:41:31 UTC
Permalink
Hi Thomas,
Thank you. I fixed it (in a bit different way) plus another bug.
PS. Github merge request is generally preferred.
--
Tue, 23 Sept 2014, 11:18 +04:00 from Thomas Schwinge <***@codesourcery.com>:
Hi!
Post by Ivan Maidanski
Thank you for the patch.
I've hit two issues (in GNU Hurd testing, but generally applicable), for
which I'm attaching patches.  Please tell if you'd rather have me submit
these as Github pull requests.
Adding the following include directive avoids an implicit declaration
warning:
commit 6b9b4eef2622a23856af47fb9aa0eebc159960e4
Author: Thomas Schwinge < ***@codesourcery.com >
Date:   Mon Sep 22 13:34:30 2014 +0200
    Avoid compiler warning.
    
        ../master/misc.c: In function 'GC_init':
        ../master/misc.c:892:7: warning: implicit declaration of function 'GC_setup_mark_lock' [-Wimplicit-function-declaration]
    
    * misc.c: Include "private/pthread_support.h".
---
 misc.c | 1 +
 1 file changed, 1 insertion(+)
diff --git misc.c misc.c
index 32cbe24..41bd3f4 100644
--- misc.c
+++ misc.c
@@ -14,6 +14,7 @@
  */
 
 #include "private/gc_pmark.h"
+#include "private/pthread_support.h"
 
 #include <stdio.h>
 #include <limits.h>
Guarding GC_setup_mark_lock usage with PARALLEL_MARK is required to avoid
an undefined reference in non-PARALLEL_MARK configurations:
commit 4c8e0dc234e671c4d64ffab9d4f94b6cd8cc2f63
Author: Thomas Schwinge < ***@codesourcery.com >
Date:   Mon Sep 22 13:36:00 2014 +0200
    Refer to GC_setup_mark_lock only in PARALLEL_MARK code.
    
        ./.libs/libgc.so: undefined reference to `GC_setup_mark_lock'
    
    * misc.c (GC_init): Guard GC_setup_mark_lock usage.
---
 misc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git misc.c misc.c
index 41bd3f4..721a999 100644
--- misc.c
+++ misc.c
@@ -889,7 +889,7 @@ GC_API void GC_CALL GC_init(void)
         /* else */ InitializeCriticalSection (&GC_allocate_ml);
      }
 #   endif /* GC_WIN32_THREADS */
-#   if defined(GC_PTHREADS) && !defined(GC_WIN32_THREADS)
+#   if defined(PARALLEL_MARK) && defined(GC_PTHREADS) && !defined(GC_WIN32_THREADS)
       GC_setup_mark_lock();
 #   endif /* GC_PTHREADS */
 #   if (defined(MSWIN32) || defined(MSWINCE)) && defined(THREADS)
GrÌße,
 Thomas
Thomas Schwinge
2014-09-23 07:18:40 UTC
Permalink
Hi!
Post by Ivan Maidanski
Thank you for the patch.
I've hit two issues (in GNU Hurd testing, but generally applicable), for
which I'm attaching patches. Please tell if you'd rather have me submit
these as Github pull requests.

Adding the following include directive avoids an implicit declaration
warning:

commit 6b9b4eef2622a23856af47fb9aa0eebc159960e4
Author: Thomas Schwinge <***@codesourcery.com>
Date: Mon Sep 22 13:34:30 2014 +0200

Avoid compiler warning.

../master/misc.c: In function 'GC_init':
../master/misc.c:892:7: warning: implicit declaration of function 'GC_setup_mark_lock' [-Wimplicit-function-declaration]

* misc.c: Include "private/pthread_support.h".
---
misc.c | 1 +
1 file changed, 1 insertion(+)

diff --git misc.c misc.c
index 32cbe24..41bd3f4 100644
--- misc.c
+++ misc.c
@@ -14,6 +14,7 @@
*/

#include "private/gc_pmark.h"
+#include "private/pthread_support.h"

#include <stdio.h>
#include <limits.h>

Guarding GC_setup_mark_lock usage with PARALLEL_MARK is required to avoid
an undefined reference in non-PARALLEL_MARK configurations:

commit 4c8e0dc234e671c4d64ffab9d4f94b6cd8cc2f63
Author: Thomas Schwinge <***@codesourcery.com>
Date: Mon Sep 22 13:36:00 2014 +0200

Refer to GC_setup_mark_lock only in PARALLEL_MARK code.

./.libs/libgc.so: undefined reference to `GC_setup_mark_lock'

* misc.c (GC_init): Guard GC_setup_mark_lock usage.
---
misc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git misc.c misc.c
index 41bd3f4..721a999 100644
--- misc.c
+++ misc.c
@@ -889,7 +889,7 @@ GC_API void GC_CALL GC_init(void)
/* else */ InitializeCriticalSection (&GC_allocate_ml);
}
# endif /* GC_WIN32_THREADS */
-# if defined(GC_PTHREADS) && !defined(GC_WIN32_THREADS)
+# if defined(PARALLEL_MARK) && defined(GC_PTHREADS) && !defined(GC_WIN32_THREADS)
GC_setup_mark_lock();
# endif /* GC_PTHREADS */
# if (defined(MSWIN32) || defined(MSWINCE)) && defined(THREADS)


GrÌße,
Thomas

Loading...