6

Investigating gcov crashes after fork() on OS X

 3 years ago
source link: http://rachelbythebay.com/w/2011/07/12/forkcrash/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Investigating gcov crashes after fork() on OS X

I've been working on improving some code with more test coverage. One of these newer libraries calls fork() and execv() to run some external programs. Imagine my surprise when I tried to run it in coverage mode and it crashed with "Abort trap". I did a lot of digging to figure out just what was going on. This is my tale.

My original program had a lot of stuff going on. It had a whole bunch of test cases and other crazy things happening. Any of those bits of my code or the third-party testing framework library could have been responsible. They all had to go. I reduced it down to a single .cc file which had a function which would fork() and execv() something. This reproduced the problem nicely, and it meant all of that testing stuff was not to blame.

After a bunch of runs through valgrind, and gdb, and dtruss, and all of this, I realized that it was just fork() which was blowing up. I could throw away all of that execv() gunk. Great! My reproduction case shrank again. I kept banging on it. Finally, I got it down to this:

$ echo "int main() { return fork(); }" > fork.c
$ gcc --coverage -o fork fork.c
$ ./fork
Abort trap
$ gcc -o fork fork.c
$ ./fork
$ 

Yeah, now we're talking. One syscall and it all goes down in flames. Now I knew exactly what to blame: the intersection of the libgcov code and fork(). It wasn't anything else. The exact call trace implicated something they added in Snow Leopard for faster shutdowns: there was a "_vproc_transaction_end" right before that call to abort().

I went further and found the source code for libvproc.c online. It lists a bunch of functions which are called by stuff all over the system, including Apple's version of libgcov. It also showed me where things were crashing. I decided to add a call to _vproc_transaction_count() in my code both before and after the fork. It didn't look good.

$ cat fork2.c
#include <stdio.h>
#include <vproc.h>
 
int main() { 
  printf("pre-fork count: %d\n", _vproc_transaction_count());
 
  fork();
 
  printf("post-fork count: %d\n", _vproc_transaction_count());
 
  return 0;
}
$ gcc --coverage -o fork2 fork2.c
$ ./fork2
pre-fork count: 1
post-fork count: 0
post-fork count: 0
Abort trap

So not only is the child winding up in some uninitialized state, but the parent is too...? That's messed up. I decided to throw caution to the wind and call their vproc_transaction_begin() like gcov, just to see what happened.

$ cat fork3.c
#include <stdio.h>
#include <vproc.h>
 
int main() { 
  printf("pre-fork count: %d\n", _vproc_transaction_count());
 
  fork();
  vproc_transaction_begin(0);
 
  printf("post-fork count: %d\n", _vproc_transaction_count());
 
  return 0;
}
$ gcc --coverage -o fork3 fork3.c
$ ./fork3
pre-fork count: 1
post-fork count: 1
post-fork count: 1
$ 

No crash! This is probably far from ideal, but I'll take it. It's enough to add a quick preprocessor hack in my code to call that when running tests on Apple machines.

I've opened a bug with Apple. It's #9759049, but I don't think other people can see it, so that's probably of little use to anyone but me. For everyone else, enjoy the workaround.

#if defined(__APPLE__)
  if (testing_mode_) {
    vproc_transaction_begin(0);
  }
#endif

September 29, 2011: This post has an update.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK