ARTICLES

LLDB support for fork(2) and vfork(2)

By Michał Górny

April 15, 2021

- 19 minutes read - 3955 words

Moritz Systems have been contracted by the FreeBSD Foundation to continue our work on modernizing the LLDB debugger’s support for FreeBSD.

The complete Project Schedule is divided into four milestones, each taking approximately one month:

M1 Switch all the non-x86 CPUs to the LLDB FreeBSD Remote-Process-Plugin.
M2 Iteration over regression tests on ARM64 and fixing known bugs, marking the non-trivial ones for future work. Remove the old local-only Process-Plugin.
M3 Implement follow-fork and follow-vfork operations on par with the GNU GDB support. Cover the functionality with LLDB regression tests.
M4 Implement SaveCore functionality for FreeBSD and enhance the regression testing of core files in LLDB. Update the FreeBSD manual.

This report summarizes the work done on M3. In it, we’ve enhanced LLDB to introduce fundamental support for dealing with forks (clone(2), fork(2) and vfork(2) calls) done within the debugged process. We’ve chosen to implement one of the two models used by GDB, the follow-fork-mode model as requested in LLDB PR#17972. We’ve intended to follow the protocol thoroughly, ensuring compatibility between LLDB and GDB.

Upon being notified of a fork, the debugger can either continue following the parent process and detach the new child, or switch to tracing the child and detach the parent. The behavior is controlled by a target.process.follow-fork-mode setting, similarly to GDB. In either case, the debugger ensures that software and hardware breakpoints, as well as watchpoints, are removed from the process being detached and present in the one being traced.

While our work does not target full support for tracing multiple processes simultaneously, it lays the groundwork necessary to implement such support in the future. Nevertheless, even in this form this is a useful addition, as it permits e.g. debugging programs spawned via a wrapper program. Additionally, it prevents forked children from crashing when breakpoints leak from parent.

Starting subprocesses on Unix derivatives

fork(2), vfork(2), exec(3) and posix_spawn(3)

There are two main approaches to starting programs as subprocesses on Unix derivatives. The traditional approach is to use a two-step process, consisting of creating a new child process (using fork(2) or vfork(2)) and then replacing its executable image inside the process using the exec(3) family of functions. The alternative approach is to use a single function that takes care of starting the new program as a subprocess, represented by the posix_spawn(2) function, as well as implemented in convenience libraries such as GLib (g_spawn* family of functions). We will discuss the functions used in both approaches, as well as example uses of them.

fork(2) and vfork(2) model

The exec(3) family of functions is used to replace the current executable image with another one. In other words, they start a new program in place of the one currently running. It is important to understand that the current program is not being terminated — it is being immediately replaced. The exit handlers are not run, the process memory is freed and replaced, the file descriptors are generally inherited. The new process uses the same PID as the original process, and its parent is not notified of the process termination. For more details, see exec in POSIX.

The fork(2) function creates a new child process that is a copy of the current process, and continues its execution from the call to fork(2). It can be used to create a new process to start another executable but it can also be used to run a different code path in the current executable in parallel. The child process receives a copy of its parent process' memory, file descriptors, etc. However, modern kernels use a technique called Copy-on-Write (CoW) to avoid physically copying the memory until it is actually modified. Usually, it is expected that the child process does not return from the current function and instead either calls exec(3) or terminates via _exit(2) to avoid calling parent process' exit handlers. For more details, see fork in POSIX.

The vfork(2) function is a special variant of fork(2) that is meant as an optimization for the common case of running fork(2) followed by exec(3). Unlike the other function, it shares memory with its parent process, and therefore must not interfere with the parent’s resources. The parent process is suspended until the child process calls exec(3) or otherwise terminated. Today, fork(2) is efficient enough for the vast majority of programs. vfork(2) can still be used as an optimization technique but it should be used carefully. It is not specified in POSIX. On some implementations, it is an alias to fork(2). The Secure Programming for Linux and Unix HOWTO suggests avoiding vfork.

Between the call to fork(2) and the subsequent call to exec(3) additional setup tasks can be performed. For example, the standard input, output and error file descriptors can be replaced in order to pipe the data from/into the subprocess.

posix_spawn(3) model

POSIX.1-2001 introduced a new posix_spawn(3) function that starts a new program as a subprocess in a single call. The function supports most common setup tasks, notably operations on file descriptors and setting additional process properties.

Please consider the following example program that start gzip as a subprocess and uses a pair of pipes to respectively send input to it and receive compressed data. The program includes #if-ed variants for traditional fork(2)+exec(3) approach, as well as posix_spawn(3). Please note that for simplicity, the program does not include the necessary error handling.

#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>

int main() {
	int read_pipe[2];
	int write_pipe[2];

	/* create pipes used to communicate with the subprocess */
	pipe(read_pipe);
	pipe(write_pipe);

#if defined(USE_FORK_EXEC)

	/* fork the process */
	pid_t cpid = fork();
	if (cpid == 0) {
		/* child process */

		/* close the ends of the pipes used by parent */
		close(read_pipe[0]);
		close(write_pipe[1]);

		/* replace stdin and stdout with the pipes */
		close(0);
		close(1);
		dup2(write_pipe[0], 0);
		dup2(read_pipe[1], 1);

		/* execute the gzip compressor */
		execlp("gzip", "gzip", "-9", NULL);
		/* if execlp fails, force exit */
		_exit(1);
	}

#else

	/* spawn the subprocess */
	posix_spawn_file_actions_t file_actions;
	posix_spawn_file_actions_init(&file_actions);
	posix_spawn_file_actions_addclose(&file_actions, read_pipe[0]);
	posix_spawn_file_actions_addclose(&file_actions, write_pipe[1]);
	posix_spawn_file_actions_addclose(&file_actions, 0);
	posix_spawn_file_actions_addclose(&file_actions, 1);
	posix_spawn_file_actions_adddup2(
			&file_actions, write_pipe[0], 0);
	posix_spawn_file_actions_adddup2(
			&file_actions, read_pipe[1], 1);

	pid_t cpid;
	char* child_argv[] = {"gzip", "-9", NULL};
	posix_spawn_file_actions_t* file_actions_p = &file_actions;
	posix_spawnp(
			&cpid, "gzip", file_actions_p, NULL, child_argv, NULL);
	posix_spawn_file_actions_destroy(file_actions_p);

#endif

	/* close the ends of the pipes used by child */
	close(read_pipe[1]);
	close(write_pipe[0]);

	/* write some data, then close the fd to force flush */
	char data[] = "Hello world\n";
	write(write_pipe[1], data, sizeof(data));
	close(write_pipe[1]);

	/* read the compressed data */
	char buf[512];
	ssize_t rd = read(read_pipe[0], buf, sizeof(buf));
	close(read_pipe[0]);

	/* output the compressed data to stdout */
	write(1, buf, rd);

	/* wait for the child to finish */
	int wstat;
	wait(&wstat);
	fprintf(stderr, "gzip exited with %d status\n",
			WEXITSTATUS(wstat));

	return 0;
}

Other process-related functions

The rfork(2) function can be used to create a child process with a more fine-grained control over its semantics, particularly what resources are shared between the parent and child processes. rfork(2) is originally a Plan9 function but is also found on FreeBSD.

The clone(2) function provides a more fine-grained control over creating child processes on Linux and NetBSD. Additionally, on Linux it can be used to create threads and to apply Linux namespaces to the child process.

The standard C library also provides a high-level system(3) function that runs the specified process via shell and waits for its completion. POSIX provides popen(3) that can be used to start a process via shell and either pipe input to it, or to pipe its output to the parent (but not both directions simultaneously).

Forks and the ptrace(2) API

The ptrace(2) API of Linux, FreeBSD and NetBSD follow the same principles regarding following child processes. By default, forks are ignored and children are not traced. To enable tracing them, tracing of the appropriate events needs to be enabled. Table 1 summarizes the events corresponding to individual function calls on the discussed platforms.

Table 1. Events associated with function calls Function Linux FreeBSD NetBSD fork(2) fork fork fork vfork(2) vfork + vfork_done vfork + vfork_done vfork + vfork_done clone(2) fork or (vfork + vfork_done) or clone n/a fork or (vfork + vfork_done) rfork(2) n/a fork or (fork + vfork_done) n/a posix_spawn(3) vfork + exec + vfork_done vfork + exec + vfork_done posix_spawn

All platforms report fork(2) and vfork(2) consistently. Due to its specific semantics, vfork(2) is reported as two events, indicating respectively call to vfork(2) and resuming the main program after child execs or exits (i.e. the memory sharing stops).

clone(2) and rfork(2) are platform-specific functions that are used to implement more flexible process spawning semantics. Depending on the flags passed to them, they can issue the same events as fork or as vfork. On Linux, there is additionally a clone event corresponding to a clone(2) call that does not correspond to either semantics.

posix_spawn(3) triggers events depending on the implementation. On NetBSD, it is reported as a single event. On the two other platforms, it is reported as the underlying vfork(2) and exec(3) calls.

To start tracing specific events, the debugger needs to issue an appropriate ptrace(2) request, that is PTRACE_SETOPTIONS on Linux or PT_SET_EVENT_MASK on FreeBSD and NetBSD. Table 2 summarizes the parameter values corresponding to the events defined in table 1. Note that on Linux child processes inherit the options from parent, while on FreeBSD and NetBSD they do not inherit the event mask.

Table 2. Event mask setting Property Linux FreeBSD NetBSD ptrace(2) request PTRACE_SETOPTIONS PT_SET_EVENT_MASK PT_SET_EVENT_MASK inherited? yes no no Event constants fork PTRACE_O_TRACEFORK PTRACE_FORK PTRACE_FORK vfork PTRACE_O_TRACEVFORK PTRACE_VFORK PTRACE_VFORK vfork_done PTRACE_O_TRACEVFORKDONE PTRACE_VFORK_DONE clone PTRACE_O_TRACECLONE n/a (via other events) exec PTRACE_O_TRACEEXEC PTRACE_EXEC always traced posix_spawn (via other events) (via other events) PTRACE_POSIX_SPAWN

Once event tracing is enabled, the system will stop both the parent process and the child process when one of the events occur. The tracing will automatically be enabled on the child. The debugger can catch the appropriate events via wait(2)-ing on the processes.

On Linux and FreeBSD, the parent process will be reported as stopped by SIGTRAP, while the child process will be stopped by SIGSTOP. The order in which these events are reported is undefined, i.e. the new child can be reported before the fork is reported for its parent.

On NetBSD, both the parent and the child process will be stopped by SIGTRAP with the same associated event and the PIDs corresponding to the other process.

The method of determining the exact event is entirely platform-dependent.

On Linux, the event is embedded in the status code returned by wait(2). For fork-related events, the ptrace(2) request PTRACE_GETEVENTMSG can be used to obtain the PID of the new child.

On FreeBSD, the PT_LWPINFO request can be used to obtain detailed information about the current event. The pl_flags field of the returned structure indicates the event type, while pl_child_pid contains the PID of the new child.

On NetBSD, the PT_GET_SIGINFO request can be used to obtain information about the stop reason. The si_code field of the psi_siginfo member indicates the trap type: TRAP_CHLD for fork-related events, TRAP_EXEC for exec(3). For the former, an additional PT_GET_PROCESS_STATE request provides the exact fork type (pe_report_event field) and the PID of other process (child in parent’s report, parent in child’s report, pe_other_pid field).

The following program demonstrates recursively forking a process and observing the events via the ptrace(2) API:

#if defined(__linux__)
#define _GNU_SOURCE
#define SIGDESC sigabbrev_np
#else
#define SIGDESC strsignal
#endif

#include <sys/types.h>
#include <sys/ptrace.h>
#include <sys/wait.h>

#include <assert.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const int fork_levels = 3;

int main() {
    int ret;
    pid_t pid = fork();
    assert(pid != -1);

    if (pid == 0) {
        /* child -- debugged program */
        /* request tracing */
#if !defined(__linux__)
        ret = ptrace(PT_TRACE_ME, 0, NULL, 0);
#else
        ret = ptrace(PTRACE_TRACEME, 0, NULL, 0);
#endif
        assert(ret != -1);
        ret = raise(SIGSTOP);
        assert(ret != -1);

        /* fork a few times, this will cause nested forking */
        int forks = 0;
        int i;
        for (i = 0; i < fork_levels; ++i) {
            pid_t fret = fork();
            assert(fret != -1);
            if (fret != 0) {
                printf("[INF] parent forked: %d -> %d\n",
                       getpid(), fret);
                ++forks;
            } else
                printf("[INF] child started: %d -> %d\n",
                       getppid(), getpid()); 
        }
        while (1) {
            pid_t wret = wait(&ret);
            if (wret == -1)
                break;
            assert(WIFEXITED(ret));
            assert(WEXITSTATUS(ret) == 0);
            printf("[INF] wait: %d -> %d\n", getpid(), wret);
        }

        exit(0);
    }

    /* parent -- the debugger */
    printf("[DEB] started inferior, pid=%d\n", pid);
    while (1) {
        pid_t waited = waitpid(-1, &ret, 0);

        if (WIFEXITED(ret)) {
            printf("[DEB] exit: pid=%d, ret=%d\n",
                   waited, WEXITSTATUS(ret));
            if (waited == pid)
                break;
        } else if (WIFSIGNALED(ret)) {
            printf("[DEB] exit: pid=%d, sig=%d (%s)\n",
                   waited, WTERMSIG(ret),
                   SIGDESC(WTERMSIG(ret)));
        } else if (WIFSTOPPED(ret)) {
            int sig = WSTOPSIG(ret);
            if (WSTOPSIG(ret) == SIGTRAP) {
                pid_t child_pid;
#if defined(__linux__)
                unsigned long msg;
                assert(ret>>8 == SIGTRAP | PTRACE_EVENT_FORK<<8);
                ret = ptrace(PT_GETEVENTMSG, waited, 0, &msg);
                assert(ret == 0);
                child_pid = msg;
#elif defined(__FreeBSD__)
                struct ptrace_lwpinfo info;
                ret = ptrace(PT_LWPINFO, waited,
                             (void*)&info, sizeof(info));
                assert(ret == 0);
                assert(info.pl_flags & PL_FLAG_FORKED);
                child_pid = info.pl_child_pid;
#else
                struct ptrace_siginfo siginfo;
                struct ptrace_state info;
                ret = ptrace(PT_GET_SIGINFO, waited,
                             &siginfo, sizeof(siginfo));
                assert(ret == 0);
                assert(siginfo.psi_siginfo.si_code == TRAP_CHLD);
                ret = ptrace(PT_GET_PROCESS_STATE, waited,
                             &info, sizeof(info));
                assert(ret == 0);
                assert(info.pe_report_event == PTRACE_FORK);
                child_pid = info.pe_other_pid;
#endif
                printf("[DEB] stopped: pid=%d, sig=%d (%s), "
                       "child_pid=%d\n",
                       waited, sig, SIGDESC(sig), child_pid);
                sig = 0;
            }
            else
                printf("[DEB] stopped: pid=%d, sig=%d (%s)\n",
                       waited, sig, SIGDESC(sig));

            if (WSTOPSIG(ret) == SIGSTOP) {
#if defined(__linux__)
                ret = ptrace(PTRACE_SETOPTIONS, waited,
                             0, PTRACE_O_TRACEFORK);
#else
                int event_mask = PTRACE_FORK;
                ret = ptrace(PT_SET_EVENT_MASK, waited,
                             (void*)&event_mask,
                             sizeof(event_mask));
#endif
                assert(ret == 0);
                sig = 0;
            }

            ret = ptrace(PT_CONTINUE, waited, (void*)1, sig);
            assert(ret == 0);
        }
    }

    return 0;
}

Fork support in the debugger

GDB fork support

GDB supports two models of working with forking processes: the simpler follow-fork-mode model and the more complex detach-on-fork off model.

The follow-fork-mode model assumes that the debugger traces only one process at a time and therefore requires only minimal changes to the protocol. Whenever the debugged process forks, the debugger either continues following the parent or switches to debugging the fork, detaching the other process.

The detach-on-fork off model provides the ability to debug multiple processes. Whenever the process forks, both the parent and the child are being traced. This implies that the debugger needs to provide a user interface capable of tracing multiple processes.

Our current goal was to implement the simpler follow-fork-mode model in LLDB. In this mode, the client sets a boolean variable (called follow-fork-mode in GDB) that controls whether the user wishes to follow the parent or the child process on fork. The client and server need only minimal awareness of multiple processes — during the time that both processed are waited pending detaching either the parent or the child.

follow-fork-mode in the GDB protocol

Follow-Fork-Mode packets

Support for forks in the GDB protocol is built on three protocol extensions:

The multiprocess extension that permits passing a process identifier along with the thread identifier in various protocol packets. This makes it possible to identify multiple processes and select between them with minimal changes to the protocol. Additionally, the extension enables passing PID to the detach packet.
The fork-events extension that enables the server to report stopping program due to fork(2), and expect the client to take appropriate action.
The vfork-events extension that enables the server to report stopping program due to vfork(2).

The supported extensions are reported by the client and the server during the early qSupported packet exchange. For a particular extension to be actually used, both the client and the server must report the support for it.

If the appropriate event type is determined to be supported and the process forks, it is stopped and the server sends an appropriate stop reason packet to the client. The client decides whether the parent process or the child process should be traced from now on, then makes appropriate preparations and finally requests detaching the other process and resuming the traced process.

Software breakpoints and forks

As a consequence of fork, the child process inherits a copy of the parent’s virtual memory. If the debugger has made any changes to the memory — for example through inserting software traps — these changes also propagate to the child process. This means that the child process' code is incorrect when run outside the debugger, and the program may misbehave or crash upon hitting the breakpoint site.

Therefore, whenever the process forks, the debugger needs to undo memory changes (i.e. remove software breakpoints) from the process that is about to be detached.

The same problem applies to vfork(2), although another solution needs to be applied since the memory is shared between the parent and the child, and so are software breakpoints. The ptrace(2) API explicitly splits vfork(2) notification into two events, called vfork and vfork-done. When the child process is first created, the vfork event is emitted and the debugger temporarily removes software breakpoints from the shared memory. When the child process executes another executable (or exits) and stops sharing memory with the parent, the vfork-done event is emitted and the debugger restores software breakpoints in the parent memory.

The following snippet presents a fragment of gdb-remote protocol exchange from a fork event:

// server reports that fork created a new process, PID 0x259365
putpkt ("$T05fork:p259365.259365;06:0*,;07:70baf*"7f0* ;10:35c5b3f7ff7f0* ;thread:p259321.259321;core:4;#5e"); [noack mode]
// client selects the child process
getpkt ("Hgp259365.259365");  [no ack sent]
putpkt ("$OK#9a"); [noack mode]
// client requests removing inherited software breakpoints
getpkt ("z0,401080,1");  [no ack sent]
putpkt ("$OK#9a"); [noack mode]
// client reselects the parent process
getpkt ("Hgp259321.259321");  [no ack sent]
putpkt ("$OK#9a"); [noack mode]
// client requests detaching the new child
getpkt ("D;259365");  [no ack sent]
Detaching from process 2462565
putpkt ("$OK#9a"); [noack mode]
// client requests resuming the parent
getpkt ("vCont;c:p259321.-1");  [no ack sent]

Hardware breakpoints and watchpoints

The debug registers used to enable hardware breakpoints and watchpoints are generally not inherited by the child processes (and since processes keep separate register sets, they are not shared during vfork(2)). FreeBSD used to be an exception to this but the behavior was changed to match other platforms. Therefore, if the debugger intends to detach the newly forked child, it needs not to take any action regarding hardware breakpoints or watchpoints.

If the debugger intends to detach the parent and trace the child, it explicitly needs to disable hardware breakpoints and watchpoints in the parent process, and reenable them in the child process. Technically, this could be done via copying the debug registers and clearing them in the parent. However, since the GDB protocol encapsulates the distinction between following the parent and the child entirely in the client, the operation is done via explicit breakpoint and watchpoint packets.

lldb-server implementation

The biggest challenge in our work was that LLDB was originally designed to handle a single traced process. Our goal was to make it capable of dealing with multiple processes, at least for the limited time necessary to deal with a fork or a vfork, while limiting the necessary changes to the bare minimum.

As far as possible, we tried to split our work into small functional changes that can be reviewed and pushed separately. We also tried to split it between the three affected logical parts of LLDB, namely:

The process plugins. By design, each process plugin uses a single class instance to represent a single process. This component was mostly ready for multiprocess support — we just had to extend it to be able to create additional instances of process classes for forked processes.
The gdb-remote server. The server has been designed to hold a single process instance. We had to extend it to maintain multiple instances and be able to select between them.
The gdb-remote client. The client has been designed with little explicit process awareness. It has treated the client-server communications as a flat stream dealing with the currently selected process and thread, with explicit methods to switch between the latter. We had to enable switching processes as well. However, due to the fundamental character of our work, we could get away with limiting the multiprocess awareness to directly dealing with forks, and reselecting a single process on return.

In the end, we’ve split our work into the following logical changes:

adding support for reading thread-ids in the format defined by the multiprocess extension (to server and client)
adding support for tracing fork(2), vfork(2) and clone(2) events and handling them inside the process plugin (without exposing them to the server or client)
refactoring qSupported packet handlers to provide better framework for checking feature support (in client and server)
abstracting away the accesses to the current process in the server, to pave way for multiple processes support
adding support for owning multiple processes in the server, and being able to detach a specific process
adding a new API to indicate support for and enable extended features in the process plugin, to be used to enable reporting forks and vforks
adding new stop reasons for fork, vfork and vforkdone events
adding client support to detach a specific PID
adding initial fork/vfork handlers to the client (detaching the child process and implicitly resuming the parent)
fixing lldb-server main loop to permit defining multiple handlers for a single signal, necessary to permit multiple process instances handling SIGCHLD
enabling the process plugin to actually report fork, vfork and vforkdone events to the server — therefore enabling the handler added above
adding support for selecting a specific process (along with thread) in the server and client code
extending the fork/vfork handlers in the client to handle software breakpoints
adding target.process.follow-fork-mode control variable and extending the fork/vfork handlers to support detaching parent and tracing the child

List of relevant commits

Changes merged upstream

Changes pending review

Summary and future plans

We are really proud that we managed to design our changes in such a way that adding final features — support for dealing with software breakpoints, then following child processes — could be done entirely in the client, without having to change anything more in the server or process plugin.

We have started working on the next milestone, that is improving core dump support. Within this milestone, we are planning to:

Enhance the FreeBSD kernel through adding a new ptrace(2) request to dump core of a running process without crashing it.
Enhance LLDB to support requesting a core dump via ptrace(2) on FreeBSD and NetBSD.
Create a utility script to remove irrelevant data from core dumps, reducing their size.
Add additional tests for LLDB’s support for core dumps, along with test dumps from FreeBSD, Linux and NetBSD.
Address any core dump handling bugs that are discovered.

LLDB support for fork(2) and vfork(2)

LLDB support for fork(2) and vfork(2)

Starting subprocesses on Unix derivatives

fork(2), vfork(2), exec(3) and posix_spawn(3)

Other process-related functions

Forks and the ptrace(2) API

Fork support in the debugger

GDB fork support

follow-fork-mode in the GDB protocol

Software breakpoints and forks

Hardware breakpoints and watchpoints

lldb-server implementation

List of relevant commits

Changes merged upstream

Changes pending review

Summary and future plans

Recommend

一杯咖啡和千万美元：Coinbase与Amber Group背后的故事

10天交易额突破 38.8亿美金的，YFX.COM DeFi衍生品赛道崛起的Uniswap

Unifying the CUDA Python Ecosystem

My Go Executable Files Are Still Getting Larger (What's New in 2021 and Go 1.16)

Protoforce - API | Product Hunt

Tesla settles lawsuit against engineer who it claims stole Autopilot source code...

欧洲央行发布关于数字欧元征询结果

Single board computers?

OnePlus has updated these phones to Android 11

斯诺登开启NFT拍卖，收益将捐给新闻自由基金会

About Joyk