21

Improving C++ Builds with Split DWARF

 5 years ago
source link: https://www.tuicool.com/articles/hit/2UR7RfE
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Large- and medium-sized C++ projects often suffer from long build times. We can distinguish these two scenarios:

  • Scratch build

    After pulling in the latest changes from upstream, or after a major refactoring that affects central headers, a lot of source files need to be rebuilt. This can take a long time. To a large extent this is caused by the insufficient module concept of C++: Each source includes a lot of headers, so after preprocessing, there can be thousands or even millions of lines of C++ code the compiler has to process. Therefore, each source file will take seconds to compile, and a large application can have thousands of source files. Compile clusters can speed things up by distributing the compile jobs across multiple machines.
  • Incremental build

    Most builds are incremental. You already have done a full build, then you make a small change to one file to fix a bug or do an enhancement, and you build and test it. Such an incremental build compiles only a handful of source files and then links the libraries and the application. This is much less work than a scratch make, but since you are doing it many times a day, it is even more critical to get it fast. Compared to scripting languages, the turnaround time for making a change and testing it is quite long for C++.

In this article we will be discussing a great way of speeding up incremental builds which also benefits scratch makes. The goal is to increase developer productivity: Shorter turnaround times allow for quicker iterations. You don’t have to switch to other tasks while the build runs, but can keep your focus on the problem at hand.

Prerequisites

  • Your incremental build should be “minimal”: No unneeded steps should be performed when invoking the build. A repeated build without any changes should not perform any actions at all. Sometimes this is not easy to achieve, and if the redundant actions only take a short time it is tolerable. But a large overhead here means your developers are already wasting time.
  • You should have a somewhat recent toolchain supporting split DWARF (introduced below) across the board. The versions below are the absolute minimum. More recent versions are recommended, especially for gdb.
    gcc >= 4.8
    clang >= 3.3
    gdb >= 7.7
    binutils >= 2.24
    

What Makes Incremental Builds Slow?

For an incremental build, only a few source files have to be recompiled. Most of the time is spent linking the application. And here we can find lots of overhead:

  • The libraries or executables containing the changed source files need to be rebuilt. This means creating them from scratch. All the contained objects need to be read again, even if unchanged, then processed by the linker, and the new binary must be written to disk. 1
  • All other binaries which are depending on binaries that were rebuilt must also be relinked. Although a smarter approach seems possible, in most build systems this means recreating these binaries from scratch.

Note that the linker also needs to process all debug information contained in the object files. Duplicate information gets removed, and the merged debug information is written to the generated binary. It gets duplicated on disk, since it is already contained in the object files. And debug information tends to be very large :

In a large C++ application compiled with -O2 and -g, the debug information accounts for 87% of the total size of the object files sent as inputs to the link step, and 84% of the total size of the output binary.

So a large bottleneck for an incremental build is processing of debug info. Ironically, debug info is most important when analyzing and fixing bugs, during which you are doing lots of incremental builds! For release builds without debug info, linking can be surprisingly fast, and sometimes developers working on large projects use them as a last resort.

Introducing Split DWARF

Linking, and therefore incremental builds, could be much faster if the linker didn’t have to process all the debug information. Split DWARF² makes this possible: It generates a separate file for the debug info which the linker can ignore. This file has the suffix .dwo ( DW ARF o bject file). DWARF is a debugging file format generally used on Unix. It is the default on most Linux distributions, the only special thing here is that the DWARF info is split from the code.

The binaries generated  by the linker will not contain debug information, but references to the .dwo files that are already on disk. Let’s examine how this works in detail:

#include <iostream>
 
int main()
{
  int a = 1;
  std::cout << "Split DWARF test" << std::endl;
 
  return 0;
}

We compile this simple program in two ways, with and without split DWARF. First, compiling with debug information only ( -g ):

$ g++ -c -g main.cpp -o main.o
$ g++ main.o -o app

Now we also enable split DWARF by adding -gsplit-dwarf to the compiler invocation:

$ g++ -c -g -gsplit-dwarf main.cpp -o main_splitdwarf.o
$ g++ main_splitdwarf.o -o app_splitdwarf

The program is not interesting here, but let’s take a look at the files generated:

-rwxrwxr-x 1 prodcpp prodcpp 20256 Oct  7 23:39 app*
-rwxrwxr-x 1 prodcpp prodcpp 12728 Oct  7 23:39 app_splitdwarf*
-rw-r--r-- 1 prodcpp prodcpp   110 Oct  7 22:36 main.cpp
-rw-rw-r-- 1 prodcpp prodcpp 22112 Oct  7 23:39 main.o
-rw-rw-r-- 1 prodcpp prodcpp 12296 Oct  7 23:39 main_splitdwarf.dwo
-rw-rw-r-- 1 prodcpp prodcpp  6968 Oct  7 23:39 main_splitdwarf.o

No surprises for the regular build, which produces main.o and app . The split DWARF compilation creates two files, main_splitdwarf.o and main_splitdwarf.dwo . app_splitdwarf takes up only 12728 bytes, in contrast to app , which is 20224 bytes. The reason is that it references the debug info, instead of containing it:

$ readelf -wi app_splitdwarf | grep dwo
    <20>   DW_AT_GNU_dwo_name: (indirect string, offset: 0x0): main_splitdwarf.dwo

That reference is already present in the object file, so all the linker had to do with regards to debugging information is copying that reference:

$ readelf -wi main_splitdwarf.o | grep dwo
    <20>   DW_AT_GNU_dwo_name: (indirect string, offset: 0x0): main_splitdwarf.dwo
    <2c>   DW_AT_GNU_dwo_id  : 0xae0d75cbd6671bc1

This also means you need to keep the .dwo files as long as you want to debug your application.

Although I couldn’t get gdb to trace loading of .dwo files, you can see via strace that it pulls them in:

$ strace -o log gdb --batch-silent --eval-command=quit app_splitdwarf
$ grep dwo log
stat("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", {st_mode=S_IFREG|0664, st_size=12256, ...}) = 0
open("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", O_RDONLY|O_CLOEXEC) = 8
lstat("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", {st_mode=S_IFREG|0664, st_size=12256, ...}) = 0

A Real-Life Example: llvm

In the previous toy example, gains are minimal and speedup for incremental builds would be non-existent since we only have one source file. So let’s take a look at a real application and perform some measurements to gauge the benefits of the split DWARF approach.

We will be building llvm 7.0.0 with and without split DWARF. llvm in it’s latest incarnation is a rather large C++ project, clocking in at 22838 C/C++ files. On top of that, the clang compiler is linked statically against the llvm libraries, so a lot of work has to be redone even if only one file changes.

First, let’s do a scratch build. I’m using a clone of the git monorepo with the tag RELEASE_700/final checked out. The root of the cmake project is in the llvm directory. To also build all the other projects, I have symlinked them to the root as follows:

$ pwd
/h/sources/llvm-project-20170507
$ cd llvm/tools
$ ln -s ../../lld lld
$ ln -s ../../lldb lldb
$ ln -s ../../clang clang
$ cd ../projects
$ ln -s ../../compiler-rt compiler-rt

First, let’s use the defaults, which is a Debug build without split DWARF.

$ mkdir llvm
$ cd llvm
$ cmake /h/sources/llvm-project-20170507/llvm/
$ /usr/bin/time -v make -j 80
        Percent of CPU this job got: 5072%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 13:00.54
        ...
        Maximum resident set size (kbytes): 11222228
$ du -shL llvm
55G

Now, with split DWARF:

$ mkdir llvm_sd
$ cd llvm_sd
$ cmake /h/sources/llvm-project-20170507/llvm/ -DLLVM_USE_SPLIT_DWARF=ON 
$ /usr/bin/time -v make -j 80
        Percent of CPU this job got: 5939%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 11:01.42
        ...
        Maximum resident set size (kbytes): 4940236
$ du -shL .
36G

Let’s look at the numbers:

.dwo

These improvements are nice, considering the low effort needed to obtain them. But what about an incremental build? Let’s change one file, and then rebuild clang. First, without split DWARF:

$ echo "//" >>./llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
$ /usr/bin/time -v make 80 clang
        Percent of CPU this job got: 150%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:16.90
        ...
        Maximum resident set size (kbytes): 11222232        

With split DWARF:

$ echo "//" >>./llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
$ /usr/bin/time -v make -j 80 clang
        Percent of CPU this job got: 195%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:42.74
        ...
        Maximum resident set size (kbytes): 4940236

We get a roughly 90% speedup in elapsed time. Resident set size shows the same behavior as above, which makes sense considering that clang is probably the largest executable in llvm, linking in all needed llvm libraries statically.

All in all, split DWARF is a huge win for development workflows. At the cost of adding a flag, you get significant improvements for everybody building the code base.

Packaging a Release from a Split DWARF Build

While split DWARF is great for developers, it doesn’t come in so handy for building a release that needs to work on another machine. The debug info is spread over many files, and the dwo references stored in the binaries will expose all your source file names and hierarchy. To solve this, a new tool called dwp was added to binutils . It operates on an executable or shared library and produces a .dwp file with all relevant info to debug that file. gdb in turn will look for dwp files and load debug info from them.

Continuing our example:

$ dwp -e app_splitdwarf
$ ll app_*
-rwxrwxr-x 1 prodcpp prodcpp 20256 Oct  7 23:39 app*
-rwxrwxr-x 1 prodcpp prodcpp 12728 Oct  7 23:39 app_splitdwarf*
-rw-rw-r-- 1 prodcpp prodcpp 12440 Oct  7 23:41 app_splitdwarf.dwp

We now have a new file app_splitdwarf.dwp containing all debug info we need. We can now delete the .dwo file. Let’s verify that debugging still works afterwards:

$ rm *dwo
$ gdb app_splitdwarf
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
...
Reading symbols from app_splitdwarf...done.
(gdb) b main
Breakpoint 1 at 0x40084e: file main.cpp, line 5.
(gdb) r
Starting program: app_splitdwarf
 
Breakpoint 1, main () at main.cpp:5
5         int a = 1;
(gdb) p a
$1 = 0

The variable can be printed, so debug information is available. Without the .dwp file you will get a warning as follows:

$ rm *dwp
$ gdb app_splitdwarf
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
...
Reading symbols from app_splitdwarf...
warning: Could not find DWO CU main_splitdwarf.dwo(0xae0d75cbd6671bc1) referenced by CU at offset 0x0 [in module app_splitdwarf]
done.

You will get the same warning when removing the .dwo file (provided there is no .dwp file either).

That wraps up our discussion of split DWARF. Hopefully you can make use of it in your projects and reduce your build times!

Limitations

Split DWARF is not used by that many projects, so some friction with tooling is possible. If you encounter any problems, please let me know in the comments.

  • icecream supports split DWARF, distcc doesn’t. I have tested neither. In a build cluster you need to ship two files as a result of the compilation. Other than that, there is only one piece of information that needs to be adjusted: The reference to the .dwo file encoded in the object file. To make it fit to the node running the build, the compiler options -fdebug-map-prefix ( gcc ) or -fdebug-compilation-dir ( clang ) can be used.
  • clang 7.0.0 recently implemented partial support for DWARF5, which does not support split DWARF yet. But it is not the default.

Notes

1 Incremental linking is another solution to this problem, but not discussed here. In my experience it does not work as reliably as the split DWARF approach.

² Debug Fission is another name for this technique.

References

DebugFission – DWARF Extensions for Separate Debug Information Files

DWP tool


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK