11

"Rust does not have a stable ABI"

 4 years ago
source link: https://people.gnome.org/~federico/blog/rust-stable-abi.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

"Rust does not have a stable ABI"

Translations: es

Wednesday 12/August/2020 - Tags: gnome, rust

I've seen GNOME people (often, people who have been working for a long time on C libraries) express concerns along the following lines:

  1. Compiled Rust code doesn't have a stable ABI (application binary interface).
  2. So, we can't have shared libraries in the traditional fashion of Linux distributions.
  3. Also Rust bundles its entire standard library with every binary it compiles, which makes Rust-built libraries huge.

These are extremely valid concerns to be addressed by people like myself who propose that chunks of infrastructural libraries should be done in Rust.

So, let's begin.

The first part of this article is a super-quick introduction to shared libraries and how Linux distributions use them. If you already know those things, feel free to skip to the "Rust does not have a stable ABI" section.

How do distributions use shared libraries?

If several programs run at the same time and use the same shared library (say, libgtk-3.so), the operating system can load a single copy of the library in memory and share the read-only parts of the code/data through the magic of virtual memory.

In theory, if a library gets a bugfix but does not change its interface, one can just recompile the library, stick the new .so in /usr/lib or whatever, and be done with it. Programs that depend on the library do not need to be recompiled.

If libraries limit their public interface to a plain C ABI (application binary interface), they are relatively easy to consume from other programming languages. Those languages don't have to deal with name mangling of C++ symbols, exception handlers, constructors, and all that complexity. Pretty much every language has some form of C FFI (foreign function interface), which roughly means "call C functions without too much trouble".

For the purposes of a library, what's an ABI? Wikipedia says, "An ABI defines how data structures or computational routines are accessed in machine code [...] A common aspect of an ABI is the calling convention", which means that to call a function in machine code you need to frob the call and stack pointers, pass some function arguments in registers or push some others to the stack, etc. Really low-level stuff. Each machine architecture or operating system usually defines a C standard ABI.

For libraries, we commonly understand an ABI to mean the machine-code implications of their programming interface. Which functions are available as public symbols in the .so file? To which numeric values do C enum values correspond, so that they can be passed to those functions? What is the exact order and type of arguments that the functions take? What are the struct sizes, and the order and types and padding of the fields that those functions take? Does one pass arguments in CPU registers or on the stack? Does the caller or the callee clean up the stack after a function call?

Bug fixes and security fixes

Linux distributions generally try really hard to have a single version of each shared library installed in the system: a single libjpeg.so, a single libpng.so, a single libc.so, etc.

This is helpful when there needs to be an update to fix a bug, security-related or not: users can just download the updated package for the library, which when installed will just stick in a new .so in the right place, and the calling software won't need to be updated.

This is possible only if the bug really only changes the internal code without changing behavior or interface. If a bug fix requires part of the public API or ABI to change, then you are screwed; all calling software needs to be recompiled. "Irresponsible" library authors either learn really fast when distros complain loudly about this sort of change, or they don't learn and get forever marked by distros as "that irresponsible library" which always requires special handling in order not to break other software.

Sidenote: sometimes it's more complicated. Poppler (the PDF rendering library) ships at least two stable APIs, one Glib-based in C, and one Qt-based in C++. However, some software like texlive uses Poppler's internals library directly, which of course does not have a stable API, and thus texlive breaks frequently as Poppler evolves. Someone should extend the public, stable API so that texlive doesn't have to use the library's internals!

Bundled libraries

Sometimes it is not irresponsible authors of libraries, but rather that people who use the libraries find out that over time the behavior of the library changes subtly, maybe without breaking the API or ABI, and they are better off bundling a specific version of the library with their software. That version is what they test their software against, and they try to learn its quirks.

Distros inevitably complain about this, and either patch the calling software by hand to force it to use the system's shared library, or succeed in getting patches accepted by the software so that they have a --use-system-libjpeg option or similar.

This doesn't work very well if the bundled version of the library has extra patches that are not in a distro's usual patches. Or vice-versa; it may actually work better to use the distro's version of the library, if it has extra fixes that the bundled library doesn't. Who knows! It's a case-by-case situation.

Rust does not have a stable ABI

By default indeed it doesn't, because the compiler team wants to have the freedom to change the data layout and Rust-to-Rust calling conventions, often for performance reasons, at any time. For example, it is not guaranteed that struct fields will be laid out in memory in the same order as they are written in the code:

struct Foo {
    bar: bool,
    baz: f64,
    beep: bool,
    qux: i32,
}

The compiler is free to rearrange the struct fields in memory as it sees fit. Maybe it decides to put the two bool fields next to each other to save on inter-field padding due to alignment requirements; maybe it does static analysis or profile-guided optimizations and picks an optmal ordering.

But we can override this! Let's look at data layout first, and then calling conventions.

Data layout for C versus Rust

The following is the same struct as above, but with an extra #[repr(C)] attribute:

#[repr(C)]
struct Foo {
    bar: bool,
    baz: f64,
    beep: bool,
    qux: i32,
}

With that attribute, the struct will be laid out just as this C struct:

#include <stdbool.h>
#include <stdint.h>

struct Foo {
    bool bar;
    double baz;
    bool beep;
    int32_t qux;
}

(Aside: it is unfortunate that gboolean is not bool, but that's because gboolean predates C99, and clearly standards from 20 years ago are too new to use. (Aside aside: since I wrote that other post, Rust's repr(C) for bool is actually defined as C99's bool; it's no longer undefined.))

Even Rust's data-carrying enums can be laid out in a manner friendly to C and C++:

#[repr(C, u8)]
enum MyEnum {
    A(u32),
    B(f32, bool),
}

This means, use C layout, and a u8 for the enum's discriminant. It will be laid out like this:

#include <stdbool.h>
#include <stdint.h>

enum MyEnumTag {
        A,
        B
};

typedef uint32_t MyEnumPayloadA;

typedef struct {
        float x;
        bool y;
} MyEnumPayloadB;

typedef union {
        MyEnumPayloadA a;
        MyEnumPayloadB b;
} MyEnumPayload;

typedef struct {
        uint8_t tag;
        MyEnumPayload payload;
} MyEnum;

The gory details of data layout are in the Alternative Representations section of the Rustonomicon and the Unsafe Code Guidelines.

Calling conventions

An ABI's calling conventions detail things like how to call functions in machine code, and how to lay out function arguments in registers or the stack. The wikipedia page on X86 calling conventions has a good cheat-sheet, useful when you are looking at assembly code and registers in a low-level debugger.

I've already written about how it is possible to write Rust code to export functions callable from C; one uses the extern "C" in the function definition and a #[no_mangle] attribute to keep the symbol name pristine. This is how librsvg is able to have the following:

#[no_mangle]
pub unsafe extern "C" fn rsvg_handle_new_from_file(
    filename: *const libc::c_char,
    error: *mut *mut glib_sys::GError,
) -> *const RsvgHandle {
    // ...
}

Which compiles to what a C compiler would produce for this:

RsvgHandle *rsvg_handle_new_from_file (const gchar *filename, GError **error);

(Aside: librsvg still uses an intermediate C library full of stubs that just call the Rust-exported functions, but there is now tooling to produce a .so directly from Rust which I just haven't had time to investigate. Help is appreciated!)

Summary of ABI so far

It is one's decision to export a stable C ABI from a Rust library. There is some awkwardness in how types are laid out in C, because the Rust type system is richer, but things can be made to work well with a little thought. Certainly no more thought than the burden of designing and maintaining a stable API/ABI in plain C.

I'll fold the second concern into here — "we can't have shared libraries in traditional distro fashion". Yes, we can, API/ABI-wise, but read on.

Rust bundles its entire standard library with Rust-built .so's

I.e. it statically links all the Rust dependencies. This produces a large .so:

  • librsvg-2.so (version 2.40.21, C only) - 1408840 bytes
  • librsvg-2.so (version 2.49.3, Rust only) - 9899120 bytes

Holy crap! What's all that?

(And I'm cheating: this is both with link-time optimization turned on, and by running strip(1) on the .so. If you just autogen.sh && make it will be bigger.)

This has Rust's standard library statically linked (or at least the bits of that librsvg actually uses), plus all the Rust dependencies (cssparser, selectors, nalgebra, glib-rs, cairo-rs, locale_config, rayon, xml5ever, and an assload of crates). I could explain why each one is needed:

  • cssparser - librsvg needs to parse CSS.
  • selectors - librsvg needs to run the CSS selector matching algorithm.
  • nalgebra - the code for SVG filter effects uses vectors and matrices.
  • glib-rs, cairo-rs - draw to Cairo and export GObject types.
  • locale_config - so that localized SVG files can work.
  • rayon - so filters can use all your CPU cores instead of processing one pixel at a time.
  • Etcetera. SVG is big and requires a lot of helper code!

Is this a problem?

Or more exactly, why does this happen, and why do people perceive it as a problem?

Stable APIs/ABIs and distros

Many Linux distributions have worked really hard to ensure that there is a single copy of "system libraries" in an installation. There is Just One Copy of /usr/lib/libc.so, /usr/lib/libjpeg.so, etc., and packages are compiled with special options to tell them to really use the sytem libraries instead of their bundled versions, or patched to do so if they don't provide build-time options for that.

In a way, this works well for distros:

  • A bug in a library can be fixed in a single place, and all applications that use it get the fix automatically.

  • A security bug can be patched in a single place, and in theory applications don't need to be audited further.

If you maintain a library that is shipped in Linux distros, and you break the ABI, you'll get complaints from distros very quickly.

This is good because it creates responsible maintainers for libraries that can be depended on. It's how Inkscape/GIMP can have a stable toolkit to be written in.

This is bad because it encourages stagnation in the long term. It's how we get a horrible, unsafe, error-prone API in libjpeg that can never ever be improved because it would requires changes in tons of software; it's why gboolean is still a 32-bit int after twenty-something years, even though everything else close to C has decided that booleans are 1 byte. It's how Inkscape/GIMP take many years to move from GTK2 to GTK3 (okay, that's lack of paid developers to do the grunt work, but it is enabled by having forever-stable APIs).

However, a long-term stable API/ABI has a lot of value. It is why the Windows API is the crown jewels; it is why people can rely on glib and glibc to not break their code for many years and take them for granted.

But we only have a single stable ABI anyway

And that is the C ABI. Even C++ libraries have trouble with this, and people sometimes write the internals of a library in C++ for convenience, but export a stable C API/ABI from it.

High level languages like Python have real trouble calling C++ code precisely because of ABI issues.

Actually, in GNOME we have gone further than that

In GNOME we have constructed a sweet little universe where GObject Introspection is basically a C ABI with a ton of machine-generated annotations to make it friendly to language bindings.

Still, we rely on a C ABI underneath. See this exploratory twitter thread on advancing the C ABI from Rust for lots of food for thought.

Single copies of libraries with a C ABI

Okay, let's go back to this. What price do we pay for single copies of libraries that, by necessity, must export a C ABI?

  • Code that can be conveniently called from C, maybe from C++, and moderately to very inconvently from ANYTHING ELSE. With most new application code being written definitely not in C, maybe we should reconsider our priorities here.

  • No language facilities like generics or field visibility, which are not even "modern language" features. Even C++ templates get compiled and statically linked into the calling code, because there's no way to pass information like the size of T in Array<T> across a C ABI. You wanted to make some struct fields public and some private? You are out of luck.

  • No knowledge of data ownership except by careful reading of the C function's documentation. Does the function free its arguments? How - with free() or g_free() or my_thing_free()? Or does the caller just lend it a reference? Can the data be copied bit-by-bit or must a special function be called to make a copy? GObject-Introspection carries this information in its annotations, while the C ABI has no idea and just ships raw pointers around.

More food for thought note: this twitter thread says this about the C++ ABI: "Also, the ABI matters for whether the actual level of practicality of complying with LGPL matches the level of practicality intended years ago when some project picked LGPL as its license. Of course, the standard does not talk about LGPL, either. LGPL has rather different implications for Rust and Go than it does for C and Java. It was obviously written with C in mind."

Monomorphization and template bloat

While C++ had the problem of "lots of template code in header files", Rust has the problem that monomorphization of generics creates a lot of compiled code. There are tricks to avoid this and they are all the decision of the library/crate author. Both share the root cause that templated or generic code must be recompiled for every specific use, and thus cannot live in a shared library.

Also, see this wonderful article on how different languages implement generics, and think that a plain C ABI means we have NOTHING of the sort.

Also, see How Swift Achieved Dynamic Linking Where Rust Couldn't for more food for thought. This is extremely roughly equivalent to GObject's boxed types; callers keep values on the heap but know the type layout via annotation magic, while the library's actual implementation is free to have the values on the stack or wherever for its own use.

Should all libraries export APIs with generics and exotic types?

You probably want something like a low-level array of values, Vec<T>, to be inlined everywhere and with code that knows the type of the vector's elements. Element accesses can be inlined to a single machine instruction in the best case.

But not everything requires this absolute raw performance with everything inlined everywhere. It is fine to pass references or pointers to things and do dynamic dispatch from a vtable if you are not in a super-tight loop, as we love to do in the GObject world.

Library sizes

I don't have a good answer to librsvg's compiled size. If gnome-shell merges my branch to rustify the CSS code, it will also grow its binary size by quite a bit.

It is my intention to have a Rust crate that both librsvg and gnome-shell share for their CSS styling needs, but right now I have no idea if this would be a shared library or just a normal Rust crate. Maybe it's possible to have a very general CSS library, and the application registers which properties it can parse and how? Is it possible to do this as a shared library without essentially reinventing libcroco? I don't know yet. We'll see.

A metaphor which I haven't fully explored

If every application or end-user package is kind of like a living organism, with its own cycles and behaviors and organs (dependent libraries) that make it possible...

Why do distros expect all the living organisms on your machine to share The World's Single Lungs Service, and The World's Single Stomach Service, and The World's Single Liver Service?

You know, instead of letting every organism have its own slightly different version of those organs, customized for it? We humans know how to do vaccination campaigns and everything; maybe we need better tools to apply bug fixes where they are needed?

I know this metaphor is extremely imperfect and not how things work in software, but it makes me wonder.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK