2

On Custom-Width Integer Types

 2 years ago
source link: https://alic.dev/blog/custom-bitwidth.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

On Custom-Width Integer Types

On Custom-Width Integer Types
May 2023

Before we start: by "custom-width integer types" I'm referring to a language feature that lets users specify integer types with any fixed number of bits (e.g. u7, u12) - instead of being provided a handful of integers that correspond to common xword sizes (e.g. u8,u16, u32,u64). The first time I've encountered this idea was in the Zig language, although it also exists in Clang & LLVM IR.

Now, Zig is an odd one. The language was founded on pure systems programming pragmatism - an antithesis of ivory-tower category theory shenanigans, and a place where Curry is just an earthy Indian spice. But time and time again, through pragmatic, real-world justifications, we witness a collision of these two worlds.

How come? It turns out that accurately modeling your program's intended behavior is in everyone's best interest. As long as this model is programatically enforced (e.g. by language semantics), we can get two big benefits: (1) we improve correctness and safety of our programs and (2) we open the door for aggressive compiler optimizations. And custom-width integer types are a perfect example of this.

A First Perspective

We can look at these integers as a limited form of integer ranges, where the range for uX ∈ [0, 2^X). Some languages like Ada have native support for arbitrary ranges:

type u7 is range 0..127;

But Ada goes a step further; you can also declare a subtyping relationship between u7 and Integer:

subtype u7 is Integer is range 0..127;

Creating subtypes that represent subsets of a supertype's possible values is called refinement. If we take a look at Zig, we can see that all custom integer sizes have a strict subtyping hierarchy, so they qualify as refinement types. To illustrate, take a look at this snippet.

pub fn foo(x: u10) u10 { return x; }
pub fn main() !void {
	var y: u9 = 0;
	_ = square(y);
}

The 9-bit integer gets implicitly promoted here and the code compiles without issues, which is exactly the kind of ergonomics you'd expect from refined types. Now, if we change foo to accept a u8, the compiler emits the following error:

note: unsigned 8-bit int cannot represent all possible unsigned 9-bit values

Great! Now, before we make use of this feature, let's take a look at how refinement can be achieved in other languages.

Emulating Refinement through Restricted Construction

To emulate refinement, most statically typed languages use the type system to restrict the construction and modification of objects - thereby encapsulating the refinement invariant in the object. Here's an example for even 64-bit integers:

pub struct EvenU64(u64);
impl EvenU64 {
    pub fn new(i: u64) -> Result<Self, Error> {
        match i % 2 == 0 {
            true => Ok(Self(i)),
            _ => Err(Error::InvalidArgument), 
        }
    }
}

You can find other useful examples in Rust's standard library. For this to work, your language of choice needs something akin to access modifiers (Java, C#, C++, Rust) or modules (ML-family), that allow you to restrict construction and modification of objects. Ironically this is not possible in Zig, because the language does not have a notion of private or immutable fields.

However, the encapsulation approach is still only an emulation of refinement, due to the lack of automatic subtyping. Let's instead take a look at languages that have first-class support for refinement types.

Refinement Through Predicates

The most common form of refinement typing comes in the form of logical predicates attached to types which are evaluated by an SMT solver. Note however that refinement types based on predicates come at a cost, and become undecidable when used with quantifiers like forall and exists; for the same reasons that type inference is undecidable in dependently typed languages. These are examples of boolean refinement types, written in F*:

let pos  = x:int { x > 0 }      //the positive numbers
let neg  = x:int { x < 0 }      //the negative numbers
let even = x:int { x % 2 = 0 }  //the even numbers
let odd  = x:int { x % 2 = 1 }  //the odd numbers

The idea is that we constrain the type to values for which the predicate on the right evaluates to true. Another great example of a language with first-class refinement types is LiquidHaskell:

{-@ type Nat   = {v:Int | 0 <= v}        @-}
{-@ type Even  = {v:Int | v mod 2 == 0 } @-}
{-@ type Lt100 = {v:Int | v < 100}       @-}

After declaring these refined types, you can apply them to types, variables or functions. You can even use them to specify preconditions and postconditions for your function arguments and return values, which are checked at compile time:

{-@ abs :: Int -> Nat @-}
abs           :: Int -> Int
abs n
  | 0 < n     = n
  | otherwise = 0 - n

It's easy to see that refinement types are a significant step up in terms of compile-time safety and enforcement of invariants. Now let's move on to some of the performance opportunities these ideas bring.

Memory Layout Optimizations

When writing performance-critical data structures, every bit of memory footprint counts. Especially for concurrent data structures, our critical bottlenecks often happen to be concentrated on a small number of integers (e.g. indices, RCU pointers, DCAS timestamps, hash keys, bitmaps, tails & heads, metadata, semaphores, etc...).

Let's take pointer compression for example. It's an optimization most often used for making buffer indices so small that they can (1) fit into fewer cache lines to improve locality and read latency or (2) can be read, modified and updated in large batches, especially for SIMD or atomic stores.

space-saving.png
Figure 1: By choosing an inappropriate integer size, we introduce lots of dead space between fields. In cases where memory footprint needs to be kept minimal, this is a big problem.

The downside of shift + mask to get your oddly sized unaligned integer out of a dense array is likely very minimal. The figure below shows a comparison of execution traces on Intel Skylake, with a latency penalty of ~2.5 cycles due to the data dependency.

without.png
with.png
Figure 2: These execution traces were generated with uica.uops.info. The pipeline is filled with a bunch of nops, so the dispatch queue is pretty tight. Unless you're doing nothing but compute indices and fetch memory, the latency penalty is likely neglible.

Another example that often comes up in this context is union types (especially tagged unions). Even small size discrepancy between union variants can drastically increase our memory consumption. This is often an issue when dealing with recursive data structures like ASTs.

Example: I'm parsing a flattened AST and want to add a new node type for block expressions. A block always ends with another expression, but may optionally have a certain number of statements preceding it. Let's take a shot at it with some pseudocode:

pub enum Expr {
    BlockExpr(u32, Option<u32>, u8)
    //        ^^^         ^^^   ^^
    // expr pointer        |     |
    //        statement pointer  |
    //                  statement count
}

The first field is an index into our array of expressions, and the two remaining fields are essentially a fat pointer. That's 10 bytes for unaligned enum values and 16 with padding! The statement pointer seems wastefully large, but 16-bits (~65k statements) is definitely too low. How about u21? That will give us a budget of 2 million total addressable statements, which seems good. Now we have a budget of u10 for the statement count (max of 1024 statements per block seems reasonable) and can use the MSB for the Option discriminant.

We can also think about the number of variants in the Expr enum. It's common to have a few dozen different expression types, so we need some bits for the enum tag. We can take them from the expression pointer.

pub enum Expr {
    BlockExpr(u24, Option<u21>, u10)
}

With these few changes we've just shaved off two bytes, while still having 2-4 bits to spare for metadata, depending on the number of variants. And since the whole thing fits into 8 bytes, we basically get alignment for free. Without native support for custom-width integers, creating these oddly sized nested enums would become a real chore.

With that being said, shaving off unneeded bits isn't only relevant for software. Most x86_64 machines today can only use the lower 48-52 bits for physical addressing, which I believe is done for space and efficiency reasons. This leads us directly to the other end of the spectrum, namely hardware design.

The LLVM Perspective

In 2020, Erich Keane submitted a patch to LLVM implementing custom-width integers and wrote a great follow-up post about it. The main motivation was to enable more efficient high-level synthesis (HLS) of C and C++ programs. The post is absolutely worth a read, and also mentions some of the intricacies around integer promotion.

Conclusion

In my opinion, most modern systems programming languages would greatly benefit from custom-width integers. I'd like to highlight a very relevant sentence from Erich's LLVM post:

Worse, there was no way for the programmer express their intent in the cases where they do not need the full width of a standard integer type.

It's my belief that exactly this shared experience made people from different backgrounds come to the same realization. It's no surprise that whenever we are given the ability to convey more information about our problem, we're often rewarded by the compiler with correctness, consistency and performance. I'm looking forward to more languages beyond Zig adopting this feature in the future.

Further Reading
  • Oli Scherer. "Ranged Integers" (a series of posts on ranges in Rust) cohost.org 2022 [link]

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK