6

LLVM internals, part 4: attributes and attribute groups

 2 years ago
source link: https://blog.yossarian.net/2021/11/29/LLVM-internals-part-4-attributes-and-attribute-groups
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

attributes and attribute groups

ENOSUCHBLOG

Programming, philosophy, pedaling.


LLVM internals, part 4: attributes and attribute groups

Nov 29, 2021

   

Tags: llvm, rust

   

Series: llvm-internals


Preword

This is the last post I plan to do on parsing LLVM’s bitcode, at least for a while. I’ll keep working on the various libraries that I’ve started under the mollusc umbrella, but they won’t be accompanied by regular posts anymore — I’d like to refocus on writing smaller, less dense (read: lower-pressure) posts for a while.

Also, as a sort of update: all four of the patchsets1 that I mentioned in the previous post have been fully merged into LLVM, including a couple of documentation and bitcode parsing bugfixes. Many thanks to the LLVM maintainers for reviewing and approving my changes!


Intrinsics, attributes, metadata?

LLVM has no less2 than three different ways to represent some of the metadata that gets stored in its intermediate representation of a program: metadata, attributes, and intrinsics. All three are represented differently in the bitcode format and this post will focus only on attributes, but it’s important to understand the semantic difference between the three.

Using this excellent 2016 LLVM developers’ meeting presentation as a reference:

  • Intrinsics are internal functions whose semantics are well-defined by and well-known to LLVM. Common examples of these include the @llvm.memcpy.* family for semantics similar to the memcpy(3) libc routine, as well as various mathematics intrinsics corresponding roughly to libm routines (such as @llvm.ceil for ceil(3)).

    Generation of these intrinsics by language frontends (such as clang) is vital to LLVM’s ability to perform a wide variety of memory and arithmetic optimizations3, and clang will not hesitate to produce them even for unoptimized runs:

    int main(void) {
      char x[1024];
      char y[1024];
    
      if (rand()) {
          memcpy(x, y, 1024);
      } else {
          memcpy(y, x, 1024);
      }
    
      return 0;
    }
    

    yields:

    define dso_local i32 @main() #0 !dbg !9 {
      %1 = alloca i32, align 4
      %2 = alloca [1024 x i8], align 16
      %3 = alloca [1024 x i8], align 16
      store i32 0, i32* %1, align 4
      call void @llvm.dbg.declare(metadata [1024 x i8]* %2, metadata !14, metadata !DIExpression()), !dbg !19
      call void @llvm.dbg.declare(metadata [1024 x i8]* %3, metadata !20, metadata !DIExpression()), !dbg !21
      %4 = call i32 @rand() #4, !dbg !22
      %5 = icmp ne i32 %4, 0, !dbg !22
      br i1 %5, label %6, label %9, !dbg !24
    
    6:                                                ; preds = %0
      %7 = getelementptr inbounds [1024 x i8], [1024 x i8]* %2, i64 0, i64 0, !dbg !25
      %8 = getelementptr inbounds [1024 x i8], [1024 x i8]* %3, i64 0, i64 0, !dbg !25
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 %7, i8* align 16 %8, i64 1024, i1 false), !dbg !25
      br label %12, !dbg !27
    
    9:                                                ; preds = %0
      %10 = getelementptr inbounds [1024 x i8], [1024 x i8]* %3, i64 0, i64 0, !dbg !28
      %11 = getelementptr inbounds [1024 x i8], [1024 x i8]* %2, i64 0, i64 0, !dbg !28
      call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 16 %10, i8* align 16 %11, i64 1024, i1 false), !dbg !28
      br label %12
    
    12:                                               ; preds = %9, %6
      %13 = load i32, i32* %1, align 4, !dbg !30
      ret i32 %13, !dbg !30
    }
    

    (View it on Godbolt.)

    Consequently, LLVM’s intrinsics are correctness-bearing in IR programs: it is not safe, in the general case, to remove calls to intrinsic functions without providing an adequate substitute function4.

  • Attributes are markers that define special properties on a small subset of program features: functions (and their callsites), individual parameters, and return values.

    In the textual representation of LLVM’s IR, attributes are associated with program features with one of two syntaxes. For parameters and return values, attributes are shown inline at their definition sites. Using a memcpy intrinsic variant as an example:

    declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #3
    

    Here, we have the following parameter attributes:

    • Our two i8* parameters are noalias and nocapture, meaning that (1) memory accesses through each pointer are exclusive, and (2) that neither pointer is “captured” within the function, e.g. by storing its value to a global or other memory location that outlives the function call.
    • Our first i8* is writeonly and our second is readonly, which enforce precisely what they say on the tin: that the function may only write or read through these respective pointers.
    • Finally, our i1 parameter is marked as immarg, meaning that its parameter must be an immediate value (along with some other constraints) — anything that would have to be further loaded or executed to produce a value is invalid.

    Similarly for a definition of malloc:

    declare dso_local noalias align 16 i8* @malloc(i64) #2
    

    Here our single parameter has no attributes of its own, but our return value has two: noalias and align 16.

    Our second syntax is also visible in the snippets above: the #N anchor ties each function (or function callsite) to a list of attributes that apply to the entire function or callsite.

    For example, the memcpy intrinsic is associated with #3, which is:

    attributes #3 = { argmemonly nofree nounwind willreturn }
    

    (We’re getting into the weeds now, but a whirlwind tour: argmemonly asserts that the function only modifies memory through any pointer arguments it’s given; nofree asserts that the function does not call a free-type function on its pointer argument(s); nounwind asserts that the function never raises an exception; willreturn asserts that the function either contains UB or returns back to its callsite in some call stack.)

    Each of these, again, has important effects on the program’s correctness, so naively removing them5 is not safe. They also appear on many callsites, e.g.:

    %5 = call i32 @rand() #4, !dbg !24
    
    ; ... snip ...
    
    attributes #4 = { nounwind }
    
  • Metadata is a lot of things. Among them:

    • Source-level debugging information that LLVM is responsible for lowering into a format like DWARF or CodeView.

      Virtually everything that ends up in one of these binary debugging formats corresponds to one or more metadata entries (indicated textually as !N) on LLVM instructions, functions, &c. For example, here’s how LLVM relates an IR-level %struct back to its original C-level layout with metadata entries:

      typedef struct {
          int x;
          char y;
          struct bar {
              long z0;
          } z;
      } foo;
      
      int main(void) {
          foo x;
          foo y;
      
          /* ... snip ... */
      }
      

      Here, the x and y declarations get @llvm.dbg.declare intrinsics with associated metadata entries (comments added):

      %1 = alloca i32, align 4
      %2 = alloca %struct.foo, align 8 ; 'foo x'
      %3 = alloca %struct.foo, align 8 ; 'foo y'
      store i32 0, i32* %1, align 4
      call void @llvm.dbg.declare(metadata %struct.foo* %2, metadata !12, metadata !DIExpression()), !dbg !24
      call void @llvm.dbg.declare(metadata %struct.foo* %3, metadata !25, metadata !DIExpression()), !dbg !26
      

      Following metadata !12, we can see that it accurately describes both x and its type, by way of subsequent metadata entries:

      !12 = !DILocalVariable(name: "x", scope: !7, file: !8, line: 13, type: !13)
      !13 = !DIDerivedType(tag: DW_TAG_typedef, name: "foo", file: !8, line: 10, baseType: !14)
      !14 = distinct !DICompositeType(tag: DW_TAG_structure_type, file: !8, line: 4, size: 128, elements: !15)
      !15 = !{!16, !17, !19}
      !16 = !DIDerivedType(tag: DW_TAG_member, name: "x", scope: !14, file: !8, line: 5, baseType: !11, size: 32)
      !17 = !DIDerivedType(tag: DW_TAG_member, name: "y", scope: !14, file: !8, line: 6, baseType: !18, size: 8, offset: 32)
      !18 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
      !19 = !DIDerivedType(tag: DW_TAG_member, name: "z", scope: !14, file: !8, line: 9, baseType: !20, size: 64, offset: 64)
      !20 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "bar", file: !8, line: 7, size: 64, elements: !21)
      !21 = !{!22}
      !22 = !DIDerivedType(tag: DW_TAG_member, name: "z0", scope: !20, file: !8, line: 8, baseType: !23, size: 64)
      !23 = !DIBasicType(name: "long int", size: 64, encoding: DW_ATE_signed)
      

      That’s a lot of metadata!

    • Optional optimization assistance whose presence or absence does not affect program correctness. When present this metadata can be used to progressively enhance some optimizations. For example, the !range metadatum can be attached to a load, call, or invoke instruction to indicate that the value produced by the instruction can only be in one of the associated ranges.

      For example, adapted from the LLVM LangRef:

      ; `%a` can only be {0, 1, 3, 4}
      %a = load i8, i8* %x, align 1, !range !0
      
      ; ... snip ...
      
      ; valid ranges: [0, 2), [3, 5)
      !0 = !{ i8 0, i8 2, i8 3, i8 5 }
      

      Another great example is !callees, which helps LLVM perform indirect call promotion (which, in turn, decreases pressure on the indirect branch predictor at runtime).

So, to wrap things up: a correct bitcode parser needs to accurately handle intrinsics and attributes, while metadata can be more or less ignored until a brave masochistic soul feels the needs it for their own purposes.

Let’s get to it.

Parsing attributes from the bitcode

For reasons that are unclear to me, LLVM describes all attributes as “parameter attributes” at the bitcode/bitstream level, even when said attributes refer to entire functions or function return values. Similarly confusingly, LLVM splits said “parameter attributes” into two separate bitstream level blocks: PARAMATTR_BLOCK and PARAMATTR_GROUP_BLOCK6.

The former references the latter (by way of indices), so the latter needs to be parsed first. As such, we’ll start with it.

PARAMATTR_GROUP_BLOCK

Here’s what LLVM’s bitcode docs have to say about this block:

The PARAMATTR_GROUP_BLOCK block (id 10) contains a table of entries describing the attribute groups present in the module. These entries can be referenced within PARAMATTR_CODE_ENTRY entries.

The PARAMATTR_GROUP_BLOCK block can only contain one kind of record: PARAMATTR_GRP_CODE_ENTRY. Each PARAMATTR_GRP_CODE_ENTRY record looks like this:

[ENTRY, grpid, paramidx, attr0, attr1, ...]

…where grpid is a unique numeric identifier for this group of attributes, and paramidx identifies one of the following:

  • 0: This group contains attributes for the return value (i.e., call-site?7) of the function that references it (by grpid).
  • 0xFFFFFFFF: This group contains attributes for the function itself.
  • [1, 0xFFFFFFFF): This group contains attributes for the Nth function parameter, starting at 1.

We can represent this with a tidy enum and a total mapping from u32:

#[derive(Clone, Copy, Debug)]
pub enum AttributeGroupDisposition {
    Return,
    Parameter(u32),
    Function,
}

impl From<u32> for AttributeGroupDisposition {
    fn from(value: u32) -> Self {
        match value {
            u32::MAX => Self::Function,
            0 => Self::Return,
            _ => Self::Parameter(value),
        }
    }
}

(Docs link.)

Each attrN has, in turn, even more internal structure:

{ kind, key [, ...], [value [, ...]] }

…where kind indicates the layout of key and value:

  • 0: key is an integer indicating a “well-known” attribute, such as noalias (9). These are sometimes referred to as “enum attributes” in the LLVM source code. These attributes have no associated value — all necessary information is wholly present in the key itself.
  • 1: key is an integer indicating a “well-known” attribute, and value is also an integer, representing a value associated with the attribute. For example, dereferenceable(<n>) (41) takes <n> via the value field, which in turn represents the address space that the dereferenceable attribute applies to.
  • 3: key is a null-terminated string, indicating a not well-known “string” attribute. This string can be free-form and isn’t accompanied by a value. The absence of well-known string attributes means that there’s no single source of truth for values that are understood by various parts of LLVM. That being said, passing -fsplit-stack on clang 13 seems to reliably introduce the "split-stack" string attribute to any function that isn’t marked with __attribute__((no_split_stack)):

    define dso_local i32 @main() #2 !dbg !9 {
      ; ... snip ...
    }
    
    ; ... snip ...
    attributes #2 = { noinline nounwind optnone uwtable "split-stack" }
    

    (View it on Godbolt.)

  • 4: key is a null-terminated string, and value is also a null-terminated string. Like with the key-only variant, neither string is well-known and both are free-form. These are comparatively common, and we can see LLVM translate C-level __attribute__s to them:

    __attribute__((__target__("sse4.2"))) int add1(int x, int y);
    __attribute__((__target__("sse4.1"))) int add2(int x, int y);
    

    These become (simplified):

    attributes #0 = { "target-features"="+cx8,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" }
    attributes #2 = { "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" }
    

    (View it on Godbolt.)

    Observe that #2 doesn’t include sse4.2 in its target-features value, since we’ve capped it at sse4.1.

The astute will notice that kind=2 isn’t specified. Why? Beats me8!

Once again, we can model this with a relatively tidy enum:

// Each variant's value is exhaustively enumerated in turn, where possible.
// For example, `EnumAttribute` is not just a `u32` newtype but an exhaustive
// `enum` of all currently known "enum"-kinded LLVM attributes.
pub enum Attribute {
    Enum(EnumAttribute),
    Int(IntAttribute),
    Str(String),
    StrKeyValue(String, String),
}

(Docs link.)

…and to tie it all together, our model for each “attribute group” in PARAMATTR_GROUP_BLOCK:

#[derive(Clone, Debug)]
pub struct AttributeGroup {
    disposition: AttributeGroupDisposition,
    attributes: Vec<Attribute>,
}

(Docs link.)

And our parsing procedure PARAMATTR_GROUP_BLOCK:

  1. Create an initial mapping of grpid -> AttributeGroup
  2. For each PARAMATTR_GRP_CODE_ENTRY in PARAMATTR_GROUP_BLOCK:
    1. Extract the grpid and paramidx fields, which are always present and are always the first two in the record. Convert the paramidx into its corresponding AttributeGroupDisposition.
    2. Initialize fieldidx as 2, indicating that we’ve already consumed grpid and paramidx.
    3. Parse Attributes from the record using the kind rules above, starting at fieldidx and increasing fieldidx by the number of record fields consumed at each parse step. Complete when fieldidx == record.fields().len().
    4. Construct a new AttributeGroup whose disposition and attributes are those just parsed.
    5. Add the new grpid and AttributeGroup to the mapping

…and here’s what that looks like, in Rust:

impl IrBlock for AttributeGroups {
    const BLOCK_ID: IrBlockId = IrBlockId::ParamAttrGroup;

    fn try_map_inner(block: &UnrolledBlock, _ctx: &mut MapCtx) -> Result<Self, BlockMapError> {
        let mut groups = HashMap::new();

        for record in block.all_records() {
            let code = AttributeCode::try_from(record.code()).map_err(AttributeError::from)?;

            if code != AttributeCode::GroupCodeEntry {
                return Err(AttributeError::WrongBlock(code).into());
            }

            if record.fields().len() < 3 {
                return Err(RecordMapError::BadRecordLayout(format!(
                    "too few fields in {:?}, expected {} >= 3",
                    code,
                    record.fields().len()
                ))
                .into());
            }

            let group_id = record.fields()[0] as u32;
            let disposition: AttributeGroupDisposition = (record.fields()[1] as u32).into();

            let mut fieldidx = 2;
            let mut attributes = vec![];
            while fieldidx < record.fields().len() {
                let (count, attr) = Attribute::from_record(fieldidx, record)?;
                attributes.push(attr);
                fieldidx += count;
            }

            if fieldidx != record.fields().len() {
                return Err(RecordMapError::BadRecordLayout(format!(
                    "under/overconsumed fields in attribute group record ({} fields, {} consumed)",
                    fieldidx,
                    record.fields().len(),
                ))
                .into());
            }

            groups.insert(
                group_id,
                AttributeGroup {
                    disposition,
                    attributes,
                },
            );
        }

        Ok(AttributeGroups(groups))
    }
}

That leaves us with the final product of mapping the PARAMATTR_GROUP_BLOCK block: a mapping of grpid -> AttributeGroup. Let’s see how the PARAMATTR_BLOCK uses this mapping.

PARAMATTR_BLOCK

The other half of the attributes equation is the PARAMATTR_BLOCK block, which is documented by LLVM as follows:

The PARAMATTR_BLOCK block (id 9) contains a table of entries describing the attributes of function parameters. These entries are referenced by 1-based index in the paramattr field of module block FUNCTION records, or within the attr field of function block INST_INVOKE and INST_CALL records.

Entries within PARAMATTR_BLOCK are constructed to ensure that each is unique (i.e., no two indices represent equivalent attribute lists).

There are two valid record codes in the PARAMATTR_BLOCK:

  • PARAMATTR_CODE_ENTRY_OLD: This is a legacy record, emitted by LLVM 3.2 and earlier. We don’t bother handling it, since modern versions of LLVM won’t emit it at all and 3.2 is nearly a decade old at this point. However, for completeness, it looks like this:

    [ENTRY, paramidx0, attr0, paramidx1, attr1...]
    

    Look familiar? That’s because it’s nearly identical to the PARAMATTR_GRP_CODE_ENTRY record format that we just comprehensively walked through: the only difference is the absence of the “group” concept. LLVM (presumably) deprecated this format when someone realized that many parameters and whole functions share entire groups of attributes, allowing for deduplication by referencing the groups rather than listing all attributes every time.

  • PARAMATTR_CODE_ENTRY: This is the referee record mentioned in the PARAMATTR_GROUP_BLOCK documentation. Each PARAMATTR_CODE_ENTRY looks like this:

    [ENTRY, attrgrp0, attrgrp1, ...]
    

    …where attrgrpN is a attribute group index. And the circle is complete: we use our mapping of the PARAMATTR_GROUP_BLOCK to fetch each list of attributes corresponding to each group index, and append them all together into one big list of each PARAMATTR_CODE_ENTRY. These lists, in turn, are referenced by 1-based indices in other blocks and records throughout the bitcode representation.

    That ends up being nice and simple:

    impl IrBlock for Attributes {
      const BLOCK_ID: IrBlockId = IrBlockId::ParamAttr;
    
      fn try_map_inner(block: &UnrolledBlock, ctx: &mut MapCtx) -> Result<Self, BlockMapError> {
          let mut entries = vec![];
    
          for record in block.all_records() {
              let code = AttributeCode::try_from(record.code()).map_err(AttributeError::from)?;
    
              match code {
                  AttributeCode::Entry => {
                      let mut groups = vec![];
                      for group_id in record.fields() {
                          let group_id = *group_id as u32;
                          log::debug!("group id: {}", group_id);
                          groups.push(
                              ctx.attribute_groups()?
                                  .get(group_id)
                                  .ok_or(AttributeError::BadAttributeGroup(group_id))?
                                  .clone(),
                          );
                      }
                      entries.push(AttributeEntry(groups));
                  }
                  AttributeCode::GroupCodeEntry => {
                      // This is a valid attribute code, but it isn't valid in this block.
                      return Err(AttributeError::WrongBlock(code).into());
                  }
                  _ => {
                      return Err(BlockMapError::Unsupported(format!(
                          "unsupported attribute block code: {:?}",
                          code,
                      )))
                  }
              }
          }
    
          Ok(Attributes(entries))
      }
    }
    

Takeaways

Parsing the blocks responsible for LLVM’s attributes was moderately troublesome: nowhere nearly as annoying as the type table9, but not as easy as the identification block or the string table. All told, the current implementation requires slightly under 900 lines of code, much of which is documentation and enum variants.

The end result of it all can be seen with in the debug logs of the unroll-bitstream example provided by the llvm-mapper crate:

RUST_LOG=debug ./target/debug/examples/unroll-bitstream some-input.bc

…which, amidst a great deal of other output, should yield some messages like this (formatted for readability):

[2021-11-29T06:54:51Z DEBUG llvm_mapper::block::module] attributes:
  Some(Attributes([
    AttributeEntry(
      [
        AttributeGroup {
          disposition: Function,
          attributes: [
            Enum(NoInline),
            Enum(NoUnwind),
            Enum(OptimizeNone),
            Enum(UwTable),
            StrKeyValue("correctly-rounded-divide-sqrt-fp-math", "false"),
            StrKeyValue("disable-tail-calls", "false"),
            StrKeyValue("frame-pointer", "all"),
            StrKeyValue("less-precise-fpmad", "false"),
            StrKeyValue("min-legal-vector-width", "0"),
            StrKeyValue("no-infs-fp-math", "false"),
            StrKeyValue("no-jump-tables", "false"),
            StrKeyValue("no-nans-fp-math", "false"),
            StrKeyValue("no-signed-zeros-fp-math", "false"),
            StrKeyValue("no-trapping-math", "false"),
            StrKeyValue("stack-protector-buffer-size", "8"),
            StrKeyValue("target-cpu", "x86-64"),
            StrKeyValue("target-features", "+cx8,+fxsr,+mmx,+sse,+sse2,+x87"),
            StrKeyValue("unsafe-fp-math", "false"),
            StrKeyValue("use-soft-float", "false")
          ]
        }
      ]
    )
  ]))

This information isn’t exposed anywhere in mapped LLVM modules, yet: it’s kept purely as state within the MapCtx. Future mapping work (e.g., for IR-level functions, blocks, instructions, &c) will access that state to correctly associate themselves with their attributes.

As I’ve said in previous posts: mollusc still has no particular end goal or state in mind, other than my broad goal of being able to perform some amount of static analysis of LLVM IR in pure Rust. The beatings development will continue until the masochism curiosity abates.


  1. D107536, D108441, D108962, and D109655

  2. And possibly more; it’s a big project. In particular, I have no clue how “operand bundles” work or what they do. 

  3. Why does LLVM need these intrinsics, rather than unconditionally inserting a call to e.g. an extremely optimized memcpy(3) implementation? Because even that sometimes isn’t enough: sometimes you want to inline the call entirely, or to call a slightly different optimized memcpy implementation. Or for even simpler reasons: having an intrinsic here gives LLVM a fuller picture of the program than an external call, even a well understood one, would normally allow. 

  4. There are, however, some intrinsics than can be safely deleted without compromising program correctness. The @llvm.dbg.* family is an example of this, but the intricacies of how these intrinsics interact with LLVM’s metadata facilities are well outside the scope of this post. Perhaps another time. 

  5. Or failing to interpret them during code generation, or optimization, &c. 

  6. This part would be less confusing if the PARAMATTR_BLOCK and PARAMATTR_GROUP_BLOCK blocks didn’t share a record code namespace. But they do, so it’s not clear why they’re separated at all, especially when they have a tight dependency relationship. 

  7. I think? 

  8. LLVM’s BitcodeReader doesn’t mention kind=2 at all; see here. It does however mention kind=5 and kind=6, despite these not being documented. I haven’t seen these in any bitcode samples yet so I haven’t bothered digging into them, but they’re apparently “type” attribute formats. 

  9. Which had to be rewritten entirely after I published the last post in this series, due to unavoidable lifetime issues with the previous approach. 


Discussions: Reddit

Twitter



About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK