2

Lifetime Annotations for C++

 2 years ago
source link: https://discourse.llvm.org/t/rfc-lifetime-annotations-for-c/61377/5
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Martin Brænne @martinboehme
Rosica Dejanovska @scentini
Gábor Horváth @Xazax-hun
Dmitri Gribenko @gribozavr
Luca Versari @veluca93

Summary

We are designing, implementing, and evaluating an attribute-based annotation scheme for C++ that describes object lifetime contracts. It allows relatively cheap, scalable, local static analysis to find many common cases of heap-use-after-free and stack-use-after-return bugs. It allows other static analysis algorithms to be less conservative in their modeling of the C++ object graph and potential mutations done to it. Lifetime annotations also enable better C++/Rust and C++/Swift interoperability.

This annotation scheme is inspired by Rust lifetimes, but it is adapted to C++ so that it can be incrementally rolled out to existing C++ codebases. Furthermore, the annotations can be automatically added to an existing codebase by a tool that infers the annotations based on the current behavior of each function’s implementation.

Clang has existing features for detecting lifetime bugs ([[clang::lifetimebound]] and -Wdangling-gsl). The lifetime annotations we propose are a strict superset of [[clang::lifetimebound]]. They support the majority of use cases of -Wdangling-gsl and many that it cannot express. A dedicated section below contains a detailed comparison with these existing approaches. We plan to enable our lifetime analysis to understand the existing annotations by translating them into our annotation syntax internally (where possible).

We are looking to contribute our current early implementation of lifetime annotations and supporting static analysis to Clang and Clang-Tidy. Developing it upstream would allow us to more easily collaborate on the design and implementation and get feedback from the community and early adopters that build the LLVM/Clang toolchain from git HEAD (for example, Chrome).

High-level implementation plan

We propose the following:

  • Add a general-purpose type annotation attribute annotate_type to Clang (see this separate RFC 55 for details).

  • Add an experimental Clang-Tidy check that infers the lifetime contracts based on the current behavior of source code and suggests annotations to add to the source code that describe these contracts. Each user will use this check once to annotate their codebase.

  • Add an experimental Clang-Tidy check that validates that the code follows the lifetime contracts described by the annotations.

  • Develop lifetime annotations for libc and libc++, stored in API notes 33 files. Upstream code from the apple/swift-clang fork of Clang for ingesting API notes in Sema 11.

  • Extend Clang’s API notes to not require Clang Modules.

  • Evaluate the annotations and Clang-Tidy checks with early adopters. Fine-tune the system based on the feedback.

  • Make a decision about stabilizing the Clang-Tidy checks and marking them non-experimental. If and when this happens, we could consider introducing attributes that are specific to our annotation scheme instead of using general-purpose annotation attributes.

  • Move libc++ annotations from API notes files into headers.

Implementation status

We have implemented a work-in-progress Clang-Tidy check that infers the lifetime contracts from un-annotated C++ code. It is already able to infer lifetimes in a wide range of non-trivial situations (see appendix B for examples). This gives us enough confidence in the annotation scheme to present it publicly and propose moving experimentation upstream.

We have not started working on the verification tool yet, but we believe it is a lot less risky than inference. Verification can reuse most of the complex static analysis algorithms required for inference; this also implies that inference will generate lifetimes that satisfy the verification tool. The two main additions that are required for verification are being able to distinguish different local lifetimes and producing good error messages when lifetime contracts are violated. Implementing this additional functionality will require some effort but not much innovation. We prioritize figuring out the type checking and inference rules and are therefore focusing on inference tooling first.

Rollback plan

Because the annotation scheme will be experimental for a while, we are not proposing to add any attributes to Clang that are specific to lifetimes. The only change to core Clang that we are proposing is adding a general-purpose type annotation attribute. Apart from this, the implementation will be contained in new Clang-Tidy checks, making it well isolated from the rest of the codebase.

In case our experimentation fails, the Clang-Tidy checks can be easily removed from Clang without breaking users’ builds. The general-purpose annotate_type attribute will remain in Clang as we expect it to be useful for other purposes.

Use cases enabled by lifetime annotations

Lifetime annotations describe the lifetime contracts of C++ APIs in a modular, machine-readable manner, with enough flexibility to cover many modern C++ architectural and local coding patterns. Having such descriptions available in C++ source code enables the following use cases:

  • Improved readability for humans. Users can easily find the lifetime contracts in the function signature, and trust this information to be correct. Typical current practice is to use prose in documentation comments to describe lifetime contracts, but code authors don’t do this consistently or reliably: The information is often missing, and when it is present, it is sometimes incorrect.

  • Improved static analysis capabilities: understanding of the object graph and mutations. Static analysis tooling today often suffers from an inability to precisely reason about mutations in a modular way. Scalable, local static analysis that needs soundness has to conservatively assume that all pointers passed to a function call will escape, and that subsequent function calls will mutate objects reachable from those pointers. Lifetime annotations allow static analysis tools to derive a more precise approximation of possible object graph state and mutations. See appendix C for an example.

    Concretely, lifetime annotations improve modeling of function call side effects in the Clang dataflow analysis framework 5.

  • Better C++/Rust interoperability. Lifetime annotations open an avenue for more complete, more automatic, more ergonomic, and safer C++/Rust interoperability than is currently provided by state-of-the-art Rust crates such as cxx 15 and autocxx 14. Existing interop solutions can only bridge C++ APIs that accept and return objects either by value or inside owning smart pointers. Lifetime annotations allow us to automatically bridge C++ functions with complex lifetime contracts. Lifetime contracts from C++ function signatures can be mapped to Rust lifetimes, enabling us to map C++ pointers and references to safe references in Rust. See appendix D for a concrete example.

  • Better C++/Swift interoperability. Lifetime annotations can help provide safer C++/Swift interoperability. Swift does not expose lifetime annotations in the language, but internally in the compiler the mechanisms and principles are rather similar to what Rust exposes at the language level. The Swift compiler starts tracking object lifetimes after converting Swift AST to the Swift intermediate language (SIL). When Swift code calls a C++ foreign function or uses an instance of a C++ struct/class, the Swift compiler can get the corresponding lifetime contract from the C++ header and validate that the input and output objects live long enough.

Limitations

The static analysis based on the proposed lifetime annotations cannot catch all memory safety problems in C++ code. Specifically, it cannot catch all temporal memory safety bugs (for example, ones caused by iterator invalidation), and of course lifetime annotations don’t help with spatial memory safety (for example, indexing C-style arrays out of bounds). See the comparison with Rust below for a detailed discussion.

Overview of lifetime annotations

Note
This is not a design doc; the design is still in flux. Design docs will be mailed as patches.

The lifetime annotation scheme we propose is inspired by and similar to lifetimes in Rust 15. Rust is an industrial-strength language with complete and consistent support for static lifetime checking. It embodies a wealth of experience on how to make lifetime checking work on large real-world codebases, and we think this is a good reason to borrow these tried-and-true concepts for C++. We will, however, present the annotation scheme in a way that should make it understandable to readers without any knowledge of Rust.

Defining the annotation scheme completely would take many pages, and we don’t feel it would be productive to go into this level of detail in this high-level RFC. Instead, we will present a few representative examples with explanations that provide enough detail to give a feel for how the annotation scheme works. We’re happy to provide more details if needed.

Example

Here is a simple example:

const std::string& [[clang::annotate_type("lifetime", "a")]] smaller(
    const std::string& [[clang::annotate_type("lifetime", "a")]] s1,
    const std::string& [[clang::annotate_type("lifetime", "a")]] s2) {
  if (s1 < s2) {
    return s1;
  } else {
    return s2;
  }
}

This function takes two references to strings and returns a reference to the lexicographically smaller of the two strings. Because the return value might refer to either of the two input strings, its lifetime is tied to the two inputs. This is expressed by the annotation [[clang::annotate_type("lifetime", "a")]].

The annotate_type attribute has no effect on the formal C++ type system or runtime semantics; the lifetime inference and verification tooling use it to establish a “shadow” type system. For more details, see the RFC for annotate_type 55.

The annotation in its “raw” form is verbose and obscures the rest of the function signature. In practice, it is preferable to define a macro that expands to the attribute. In the rest of this proposal, we will assume that a macro $a has been defined to expand to [[clang::annotate_type("lifetime", "a")]], and similarly $b, $c, and so on. (Most major compilers, including Clang, GCC, and MSVC, allow $ as an implementation-defined character in identifiers.) With this, the function signature looks as follows:

const std::string& $a smaller(const std::string& $a s1, const std::string& $a s2) {
   ...
}

We think this style of macros makes the lifetimes visually distinctive as well as brief, so we will use it throughout this proposal. However, the macros are not part of our proposal; every codebase can define its own macro shortcuts that work within the context of that codebase.

The names of lifetimes have no connection to any other identifiers in the program. A lifetime may happen to have the same name as another entity in the program, but this does not affect its meaning. Lifetimes in function signatures are implicitly scoped to the function in which they appear; we will elaborate on scoping rules in detailed design docs.

Tooling can use the annotations to detect lifetime bugs, for example:

void f() {
  std::string foo = "foo";
  const std::string& first = smaller(foo, "bar");
  std::cout << first << “\n”;
}

The second argument to smaller is a temporary std::string object, whose lifetime lasts only until the end of the statement. The lifetime annotations tell us that the reference first may be bound to this temporary, and that therefore accessing this reference in the following line is UB.

Note: Both parameters of smaller are annotated with the same lifetime $a, but this does not mean that the objects passed in as arguments need to have exactly the same lifetime. Indeed, this is not the case in the example call smaller(foo, "bar") above.

Informally, the annotation means that the return value can have the lifetime of either of the two arguments.

Formally, we can think of the lifetime $a as being a generic parameter of the function smaller(). A concrete lifetime is substituted for this parameter at every callsite of smaller(). A reference with a given lifetime may be implicitly converted to a reference of shorter lifetime. For the example call smaller(foo, "bar") above, we therefore choose $a to be the shorter of the two argument lifetimes; this is the lifetime of the second argument, the implicitly constructed temporary. The string foo has a longer lifetime than this temporary, so it can be implicitly converted to a reference with lifetime $a. We therefore conclude that the lifetime of the reference returned by smaller() is equal to the lifetime of the temporary.

Lifetime of this

The annotation for the lifetime of a this pointer is placed at the end of the member function declaration, for example:

struct StringPair {
  std::string first, second;
  const std::string& $a smaller() const $a {
    if (first < second) {
      return first;
    } else {
      return second;
    }
  }
};

This expresses that the lifetime of the reference returned by StringPair::smaller() is tied to the lifetime of the StringPair object on which the member function is called. Note that the $a signifying the lifetime of the this pointer comes in a natural position directly after the const signifying the constness of the this pointer. The syntax remains consistent if we added a ref-qualifier, e.g., const std::string& $a smaller() const & $a.

Lifetimes in template arguments

Lifetimes may be added to template arguments, e.g.

int* $a get_first(const std::vector<int* $a>& $b v) {
  return v.at(0);
}

This expresses that the lifetime $a of the return value is tied to the lifetime of the pointers contained in the vector, and that this lifetime is independent of the lifetime $b of the vector itself.

Lifetime-parameterized types

Some types are reference-like in the sense that they refer to data whose lifetime is independent of their own lifetime. An example of this from the standard library is string_view: It refers to string data whose lifetime is independent of the lifetime of the string_view itself.

This is expressed by adding a lifetime parameter to the type that represents the lifetime of the data referred to by the type. Here is an excerpt of what this would look like for a string_view-like type:

class LIFETIME_PARAM(s) simple_string_view {
  char* $s data_ptr;
  size_t data_size;
public:
  const char* $s data() const $a {
    return data_ptr;
  }
// …
};

LIFETIME_PARAM(s) is a macro that expands to the attribute [[clang::annotate(“lifetime_param”, “s”)]]. Again, the particular name of the macro is not part of this proposal.

The lifetime parameter $s is used in the definition of the member variable data_ptr to express that the lifetime of the string data is $s, a lifetime that is independent of the lifetime of the simple_string_view itself.

Similarly, $s is used in the data() member function to express that the lifetime of the return value is equal to the lifetime of the string data pointed to by data_ptr, not the lifetime $a of the simple_string_view itself.

When lifetime-parameterized types are used elsewhere in the code, they should be annotated with a lifetime in the same way that pointers and references are. For example, here is a simple_string_view version of the function smaller() that we showed earlier:

simple_string_view $a smaller(simple_string_view $a s1, simple_string_view $a s2) {
  if (s1 < s2) {
    return s1;
  } else {
    return s2;
  }
}

Appendix A shows an annotated version of the most important parts of the actual standard string_view type.

Formally, lifetimes are generic type parameters, identified by their index, and type-erased at code generation time.

Lifetime elision

As in Rust, to avoid unnecessary annotation clutter, we allow lifetime annotations to be elided (omitted) from a function signature when they conform to certain regular patterns. Lifetime elision is merely a shorthand for these regular lifetime patterns. Elided lifetimes are treated exactly as if they had been spelled out explicitly; in particular, they are subject to lifetime verification, so they are just as safe as explicitly annotated lifetimes.

We propose to use the same rules as in Rust, as these transfer naturally to C++. We call lifetimes on parameters input lifetimes and lifetimes on return values output lifetimes. (Note that all lifetimes on parameters are called input lifetimes, even if those parameters are output parameters.) Here are the rules:

  1. Each input lifetime that is elided (i.e., not stated explicitly) becomes a distinct lifetime.
  2. If there is exactly one input lifetime (whether stated explicitly or elided), that lifetime is assigned to all elided output lifetimes.
  3. If there are multiple input lifetimes but one of them applies to the implicit this parameter, that lifetime is assigned to all elided output lifetimes.

In practice, lifetime elision allows explicit annotations to be omitted in many cases. For example, the lifetimes of the StringPair::smaller() example we showed earlier are implied by the elision rules and could therefore be omitted: const std::string& $a smaller() const $a.

Introducing lifetimes to a codebase will have to happen incrementally. During this process, missing lifetimes need to be interpreted differently in different files:

  • In files on which we have already run the lifetime inference tooling, the elision rules should be applied to types that require lifetimes but do not have lifetime annotations (these are pointers, references, and lifetime-parameterized types).
  • In files on which we have not yet run the inference tooling, none of the functions have lifetime annotations, and the elision rules should not be applied because the lifetimes they imply are generally not correct.

We therefore propose using a pragma #pragma clang lifetime_elision to mark source files where lifetime elision should be applied. Note that support for this pragma can be implemented entirely within the Clang-Tidy check using the clang::PragmaHandler API; no changes to Clang itself are needed.

Alternative annotation syntax using only [[clang::annotate]]

If our proposal to add a general-purpose type annotation attribute annotate_type to Clang does not meet with approval, we can instead use the existing [[clang::annotate]] attribute, though at the cost of readability. For example:

class [[clang::annotate("lifetime_params”, “s")]] simple_string_view {
  [[clang::annotate("member_lifetimes”, “s")]]
  const char* data_ptr;
};

[[clang::annotate("function_lifetimes”, “a, a -> a")]]
const std::string& smaller(const std::string& s1, const std::string& s2);

template<typename T, typename U>
[[clang::annotate("function_lifetimes”, “(a, b) -> a")]]
int* get_first(const std::vector<int*>& v);

Since [[clang::annotate]] is a declaration attribute, it can’t appear inline within a type, and must be attached to the declaration. This attribute placement detaches the lifetime information from the type, and we think that it is less readable. Certain cases of lifetime elision, where only some of the lifetimes in a function are elided, would also not be possible with this notation.

Current limitations of proposed lifetime annotations

No subtyping constraints between lifetimes

We do not have an equivalent of Rust’s where clauses, which establish “outlives” constraints between lifetimes. Consider this example:

void push_first(std::vector<int*>& a, std::vector<int*>& b) {
  a.push_back(b[0]);
}

We should be able to call push_first if the lifetime of the pointers in b is at least as long as the lifetime of the pointers in a, but there is no way to express this constraint with the current annotations.

This limitation could be solved by introducing a LIFETIME_CONSTRAINTS annotation:

LIFETIME_CONSTRAINTS(a <= b)
void push_first(std::vector<int* $a>& a, std::vector<int* $b>& b) {
  a.push_back(b[0]);
}

No equality constraints between lifetime parameters

If a class has multiple lifetime parameters, those lifetimes are always assumed to be independent of each other; individual member functions cannot impose constraints on them. This creates a limitation in expressivity. For example, we cannot annotate Pair::Method() in the following example with lifetimes since it may only be called when $a == $b:

struct LIFETIME_PARAM(a, b) Pair {
  int* $a first;
  int* $b second;

  void Method() {
    TakeSpecialPair(this);
  }
};

void TakeSpecialPair(Pair $a $a * p);

Rust solves this issue by allowing users to write multiple impl blocks for a struct, where each carries its own generic signature for self.

Again, we could solve this issue in C++ by adding per-method equality constraints:

struct LIFETIME_PARAM(a, b) Pair {
  int* $a first;
  int* $b second;

  LIFETIME_CONSTRAINTS(a == b)
  void Method() {
    TakeSpecialPair(this);
  }
};

C++23’s explicit object parameter syntax ((P0847R7)[Deducing this 14]) will allow this constraint to be expressed directly:

struct LIFETIME_PARAM(a, b) Pair {
  int* $a first;
  int* $b second;

  void Method(this Pair $c $c &self) {
    TakeSpecialPair(self);
  }
};

Cannot define different constraints for function entry and exit

Our annotation scheme cannot express different lifetime constraints at function entry and exit, i.e., it cannot express separate pre- and post-conditions. Note that Rust has the same limitation.

The use cases for different lifetime constraints at function entry and exit are probably rare, but they do exist. As an example, consider string_view::swap. It exchanges the data pointers of the two string_view objects and hence also their lifetimes, but our annotation scheme cannot express this. Instead, we must more conservatively demand that the lifetimes of the two string_views are the same:

class LIFETIME_PARAM(s) string_view {
  size_t __size;
  const char* $s __data;

public:
  void swap(string_view $s & __other);
};

A similar limitation applies to std::swap.

However, this overly conservative annotation does not appear to be an issue for most practical applications. For example, when using string_view::swap to implement a sorting algorithm, the lifetimes of the string_views being sorted will anyhow be the same. The fact that Rust’s lifetime annotations have the same limitation is further evidence that it does not appear to be a problem in practice.

Lifting this limitation is possible, but it would require more complexity in the annotation scheme and likely also significant additional complexity in the lifetime inference and verification algorithms. We should only commit to this additional complexity if we discover an important use case that requires it.

Comparison with other work in this area

[[clang::lifetimebound]]

Clang implements an attribute [[clang::lifetimebound]] (Attributes in Clang — Clang 15.0.0git documentation 4) that can express a strict subset of the lifetime annotations that we are proposing. Specifically, [[clang::lifetimebound]] only supports connecting the top-level lifetime of a function argument object to all lifetimes of the return value. It does not support, for example, expressing a relationship between two lifetimes of arguments, or talking about a lifetime that is not at the top level of the type (for example, nested in a template argument):

void push_back_if_not_null(std::vector<int* $a> xs, int* $a x) {
  if (x != nullptr) {
    xs.push_back(x);
  }
}

The function push_back_if_not_null can be annotated with our proposed lifetime annotations as shown, but cannot be annotated with [[clang::lifetimebound]].

Our lifetime analysis will desugar [[clang::lifetimebound]] into the lifetime representation that it uses.

Lifetime safety: preventing common dangling (WG21 proposal P1179, -Wdangling-gsl)

P1179 7 describes an analysis that has preliminary implementations in MSVC and a fork 2 of Clang. This analysis also inspired some statement-local warnings that are implemented in MSVC and the Clang trunk (-Wdangling-gsl, on by default); see tests here 1. The statement-local warnings have found many bugs in many real-world codebases.

The analysis described in P1179 is a flow-sensitive points-to analysis. It had the explicit goal to only warn when dangling pointers are actually dereferenced (not when they are created). It aims to prevent many kinds of errors, including:

  • Use after free
  • Use of a moved-from object
  • Dereferencing an invalid iterator
  • Null dereference

The analysis uses contract-style annotations to describe lifetime preconditions and postconditions. The separate pre- and postconditions help circumvent the limitations described above in the section “Cannot define different constraints for function entry and exit”.

Implementations

  • The Clang implementation (in the fork) lacks full support for field-sensitivity.
  • The Clang implementation will not attempt to find use-after-move errors.
  • The MSVC implementation does not support annotations.
  • None of the implementations support the SharedOwner concept that was introduced in the R1 version of the paper.
  • Both implementations do fixed-point iteration (as opposed to doing the acyclic CFG approach suggested by the paper).
  • According to the benchmarks, the Clang implementation imposes ~5% impact on full compilation including codegen (closer to 10% without codegen).
  • Currently, none of the implementations are actively developed, as contracts were not voted into the standard.

The readme of the Clang fork has direct links to the tests that can give a picture of the current state.

Comparison to Rust-style lifetimes

Here is a comparison between the properties of the Rust-style lifetime annotations proposed here and the P1179-style lifetime annotations:

  • Annotation syntax and semantics
    • This proposal: Introduces lifetime parameters via type annotations. Users need to learn a new concept, but the annotations are concise, spelled within the relevant type, and syntactically close to the function parameter names.
    • P1179: Describes points-to relationships via contracts, often in terms of abstract locations (e.g., the syntax o' refers to the memory owned by an owner o). Developers are familiar with points-to relationships, but the contracts-style annotations can be overly verbose and syntactically far from the parameters. Certain ambiguities require additional annotations, e.g., a non-const reference parameter can be either out or in-out, which has implications on its assumed “moved-from”-ness.
  • New concepts
    • This proposal: Introduces a relatively low number of new concepts.
    • P1179: Reuses concepts developers are already familiar with, such as “Owner” or “Pointer”.
  • Scope
    • This proposal: Iterator invalidation, use-after-move, null dereference are not in scope.
    • P1179: Can catch problems related to iterator invalidation but might need additional annotations to avoid certain false positives. Certain patterns (e.g. std::vector::reserve) cannot be supported in the model.
  • Limitations
    • This proposal: Certain patterns (like conditional lifetimes) cannot be represented.
    • P1179: Certain concepts (like conditional points-to relationships) cannot be represented. Moreover, the dataflow analysis cannot handle arbitrary code patterns and can be confused even when the underlying pattern is supported.
  • Treatment of dangling pointers
    • This proposal: Warns when a dangling pointer is created.
    • P1179: Warns when a dangling pointer is dereferenced.
  • Rules for default lifetimes
    • This proposal: Simple, easy-to understand default lifetimes and lifetime elision rules.
    • P1179: More sophisticated, harder to understand, rules to infer default annotations from signatures that cover the most common cases.
  • Mutations
    • This proposal: Cannot represent certain mutations (e.g., std::swap(ptr1, ptr2) requires ptr1 and ptr2 to have the same lifetimes).
    • P1179: Has no problems with mutations in general.
  • Support for user-defined classes
    • This proposal: Supports arbitrary user-defined classes as long as they don’t do anything forbidden (e.g., conditional lifetimes).
    • P1179: Certain user-defined constructs are not supported (e.g., a pointer-like type with multiple pointees at the same time).

Examples

Here are some code examples annotated in both styles.

Function that returns a pointer parameter

// This proposal
int* $a f(int* $a i);

// P1179
int* f(int* i)
  [[post: lifetime(Return, i)]];

Struct containing a pointer

// This proposal
struct LIFETIME_PARAM(s) S {
  int* $s m;
};

void f(int* $a i, S $a * out) {
  out->m = i;
}

// P1179
struct S { int* m; };

void f(int* i, S* out)
  [[post: lifetime(out->m, i)]]
{
  out->m = i;
}

Lifetimes of pointers in template arguments

// This proposal
void push_back_if_not_null(std::vector<int* $a>& xs, int* $a x) {
  if (x != nullptr) {
    xs.push_back(x);
  }
}

// Not actually supported by P1179, but the Clang implementation had experiments in
// this direction.
void push_back_if_not_null(std::vector<int*>& xs, int* x)
  [[pre: lifetime(deref(xs), x)]]
  [[post: lifetime(deref(xs), x)]]
{
  if (x != nullptr) {
    xs.push_back(x);
  }
}

The deref notation in the P1179 example above was originally developed for smart pointer types, hence the “dereference” nomenclature. It would require additional annotation (not shown above) of std::vector<int*> member functions that take or return an int*.

Template with multiple pointer arguments

// This proposal
void insert_if_not_null(map<int* $a, int* $b>& m, int* $a key, int* $b value) {
  if (key != nullptr && value != nullptr) {
    m[key] = value;
  }
}

// This is not supported in P1179, as confirmed with Herb Sutter, but he is willing
// to look into making this work (and include something officially for the case above).

Lifetimes and the borrow checker in Rust

Rust code that passes type checking and does not use unsafe is guaranteed to be memory safe. Our proposed lifetime annotations are heavily inspired by Rust, but they don’t catch all memory safety problems in C++ code. Specifically:

  • Lifetimes don’t help with statically proving spatial memory safety (that all reads/writes are in bounds). This is expected, since lifetime annotations and the borrow checker in Rust don’t help with spatial memory safety either. Instead Rust relies on runtime bounds checking and API design that makes accesses in-bounds by construction (for example, range-based for loops).

  • The proposed static analysis for C++ is not a borrow checker. It does not enforce Rust’s borrowing rule 1: “At any given time, you can have either one mutable reference or any number of immutable references.”

Enforcing the borrowing rule is a critical component of Rust’s memory safety guarantee. For example, memory safety bugs caused by iterator invalidation are not caught by lifetime annotations alone.

For example, the following code passes lifetime verification, but it contains a possible use-after-free (it might or might not happen at runtime depending on the implementation details of std::vector):

#include <iostream>
#include <vector>

int main() {
  std::vector<int> xs = { 10, 20, 30 };
  auto it = xs.cbegin();
  xs.push_back(40);
  std::cout << *it; // possible use-after-free: dereferencing an iterator that was invalidated
}

The Rust compiler would reject the equivalent Rust code because xs.push_back() needs to borrow xs mutably within the live region of the variable it, which borrows xs immutably.

Unfortunately, C++ iterators seem to be incompatible with Rust’s borrowing rule, since the vast majority of algorithms operate on pairs of non-const iterators borrowed from the same container.

To summarize, enforcing the borrowing rule in C++ is unfortunately not so simple because there is a lot of existing code that creates multiple non-const pointers or references to the same object, intentionally violating the borrowing rule. At this point we don’t have a plan of how we could incrementally roll out the borrowing rule to existing C++ code, but it is a very interesting direction for future work.

Appendix A: std::string_view annotated with lifetimes

As an example of real-world code with our proposed lifetime annotations, here is an annotated version of representative parts of `std::string_view`.
namespace std {

template<class _CharT, class _Traits = char_traits<_CharT> >
    class basic_string_view;

typedef basic_string_view<char>     string_view;

template<class _CharT, class _Traits>
class LIFETIME_PARAM(s) basic_string_view {
public:
    // types
    LIFETIME_PARAM(d)  typedef _CharT* $d                pointer;
    LIFETIME_PARAM(d)  typedef const _CharT* $d          const_pointer;
    LIFETIME_PARAM(d)  typedef _CharT& $d                reference;
    LIFETIME_PARAM(d)  typedef const _CharT& $d          const_reference;
    LIFETIME_PARAM(d)  typedef const_pointer $d          const_iterator;
    LIFETIME_PARAM(d)  typedef const_iterator $d         iterator;
    LIFETIME_PARAM(d)  typedef std::reverse_iterator<const_iterator $d>   const_reverse_iterator;

    typedef _Traits                                      traits_type;
    typedef _CharT                                       value_type;
    typedef size_t                                       size_type;
    typedef ptrdiff_t                                    difference_type;
    static _LIBCPP_CONSTEXPR const size_type npos = -1; // size_type(-1);

    basic_string_view();
    basic_string_view(const basic_string_view $s & __s);
    basic_string_view $s & operator=(const basic_string_view $s &);
    basic_string_view(const _CharT* $s __s, size_type __len);
    basic_string_view(const _CharT* $s __s);


    const_iterator $s begin() const;
    const_iterator $s end() const;
    const_pointer $s data() const;


    const_reference $s operator[](size_type __pos) const;
    basic_string_view $s substr(size_type __pos = 0, size_type __n = npos) const;

    void remove_prefix(size_type __n);
    void remove_suffix(size_type __n);

    void swap(basic_string_view $s &__other);

    // copy() and find() don't allow their arguments to escape, therefore their lifetimes
    // are independent of $s.
    // According to lifetime elision rules, they don't need an explicit annotation.
    size_type copy(_CharT* __s, size_type __n, size_type __pos = 0) const;
    size_type find(const _CharT* $t __s, size_type __pos, size_type __n) const;

private:
    const   value_type* $s __data;
    size_type              __size;
};

} // namespace std

Appendix B: Examples of lifetimes inferred by the current experimental implementation

This appendix contains a selection of functions that illustrate the range of C++ language constructs on which our current experimental implementation can automatically infer lifetimes.

The input to the lifetime inference algorithm is the unannotated source code. All lifetime annotations below were automatically inferred from the function implementations.

A simple example to get started

int* $a get_lesser_of(int* $a a, int* $a b) {
  return *a < *b? a : b;
}

Lifetime inference is flow-sensitive

int* $p target(int* $p p, int* a, int* $p b) {
  // Note: `int* a` is not annotated. The lifetime elision rules imply that it has a
  // unique lifetime different from `$p`.
  for (int i = 0; i < *a; i++) {
    p = a;
    p = b;
  }
  return p;
}

Lifetime inference for class template arguments

template <typename A>
struct S { A array; };

void target(S<int* $a *>* s, int* $a p, int* $a q) {
  s->array[0] = p;
  s->array[1] = q;
}

Lifetime inference for variadic class template arguments

template <int idx, typename... Args> struct S {};
template <int idx, typename T, typename... Args>
struct S<idx, T, Args...> {
  T t;
  S<idx+1, Args...> nested;
};

template <typename... Args>
struct tuple: public S<0, Args...> {};

int*$a target(tuple<int*, int* $a>& s) {
  return s.nested.t;
}

Lifetime inference for nested class templates

template <typename T>
struct R {
  R(T t) : t(t) {}
  T t;
};

bool some_condition();

template <typename T>
struct S {
  S(T a, T b) : r(some_condition() ? R(a) : R(b)) {}
  R<T> r;
};

int* $a target(int* $a a, int* $a b) {
  S<int*> s(a, b);
  return s.r.t;
}

// The algorithm infers the following lifetimes for class template instantiations
// (which cannot be annotated directly in the code):
// R<int* $a>::R(int* $a) $b
// S<int* $a>::S(int* $a, int* $a) $b

Appendix C: How lifetime annotations help static analysis better understand the object graph and potential mutations

Lifetime annotations can help static analysis tools in general better understand how a function call may mutate the object graph.

As an example, say we want to implement a static analysis that detects unchecked unwraps of std::optional. Here is an example program:

struct A {
  std::optional<int> opt_int;
};
struct B { … };

void MutateAB(A* a, B* b);
void MutateB(B* b);
void Use(int x);

void Target() {
  A a;
  B b;
  MutateAB(&a, &b);
  if (a.opt_int.has_value()) {
    MutateB(&b);
    Use(*a.opt_int); // Safe?
  }
}

Many programmers will say that accessing the value of the optional in Use(*a.opt_int) is safe because it is protected by the if (... has_value …) check, and the MutateB(&b) call does not change a.

However, MutateAB(&a, &b) could have stored a pointer to a inside b. Subsequently, MutateB(&b) could have cleared a.opt_int, invalidating the if (... has_value…) check.

A sound static analysis must therefore warn that Use(*a.opt_int) is not safe, but many users will flag this warning in their code as a false positive, because in practice modern C++ code rarely has this kind of action-at-a-distance.

Note that even (unsoundly) assuming absence of global variables does not help here, since no global variables are involved. To eliminate this false positive we need to assume that the object graphs reachable from a and b are disjoint. A scalable, local analysis can’t gather enough evidence from the program to make such assumptions on a solid basis.

Lifetime annotations allow the programmer to express the possible mutations to the object graph in a machine-readable way. If B can point to A and MutateAB() sets this pointer, the code can express it with lifetime annotations:

// Indicate that the lifetimes implied by elision rules are indeed correct.
#pragma clang lifetime_elision

struct A {
  std::optional<int> opt_int;
};

struct B [[clang::lifetime_param(a)]] {
  std::vector<A* $a> helpers;
};

// Lifetime annotations express that the object graph behind the pointer `b` may point to `a`:
void MutateAB(A* $a a, B $a * $b b);

// Or, equivalently, using lifetime elision shorthand syntax:
void MutateAB(A* $a a, B $a * b);

void MutateAB(A* a, B* b) {
  b->helpers.push_back(a);
}

Furthermore, the Clang-Tidy check that verifies that the implementation of MutateAB follows its lifetime contract would reject any other lifetime annotations. In other words, lifetime annotations are not just a promise equivalent to comments; they are checked and can be relied upon.

If, conversely, B can’t point to A – the common case that many engineers expect – the original code without explicit annotations already expresses the right semantics:

// Indicate that the lifetimes implied by elision rules are indeed correct.
#pragma clang lifetime_elision

struct A {
  std::optional<int> opt_int;
};

// Absence of lifetime parameters on `B` means that it can't point to other objects
// in the object graph that it does not own.
struct B { … };

// Lifetime annotations express that object graphs behind pointers `a` and `b` are unrelated:
void MutateAB(A* $a a, B* $b b);

// Or, equivalently, using lifetime elision shorthand syntax:
void MutateAB(A* a, B* b);

Appendix D: How lifetime annotations help C++/Rust interoperability

References and pointers in Rust

Rust provides two kinds of indirections, references and pointers, that have different semantics:

References

  • References are safe. Each reference has a lifetime associated with it. For example, a reference to a 32-bit integer with lifetime ’a is written &’a i32. The lifetimes allow the borrow checker to verify that references are used in a memory-safe way.
  • References are non-nullable. Nullability can be added explicitly where necessary by using the Option<T> type, for example Option<&'a i32>.
  • References are ergonomic. Rust’s syntax and libraries are optimized for using references most of the time.
  • References are idiomatic. Rust programmers prefer to use references in their code as much as possible.

Pointers

  • Pointers are unsafe. They don’t carry lifetime information. For example, a non-mutable pointer to a 32-bit integer is simply written *const i32. The borrow checker cannot verify that pointers are used in a memory-safe way.
  • Pointers are nullable. To express a non-null constraint one must add an annotation.
  • Pointers lead to non-ergonomic code. For example, verbose casts are required to convert between references and pointers. To convert a pointer x to a reference one must write unsafe {&*x}.
  • Pointers are non-idiomatic. Rust programmers avoid using pointers.

C++/Rust interoperability without lifetime annotations in C++

Let’s say we want to call the following C++ function from Rust:

// C++:
const int& smaller(const int& x, const int& y);

This function signature does not explain the lifetime contract. A tool that generates C++/Rust bindings based on C++ headers (for example, bindgen) has no choice but to declare smaller() using unsafe pointers that don’t have a Rust lifetime:

// Rust bindings (automatically generated):
extern "C" {
  pub fn smaller(x: *const i32, y: *const i32) -> *const i32;
}

Rust code can now call smaller(), but callers must use unsafe pointers:

// Rust caller of C++ `smaller()` function that does not have lifetime annotations:
fn user() {
  let x = 10;
  let y = 5;
  let m = unsafe { smaller(&x, &y) };
  println!("smaller({x}, {y}) is {}", unsafe{*m});
}

C++/Rust interoperability with lifetime annotations in C++

Now let’s annotate smaller() with lifetimes on the C++ side:

// C++:
const int& $a smaller(const int& $a x, const int& $a y);

Equipped with this machine-readable lifetime information, a tool that generates C++/Rust bindings can define a safe Rust wrapper. This wrapper exposes safe Rust references and describes the lifetime contract of smaller() to the Rust borrow checker:

// Rust bindings (automatically generated):
pub fn smaller<'a>(x: &'a i32, y: &'a i32) -> &'a i32 {
  // Glue code to call C++ function through foreign-function interface omitted.
}

Now smaller() can be ergonomically called like any other safe Rust function:

// Rust caller of C++ `smaller()` function that is annotated with lifetimes:
fn user() {
  let x = 10;
  let y = 5;
  let m = smaller(&x, &y);
  println!("smaller({x}, {y}) is {m}");
}

Appendix E: "Contributing Extensions to Clang" Q&A

Here we answer the usual set of questions about contributing extensions to Clang (https://clang.llvm.org/get_involved.html)

Evidence of a significant user community

  • Large parts of the C++ community are interested in finding memory safety bugs in C++ code. This is evidenced by the popularity of dynamic analysis tools such as AddressSanitizer 1 and UndefinedBehaviorSanitizer, used in combination with manually written tests and fuzzing.

  • Finding memory safety bugs statically is also very interesting to users, since it allows bugs to be found before the tests are written and run. This interest is evidenced by Clang’s existing efforts in this area: -Wreturn-stack-address, -Wdangling, and -Wdangling-gsl. The latter two warnings are based on a partial implementation of the WG21 proposal P1179. All of these warnings have been on by default for a few years, have received little to no pushback from users, and have proven themselves valuable by finding quite a few bugs (based on our experience running them on our internal codebases).

  • Interest in source code annotations that help statically finding memory safety bugs is evidenced by P1179 itself, which has been partially implemented in Clang for a few years.

  • Interoperability between C++ and other languages is desired by some C++ users. For example, https://cxx.rs/ 15 is a relatively popular crate for C++/Rust interop (750K+ downloads on crates.io 1 (https://crates.io/crates/cxx) ), and C++/Swift interop has been worked on for a few years already. However, the fact that pointers and references in C++ APIs have unclear ownership and lifetime semantics presents a huge obstacle to automatic, ergonomic, safe bridging of C++ to other languages. Due to this issue, for example, cxx.rs does not support borrowed data as much as one could desire to bridge many idiomatic C++ APIs to Rust.

A specific need to reside within the Clang tree

  • Clang-Tidy is one of the industry standard static analysis tools, integrated into many workflows and IDEs (both free and commercial). Having our proposed analysis integrated into Clang-Tidy will allow interested engineers to run it much more easily than with an out-of-tree tool.

  • The only change to core Clang we are proposing is a general-purpose type annotation attribute that is not specific to the lifetime analysis. The lifetime analysis itself is kept separate in Clang-Tidy.

  • We believe that the biggest impact from the proposed static analysis could be realized if it was included into the core compiler as a warning. We are not ready to propose this yet because we are still experimenting with the semantics of the annotations and need to collect feedback from early adopters.

Specification

At this point, we are still experimenting with the semantics of the annotations. This document includes a high-level overview. We will be committing more detailed design docs and specifications together with the implementation, but they will be in flux for some time.

Representation within the appropriate governing organization

We believe it is too early to ask this question. In principle, the existence of P1179 shows that WG21 has some interest in this kind of annotations.

A long-term support plan

If the experimentation confirms that this type of annotations and static analysis is useful in practice, maintaining them is very similar to maintaining any other Clang-Tidy check: organizations and individuals that enable it for their codebases will do the maintenance work.

A high-quality implementation with a test suite

We will be contributing a high-quality implementation with extensive tests. This is in our best interest since the burden is on us to show that this style of lifetime annotations is worth the added complexity for engineers reading and writing C++.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK