7

dlang adds borrowchecker

 3 years ago
source link: https://dlang.org/changelog/2.092.0.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Compiler changes

  1. CLI switches -revert=import and -transition=checkimports have been removed

    Those switched were already not doing anything and had been deprecated for a while. The compiler will no longer recognized them.

  2. Added support for mangling C++'s GNU ABI tags

    GNU ABI tags are a feature that was added with C++11 in GCC 5.1 . In order for D to fully support the standard C++ library , DMD now recognize the special UDA gnuAbiTag , declared in core.attribute and publicly aliased in object (so one need not import anything to use it). The ABI tags are a low level feature that most user will not need to interact with, but can be used to bind to C++ libraries that need it. In particular, it is required to bind std::string when targeting C++11 and higher (DMD switch -extern-std={c++11,c++14,c++17} ).

    It can be used in the following way:

    extern(C++):
    @gnuAbiTag("tagOnStruct")
    struct MyStruct {}
    @gnuAbiTag("Multiple", "Tags", "On", "Function")
    MyStruct func();

    Only one gnuAbiTag can be present on a symbol at a time. The order of the array entries does not matter (they are sorted on output). The UDA will only have an effect if -extern-std=c++11 or higher is passed to the compiler. The default ( -extern-std=c++98 ) will ignore the UDA. This UDA can only be applied to extern(C++) symbols and cannot be applied to namespaces.

  3. Module constructors and destructors which are not extern(D) are deprecated

    Module constructors and destructors (shared or not) could be marked with a different linkage than extern(D) , which would affect their mangling. Since such a mangling is simple and predictable, there was a very small chance of conflict if two same kind of constructor/destructors were declared in similar condition, for example if the third module constructor in module a was on line 479 and the third module constructor in module b was also on line 479, they would have the same mangling.

    While it's unlikely that such a bug is triggered in practice, affected symbols will now trigger a deprecation message.

  4. DIP25 violations will now issue deprecations by default

    DIP25 has been available sincev2.067.0, first as its own switch, and more recently under the -preview=dip25 switch. The feature is now fully functional and has been built on, for example by DIP1000.

    Starting from this release, code that would trigger errors when -preview=dip25 is passed to the compiler will also trigger a deprecation message without -preview=dip25 . The behavior of the switch is unchanged (errors will still be issued).

    DIP25 aims to make it impossible for @safe code to refer to destructed object. In practice, functions and methods returning a ref to their parameter might be required to qualify the method or the parameter as return , as hinted by the compiler.

    struct Foo
    {
        int x;
        // returning `this.x` escapes a reference to parameter `this`, perhaps annotate with `return`
        ref int method() /* return */ { return this.x; }
    }
    // returning `v` escapes a reference to parameter `v`, perhaps annotate with `return`
    ref int identity(/* return */ ref int v) { return v; }

    In both cases, uncommenting the return annotation will appease the compiler. The complete description of DIP25 can be foundhere.

  5. Prototype Ownership/Borrowing System for Pointers

    An Ownership/Borrowing (aka OB) system for pointers can guarantee that dereferenced pointers are pointing to a valid memory object.

    Scope of Prototype OB System

    This is a prototype OB system adapted to D. It is initially for pointers only, not dynamic arrays, class references, refs, or pointer fields of aggregates. Adding support for such adds complexity, but does not change the nature of it, hence it is deferred to later. RAII objects can safely manage their own memory, so are not covered by OB. Whether a pointer is allocates memory using the GC or some other storage allocator is immaterial to OB, they are not distinguished and are handled identically.

    The system is only active in functions annotated with the @live attribute. It is applied after semantic processing is done as purely a check for violations of the OB rules. No new syntax is added. No change is made to the code generated. If @live functions call non- @live functions, those called functions are expected to present an @live compatible interface, although it is not checked. if non- @live functions call @live functions, arguments passed are expected to follow @live conventions.

    The OB system will detect as errors:

    • dereferencing pointers that are in an invalid state
    • more than one active pointer to a mutable memory object

    It will not detect attempts to dereference null pointers or possibly null pointers. This is unworkable because there is no current method of annotating a type as a non- null pointer.

    Core OB Principle

    The OB design follows from the following principle:

    For each memory object, there can exist either exactly one mutating pointer to it, or multiple non-mutating (read-only) pointers.

    Design

    The single mutating pointer is called the "owner" of the memory object. It transitively owns the memory object and all memory objects accessible from it (i.e. the memory object graph). Since it is the sole pointer to that memory object, it can safely manage the memory (change its shape, allocate, free and resize) without pulling the rug out from under any other pointers (mutating or not) that may point to it.

    If there are multiple read-only pointers to the memory object graph, they can safely read from it without being concerned about the memory object graph being changed underfoot.

    The rest of the design is concerned with how pointers become owners, read only pointers, and invalid pointers, and how the Core OB Principle is maintained at all times.

    Tracked Pointers

    The only pointers that are tracked are those declared in the @live function as this , function parameters or local variables. Variables from other functions are not tracked, even @live ones, as the analysis of interactions with other functions depends entirely on that function signature, not its internals. Parameters that are const are not tracked.

    Pointer States

    Each pointer is in one of the following states:

    Undefined The pointer is in an invalid state. Dereferencing such a pointer is an error. Owner The owner is the sole pointer to a memory object graph. An Owner pointer normally does not have a scope attribute. If a pointer with the scope attribute is initialized with an expression not derived from a tracked pointer, it is an Owner.

    If an Owner pointer is assigned to another Owner pointer, the former enters the Undefined state.

    Borrowed A Borrowed pointer is one that temporarily becomes the sole pointer to a memory object graph. It enters that state via assignment from an owner pointer, and the owner then enters the Lent state until after the last use of the borrowed pointer.

    A Borrowed pointer must have the scope attribute and must be a pointer to mutable.

    Readonly A Readonly pointer acquires its value from an Owner. While the Readonly pointer is live, only Readonly pointers can be acquired from that Owner. A Readonly pointer must have the scope attribute and also must not be a pointer to mutable.

    Lifetimes

    The lifetime of a Borrowed or Readonly pointer value starts when it is first read (not when it is initialized or assigned a value), and ends at the last read of that value.

    This is also known as Non-Lexical Lifetimes .

    Pointer State Transitions

    A pointer changes its state when one of these operations is done to it:

    • storage is allocated for it (such as a local variable on the stack), which places the pointer in the Undefined state
  6. initialization (treated as assignment)
  7. assignment - the source and target pointers change state based on what states they are in and their types and storage classes
  8. passed to an out function parameter (changes state after the function returns), treated the same as initialization
  9. passed by ref to a function parameter, treated as an assignment to a Borrow or a Readonly depending on the storage class and type of the parameter
  10. returned from a function
  11. it is passed by value to a function parameter, which is treated as an assignment to that parameter.
  12. it is implicitly passed by ref as a closure variable to a nested function
  13. the address of the pointer is taken, which is treated as assignment to whoever receives the address
  14. the address of any part of the memory object graph is taken, which is treated as assignment to whoever receives that address
  15. a pointer value is read from any part of the memory object graph, which is treated as assignment to whoever receives that pointer
  16. merging of control flow reconciles the state of each variable based on the states they have from each edge

Limitations

Being a prototype, there are a lot of aspects not dealt with yet, and won't be until the prototype shows that it is a good design.

Bugs

Expect lots of bugs. Please report them to bugzilla and tag with the "ob" keyword. It's not necessary to report the other limitations that are enumerated here.

Class References and Associative Array References are not Tracked

They are presumed to be managed by the garbage collector.

Borrowing and Reading from Non-Owner Pointers

Owners are tracked for leaks, not other pointers. Borrowers are considered Owners if they are initialized from other than a pointer.

@live void uhoh()
{
    scope p = malloc();  // p is considered an Owner
    scope const pc = malloc(); // pc is not considered an Owner
} // dangling pointer pc is not detected on exit

It doesn't seem to make much sense to have such pointers as scope , perhaps this can be resolved by making such an error.

Pointers Read/Written by Nested Functions

They're not tracked.

@live void ohno()
{
    auto p = malloc();

    void sneaky() { free(p); }

    sneaky();
    free(p);  // double free not detected
}

Exceptions

The analysis assumes no exceptions are thrown.

@live void leaky()
{
    auto p = malloc();
    pitcher();  // throws exception, p leaks
    free(p);
}

One solution is to use scope(exit) :

@live void waterTight()
{
    auto p = malloc();
    scope(exit) free(p);
    pitcher();
}

or use RAII objects or call only nothrow functions.

Lazy Parameters

These are not considered.

Quadratic Behavior

The analysis exhibits quadratic behavior, so keeping the @live functions smallish will help.

Mixing Memory Pools

Conflation of different memory pools:

void* xmalloc(size_t);
void xfree(void*);

void* ymalloc(size_t);
void yfree(void*);

auto p = xmalloc(20);
yfree(p);  // should call xfree() instead

is not detected.

This can be mitigated by using type-specific pools:

U* umalloc();
void ufree(U*);

V* vmalloc();
void vfree(V*);

auto p = umalloc();
vfree(p);  // type mismatch

and perhaps disabling implicit conversions to void* in @live functions.

Variadic Function Arguments

Arguments to variadict functions (like printf ) are considered to be consumed. While safe, this doesn't seem to be very practical, and will likely need revisiting.

Added -preview=in to make the in storage class mean scope const .

Although technically defined to be const scope , the in storage class has never been implemented as such until this preview switch. With the implementation now done, in should be the storage class of choice for purely input function parameters.

Without -preview=in , these two declarations are equivalent:

void fun(in int x);
void fun(const int x);

With -preview=in , these two declarations are equivalent:

void fun(in int x);
void fun(scope const int x);
Validate printf and scanf (variants too) arguments against format specifiers

Follows the C99 specification 7.19.6.1 for printf and 7.19.6.2 for scanf.

For printf, it takes a generous, rather than strict, view of compatiblity. For example, an unsigned value can be formatted with a signed specifier.

For scanf, it takes a strict view of compatiblity.

Diagnosed incompatibilities are:

  1. incompatible sizes which will cause argument misalignment
  2. deferencing arguments that are not pointers
  3. insufficient number of arguments
  4. struct arguments
  5. array and slice arguments
  6. non-pointer arguments to s specifier
  7. non-standard formats
  8. undefined behavior per C99

Per the C Standard, extra arguments are ignored.

No attempt is made to fix the arguments or the format string.

In order to use non-Standard printf/scanf formats, an easy workaround is:

printf("%k\n", value);  // error: non-Standard format k
const format = "%k\n";
printf(format.ptr, value);  // no error

Most of the errors detected are portability issues. For instance,

string s;
printf("%.*s\n", s.length, s.ptr);
printf("%d\n", s.sizeof);
ulong u;
scanf("%lld%*c\n", &u);

should be replaced with:

string s;
printf("%.*s\n", cast(int) s.length, s.ptr);
printf("%zd\n", s.sizeof);
ulong u;
scanf("%llu%*c\n", &u);

Printf-like and scanf-like functions are detected by prefixing them with pragma(printf) for printf-like functions or pragma(scanf) for scanf-like functions.

In addition to the pragma, the functions must conform to the following characteristics:

  1. be extern (C) or extern (C++)
  2. have the format parameter declared as const(char)*
  3. have the format parameter immediately precede the ... for non-v functions, or immediately precede the va_list parameter (which is the last parameter for "v" variants of printf and scanf )

which enables automatic detection of the format string argument and the argument list.

Checking of "v" format strings is not implemented yet.

Environment variable SOURCE_DATE_EPOCH is now supported

The environment variable SOURCE_DATE_EPOCH is used for reproducible builds . It is an UNIX timestamp (seconds since 1970-01-01 00:00:00), as described here . DMD now correctly recognize it and will set the __DATE__ , __TIME__ , and __TIMESTAMP__ tokens accordingly.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK