4

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

 2 years ago
source link: https://lwn.net/Articles/888693/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

[Posted March 21, 2022 by corbet]
Aria Beingessner points out a set of problems with Rust's conception of unsafe pointers and proposes some fixes in this highly detailed post.

Rust currently says this code is totally cool and fine:

    // Masking off a tag someone packed into a pointer:
    let mut addr = my_ptr as usize;
    addr = addr & !0x1; 
    let new_ptr = addr as *mut T;
    *new_ptr += 10;

This is some pretty bog-standard code for messing with tagged pointers, what’s wrong with that? [...]

For this to possibly work with Pointer Provenance and Alias Analysis, that stuff must pervasively infect all integers on the assumption that they might be pointers. This is a huge pain in the neck for people who are trying to actually formally define Rust’s memory model, and for people who are trying to build sanitizers for Rust that catch UB. And I assure you it’s just as much a headache for all the LLVM and C(++) people too.


(Log in to post comments)

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 16:23 UTC (Mon) by smoogen (subscriber, #97) [Link]

Going from whenever I deal with pointers for too long, I feel this is a good quote of the week (from teh end of the blog)
```
I think about unsafe pointers in Rust a lot.

I wrote this all in one sitting and I really need dinner.

Head empty only pointers.
```

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 19:01 UTC (Mon) by wtarreau (subscriber, #51152) [Link]

Seems like the language is starting to face the real world of computers, in which absolutely nothing distinguishes a pointer from an integer, both are used and stored interchangeably in registers, it's only how they are _used_ that makes us consider one is arbitrarily a pointer or an integer. On x86 a number of instructions even support indirect memory accesses involving [reg1+reg2*scale+disp]. One would think that reg1 is always the pointer, reg2 an index and disp a relative displacement, but it can be used in any form, including as indexes from local variables whose address is known (hence the pointer in disp), and reg1/reg2 can easily be exchanged when the scale is 1.

IMHO that's what "volatile" and "register" address in C (and not in the most elegant way, admittedly). "volatile" may be aliased by anything and will always be reloaded when read. "register" may never be aliased at all and the compiler will happily optimise their accesses.

Ideally we'd need a simplified mechanism in a language to indicate that certain pointers may alias only their own type, nothing at all or everything, and that they may be aliased by the same factors. with this, developers could choose their constructs without having to worry about what the compiler does behind (exactly like they do in assembly). Having to pretend that something is a register to prevent it from being aliased is annoying and limited since you cannot take its pointer to pass it anywhere. But if we could say "this never aliases anything" some constructs could be more easily optimized. Maybe some scopes would be useful (sort of aliasing barriers for certain variables).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 19:11 UTC (Mon) by acarno (subscriber, #123476) [Link]

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 12:42 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

Interesting, thanks for the link, I wasn't aware that Ada did that. It would be interesting to compare evolutions of all such languages and the classes of bugs or the complexities in developing certain classes of programs.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 19:31 UTC (Mon) by pm215 (subscriber, #98099) [Link]

The parts of the article that talk about CHERI, on the other hand, are dealing with the real world of a computer that absolutely *is* distinguishing a pointer (which has the untamperable metadata saying it's a valid pointer) from an integer (which doesn't, even if the 64 bits of 'value' are the same, and will fault if you attempt to use it as a pointer)...

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 20:37 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

> "volatile" may be aliased by anything and will always be reloaded when read.

Officially, volatile gets you exactly nothing in terms of the abstract machine semantics and the strict aliasing rule. In practice, if all of the aliasing variables are volatile, it's unlikely that most "reasonable" compilers will have issues, but it's still UB and so the entire code path is still considered poisoned. It's possible that the compiler assumes, for example, that the code path in which the aliased write happens is never executed while the other alias exists, and therefore makes incorrect simplifying assumptions about the overall flow of control.

The purpose of volatile is to control memory-mapped I/O and other hardware that does "magic" stuff to your memory/address space. It is not to enact an end-run-around the strict aliasing rule.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 21:05 UTC (Mon) by walters (subscriber, #7396) [Link]

See e.g. https://www.ralfj.de/blog/2020/12/14/provenance.html which talks about this in the context of C and LLVM.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 22:40 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

The sentence "IMHO that's what..." trails off without ever reaching any conclusion, but perhaps that's appropriate since whatever you were going for here, no they don't.

It's true that you can't alias variables with storage class register because you can't take their address, although this was a relatively late addition (K&R C does not have this rule), but there are no rules about aliasing volatile at all, neither allowing nor forbidding.

The standard says that register is merely a hint [which today your compiler almost certainly ignores], that it be a good idea to put this variable in a register and serves no other purpose despite the restriction.

Meanwhile volatile serves only one clear purpose, you can use it to perform explicit stores and loads from a memory address, likely because you are doing MMIO. This is tricky to get right but since C doesn't provide any intrinsics for this purpose it's the only way that's even a little bit portable. All other uses of volatile are platform specific (in the good cases) or just voodoo / cargo cult C, sprinkled on by people who are hoping maybe the bug goes away if they write volatile in more places.

> But if we could say "this never aliases anything" some constructs could be more easily optimized.

Which is why (safe) Rust gets to go very fast. But attempting to retro-fit this to a language like C is impractical.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 22:53 UTC (Mon) by NYKevin (subscriber, #129325) [Link]

>> But if we could say "this never aliases anything" some constructs could be more easily optimized.
Which is why (safe) Rust gets to go very fast. But attempting to retro-fit this to a language like C is impractical.

To be fair, C does have the restrict keyword. But that's more or less the opposite of register or Rust's borrow checker (i.e. instead of the type system preventing aliasing from happening and promising the programmer that it has done so, the programmer prevents aliasing from happening and promises the type system that they have done so), and this arguably makes it less useful in more complicated cases.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 9:21 UTC (Tue) by roc (subscriber, #30627) [Link]

'restrict' seems like a nightmare in practice. You'll need a whole new sanitizer to detect restrict violations, and you'll still have restrict violations in untested code paths. Violations of 'restrict' will be painful to track down.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 13:04 UTC (Tue) by immibis (subscriber, #105511) [Link]

Everything in C is like that. I assume that C programmers are accustomed to thinking very slowly and carefully about their programs.

Who am I kidding.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 12:53 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

> To be fair, C does have the restrict keyword. But that's more or less the opposite of register or Rust's borrow checker

Kevin, could you please explain me "restrict" ? I've started to see it a few years ago in includes and man pages, and all info I've read on it were incomprehensible to me. I've always been interested in strong typing (and am using const a lot). I'd like to know if "restrict" may bring me anything at all or if I shouldn't care.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 14:05 UTC (Tue) by farnz (subscriber, #17727) [Link]

"restrict" is a promise from the programmer to the compiler, which is why it's a pain to understand.

Using the following names for exposition:

int foo[16];
int * foo_ptr = &foo[0];
int * restrict foo_restrict = foo;

With foo_ptr, the programmer makes no promises about aliasing. There can be a second pointer to any element of foo, and you can use foo[2] and *(foo_ptr + 2) interchangeably.

"restrict" makes a promise to the compiler about using overlapping names, and hence a promise that no aliasing is used for as long as the "restrict" pointer is alive. For as long as foo_restrict is alive, you promise not to access foo directly, or via foo_ptr, and you promise that if you use *(foo_restrict + 4), you have not accessed foo[4] any other way since foo_restrict was initialized, and that you will not access it any other way (e.g. via foo[4], or *(foo_ptr + 4)) until the lifetime of foo_restrict ends.

The usual concrete example is memcpy versus memmove; the inputs to memcpy are "restrict" pointers, because if you do memcpy(foo, bar, 16 * sizeof(foo[0]));, you promise the compiler that until memcpy returns, *(foo + 0) through *(foo + 15) cannot be accessed via *(bar + offset). memmove, on the other hand, permits that overlap, so its input pointers cannot be marked restrict.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 14:55 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

Many thanks. That's exactly the explanation I was missing, and your memcpy() vs memmove() example is point on!
I think I'm seeing a few cases where that could help, especially when some asm() statements are used and the
compiler cannot figure that some values cannot have changed there. At least now I know what to look for and
how to experiment.
Thank you!

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 4:11 UTC (Wed) by marcH (subscriber, #57642) [Link]

I don't see much in this thread that wasn't already on https://en.wikipedia.org/wiki/Restrict (and probably elsewhere) but if you do then please go and edit that page.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 10:34 UTC (Wed) by wtarreau (subscriber, #51152) [Link]

It just turns out that wikipedia is not exactly the first place that comes to my mind when searching for the definition of a language keyword :-) But indeed it seems clear enough there as well.

What I previously found was this: https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Restricted-... but it wasn't very clear to me. Of course there's nothing wrong in it, it's just that when the use cases are unclear to you they can remain unclear after reading the doc.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 21, 2022 23:29 UTC (Mon) by jlombera (guest, #155698) [Link]

> Meanwhile volatile serves only one clear purpose, you can use it to perform explicit stores and loads from a memory address, likely because you are doing MMIO. This is tricky to get right but since C doesn't provide any intrinsics for this purpose it's the only way that's even a little bit portable. All other uses of volatile are platform specific (in the good cases) or just voodoo / cargo cult C, sprinkled on by people who are hoping maybe the bug goes away if they write volatile in more places.

When accessing/modifying shared memory between processes/threads, volatile is some time the right thing to do to ensure stores/loads to/from memory. Thus it's not limited to MMIO.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 1:26 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

The Right Thing™ in your scenario is atomics. Asking for volatile and expecting atomics is mostly not dangerous on Windows, using Microsoft's C++ compiler.

On other platforms, with other compilers, you get what you asked for, not what you expected. Maybe you get lucky and maybe you don't. Maybe if you get unlucky you can write "volatile" in a few extra places and now it works. Voodoo.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 1:47 UTC (Tue) by jlombera (guest, #155698) [Link]

I don't see how this is atomics. Sure you might want/need atomics/synchronizations in addition to volatile is some cases, but the use of volatile in this case is to ask the compiler not to optimize access to certain memory, always go to memory.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 3:34 UTC (Tue) by mrugiero (guest, #153040) [Link]

You probably still need atomics because "go to memory" in pretty much all non-microcontroller hardware really means "go to cache", and if you're sharing with other threads/processes you really want to synchronize the data, not just makes sure it gets out of the register. Your C program typically doesn't know about nor can control how cache gets flushed and sync'd. You may not notice the problem in x86 because MESI solves it behind the scenes, but that's not universal.
For MMIO it works because the OS can mark a page as cache disabled so it goes straight to "memory" (which really is a mapped device).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 4:16 UTC (Tue) by jlombera (guest, #155698) [Link]

> Your C program typically doesn't know about nor can control how cache gets flushed and sync'd.

The processor knows, though. All the compiler needs to do is not to register-optimize and emit memory access instructions instead, the processor takes care of maintaining cache coherence.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 4:45 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

To the best of my understanding, this is somewhat true on x86, and not at all true on basically any (modern) architecture other than x86 (e.g. ARM).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 5:02 UTC (Tue) by jlombera (guest, #155698) [Link]

I can assure you "volatile" works as described on at least X64, IA64, SPARC, IBM Z (don't remember the actual name of the arch) using different compilers (GCC, Clang, MSVC, ICC, ACC, Solaris Studio, XLC).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 11:29 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

"Volatile" is a C qualifier but the CPU doesn't run C it runs machine code. And so your volatile storage qualifier is long gone by the time the CPU is running your program.

You can take a look at a toy example with Godbolt, and see for yourself what the compiler actually tells your CPU to do, on x86 (and x86-64) you get Acquire/Release semantics (but not full consistency) "for free" (in fact everybody is paying for these semantics, all the time on this platform so it's only "free" the same way the ice is "free" with a $5 coke in a restaurant) but on other platforms if you don't see the CPU being told to do this work it's not doing the work. Maybe you get away with it, and maybe you don't. You are gambling every time.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 16:14 UTC (Tue) by jlombera (guest, #155698) [Link]

>"Volatile" is a C qualifier but the CPU doesn't run C it runs machine code. And so your volatile storage qualifier is long gone by the time the CPU is running your program.

Again, "volatile" is to ask the *compiler* not to optimize access to memory in the generated binary code.

>You can take a look at a toy example with Godbolt, and see for yourself what the compiler actually tells your CPU to do

Sure, feel free to play with this (very contrived) example in godbolt.org (sorry for the formatting, I couldn't fine a way to make this work as a plain text comment):

```
void f(volatile int *x_p) {
while (!*x_p)
;
}

void g(int *x_p) {
while (!*x_p)
;
}
```

Feel free to play with different compilers, optimization levels, even different archs. You'll see that in every case, in the loop in f() *x_p is read from memory in every iteration, whereas for g(), different kind of optimizations are performed.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 17:13 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

> You'll see that in every case, in the loop in f() *x_p is read from memory in every iteration, whereas for g(), different kind of optimizations are performed.

Nobody is disputing that. We are telling you that the compiler will fail to emit acquire/release memory barrier instructions on non-x86 platforms, and without those, you get no cross-thread guarantees.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 18:27 UTC (Tue) by khim (subscriber, #9252) [Link]

> Again, "volatile" is to ask the *compiler* not to optimize access to memory in the generated binary code.

And that's not enough. Not even on x86. You know that, right?

Intel 8086 included lock prefix from the very beginning! And you can not force compiler to use it with volatile. End of story.

Yes, with C89 you had no choice but to use assembler with some volatile sprinkled here and there. C11 offers atomics which provide much more concise and usable semantic.

Don't use volatile except in kernel, please. It's not needed and harmful.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 20:13 UTC (Tue) by jlombera (guest, #155698) [Link]

Sorry, I think we just keep talking about different things. You keep bringing atomics, reordering and serialization issues to the discussion. I already conceded those might be required in addition (or instead of) to "volatile" depending on what you are trying to achieve (e.g. serialization), but those are not the use case of "volatile".

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 0:20 UTC (Wed) by excors (subscriber, #95769) [Link]

Could you give specific examples of use cases with shared memory between threads where volatile is sufficient?

The main thing I can think of is e.g. performance counters, where one thread updates them and another thread periodically reads them, and where you don't care if it reads slightly stale values (but not worse than a few usecs) or reads each counter in an unpredictable order. In that case, you do need something like volatile (to ensure the first thread doesn't hold the counter in a register for many usecs) but you don't need any further synchronisation guarantees. You also need atomic reads/writes, which I don't think volatile guarantees, but in practice it's probably okay if it's an aligned word-sized value.

Probably you could also do a simple form of mailboxes, where the producer thread does "while (m != 0) {}; m = 42;" and the consumer thread does "while (m == 0) {}; do_work(m); m = 0;", where (I think? but not certain) there are hopefully enough implicit dependencies that it will always behave as expected on any CPU. (But that won't work if you want to share more than a single word, because the mailbox message won't be synchronised with any other memory access.)

Those seem very niche cases, though. And you can easily do them with C++/C11 atomics using memory_order_relaxed (which adds no synchronisation barriers but does guarantee atomicity, like a more well-behaved volatile). I'm not aware of any drawbacks of memory_order_relaxed over volatile, and the benefit is it can be combined with acquire/release accesses (to the same variable or to others) for cases where synchronisation is important (which is nearly all cases involving shared memory).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 4:17 UTC (Wed) by marcH (subscriber, #57642) [Link]

> so it's only "free" the same way the ice is "free" with a $5 coke in a restaurant

BTW https://queue.acm.org/detail.cfm?id=3212479

> The cache coherency protocol is one of the hardest parts of a modern CPU to make both fast and correct. Most of the complexity involved comes from supporting a language in which data is expected to be both shared and mutable as a matter of course.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 9:05 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

> the processor takes care of maintaining cache coherence

The CPU doesn't promise sequential consistency, because doing so would make it (much) slower. So now your program doesn't have sequential consistency. This is inherently a very difficult environment in which to write programs at all, but C and C++ don't bother you with that trouble because both languages have the same rule about sequential consistency: If your program doesn't exhibit sequential consistency it instead has Undefined Behaviour and they wash their hands of you entirely.

Again, you can write "volatile" on some more variables and maybe you get lucky and on the CPU you're working with the extra spills cause a cache flush, or forces an extra wait cycle somewhere and it happens to mask the bug. And then maybe somebody buys a CPU with more L1 cache, or a different cache policy and now the mysterious bug is back. You are using the wrong tool for the job.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 15:45 UTC (Tue) by jlombera (guest, #155698) [Link]

Sure, you might need to resort to explicit memory fencing when you want to ensure sequential consistency, but this is not what volatile is about.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 22:38 UTC (Tue) by edeloget (subscriber, #88392) [Link]

> > Your C program typically doesn't know about nor can control how cache gets flushed and sync'd.

> The processor knows, though. All the compiler needs to do is not to register-optimize and emit
> memory access instructions instead, the processor takes care of maintaining cache coherence.

That would not work.

First of all, the obvious: it would fail on the multi-processor (physically separated) case because one thing a processor knows is not always known by the other guy. That would require N-to-N communication between the processors - and it would be a nightmare on systems where you have many nodes (up to 6x64 cores) that can share the same physical memory, such as the Chinese Sunway supercomputer.

And then it would be reeeeeeeeaaaaaaaaaaly slow. The reason why cache works this way is that the processor don't even try to find out if the underlying memory has changed (on a load) or if it should change (on a store). Because it does not check anything, it's fast. If you start to factor in multiple checks then you'll hit a performance wall quite soon.

That's exactly why we do this only when performing an atomic operation: we are willing to pay the performance cost in exchange for the information. This is not something we want to do on every load or store. And that's exactly why processors don't do it unless we explicitely tell them to do it. The application (either the OS or a user space program) knows when it shall make an atomic load or store. The processor cannot know it in advance and unless you make an explicit pledge the compiler cannot know either.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 8:47 UTC (Tue) by metan (subscriber, #74107) [Link]

Actually volatile must be used for any global variable modified from a signal handler. The most common pattern is to have volatile sig_atomic_t global flag set by the signal handler. I bet this is the most common use for volatile in userspace.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 13:25 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

Fair. AIUI POSIX promises only that this will work for volatile sig_atomic_t and there's no promise this will work for variables with another type even though in practice int will work on real hardware.

This all pre-dates a formal memory model, but it is promised in POSIX and so you are indeed welcome to rely on it on a POSIX system. Like making errno work the way the standard says it should, on modern systems this involves a considerable amount of extra lifting for your compiler and C library, but that work is done and so yes you might as well rely on it.

There's a lot of low-level code out there actually banging on MMIO far from any POSIX system and MMIO is, in fact by my understanding where volatile starts out (first C compilers are too naive to eliminate duplicate stores/ loads, as the optimiser improves it elides enough apparently useless loads and stores that now the device driver doesn't work, volatile qualifier tells the compiler not to optimise the loads and stores and now the device drivers work properly again), so if I was a betting man I might take the other side of your bet.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 12:51 UTC (Tue) by wtarreau (subscriber, #51152) [Link]

> The standard says that register is merely a hint [which today your compiler almost certainly ignores], that it be a good idea to put this variable in a register and serves no other purpose despite the restriction.

I know but there are few cases where it's still used. Trying to get the pointer from a variable declared as register will be instantly refused (which is great). Declaring a global variable with register (you're forced to indicate what register) will allow the compiler to optimize some operations because it knows the variable cannot change.
But I agree these are almost exceptions to the general rule that the compiler doesn't care much anymore.

> Meanwhile volatile serves only one clear purpose, you can use it to perform explicit stores and loads from a memory address, likely because you are doing MMIO.

The primary usage is for signals, even before MMIO. Userland code needs to use volatile and is certainly not fiddling with MMIO in general.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 18:00 UTC (Tue) by njs (guest, #40338) [Link]

> Seems like the language is starting to face the real world of computers, in which absolutely nothing distinguishes a pointer from an integer, both are used and stored interchangeably in registers, it's only how they are _used_ that makes us consider one is arbitrarily a pointer or an integer.

I think it's exactly the opposite: at the language level, C/C++/unsafe Rust all say that you're allowed to convert back and forth between integers and pointers, because the language designers had the same intuition you did – that's how the machine actually works, so it'll be fine.

But there are two problems:

- that's not actually how all machines work (like this exotic CHERI thing, or old-school segmented architectures)

- more importantly, even on common ISAs like x86 and ARM, it turns out that if you want a decent compiler, your front end needs to target a higher-level virtual machine where pointers *aren't* just integers. Of course they'll eventually get lowered to integers, but if you do that too early then it destroys your ability to do optimizations. So the status quo right now is that all compilers *actually* treat integers and pointers as fundamentally different, and they do it using a bunch of ad hoc heuristics that were never written down and the compiler engineers have been gradually realizing are actually incoherent and busted, even if they *mostly* work in practice.

So the problem is: how do we change the language and the compiler so that the code is efficient *and* the compiler rigorously implements the language semantics *and* the language semantics are understandable without a phd. And this means the language semantics need to treat pointers and integers as fundamentally different, while still giving enough tools to do all the weird pointer tricks you need in real systems.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 2:45 UTC (Tue) by atnot (subscriber, #124910) [Link]

I think the fact that it is even feasible to discuss changes like this is testament to the benefits a somewhat more restrictive but also more rigorously defined system can have. It's very hard to imagine, for example, the likes of C pointers being reworked to give them better formal properties.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 5:04 UTC (Tue) by jhoblitt (subscriber, #77733) [Link]

It is also relevant that there are not billions of lines of Rust code which have been "working" for decades. Languages tend to become ossified as the user base grows.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 13:01 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

Avoiding this problem as much as possible is the goal of Rust's Editions system, even for Unsafe, and that's called out in this article. Rust has seen the last of big sweeping changes like garbage collection, but Editions enable it to be more agile than languages like C or C++ without the trauma associated with something like Python 3.x

Importantly Editions respect time's arrow. You write new code the new way, and your old code is unchanged. 10M lines or 10B lines doesn't matter, you aren't required to touch any of it. But Editions change how people think about what's possible and that means both that more adventurous changes are considered (knowing Editions might make the change practicable), and so often changes which wouldn't have been conceived at all without editions, ultimately turn out not to be incompatible and so the benefits accrue to everybody, not just on new editions.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 16:40 UTC (Tue) by jhoblitt (subscriber, #77733) [Link]

Supporting multiple different language versions is not a Rust innovation. I don't know what language can claim the original invention but perl 5 had `use 5.024_001;` in the 90s.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:01 UTC (Tue) by khim (subscriber, #9252) [Link]

Technically nothing in Rust is Rust innovation and most ideas it uses were already old when it was conceived. Heck, it was presented to the world with words technology from the past come to save the future from itself!

But most compileable “mainstream” language are based on ideas so ancient that even these, pretty old and well-tested ideas are looking like some kind of radical revelation to C/C++/ObjectPascal/etc developers (Swift took some of these ideas, though).

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 7:16 UTC (Tue) by wahern (subscriber, #37304) [Link]

> It's very hard to imagine, for example, the likes of C pointers being reworked to give them better formal properties.

The very topic of discussion is literally evidence to the contrary. C already has intptr_t precisely to avoid the very problem Rust now has. (Relatedly, intptr_t is *optional* in Standard C, understanding that there might be some systems where data pointer to integer conversions aren't supportable; and Standard C doesn't support function pointer to integer conversions at all.) Moreover, FreeBSD has already been ported to CHERI, so claims that "real-world" C code is too riddled with non-standard pointer to integer conversions isn't very persuasive, particularly relative to Rust.

IIRC, porting the entire POSIX API to CHERI required only two significant changes: dlsym and signals. Both are areas where POSIX (much like Rust) required assumptions that Standard C doesn't. There was some ugliness related to memcpy, but Rust takes memcpy abuse to an entirely different level.

While C is far from the ideal language for a memory capability system, it certainly was more prepared for it than Rust. It's not surprising, though, as Rust was largely designed to workaround the lack of ABI- or ISA-enforced memory protections, whereas that possibility has always been at the back of the minds of C committee members. If you assume those things aren't on the horizon (and it's still not a given will see commercial success, let alone ubiquity), playing fast-and-loose with pointer types under the hood is an easy simplification. If common platforms implemented CHERI or something like it, Rust probably would have never been conceived in the first place.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 17:24 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

The point which you claim has been contradicted is that it's hard to imagine - "The likes of C pointers being reworked to give them better formal properties"

But you don't present any evidence of such a reworking, only that people have managed to run some C software on CHERI which of course you'd expect since CHERI has been under development for some time explicitly to run C software. Here's an excerpt from Cambridge's description of CHERI, "The CHERI memory-protection features allow historically memory-unsafe programming languages such as C and C++ to be adapted to provide strong, compatible, and efficient protection against many currently widely exploited vulnerabilities". Nothing in there about formal properties, no proposals to the ISO committee, instead they are being pragmatic, what choice do they have after decades of C programmers resolute disinterest.

> If common platforms implemented CHERI or something like it, Rust probably would have never been conceived in the first place.

If "something like it" managed to make the vague semantics of C's logical machine better match the reality of a modern computer by adjusting the computer instead, perhaps you'd even be right. Maybe if this had happened in the 1990s, the elevator Graydon was annoyed by in 2005 would have actually worked.

CHERI is a long way from this fantasy, many grave C problems are orthogonal to CHERI but are completely solved in (safe) Rust. Which doesn't make CHERI a bad idea, it just highlights that Graydon's problem wasn't something a lot the lines of "there's this one thing about C I don't like, so I guess I will write an entirely new programming language" but rather that systematically none of the useful lessons of past decades of programming language theory had been adopted into systems programming languages people actually use.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:06 UTC (Tue) by excors (subscriber, #95769) [Link]

> Nothing in there about formal properties, no proposals to the ISO committee, instead they are being pragmatic, what choice do they have after decades of C programmers resolute disinterest.

They appear to have plenty about formal properties at https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ch... , including papers like "Exploring C Semantics and Pointer Provenance" which adds CHERI-like semantics to a subset of C, based on (if I understand it correctly) the Cerberus tool which carefully transforms C programs into a 'Core' language that makes the memory model explicit and has well-defined operational semantics. The Core code can then be analysed for pointer provenance violations etc. And https://www.cl.cam.ac.uk/~pes20/cerberus/ lists many proposals submitted to ISO by that research group. (Of course they're still a long way from a complete semantics for C, despite working on this for well over a decade with many PhDs, so it's far from a solved (or even solvable) problem in general.)

(Hmm, actually Cerberus seems to be somewhat more relaxed than CHERI, because it doesn't require you to use intptr_t. See https://cerberus.cl.cam.ac.uk/?short/2eaa24 , select "Model > Integer provenance (PVI)", "Search > Random", and it complains of undefined behaviour when dereferencing the pointer, because it gets understandably confused about provenance. But comment out lines 9-10 (which are a noop in regular C) and it works okay, because it can still track provenance through the cast to long and back. If you step "Forward" enough times then you can see the allocation number associated with each pointer.)

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:30 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

C arguably solves this problem by ignoring it. The standard already says it's illegal to even evaluate an invalid not-null not-one-past-the-end pointer, regardless of whether you dereference it, and integer types are similarly allowed to have trap representations, so you just declare "all the weird CHERI stuff" to be UB or implementation details that "portable" code can't rely on, and then you've "ported" C to CHERI by making CHERI look like just another architecture. The most you get out of this is "If your program otherwise would have performed certain very specific kinds of UB, on CHERI it will *probably* trap at runtime instead." "Probably" because who knows what the optimizer will do. Contrast this with the much stronger guarantees you get out of safe Rust, where it will fail at compile time, every time.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 13:37 UTC (Tue) by uecker (guest, #157556) [Link]

There is a proposed technical specification which formulates this
for C and comes with precise formal semantics.

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2676.pdf

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:05 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

That paper does indeed acknowledge the necessity of provenance in C (and C++). But what they're doing is patching the hole in these standards - a hole made about twenty years ago, whereas the Rust article is about something a little different - that hole never existed in Rust. In _unsafe_ Rust the nomicon tells you already that if you violate the provenance rules that's undefined behaviour. Unlike safe Rust, but just like C, unsafe Rust doesn't _prevent_ you from picking some random bytes and insisting they're definitely a pointer to something, however you now have incoherent nonsense, don't do that (on CHERI you fault immediately in unsafe Rust or C if you do this).

Aria proposes that usize - Rust's built in unsigned integer type that's typically 64-bit on a modern computer - should formally be the same size as an address _not_ the same size as a pointer as it is defined today. As a side effect, something desirable (to me at least, but I believe others too) falls out, while we're acknowledging that a pointer isn't just an integer with intent, we abolish the (ab)use of as casts to turn one into the other. Instead the programmer is expected to write what they meant, e.g. ptr.with_addr(address) gets you a pointer (maybe 129-bits) made from an address (maybe 64-bits) plus your promise that what you are doing is OK. Did you lie? Same rules as before, now your program is meaningless.

The C proposal can't go around adding methods to pointers, not least because C doesn't have methods and if it did it wouldn't have them on pointers, it just changes the formal semantics of the language to acknowledge the practical need for provenance. Existing correct C will remain correct, the TS just says why it's correct (or rather, why other seemingly reasonable C that doesn't work is not correct).

Also I expect that the committee will nod wisely and say that they don't have time to take this up right now, but please bring it back again next time, which is roughly what it has been doing since at least 2016, if your plan is to wait for them to fix C rather than learn a new language, don't figure on that happening any time soon.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 13:26 UTC (Wed) by uecker (guest, #157556) [Link]

I do not understand why say you the hole does not exist in unsafe Rust. The article seems to indicate there are still problems in unsafe Rust of the same nature as we have addressed in the TS for C.

C could of course as easily add new way to combine a pointer with an address using some other syntax than a method. The problem with this is that it would break existing code (which for Rust somehow seems OK). The other problem is that the in most cases where you need to convert an integer to a pointer you do not have the pointer available, so you simply can not use ptr.with_addr(address). If you had the pointer you could also just to ptr + offset which is the same as ptr + (addr - base_addr), so I do not see how ptr.with_addr(address) solves the same problem.

C has a lot issues, but it also has many advantages: Widely supported, long-term stability, many existing tools, fast compilation, low complexity, emerging formal semantics, etc. And yes, it will take a long time fixing its many issues. It will also take a long time before Rust is ready (the long compile times and lack of stability rule it out for me at this time) and it is already too complex for my taste.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 15:10 UTC (Wed) by farnz (subscriber, #17727) [Link]

There are two important differences between Rust and C that make breaking changes in handling raw pointers more palatable on the Rust side:

  1. Use of raw pointers in Rust is already gated behind unsafe, and Rust style says to keep the use of unsafe to a minimum. This allows you to use data in papers like this ACM paper on Unsafe Rust to judge the maximum blast radius of changes - the data supports the idea that at most 25% of Rust code could be affected (as around 75% of published crates contain no Unsafe Rust), and that 3% of Rust code would be a good estimate of the amount of code affected by changes to raw pointers (about 10% of Unsafe Rust deals with raw pointers). In contrast, because of the nature of C, it's harder to tell how much C code is likely to be affected by any given change to pointer semantics.
  2. Rust's module system allows you to have maintained legacy code in an older edition and modern Rust in the same binary - I can link Rust 2015 code with Rust 2027 code, and the compiler will give the Rust 2015 code the semantics that went with Rust 2015, while giving me modern semantics for Rust 2027 code. #include in C means that I can't clearly delineate code that has modern semantics from code that doesn't, because some code has to have the "right" semantics whether it's #included into a compilation unit that has C99 semantics or whether it's #included into a compilation unit that has C27 semantics.

Of these, I think the former is the hard one to overcome; fixing the latter is something that can be done by a sufficiently smart C standard committee and compiler implementation team, while the former is about gathering statistics easily on which code might be affected.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 22, 2022 19:10 UTC (Tue) by khim (subscriber, #9252) [Link]

It's one of half-dozen attempts. It's not yet fully finished and it's not clear if it even can be finished and adopted.

Rust's issue lies precisely with the fact that there are no C or C++ memory model which can be used to write code which would then be actually compiled (yes, there are some memory model specified by standard, but we know that compilers are happy to break certain valid programs based on that memory model, examples are actually in the proposal you are linking).

If there would have been some memory model which would have matched what the actual compilers are doing unsafe Rust would have just used that. But there are nothing, just a DR260 resolutions which prompts compiler developers to develop something and include it in the standard… and lots of handwaving.

Beingessner: Rust's Unsafe Pointer Types Need An Overhaul

Posted Mar 23, 2022 13:26 UTC (Wed) by uecker (guest, #157556) [Link]

It is an attempt which addresses provenance and has been agreed on to become a technical specification. If compilers support it (they already do to a large extend - modulo some known optimizer bugs and some cases where they already do not follow the standard) and no serious objections come up this will likely be adopted to be the C semantics for provenance.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK