Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
source link: https://lwn.net/Articles/888693/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Rust currently says this code is totally cool and fine:
// Masking off a tag someone packed into a pointer: let mut addr = my_ptr as usize; addr = addr & !0x1; let new_ptr = addr as *mut T; *new_ptr += 10;
This is some pretty bog-standard code for messing with tagged pointers, what’s wrong with that? [...]
For this to possibly work with Pointer Provenance and Alias Analysis, that stuff must pervasively infect all integers on the assumption that they might be pointers. This is a huge pain in the neck for people who are trying to actually formally define Rust’s memory model, and for people who are trying to build sanitizers for Rust that catch UB. And I assure you it’s just as much a headache for all the LLVM and C(++) people too.
(Log in to post comments)
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 16:23 UTC (Mon) by smoogen (subscriber, #97) [Link]
```
I think about unsafe pointers in Rust a lot.
I wrote this all in one sitting and I really need dinner.
Head empty only pointers.
```
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 19:01 UTC (Mon) by wtarreau (subscriber, #51152) [Link]
IMHO that's what "volatile" and "register" address in C (and not in the most elegant way, admittedly). "volatile" may be aliased by anything and will always be reloaded when read. "register" may never be aliased at all and the compiler will happily optimise their accesses.
Ideally we'd need a simplified mechanism in a language to indicate that certain pointers may alias only their own type, nothing at all or everything, and that they may be aliased by the same factors. with this, developers could choose their constructs without having to worry about what the compiler does behind (exactly like they do in assembly). Having to pretend that something is a register to prevent it from being aliased is annoying and limited since you cannot take its pointer to pass it anywhere. But if we could say "this never aliases anything" some constructs could be more easily optimized. Maybe some scopes would be useful (sort of aliasing barriers for certain variables).
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 19:11 UTC (Mon) by acarno (subscriber, #123476) [Link]
http://www.ada-auth.org/standards/rm12_w_tc1/html/RM-3-10...
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 12:42 UTC (Tue) by wtarreau (subscriber, #51152) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 19:31 UTC (Mon) by pm215 (subscriber, #98099) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 20:37 UTC (Mon) by NYKevin (subscriber, #129325) [Link]
Officially, volatile gets you exactly nothing in terms of the abstract machine semantics and the strict aliasing rule. In practice, if all of the aliasing variables are volatile, it's unlikely that most "reasonable" compilers will have issues, but it's still UB and so the entire code path is still considered poisoned. It's possible that the compiler assumes, for example, that the code path in which the aliased write happens is never executed while the other alias exists, and therefore makes incorrect simplifying assumptions about the overall flow of control.
The purpose of volatile is to control memory-mapped I/O and other hardware that does "magic" stuff to your memory/address space. It is not to enact an end-run-around the strict aliasing rule.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 21:05 UTC (Mon) by walters (subscriber, #7396) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 22:40 UTC (Mon) by tialaramex (subscriber, #21167) [Link]
It's true that you can't alias variables with storage class register because you can't take their address, although this was a relatively late addition (K&R C does not have this rule), but there are no rules about aliasing volatile at all, neither allowing nor forbidding.
The standard says that register is merely a hint [which today your compiler almost certainly ignores], that it be a good idea to put this variable in a register and serves no other purpose despite the restriction.
Meanwhile volatile serves only one clear purpose, you can use it to perform explicit stores and loads from a memory address, likely because you are doing MMIO. This is tricky to get right but since C doesn't provide any intrinsics for this purpose it's the only way that's even a little bit portable. All other uses of volatile are platform specific (in the good cases) or just voodoo / cargo cult C, sprinkled on by people who are hoping maybe the bug goes away if they write volatile in more places.
> But if we could say "this never aliases anything" some constructs could be more easily optimized.
Which is why (safe) Rust gets to go very fast. But attempting to retro-fit this to a language like C is impractical.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 22:53 UTC (Mon) by NYKevin (subscriber, #129325) [Link]
Which is why (safe) Rust gets to go very fast. But attempting to retro-fit this to a language like C is impractical.
To be fair, C does have the restrict keyword. But that's more or less the opposite of register or Rust's borrow checker (i.e. instead of the type system preventing aliasing from happening and promising the programmer that it has done so, the programmer prevents aliasing from happening and promises the type system that they have done so), and this arguably makes it less useful in more complicated cases.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 9:21 UTC (Tue) by roc (subscriber, #30627) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 13:04 UTC (Tue) by immibis (subscriber, #105511) [Link]
Who am I kidding.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 12:53 UTC (Tue) by wtarreau (subscriber, #51152) [Link]
Kevin, could you please explain me "restrict" ? I've started to see it a few years ago in includes and man pages, and all info I've read on it were incomprehensible to me. I've always been interested in strong typing (and am using const a lot). I'd like to know if "restrict" may bring me anything at all or if I shouldn't care.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 14:05 UTC (Tue) by farnz (subscriber, #17727) [Link]
"restrict" is a promise from the programmer to the compiler, which is why it's a pain to understand.
Using the following names for exposition:
int foo[16]; int * foo_ptr = &foo[0]; int * restrict foo_restrict = foo;
With foo_ptr, the programmer makes no promises about aliasing. There can be a second pointer to any element of foo, and you can use foo[2] and *(foo_ptr + 2) interchangeably.
"restrict" makes a promise to the compiler about using overlapping names, and hence a promise that no aliasing is used for as long as the "restrict" pointer is alive. For as long as foo_restrict is alive, you promise not to access foo directly, or via foo_ptr, and you promise that if you use *(foo_restrict + 4), you have not accessed foo[4] any other way since foo_restrict was initialized, and that you will not access it any other way (e.g. via foo[4], or *(foo_ptr + 4)) until the lifetime of foo_restrict ends.
The usual concrete example is memcpy versus memmove; the inputs to memcpy are "restrict" pointers, because if you do memcpy(foo, bar, 16 * sizeof(foo[0]));, you promise the compiler that until memcpy returns, *(foo + 0) through *(foo + 15) cannot be accessed via *(bar + offset). memmove, on the other hand, permits that overlap, so its input pointers cannot be marked restrict.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 14:55 UTC (Tue) by wtarreau (subscriber, #51152) [Link]
I think I'm seeing a few cases where that could help, especially when some asm() statements are used and the
compiler cannot figure that some values cannot have changed there. At least now I know what to look for and
how to experiment.
Thank you!
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 23, 2022 4:11 UTC (Wed) by marcH (subscriber, #57642) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 23, 2022 10:34 UTC (Wed) by wtarreau (subscriber, #51152) [Link]
What I previously found was this: https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Restricted-... but it wasn't very clear to me. Of course there's nothing wrong in it, it's just that when the use cases are unclear to you they can remain unclear after reading the doc.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 21, 2022 23:29 UTC (Mon) by jlombera (guest, #155698) [Link]
When accessing/modifying shared memory between processes/threads, volatile is some time the right thing to do to ensure stores/loads to/from memory. Thus it's not limited to MMIO.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 1:26 UTC (Tue) by tialaramex (subscriber, #21167) [Link]
On other platforms, with other compilers, you get what you asked for, not what you expected. Maybe you get lucky and maybe you don't. Maybe if you get unlucky you can write "volatile" in a few extra places and now it works. Voodoo.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 1:47 UTC (Tue) by jlombera (guest, #155698) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 3:34 UTC (Tue) by mrugiero (guest, #153040) [Link]
For MMIO it works because the OS can mark a page as cache disabled so it goes straight to "memory" (which really is a mapped device).
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 4:16 UTC (Tue) by jlombera (guest, #155698) [Link]
The processor knows, though. All the compiler needs to do is not to register-optimize and emit memory access instructions instead, the processor takes care of maintaining cache coherence.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 4:45 UTC (Tue) by NYKevin (subscriber, #129325) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 5:02 UTC (Tue) by jlombera (guest, #155698) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 11:29 UTC (Tue) by tialaramex (subscriber, #21167) [Link]
You can take a look at a toy example with Godbolt, and see for yourself what the compiler actually tells your CPU to do, on x86 (and x86-64) you get Acquire/Release semantics (but not full consistency) "for free" (in fact everybody is paying for these semantics, all the time on this platform so it's only "free" the same way the ice is "free" with a $5 coke in a restaurant) but on other platforms if you don't see the CPU being told to do this work it's not doing the work. Maybe you get away with it, and maybe you don't. You are gambling every time.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 16:14 UTC (Tue) by jlombera (guest, #155698) [Link]
Again, "volatile" is to ask the *compiler* not to optimize access to memory in the generated binary code.
>You can take a look at a toy example with Godbolt, and see for yourself what the compiler actually tells your CPU to do
Sure, feel free to play with this (very contrived) example in godbolt.org (sorry for the formatting, I couldn't fine a way to make this work as a plain text comment):
```
void f(volatile int *x_p) {
while (!*x_p)
;
}
void g(int *x_p) {
while (!*x_p)
;
}
```
Feel free to play with different compilers, optimization levels, even different archs. You'll see that in every case, in the loop in f() *x_p is read from memory in every iteration, whereas for g(), different kind of optimizations are performed.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 17:13 UTC (Tue) by NYKevin (subscriber, #129325) [Link]
Nobody is disputing that. We are telling you that the compiler will fail to emit acquire/release memory barrier instructions on non-x86 platforms, and without those, you get no cross-thread guarantees.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 18:27 UTC (Tue) by khim (subscriber, #9252) [Link]
> Again, "volatile" is to ask the *compiler* not to optimize access to memory in the generated binary code.
And that's not enough. Not even on x86. You know that, right?
Intel 8086 included lock
prefix from the very beginning! And you can not force compiler to use it with volatile
. End of story.
Yes, with C89 you had no choice but to use assembler with some volatile
sprinkled here and there. C11 offers atomics which provide much more concise and usable semantic.
Don't use volatile
except in kernel, please. It's not needed and harmful.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 20:13 UTC (Tue) by jlombera (guest, #155698) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 23, 2022 0:20 UTC (Wed) by excors (subscriber, #95769) [Link]
The main thing I can think of is e.g. performance counters, where one thread updates them and another thread periodically reads them, and where you don't care if it reads slightly stale values (but not worse than a few usecs) or reads each counter in an unpredictable order. In that case, you do need something like volatile (to ensure the first thread doesn't hold the counter in a register for many usecs) but you don't need any further synchronisation guarantees. You also need atomic reads/writes, which I don't think volatile guarantees, but in practice it's probably okay if it's an aligned word-sized value.
Probably you could also do a simple form of mailboxes, where the producer thread does "while (m != 0) {}; m = 42;" and the consumer thread does "while (m == 0) {}; do_work(m); m = 0;", where (I think? but not certain) there are hopefully enough implicit dependencies that it will always behave as expected on any CPU. (But that won't work if you want to share more than a single word, because the mailbox message won't be synchronised with any other memory access.)
Those seem very niche cases, though. And you can easily do them with C++/C11 atomics using memory_order_relaxed (which adds no synchronisation barriers but does guarantee atomicity, like a more well-behaved volatile). I'm not aware of any drawbacks of memory_order_relaxed over volatile, and the benefit is it can be combined with acquire/release accesses (to the same variable or to others) for cases where synchronisation is important (which is nearly all cases involving shared memory).
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 23, 2022 4:17 UTC (Wed) by marcH (subscriber, #57642) [Link]
BTW https://queue.acm.org/detail.cfm?id=3212479
> The cache coherency protocol is one of the hardest parts of a modern CPU to make both fast and correct. Most of the complexity involved comes from supporting a language in which data is expected to be both shared and mutable as a matter of course.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 9:05 UTC (Tue) by tialaramex (subscriber, #21167) [Link]
The CPU doesn't promise sequential consistency, because doing so would make it (much) slower. So now your program doesn't have sequential consistency. This is inherently a very difficult environment in which to write programs at all, but C and C++ don't bother you with that trouble because both languages have the same rule about sequential consistency: If your program doesn't exhibit sequential consistency it instead has Undefined Behaviour and they wash their hands of you entirely.
Again, you can write "volatile" on some more variables and maybe you get lucky and on the CPU you're working with the extra spills cause a cache flush, or forces an extra wait cycle somewhere and it happens to mask the bug. And then maybe somebody buys a CPU with more L1 cache, or a different cache policy and now the mysterious bug is back. You are using the wrong tool for the job.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 15:45 UTC (Tue) by jlombera (guest, #155698) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 22:38 UTC (Tue) by edeloget (subscriber, #88392) [Link]
> The processor knows, though. All the compiler needs to do is not to register-optimize and emit
> memory access instructions instead, the processor takes care of maintaining cache coherence.
That would not work.
First of all, the obvious: it would fail on the multi-processor (physically separated) case because one thing a processor knows is not always known by the other guy. That would require N-to-N communication between the processors - and it would be a nightmare on systems where you have many nodes (up to 6x64 cores) that can share the same physical memory, such as the Chinese Sunway supercomputer.
And then it would be reeeeeeeeaaaaaaaaaaly slow. The reason why cache works this way is that the processor don't even try to find out if the underlying memory has changed (on a load) or if it should change (on a store). Because it does not check anything, it's fast. If you start to factor in multiple checks then you'll hit a performance wall quite soon.
That's exactly why we do this only when performing an atomic operation: we are willing to pay the performance cost in exchange for the information. This is not something we want to do on every load or store. And that's exactly why processors don't do it unless we explicitely tell them to do it. The application (either the OS or a user space program) knows when it shall make an atomic load or store. The processor cannot know it in advance and unless you make an explicit pledge the compiler cannot know either.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 8:47 UTC (Tue) by metan (subscriber, #74107) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 13:25 UTC (Tue) by tialaramex (subscriber, #21167) [Link]
This all pre-dates a formal memory model, but it is promised in POSIX and so you are indeed welcome to rely on it on a POSIX system. Like making errno work the way the standard says it should, on modern systems this involves a considerable amount of extra lifting for your compiler and C library, but that work is done and so yes you might as well rely on it.
There's a lot of low-level code out there actually banging on MMIO far from any POSIX system and MMIO is, in fact by my understanding where volatile starts out (first C compilers are too naive to eliminate duplicate stores/ loads, as the optimiser improves it elides enough apparently useless loads and stores that now the device driver doesn't work, volatile qualifier tells the compiler not to optimise the loads and stores and now the device drivers work properly again), so if I was a betting man I might take the other side of your bet.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 12:51 UTC (Tue) by wtarreau (subscriber, #51152) [Link]
I know but there are few cases where it's still used. Trying to get the pointer from a variable declared as register will be instantly refused (which is great). Declaring a global variable with register (you're forced to indicate what register) will allow the compiler to optimize some operations because it knows the variable cannot change.
But I agree these are almost exceptions to the general rule that the compiler doesn't care much anymore.
> Meanwhile volatile serves only one clear purpose, you can use it to perform explicit stores and loads from a memory address, likely because you are doing MMIO.
The primary usage is for signals, even before MMIO. Userland code needs to use volatile and is certainly not fiddling with MMIO in general.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 18:00 UTC (Tue) by njs (guest, #40338) [Link]
I think it's exactly the opposite: at the language level, C/C++/unsafe Rust all say that you're allowed to convert back and forth between integers and pointers, because the language designers had the same intuition you did – that's how the machine actually works, so it'll be fine.
But there are two problems:
- that's not actually how all machines work (like this exotic CHERI thing, or old-school segmented architectures)
- more importantly, even on common ISAs like x86 and ARM, it turns out that if you want a decent compiler, your front end needs to target a higher-level virtual machine where pointers *aren't* just integers. Of course they'll eventually get lowered to integers, but if you do that too early then it destroys your ability to do optimizations. So the status quo right now is that all compilers *actually* treat integers and pointers as fundamentally different, and they do it using a bunch of ad hoc heuristics that were never written down and the compiler engineers have been gradually realizing are actually incoherent and busted, even if they *mostly* work in practice.
So the problem is: how do we change the language and the compiler so that the code is efficient *and* the compiler rigorously implements the language semantics *and* the language semantics are understandable without a phd. And this means the language semantics need to treat pointers and integers as fundamentally different, while still giving enough tools to do all the weird pointer tricks you need in real systems.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 2:45 UTC (Tue) by atnot (subscriber, #124910) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 5:04 UTC (Tue) by jhoblitt (subscriber, #77733) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 13:01 UTC (Tue) by tialaramex (subscriber, #21167) [Link]
Importantly Editions respect time's arrow. You write new code the new way, and your old code is unchanged. 10M lines or 10B lines doesn't matter, you aren't required to touch any of it. But Editions change how people think about what's possible and that means both that more adventurous changes are considered (knowing Editions might make the change practicable), and so often changes which wouldn't have been conceived at all without editions, ultimately turn out not to be incompatible and so the benefits accrue to everybody, not just on new editions.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 16:40 UTC (Tue) by jhoblitt (subscriber, #77733) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 19:01 UTC (Tue) by khim (subscriber, #9252) [Link]
Technically nothing in Rust is Rust innovation and most ideas it uses were already old when it was conceived. Heck, it was presented to the world with words technology from the past come to save the future from itself!
But most compileable “mainstream” language are based on ideas so ancient that even these, pretty old and well-tested ideas are looking like some kind of radical revelation to C/C++/ObjectPascal/etc developers (Swift took some of these ideas, though).
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 7:16 UTC (Tue) by wahern (subscriber, #37304) [Link]
The very topic of discussion is literally evidence to the contrary. C already has intptr_t precisely to avoid the very problem Rust now has. (Relatedly, intptr_t is *optional* in Standard C, understanding that there might be some systems where data pointer to integer conversions aren't supportable; and Standard C doesn't support function pointer to integer conversions at all.) Moreover, FreeBSD has already been ported to CHERI, so claims that "real-world" C code is too riddled with non-standard pointer to integer conversions isn't very persuasive, particularly relative to Rust.
IIRC, porting the entire POSIX API to CHERI required only two significant changes: dlsym and signals. Both are areas where POSIX (much like Rust) required assumptions that Standard C doesn't. There was some ugliness related to memcpy, but Rust takes memcpy abuse to an entirely different level.
While C is far from the ideal language for a memory capability system, it certainly was more prepared for it than Rust. It's not surprising, though, as Rust was largely designed to workaround the lack of ABI- or ISA-enforced memory protections, whereas that possibility has always been at the back of the minds of C committee members. If you assume those things aren't on the horizon (and it's still not a given will see commercial success, let alone ubiquity), playing fast-and-loose with pointer types under the hood is an easy simplification. If common platforms implemented CHERI or something like it, Rust probably would have never been conceived in the first place.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 17:24 UTC (Tue) by tialaramex (subscriber, #21167) [Link]
But you don't present any evidence of such a reworking, only that people have managed to run some C software on CHERI which of course you'd expect since CHERI has been under development for some time explicitly to run C software. Here's an excerpt from Cambridge's description of CHERI, "The CHERI memory-protection features allow historically memory-unsafe programming languages such as C and C++ to be adapted to provide strong, compatible, and efficient protection against many currently widely exploited vulnerabilities". Nothing in there about formal properties, no proposals to the ISO committee, instead they are being pragmatic, what choice do they have after decades of C programmers resolute disinterest.
> If common platforms implemented CHERI or something like it, Rust probably would have never been conceived in the first place.
If "something like it" managed to make the vague semantics of C's logical machine better match the reality of a modern computer by adjusting the computer instead, perhaps you'd even be right. Maybe if this had happened in the 1990s, the elevator Graydon was annoyed by in 2005 would have actually worked.
CHERI is a long way from this fantasy, many grave C problems are orthogonal to CHERI but are completely solved in (safe) Rust. Which doesn't make CHERI a bad idea, it just highlights that Graydon's problem wasn't something a lot the lines of "there's this one thing about C I don't like, so I guess I will write an entirely new programming language" but rather that systematically none of the useful lessons of past decades of programming language theory had been adopted into systems programming languages people actually use.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 19:06 UTC (Tue) by excors (subscriber, #95769) [Link]
They appear to have plenty about formal properties at https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/ch... , including papers like "Exploring C Semantics and Pointer Provenance" which adds CHERI-like semantics to a subset of C, based on (if I understand it correctly) the Cerberus tool which carefully transforms C programs into a 'Core' language that makes the memory model explicit and has well-defined operational semantics. The Core code can then be analysed for pointer provenance violations etc. And https://www.cl.cam.ac.uk/~pes20/cerberus/ lists many proposals submitted to ISO by that research group. (Of course they're still a long way from a complete semantics for C, despite working on this for well over a decade with many PhDs, so it's far from a solved (or even solvable) problem in general.)
(Hmm, actually Cerberus seems to be somewhat more relaxed than CHERI, because it doesn't require you to use intptr_t. See https://cerberus.cl.cam.ac.uk/?short/2eaa24 , select "Model > Integer provenance (PVI)", "Search > Random", and it complains of undefined behaviour when dereferencing the pointer, because it gets understandably confused about provenance. But comment out lines 9-10 (which are a noop in regular C) and it works okay, because it can still track provenance through the cast to long and back. If you step "Forward" enough times then you can see the allocation number associated with each pointer.)
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 19:30 UTC (Tue) by NYKevin (subscriber, #129325) [Link]
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 13:37 UTC (Tue) by uecker (guest, #157556) [Link]
for C and comes with precise formal semantics.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 19:05 UTC (Tue) by tialaramex (subscriber, #21167) [Link]
Aria proposes that usize - Rust's built in unsigned integer type that's typically 64-bit on a modern computer - should formally be the same size as an address _not_ the same size as a pointer as it is defined today. As a side effect, something desirable (to me at least, but I believe others too) falls out, while we're acknowledging that a pointer isn't just an integer with intent, we abolish the (ab)use of as casts to turn one into the other. Instead the programmer is expected to write what they meant, e.g. ptr.with_addr(address) gets you a pointer (maybe 129-bits) made from an address (maybe 64-bits) plus your promise that what you are doing is OK. Did you lie? Same rules as before, now your program is meaningless.
The C proposal can't go around adding methods to pointers, not least because C doesn't have methods and if it did it wouldn't have them on pointers, it just changes the formal semantics of the language to acknowledge the practical need for provenance. Existing correct C will remain correct, the TS just says why it's correct (or rather, why other seemingly reasonable C that doesn't work is not correct).
Also I expect that the committee will nod wisely and say that they don't have time to take this up right now, but please bring it back again next time, which is roughly what it has been doing since at least 2016, if your plan is to wait for them to fix C rather than learn a new language, don't figure on that happening any time soon.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 23, 2022 13:26 UTC (Wed) by uecker (guest, #157556) [Link]
C could of course as easily add new way to combine a pointer with an address using some other syntax than a method. The problem with this is that it would break existing code (which for Rust somehow seems OK). The other problem is that the in most cases where you need to convert an integer to a pointer you do not have the pointer available, so you simply can not use ptr.with_addr(address). If you had the pointer you could also just to ptr + offset which is the same as ptr + (addr - base_addr), so I do not see how ptr.with_addr(address) solves the same problem.
C has a lot issues, but it also has many advantages: Widely supported, long-term stability, many existing tools, fast compilation, low complexity, emerging formal semantics, etc. And yes, it will take a long time fixing its many issues. It will also take a long time before Rust is ready (the long compile times and lack of stability rule it out for me at this time) and it is already too complex for my taste.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 23, 2022 15:10 UTC (Wed) by farnz (subscriber, #17727) [Link]
There are two important differences between Rust and C that make breaking changes in handling raw pointers more palatable on the Rust side:
- Use of raw pointers in Rust is already gated behind unsafe, and Rust style says to keep the use of unsafe to a minimum. This allows you to use data in papers like this ACM paper on Unsafe Rust to judge the maximum blast radius of changes - the data supports the idea that at most 25% of Rust code could be affected (as around 75% of published crates contain no Unsafe Rust), and that 3% of Rust code would be a good estimate of the amount of code affected by changes to raw pointers (about 10% of Unsafe Rust deals with raw pointers). In contrast, because of the nature of C, it's harder to tell how much C code is likely to be affected by any given change to pointer semantics.
- Rust's module system allows you to have maintained legacy code in an older edition and modern Rust in the same binary - I can link Rust 2015 code with Rust 2027 code, and the compiler will give the Rust 2015 code the semantics that went with Rust 2015, while giving me modern semantics for Rust 2027 code. #include in C means that I can't clearly delineate code that has modern semantics from code that doesn't, because some code has to have the "right" semantics whether it's #included into a compilation unit that has C99 semantics or whether it's #included into a compilation unit that has C27 semantics.
Of these, I think the former is the hard one to overcome; fixing the latter is something that can be done by a sufficiently smart C standard committee and compiler implementation team, while the former is about gathering statistics easily on which code might be affected.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 22, 2022 19:10 UTC (Tue) by khim (subscriber, #9252) [Link]
It's one of half-dozen attempts. It's not yet fully finished and it's not clear if it even can be finished and adopted.
Rust's issue lies precisely with the fact that there are no C or C++ memory model which can be used to write code which would then be actually compiled (yes, there are some memory model specified by standard, but we know that compilers are happy to break certain valid programs based on that memory model, examples are actually in the proposal you are linking).
If there would have been some memory model which would have matched what the actual compilers are doing unsafe Rust would have just used that. But there are nothing, just a DR260 resolutions which prompts compiler developers to develop something and include it in the standard… and lots of handwaving.
Beingessner: Rust's Unsafe Pointer Types Need An Overhaul
Posted Mar 23, 2022 13:26 UTC (Wed) by uecker (guest, #157556) [Link]
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK