Github core: disable `ptr::swap_nonoverlapping_one`'s block optimization on SPIR...

Copy link

Member

eddyb commented on Mar 11

SPIR-V primarily supports what it calls the "Logical addressing model" (and AFAIK for graphical shaders it's the only option), and what that implies is that there is no "memory" to uniformly address at some byte/word level, and that you can't really talk about values having a "raw representation" in terms of sequences of bytes. Therefore, the "block"-wise swapping optimization employed by ptr::swap_nonoverlapping_one (where a "block" is 32 bytes, currently), is fundamentally incompatible with SPIR-V "memory".

As such, Rust-GPU's rustc_codegen_spirv backend cannot currently allow the use of ptr::swap_nonoverlapping_one - but that comes at a great price, since it's the building block of mem::{swap,replace}, and those in turn are used by e.g. Option::take and Range's Iterator implementation (the latter blocking the use of for i in 0..n loops).

There's 4 options I can see in terms of supporting ptr::swap_nonoverlapping_one in rustc_codegen_spirv:

legalize the block-wise swap loop back into swapping whole values, for SPIR-V
- this is made borderline impossible by the fact that the size of the state "on the stack" is a block, and has to be expanded back to the appropriate size of the value being swapped, so in practice this would have to effectively pattern-match on the exact shape of the block-wise swapping algorithm, as a roundabout way of "patching core::ptr on the fly"
(this PR) disable the block-wise swap optimization altogether when #[cfg(target_arch = "spirv")
- I've tested it and it does in fact allow compiling for i in 0..n loops, which was my primary motivation
- main downside IMO is the fact that core now acknowledges an out-of-tree backend
  - as a counterpoint, any attempt to compile Rust to SPIR-V would run into this problem, one way or another
only enable the block-wise swap optimization on targets where it's been empirically proven to be an improvement
- would avoid any surprises in terms of potentially-broken/inefficient codegen, in general
- however, it may be universally applicable (thanks to caches), even if the optimal block size could differ
move low-level swapping into an intrinsic, where the backend can choose any optimization approach it wants
- this also has an impact on MIR optimizations (cc @rust-lang/wg-mir-opt) - which currently cannot hope to make sense of e.g. Option::take despite it being effectively _0 = *_1; *_1 = None; return;
- long-term this is my preferred approach, and I can start working on it if that's desired, but I wanted to confirm that this swapping optimization is the final blocker for Rust-GPU supporting e.g. range for loops

r? @nagisa cc @rust-lang/libs

eddyb commented on Mar 11

Recommend

Github Fix perf regression in rustdoc::bare_urls by jyn514 · Pull Request #84034...

The Science Around Air Pollution – How Do We Affect It?

Github Account for `ExprKind::Block` when suggesting .into() and deref by esteba...

Easy Cast - KAS blog

Github rustdoc: sort search index items for compression by notriddle · Pull Requ...

This Month in Rust GameDev #20 - March 2021

Boredom - A Life Killer. How to Overcome It > CEOWORLD magazine

Responding to Disruption: The Right Attitude is Key

Support Mozilla (@SUMO_Mozilla) / Twitter

Breaking Down The Organizational Skills Of A CV

About Joyk