2

Github Tracking issue for WebAssembly SIMD support · Issue #74372 · rust-lang/ru...

 3 years ago
source link: https://github.com/rust-lang/rust/issues/74372
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Comments

Copy link

Member

alexcrichton commented on Jul 15, 2020

edited

I'm opening this as a tracking issue for the SIMD intrinsics in the {std,core}::arch::wasm32 module. Eventually we're going to want to stabilize these intrinsics for the WebAssembly target, so I think it's good to have a canonical place to talk about them! I'm also going to update the #![unstable] annotations to point to this issue to direct users here if they want to use these intrinsics.

The WebAssembly simd proposal is currently in "phase 3". I would say that we probably don't want to consider stabilizing these intrinsics until the proposal has at least reached "phase 4" where it's being standardized, because there are still changes to the proposal happening over time (small ones at this point, though). As a brief overview, the WebAssembly simd proposal adds a new type, v128, and a suite of instructions to perform data processing with this type. The intention is that this is readily portable to a lot of architectures so usage of SIMD can be fast in lots of places.

For rust stabilization purposes the code for all these intrinsics lives in the rust-lang/stdarch git repository. All code lives in crates/core_arch/src/wasm32/simd128.rs. I've got a large refactoring and sync queued up for that module, so I'm going to be writing this issue with the assumption that it will land mostly as designed there.

Currently the design principles for the SIMD intrinsics are:

  • Like the existing memory_size, memory_grow and unreachable intrinsics, most intrinsics are named after the instruction that it represents. There is generally a 1:1 mapping with new instructions added to WebAssembly and intrinsics in the module.
  • The type signature of each intrinsic is intended to match the textual description of each intrinsic
  • Each intrinsic has #[target_feature(enable = "simd128")] which forces them all to be unsafe
  • Some gotchas for specific intrinsics are:
    • v128.const is exposed through a suite of const functions, one for each vector type (but not unsigned, just signed integers). Additionally the arguments are not actually required to be constant, so it's expected that the compiler will make the best choice about how to generate a runtime vector.
    • Instructions using lane indices, such as v8x16_shuffle and *_{extract,replace}_lane use const generics to represent constant arguments. This is different from x86_64 which uses the older #[rustc_args_required_const] attribute.
    • Shuffles are provided for v16x8, v32x4, and v64x2 as conveniences instead of only providing v8x16_shuffle. All of them are implemented in terms of the v8x16.shuffle instruction, however.
  • There is a singular v128 type, not a type for each size of vector that intrinsics operate with
  • The extract_lane intrinsics return the value type associated with the intrinsic name, they do not all return i32 unlike the actual WebAssembly instruction. This means that we do not have extract_lane_s and extract_lane_u intrinsics because the compiler will select the appropriate one depending on the context.

It's important to note that clang has an implementation of these intrinsics in the wasm_simd128.h header. The current design of the Rust wasm32 module is different in that:

  • The prefix wasm_* isn't used.
  • Only one datatype, v128, is exposed instead of types for each size/kind of vector
  • Naming can be different depending on the intrinsic. For example clang has wasm_i16x8_load_8x8 and wasm_u16x8_load_8x8 while Rust has i16x8_load8x8_s and i16x8_load8x8_u.

Most of these differences are largely stylistic, but there are some that are conveniences (like other forms of shuffles) which might be nice to expose in Rust as well. All the conveniences still compile down to one instruction, it's just different how users specify in code how the instruction is generated. I believe it should be possible for conveniences to live outside the standard library as well, however.

How SIMD will be used

If the SIMD proposal were to move to stage 4 today I think we're in a really good spot for stabilization. #74320 is a pretty serious bug we will want to fix before full stabilization but I don't believe the fix will be hard to land in LLVM (I've already talked with some folks on that side).

Other than that SIMD-in-wasm is different from other platforms where a binary with SIMD will refuse to run on engines that do not have SIMD support. In that sense there is no runtime feature detection available to SIMD consumers. (at least not natively)

After rust-lang/stdarch#874 lands programs will simply use #[target_feature(enable = "...")] or RUSTFLAGS and everything should work. The SIMD intrinsics will always be exposed from the standard library (but the standard library itself will not use them) and available to users. If programs don't use the intrinsics then SIMD won't get emitted, otherwise when used the binary will use v128.

Open Questions

A set of things we'll need to settle on before stabilizing (and this will likely expand over time) is:

  • Handle the difference between Clang and Rust. This could come in a number of forms such as accepting the difference or trying to unify the two. Either way the standard itself, unlike for x86, does not nor do I think will it provide a standard convention of how to expose these instructions in languages.
  • Audit and confirm the types of pointers in various *_load_* and *_store_* instructions. Primarily the instructions that load 64 bits (8x8, 16x4, ...) I'm unsure of on the types of their pointer arguments.
  • Figure out if the usage of const generics is ok for v8x16_shuffle and lane managment instructions.
  • Confirm the deviation of not having i8x16_extract_lane_s is ok (e.g. having i8x16_extract_lane returning i8 is all we need), same for i16x8.
  • Consider relaxing #[target_feature] "requires unsafe" rules for these WebAssembly intrinsics. Intrinsic like f32x4_splat have no fundamental reason they need to be unsafe. The only reason they're unsafe is because #[target_feature] is used on them to ensure that SIMD instructions are generated in LLVM.
  • Consider switching *_{any,all}_true to returning a bool
  • A general audit of intrinsic names and signatures to ensure they match the specification.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK