5

Tracking issue for stable SIMD in Rust

 3 years ago
source link: https://github.com/rust-lang/rust/issues/48556
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Tracking issue for stable SIMD in Rust #48556

alexcrichton opened this issue on Feb 27, 2018 · 73 comments

Comments

Member

alexcrichton commented on Feb 27, 2018

edited

This is a tracking issue for RFC 2325, adding SIMD support to stable Rust. There's a number of components here, including:

The initial implementation of this is being added in #48513 and the next steps would be:


Known issues

Member

scottmcm commented on Feb 27, 2018

My one request for the bikeshed (which the current PR already does and may be obvious, but I'll write it down anyway): Please ensure they're not all in the same module as things like undefined_behaviour and [un]likely, so that those rust-defined things don't get lost in the sea of vendor intrinsics.

Member

cuviper commented on Feb 27, 2018

What will be the story for external LLVM? (lacking MCSubtargetInfo::getFeatureTable())

Member

Author

alexcrichton commented on Feb 27, 2018

@scottmcm certainly! I'd imagine that if we ever stabilized Rust-related intrinsics they'd not go into the same module (they probably wouldn't even be platform-specific).

@cuviper currently it's an unresolved question, so if it doesn't get fixed it means that using an external LLVM would basically mean that #[cfg(target_feature = ...)] would always expand to false (or the equivalent thereof)

Contributor

hanna-kruppe commented on Feb 27, 2018

edited

I'd imagine that if we ever stabilized Rust-related intrinsics they'd not go into the same module (they probably wouldn't even be platform-specific).

One option raised in the RFC thread (that I personally quite like) was stabilizing std::intrinsics (only the module), keep the stable rust intrinsics in that module (they can already be imported from that location due to a long-standing bug in stability checking) and put these new platform-specific intrinsics in submodules. IIUC this would also satisfy @scottmcm's request.

To be explicit, under that plan the rustdoc page for std::intrinsics would look like this:


Modules

  • x86_64
  • arm

Functions

  • copy
  • copy_nonoverlapping
  • drop_in_place

Member

Author

alexcrichton commented on Mar 3, 2018

Another naming idea I've just had. Right now the feature detection macro is is_target_feature_enabled!, but since it's so target specific it may be more apt to call it is_x86_target_feature_enabled!. This'll make it a pain to call on x86/x86_64 though which could be a bummer.

Contributor

nox commented on Mar 5, 2018

Why keep all the leading underscores for the intrinsics? Surely even if we keep the same names as what the vendors chose, we can still remove those signs, right?

Member

BurntSushi commented on Mar 5, 2018

The point is to expose vendor APIs. The vendor APIs have underscores. Therefore, ours do too.

Contributor

nox commented on Mar 5, 2018

It is debatable that those underscores are actually part of the name. They only have one because C has no modules and namespacing, AFAICT.

Contributor

nox commented on Mar 5, 2018

I would be happy dropping the topic if it was discussed at length already, but I couldn't find any discussion specific to them leading underscores.

Member

BurntSushi commented on Mar 5, 2018

@nox rust-lang/stdarch#212 --- My comment above is basically a summary of that. I probably won't say much else on the topic.

Contributor

Centril commented on Mar 5, 2018

@nox, @BurntSushi Continuing the discussion from there... since it hasn't been mentioned before:

Leading _ for identifiers in rust often means "this is not important" - so just taking the names directly from the vendors may wrongly give this impression.

Member

Author

alexcrichton commented on Mar 6, 2018

@nox @Centril the recurring theme of stabilizing SIMD in Rust is "it's not our job to make this nice". Any attempt made to make SIMD different than what the vendors define has ended with uncomfortable questions and intrinsics that end up being left out. To that end the driving force for SIMD intrinsics in Rust is to get anything compiling on stable.

Crates like faster are explicitly targeted at making SIMD usage easy, fast, and ergonomic. The standard library's intrinsics are not intended to be widely used nor use for "intro level" problems. Leveraging the SIMD intrinsics is quite unsafe (due to target feature detection/selection) and can come at a high cost if used incorrectly.

Overall, again, the goal is to not enable ergonomic SIMD in Rust right now, but any SIMD in Rust. Following exactly what the vendors say is the easiest way for us to guarantee that all Rust programs will always have access to vendor intrinsics.

Contributor

hanna-kruppe commented on Mar 6, 2018

I agree that the leading underscores are a C artifact, not a vendor choice (the C standard reserves identifiers of this form, so that's what C compilers use for intrinsics). Removing them is neither "trying to make it nicer/more ergonomic" (it's really only a minor aesthetic difference) nor involves any per-intrinsic judgement calls. It's a dead simple mechanical translation for a difference in language rules, almost as much as __m128 _mm_foo(); is mechanically translated to fn _mm_foo() -> __m128;.

Member

Author

alexcrichton commented on Mar 6, 2018

@rkruppe do we have a rock solid guarantee that no vendor will ever in the future add the same name without underscores?

Contributor

Centril commented on Mar 6, 2018

@alexcrichton

@rkruppe do we have a rock solid guarantee that no vendor will ever in the future add the same name without underscores?

Can't speak for CPU vendors, but the probability seems very very low. Why would they add an intrinsic where the difference is only an underscore..? Further, as Rust's influence grows, they might not do this simply because of Rust.

Contributor

hanna-kruppe commented on Mar 6, 2018

A name like mm_foo (no leading underscore at all) is not reserved in the C language, so it can't be used for compiler-supplied extensions without breaking legal C programs. There are a few theoretical possibilities for a vendor to nevertheless create intrinsics without leading underscores:

  • they could expose it only in C++ (with namespacing) -- or, for that matter, another language that isn't C
  • they could break legal C programs (very unlikely, and I'll eat my hat if GCC or Clang developers accept this)
  • A future version of C adds some way of doing namespacing, and people start using it for intrinsics

All extremely unlikely. The first one seems like the only one that doesn't sound like science fiction to me, and if that happens we'd have other problems anyway (such intrinsics may use function overloading and other features Rust doesn't have).

Contributor

alexreg commented on Mar 6, 2018

It is debatable that those underscores are actually part of the name. They only have one because C has no modules and namespacing, AFAICT.

This. The whole point is that the underscore-leading names were chosen so as to specifically not clash with user-defined functions. Which means they should never be using non-underscore names. It's against well-established C conventions. Hence, we should just rename them to follow Rust conventions, with no real chance there will be any name clash in the future, providing the vendors stay sane and respect C conventions.

Member

Author

alexcrichton commented on Mar 6, 2018

@Centril "probability seems very very low" is what I would say as well, but we're talking about stability of functions in the standard library, so "low probability" won't cut it unfortunately.

@rkruppe I definitely agree, yeah, but "extremely unlikely" to me says "follow the vendor spec to the letter and we can figure out ergonomics later".

Member

Author

alexcrichton commented on Mar 6, 2018

Another point worth mentioning for staying exactly to the upstream spec is that I believe it actually boosts learnability. You'll have instant familiarity with any SIMD/intrinsic code written in C, of which there's already quite a lot!

If we stray from the beaten path then we'll have to have a section of the documentation which is very clear about defining the mappings between intrinsic names and what we actually expose in Rust.

Contributor

pythoneer commented on Mar 6, 2018

I don't think renaming (no leading underscore or any other alteration) is useful. This is simply not the goal and only introduces pain points. I cannot think of a reason other than "i like that more" to justify that. It only introduces the possibility to naming clashes and "very very unlikely" is not convincing because we can prevent this 100% by not doing it altogether.

I think its the best choice to follow the vendor naming schema as close as possible and i think we should even break compatibility if we ever introduce an error in the "public API" without doing some renaming like _mm_intr_a to _mm_intr_a2 and start diverging the exact naming schema introduced by the vendor.

Contributor

nox commented on Mar 6, 2018

@alexcrichton But as @rkruppe said, removing the leading underscore isn't about ergonomics, it's about not porting C defects to Rust blindly.

Contributor

nox commented on Mar 6, 2018

Sorry for the double post, but I also want to add that arguing that a vendor may release an unprefixed intrinsic with the same name as a prefixed one is to me as hypothetical as arguing that bool may not be a single byte on some platform we would like to support.

Contributor

pythoneer commented on Mar 6, 2018

@nox but why stop by the _? We could also fully rename the function with ps and pd into f32 and f64 which would be something "more Rust". Its somewhat arbitrary to just remove the leading underscore. And we could argue back and forth what is ergonomics and what isn't but i don't think there is a very good line to distinguish that to a point every body agrees.

Contributor

nox commented on Mar 6, 2018

@pythoneer Because the name is what the vendor decided, with a leading underscore because of nondescript limitations of C.

Contributor

pythoneer commented on Mar 6, 2018

@nox and the explicit goal of stdsimd is to expose this (however defect) vendor defined interface.

Contributor

alexreg commented on Mar 6, 2018

@nox and the explicit goal of stdsimd is to expose this (however defect) vendor defined interface.

Interface, sure, but not necessarily the naming conventions!

Contributor

gnzlbg commented on May 8, 2018

edited

I don't know if it's too late to still tune things here, but the original RFC had two features that were changed during the discussion over there:

  • the submitted RFC put all intrinsics in std::arch::*, the revised RFC in std::arch::{arch_name}.
  • the submitted RFC used is_feature_detected! for run-time feature detection, the revised RFC uses is_{arch_name}_feature_detected!

The RFC was accepted before those changes were made. The changes were made in the RFC at the end of February, implemented at the beginning of March, and the FCP went through mid April. Right now we have ~2 month of experience with these changes

In any case, going through the RFC, I cannot pin point any concrete argument about why:

  • the intrinsics of each architecture should be in a different std::arch::{arch_name} module,
  • the architecture name should be part of the is_..._feature_detected! macros.

In particular, std::arch only contains one single module, the one of the current architecture, and that's it. Also, there is only one is_..._feature_detected! macro re-exported, the one of the current architecture.

These last-minute changes make it more painful than necessary to write code even for x86, where one has to:

#[target_feature(enabled = "sse3")]
unsafe fn foo() {
    #[cfg(target_feature = "x86")] use core::arch::x86::*;
    #[cfg(target_feature = "x86_64")] use core::arch::x86_64::*;
    /* ... */
}

all over the place, or at the top level, to avoid having to do this all over the place. Things don't get better when targeting multiple architectures. What before was horrible:

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "sse4.2"))] 
#[cfg_attr(any(target_arch = "arm", target_arch = "aarch64"), target_feature(enable = "neon"))] 
unsafe foo() {
    use core::arch::*;

     #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
         if is_feature_detected!("avx2") { ... } else { ... }
     }
     #[cfg(any(target_arch = "arm", target_arch = "aarch64"))] {
        if is_feature_detected!("crypto") { ... } else { ... }
     }  
}

now is worse:

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "sse4.2"))] 
#[cfg_attr(any(target_arch = "arm", target_arch = "aarch64"), target_feature(enable = "neon"))] 
unsafe foo() {
    #[cfg(target_arch = "x86")]  use core::arch::x86::*; 
    #[cfg(target_arch = "x86_64")]  use core::arch::x86_64::*
    #[cfg(target_arch = "arm")] use core::arch::arm::*;
    #[cfg(target_arch = "aarch64")] use core::arch::aarch64::*; 

     #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
         if is_x86_feature_detected!("crypto") { ... } else {  ... }
     }
     #[cfg(target_arch = "arm")] {
        if is_arm_feature_detected!("crypto") { ... } else { ... }
     }
     #[cfg(target_arch = "aarch64")] {
        if is_aarch64_feature_detected!("crypto") { ... } else { ... }
     }
}

This is particularly worrying if we want to add new "feature sets" for ergonomics like simd128 and simd256 since before the changes the above would just become:

#[target_feature(enable = "simd128")] 
unsafe foo() {
    use core::arch::*;
     if is_feature_detected!("crypto") { ... } else { ... }
}

I remember that to me they sounded like a potentially good idea back then, so I did not gave them more thought (I was more in the "I want SIMD now" mood). But now that the love story has faded and I've had the chance to use them a couple of times, I've clashed against them every single time:

Anyways, can somebody summarize why doing those two changes were a good idea?

In particular for the first change of putting the intrinsics in std::arch::{arch_name}, AFAIK we are never going to add more modules to std::arch because that would mean that the current code is being compiled for two archs at the same time, and in that case, one arch shouldn't be able to access the intrinsics of the other anyways. For the run-time feature detection macros, the benefits are smaller (but still there), since each arch has different intrinsics. But one idiom I would like to use is:

#[cfg(target_arch = "arm")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

#[cfg(target_arch = "aarch64")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

#[cfg(target_arch = "x86_64")]
#[target_feature(enable = "simd128")]
unsafe fn bar() { ... }

fn foo() {
   if is_feature_detected("simd128") { bar() } else { fallback() }
}

and the named macros wouldn't allow that.


There are two ways of fixing this in a backwards compatible way:

  • re-exporting all of std::arch::{arch_name}::* via, e.g., std::arch::current::*
  • adding a is_feature_detected!("...") macro that dispatches to the named ones depending on the architecture.

So I don't think we should block landing this on these ergonomic issues. In any case, I don't feel I understand the real reasons behind the change, so maybe adding these conveniences defeats their purpose.


cc @alexcrichton @rkruppe @eddyb @hsivonen @BurntSushi @Ericson2314 (those who had opinions about this in the RFC)

Member

Author

alexcrichton commented on May 8, 2018

@gnzlbg this was something I forgot about in the original RFC personally. In the standard library anything that isn't portable currently stylistically requires the "non portable part of it" to appear in the path you use it. For example Windows-specific functionality is at std::os::windows. Following suit for SIMD, architecture-specific intrinsics, was natural to place in submodules of std::arch as a warning that what you're using is indeed not portable and specific to only one platform.

The name of the macro was the same rationale, ensuring that you aren't tricked to thinking it can be invoked in a portable context but rather explicitly specifying that it's not portable.

Contributor

parched commented on May 9, 2018

In the standard library anything that isn't portable currently stylistically requires the "non portable part of it" to appear in the path you use it. For example Windows-specific functionality is at std::os::windows. Following suit for SIMD, architecture-specific intrinsics, was natural to place in submodules of std::arch as a warning that what you're using is indeed not portable and specific to only one platform.

Is this something that will be covered with the new portability lint? Also, by that rationale, should everything in std::arch be in target feature submodules?

Member

Author

alexcrichton commented on May 9, 2018

@parched ideally, yes! If that exists we could perhaps consider moving everything wholesale to different modules.

Contributor

gnzlbg commented on May 9, 2018

we could perhaps consider moving everything wholesale to different modules.

For x86/x86_64 this should be easily doable since we already do this internally in stdsimd. For other platforms we can do this in a best effort basis.

Contributor

vks commented on May 23, 2018

core::simd::FromBits still points to this issue. Shouldn't it point to an open issue?

Contributor

gnzlbg commented on May 29, 2018

So should we do the changes? (add is_x86_64_feature_detected, expose the feature submodules instead of all intrinsics directly, ...) We don't have much time to do this if we want to, and I could do this on Friday this week.

Member

Author

alexcrichton commented on May 30, 2018

Er sorry I misread, I think. I do not think we should change anything. Perhaps one day intrinsic can live directly in std::arch and be easier to use with the portability lint, but don't have the portability lint.

Is there any word on when we can stabilize instrinsics like https://doc.rust-lang.org/core/arch/x86_64/fn.cmpxchg16b.html ?
I am running into some issues implementing some lockfree algorithms without it.

Contributor

comex commented on Aug 7, 2020

Would stabilizing AtomicU128 (theoretically tracked in #32976) satisfy your use case, or is there some reason you specifically need the x86 intrinsic?

xacrimon commented on Aug 7, 2020

edited

That would do it as long as it has weak compare and exchange or compare and swap. I really just need a 128 bit compare and swap to fit a pointer and refcount. How is that implemented on archs like spark and ppc that don't support it that easily. LL/SC?

Contributor

Amanieu commented on Aug 7, 2020

AtomicU128 will only be available on targets that support it. AFAIK that's only x86_64 and AArch64.

Ah, it could be theoretically implemented with doublewidth LL/SC on other architectures I think. Is that a possible thing to do?

Contributor

Amanieu commented on Aug 7, 2020

Only AArch64 has 2x64-bit LL/SC.

Contributor

aloucks commented on Aug 28, 2020

edited

Are the half-precision x86/64 functions intended to remain unstable? The compiler errors and the documentation points to this issue, but it was closed quite a while ago along with the stabilization PR.

EDIT: I also noticed that the f16c feature isn't reported in CARGO_CFG_TARGET_FEATURE in the stable compiler when it's explicitly requested: RUSTFLAGS="-C target-cpu=x86-64 -C target-feature=+sse3,+sse4.1,+avx,+f16c" cargo test. However, it does show up in nightly.

Contributor

Amanieu commented on Sep 1, 2020

I think someone just needs to send a stabilization PR for that feature. But first we need to ensure that all the intrinsics covered by the f16c feature are properly implemented.

Any updates on stabilizing the F16C instructions?

Contributor

Amanieu commented 22 days ago

@novacrazy I don't think there's anything blocking F16C intrinsics, feel free to send a stabilization PR for them.

Member

frewsxcv commented 15 days ago

There are four occurrences of #[unstable(feature = "stdsimd", issue = "48556")] in the codebase (this issue number is 48556). This seems to conflict with the fact that this issue is closed. Should these occurrences be referencing a different issue? See also: #76412

Contributor

Amanieu commented 9 days ago

I'm going to reopen this issue. SIMD was only stabilized on x86/x86_64, not on other architectures.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK