Tracking issue for stable SIMD in Rust
source link: https://github.com/rust-lang/rust/issues/48556
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Tracking issue for stable SIMD in Rust #48556
alexcrichton opened this issue on Feb 27, 2018 · 73 comments
Comments
Member
scottmcm commented on Feb 27, 2018
My one request for the bikeshed (which the current PR already does and may be obvious, but I'll write it down anyway): Please ensure they're not all in the same module as things like undefined_behaviour
and [un]likely
, so that those rust-defined things don't get lost in the sea of vendor intrinsics.
Member
cuviper commented on Feb 27, 2018
What will be the story for external LLVM? (lacking MCSubtargetInfo::getFeatureTable()
)
Member
Author
alexcrichton commented on Feb 27, 2018
@scottmcm certainly! I'd imagine that if we ever stabilized Rust-related intrinsics they'd not go into the same module (they probably wouldn't even be platform-specific).
@cuviper currently it's an unresolved question, so if it doesn't get fixed it means that using an external LLVM would basically mean that #[cfg(target_feature = ...)]
would always expand to false
(or the equivalent thereof)
I'd imagine that if we ever stabilized Rust-related intrinsics they'd not go into the same module (they probably wouldn't even be platform-specific).
One option raised in the RFC thread (that I personally quite like) was stabilizing std::intrinsics
(only the module), keep the stable rust intrinsics in that module (they can already be imported from that location due to a long-standing bug in stability checking) and put these new platform-specific intrinsics in submodules. IIUC this would also satisfy @scottmcm's request.
To be explicit, under that plan the rustdoc page for std::intrinsics
would look like this:
Modules
x86_64
arm
Functions
copy
copy_nonoverlapping
drop_in_place
Member
Author
alexcrichton commented on Mar 3, 2018
Another naming idea I've just had. Right now the feature detection macro is is_target_feature_enabled!
, but since it's so target specific it may be more apt to call it is_x86_target_feature_enabled!
. This'll make it a pain to call on x86/x86_64 though which could be a bummer.
Contributor
nox commented on Mar 5, 2018
Why keep all the leading underscores for the intrinsics? Surely even if we keep the same names as what the vendors chose, we can still remove those signs, right?
Member
BurntSushi commented on Mar 5, 2018
The point is to expose vendor APIs. The vendor APIs have underscores. Therefore, ours do too.
Contributor
nox commented on Mar 5, 2018
It is debatable that those underscores are actually part of the name. They only have one because C has no modules and namespacing, AFAICT.
Contributor
nox commented on Mar 5, 2018
I would be happy dropping the topic if it was discussed at length already, but I couldn't find any discussion specific to them leading underscores.
Member
BurntSushi commented on Mar 5, 2018
@nox rust-lang/stdarch#212 --- My comment above is basically a summary of that. I probably won't say much else on the topic.
Contributor
Centril commented on Mar 5, 2018
@nox, @BurntSushi Continuing the discussion from there... since it hasn't been mentioned before:
Leading _
for identifiers in rust often means "this is not important" - so just taking the names directly from the vendors may wrongly give this impression.
Member
Author
alexcrichton commented on Mar 6, 2018
@nox @Centril the recurring theme of stabilizing SIMD in Rust is "it's not our job to make this nice". Any attempt made to make SIMD different than what the vendors define has ended with uncomfortable questions and intrinsics that end up being left out. To that end the driving force for SIMD intrinsics in Rust is to get anything compiling on stable.
Crates like faster
are explicitly targeted at making SIMD usage easy, fast, and ergonomic. The standard library's intrinsics are not intended to be widely used nor use for "intro level" problems. Leveraging the SIMD intrinsics is quite unsafe (due to target feature detection/selection) and can come at a high cost if used incorrectly.
Overall, again, the goal is to not enable ergonomic SIMD in Rust right now, but any SIMD in Rust. Following exactly what the vendors say is the easiest way for us to guarantee that all Rust programs will always have access to vendor intrinsics.
Contributor
hanna-kruppe commented on Mar 6, 2018
I agree that the leading underscores are a C artifact, not a vendor choice (the C standard reserves identifiers of this form, so that's what C compilers use for intrinsics). Removing them is neither "trying to make it nicer/more ergonomic" (it's really only a minor aesthetic difference) nor involves any per-intrinsic judgement calls. It's a dead simple mechanical translation for a difference in language rules, almost as much as __m128 _mm_foo();
is mechanically translated to fn _mm_foo() -> __m128;
.
Member
Author
alexcrichton commented on Mar 6, 2018
@rkruppe do we have a rock solid guarantee that no vendor will ever in the future add the same name without underscores?
Contributor
Centril commented on Mar 6, 2018
@rkruppe do we have a rock solid guarantee that no vendor will ever in the future add the same name without underscores?
Can't speak for CPU vendors, but the probability seems very very low. Why would they add an intrinsic where the difference is only an underscore..? Further, as Rust's influence grows, they might not do this simply because of Rust.
Contributor
hanna-kruppe commented on Mar 6, 2018
A name like mm_foo
(no leading underscore at all) is not reserved in the C language, so it can't be used for compiler-supplied extensions without breaking legal C programs. There are a few theoretical possibilities for a vendor to nevertheless create intrinsics without leading underscores:
- they could expose it only in C++ (with namespacing) -- or, for that matter, another language that isn't C
- they could break legal C programs (very unlikely, and I'll eat my hat if GCC or Clang developers accept this)
- A future version of C adds some way of doing namespacing, and people start using it for intrinsics
All extremely unlikely. The first one seems like the only one that doesn't sound like science fiction to me, and if that happens we'd have other problems anyway (such intrinsics may use function overloading and other features Rust doesn't have).
Contributor
alexreg commented on Mar 6, 2018
It is debatable that those underscores are actually part of the name. They only have one because C has no modules and namespacing, AFAICT.
This. The whole point is that the underscore-leading names were chosen so as to specifically not clash with user-defined functions. Which means they should never be using non-underscore names. It's against well-established C conventions. Hence, we should just rename them to follow Rust conventions, with no real chance there will be any name clash in the future, providing the vendors stay sane and respect C conventions.
Member
Author
alexcrichton commented on Mar 6, 2018
@Centril "probability seems very very low" is what I would say as well, but we're talking about stability of functions in the standard library, so "low probability" won't cut it unfortunately.
@rkruppe I definitely agree, yeah, but "extremely unlikely" to me says "follow the vendor spec to the letter and we can figure out ergonomics later".
Member
Author
alexcrichton commented on Mar 6, 2018
Another point worth mentioning for staying exactly to the upstream spec is that I believe it actually boosts learnability. You'll have instant familiarity with any SIMD/intrinsic code written in C, of which there's already quite a lot!
If we stray from the beaten path then we'll have to have a section of the documentation which is very clear about defining the mappings between intrinsic names and what we actually expose in Rust.
Contributor
pythoneer commented on Mar 6, 2018
I don't think renaming (no leading underscore or any other alteration) is useful. This is simply not the goal and only introduces pain points. I cannot think of a reason other than "i like that more" to justify that. It only introduces the possibility to naming clashes and "very very unlikely" is not convincing because we can prevent this 100% by not doing it altogether.
I think its the best choice to follow the vendor naming schema as close as possible and i think we should even break compatibility if we ever introduce an error in the "public API" without doing some renaming like _mm_intr_a
to _mm_intr_a2
and start diverging the exact naming schema introduced by the vendor.
Contributor
nox commented on Mar 6, 2018
@alexcrichton But as @rkruppe said, removing the leading underscore isn't about ergonomics, it's about not porting C defects to Rust blindly.
Contributor
nox commented on Mar 6, 2018
Sorry for the double post, but I also want to add that arguing that a vendor may release an unprefixed intrinsic with the same name as a prefixed one is to me as hypothetical as arguing that bool
may not be a single byte on some platform we would like to support.
Contributor
pythoneer commented on Mar 6, 2018
@nox but why stop by the _
? We could also fully rename the function with ps
and pd
into f32
and f64
which would be something "more Rust". Its somewhat arbitrary to just remove the leading underscore. And we could argue back and forth what is ergonomics and what isn't but i don't think there is a very good line to distinguish that to a point every body agrees.
Contributor
nox commented on Mar 6, 2018
@pythoneer Because the name is what the vendor decided, with a leading underscore because of nondescript limitations of C.
Contributor
pythoneer commented on Mar 6, 2018
@nox and the explicit goal of stdsimd is to expose this (however defect) vendor defined interface.
Contributor
alexreg commented on Mar 6, 2018
@nox and the explicit goal of stdsimd is to expose this (however defect) vendor defined interface.
Interface, sure, but not necessarily the naming conventions!
I don't know if it's too late to still tune things here, but the original RFC had two features that were changed during the discussion over there:
- the submitted RFC put all intrinsics in
std::arch::*
, the revised RFC instd::arch::{arch_name}
. - the submitted RFC used
is_feature_detected!
for run-time feature detection, the revised RFC usesis_{arch_name}_feature_detected!
The RFC was accepted before those changes were made. The changes were made in the RFC at the end of February, implemented at the beginning of March, and the FCP went through mid April. Right now we have ~2 month of experience with these changes
In any case, going through the RFC, I cannot pin point any concrete argument about why:
- the intrinsics of each architecture should be in a different
std::arch::{arch_name}
module, - the architecture name should be part of the
is_..._feature_detected!
macros.
In particular, std::arch
only contains one single module, the one of the current architecture, and that's it. Also, there is only one is_..._feature_detected!
macro re-exported, the one of the current architecture.
These last-minute changes make it more painful than necessary to write code even for x86
, where one has to:
#[target_feature(enabled = "sse3")] unsafe fn foo() { #[cfg(target_feature = "x86")] use core::arch::x86::*; #[cfg(target_feature = "x86_64")] use core::arch::x86_64::*; /* ... */ }
all over the place, or at the top level, to avoid having to do this all over the place. Things don't get better when targeting multiple architectures. What before was horrible:
#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "sse4.2"))] #[cfg_attr(any(target_arch = "arm", target_arch = "aarch64"), target_feature(enable = "neon"))] unsafe foo() { use core::arch::*; #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] { if is_feature_detected!("avx2") { ... } else { ... } } #[cfg(any(target_arch = "arm", target_arch = "aarch64"))] { if is_feature_detected!("crypto") { ... } else { ... } } }
now is worse:
#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "sse4.2"))] #[cfg_attr(any(target_arch = "arm", target_arch = "aarch64"), target_feature(enable = "neon"))] unsafe foo() { #[cfg(target_arch = "x86")] use core::arch::x86::*; #[cfg(target_arch = "x86_64")] use core::arch::x86_64::* #[cfg(target_arch = "arm")] use core::arch::arm::*; #[cfg(target_arch = "aarch64")] use core::arch::aarch64::*; #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] { if is_x86_feature_detected!("crypto") { ... } else { ... } } #[cfg(target_arch = "arm")] { if is_arm_feature_detected!("crypto") { ... } else { ... } } #[cfg(target_arch = "aarch64")] { if is_aarch64_feature_detected!("crypto") { ... } else { ... } } }
This is particularly worrying if we want to add new "feature sets" for ergonomics like simd128
and simd256
since before the changes the above would just become:
#[target_feature(enable = "simd128")] unsafe foo() { use core::arch::*; if is_feature_detected!("crypto") { ... } else { ... } }
I remember that to me they sounded like a potentially good idea back then, so I did not gave them more thought (I was more in the "I want SIMD now" mood). But now that the love story has faded and I've had the chance to use them a couple of times, I've clashed against them every single time:
- inside
coresimd
: in thestd
docs, in the portable vector types, in run-time feature detection, and many more... - inside
is_sorted
: here, amongst others... - again yesterday night while porting aobench to Rust simd
Anyways, can somebody summarize why doing those two changes were a good idea?
In particular for the first change of putting the intrinsics in std::arch::{arch_name}
, AFAIK we are never going to add more modules to std::arch
because that would mean that the current code is being compiled for two archs at the same time, and in that case, one arch shouldn't be able to access the intrinsics of the other anyways. For the run-time feature detection macros, the benefits are smaller (but still there), since each arch has different intrinsics. But one idiom I would like to use is:
#[cfg(target_arch = "arm")] #[target_feature(enable = "simd128")] unsafe fn bar() { ... } #[cfg(target_arch = "aarch64")] #[target_feature(enable = "simd128")] unsafe fn bar() { ... } #[cfg(target_arch = "x86_64")] #[target_feature(enable = "simd128")] unsafe fn bar() { ... } fn foo() { if is_feature_detected("simd128") { bar() } else { fallback() } }
and the named macros wouldn't allow that.
There are two ways of fixing this in a backwards compatible way:
- re-exporting all of
std::arch::{arch_name}::*
via, e.g.,std::arch::current::*
- adding a
is_feature_detected!("...")
macro that dispatches to the named ones depending on the architecture.
So I don't think we should block landing this on these ergonomic issues. In any case, I don't feel I understand the real reasons behind the change, so maybe adding these conveniences defeats their purpose.
cc @alexcrichton @rkruppe @eddyb @hsivonen @BurntSushi @Ericson2314 (those who had opinions about this in the RFC)
Member
Author
alexcrichton commented on May 8, 2018
@gnzlbg this was something I forgot about in the original RFC personally. In the standard library anything that isn't portable currently stylistically requires the "non portable part of it" to appear in the path you use
it. For example Windows-specific functionality is at std::os::windows
. Following suit for SIMD, architecture-specific intrinsics, was natural to place in submodules of std::arch
as a warning that what you're using is indeed not portable and specific to only one platform.
The name of the macro was the same rationale, ensuring that you aren't tricked to thinking it can be invoked in a portable context but rather explicitly specifying that it's not portable.
Contributor
parched commented on May 9, 2018
In the standard library anything that isn't portable currently stylistically requires the "non portable part of it" to appear in the path you
use
it. For example Windows-specific functionality is atstd::os::windows
. Following suit for SIMD, architecture-specific intrinsics, was natural to place in submodules ofstd::arch
as a warning that what you're using is indeed not portable and specific to only one platform.
Is this something that will be covered with the new portability lint? Also, by that rationale, should everything in std::arch
be in target feature submodules?
Member
Author
alexcrichton commented on May 9, 2018
@parched ideally, yes! If that exists we could perhaps consider moving everything wholesale to different modules.
Contributor
gnzlbg commented on May 9, 2018
we could perhaps consider moving everything wholesale to different modules.
For x86
/x86_64
this should be easily doable since we already do this internally in stdsimd
. For other platforms we can do this in a best effort basis.
Contributor
vks commented on May 23, 2018
core::simd::FromBits
still points to this issue. Shouldn't it point to an open issue?
Contributor
gnzlbg commented on May 29, 2018
So should we do the changes? (add is_x86_64_feature_detected
, expose the feature submodules instead of all intrinsics directly, ...) We don't have much time to do this if we want to, and I could do this on Friday this week.
Member
Author
alexcrichton commented on May 30, 2018
Er sorry I misread, I think. I do not think we should change anything. Perhaps one day intrinsic can live directly in std::arch
and be easier to use with the portability lint, but don't have the portability lint.
Is there any word on when we can stabilize instrinsics like https://doc.rust-lang.org/core/arch/x86_64/fn.cmpxchg16b.html ?
I am running into some issues implementing some lockfree algorithms without it.
Contributor
comex commented on Aug 7, 2020
Would stabilizing AtomicU128
(theoretically tracked in #32976) satisfy your use case, or is there some reason you specifically need the x86 intrinsic?
xacrimon commented on Aug 7, 2020 •
That would do it as long as it has weak compare and exchange or compare and swap. I really just need a 128 bit compare and swap to fit a pointer and refcount. How is that implemented on archs like spark and ppc that don't support it that easily. LL/SC?
Contributor
Amanieu commented on Aug 7, 2020
AtomicU128
will only be available on targets that support it. AFAIK that's only x86_64 and AArch64.
Ah, it could be theoretically implemented with doublewidth LL/SC on other architectures I think. Is that a possible thing to do?
Contributor
Amanieu commented on Aug 7, 2020
Only AArch64 has 2x64-bit LL/SC.
Are the half-precision x86/64 functions intended to remain unstable? The compiler errors and the documentation points to this issue, but it was closed quite a while ago along with the stabilization PR.
EDIT: I also noticed that the f16c
feature isn't reported in CARGO_CFG_TARGET_FEATURE
in the stable compiler when it's explicitly requested: RUSTFLAGS="-C target-cpu=x86-64 -C target-feature=+sse3,+sse4.1,+avx,+f16c" cargo test
. However, it does show up in nightly.
Contributor
Amanieu commented on Sep 1, 2020
I think someone just needs to send a stabilization PR for that feature. But first we need to ensure that all the intrinsics covered by the f16c feature are properly implemented.
Any updates on stabilizing the F16C instructions?
Contributor
Amanieu commented 22 days ago
@novacrazy I don't think there's anything blocking F16C intrinsics, feel free to send a stabilization PR for them.
Member
frewsxcv commented 15 days ago
There are four occurrences of #[unstable(feature = "stdsimd", issue = "48556")]
in the codebase (this issue number is 48556). This seems to conflict with the fact that this issue is closed. Should these occurrences be referencing a different issue? See also: #76412
Contributor
Amanieu commented 9 days ago
I'm going to reopen this issue. SIMD was only stabilized on x86/x86_64, not on other architectures.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK