

Implement dynamic byte-swizzle prototype by workingjubilee · Pull Request #334 ·...
source link: https://github.com/rust-lang/portable-simd/pull/334
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Conversation
Contributor
This is meant to be an example that is used to test a Rust intrinsic against, which will replace it. The interface is fairly direct and doesn't address more nuanced or interesting permutations one can do, nevermind on types other than bytes.
The ultimate goal is for direct LLVM support for this.
Contributor
Author
The API isn't perfect as we will want to figure out something so that, if we do indeed want to support this function beyond an N of 128~256, we can have this use 16-bit indices instead like RVV allows. However, that's not necessarily the biggest concern, and this is too important as a functionality to worry about "what if it needs two functions, one for each index type?" Then it needs two functions! We will endure. |
Are 16 bit indices as universal as 8 bit? |
{ |
||
/// Swizzle a vector of bytes according to the index vector. |
||
/// Indices within range select the appropriate byte. |
||
/// Indices "out of bounds" instead select 0. |
Maybe add a note that this really needs build-std
to work correctly
Contributor
Author
Well, it kinda doesn't, does it? The vectors are generic, so this will get instantiated at compile time. What it needs is to be combined with target_feature
configuration, either dynamic multiversioning or compile-time versioning or whatnot.
The cfg
s will depend on the features std is built with, unfortunately
Contributor
Author
Oooh good point hmmmmmm...
...I guess I could make this dynamically multiversioned lol.
Contributor
Author
That part, at least, will be fixed upon promoting this into an intrinsic.
Contributor
Author
Hmm. Honestly, 16-bit indices are only used, afaik, by RISC-V's Vector extension. So I think saying "nah, use a target intrinsic for that" would be fair. It's mostly "if we find a way to seamlessly transition to larger indices, that would be cool". |
afaict avx512 supports 16-bit indexes for SimpleV supports 8/16/32/64-bit indexes iirc. |
Contributor
Author
The AVX512 instruction, VPERMI2{W,D,Q} doesn't really matter to the abstract operation we're defining because that instruction uses indices that have the same size as the type, because it overwrites the index vector with the results (subject to the mask). And that's to be expected for AVX512F because without the AVX512BW extension, you can't do byte-level operations at all. But we don't really care about that because destructive update on the indices is a pretty unusual pattern, a form using that instruction should be yielded by combining it with a masked store (or the intrinsic, obv), and what is actually relevant for what we're doing is whether a u8 will be big enough. |
As funny as it would be to package a CPUID implementation to handle the which-AVX-version stuff, I think I am gonna skip it for now. I could have completely skipped implementing this "up here" at the "tip", but I wanted to have full testing in our suite against proptest, first, which is very good at finding counterfactuals.
// This is ordering sensitive, and LLVM will order these how you put them. |
||
// Most AVX2 impls use ~5 "ports", and only 1 or 2 are capable of permutes. |
||
// But the "compose" step will lower to ops that can also use at least 1 other port. |
||
// So this tries to break up permutes so composition flows through "open" ports. |
||
// Comparative benches should be done on multiple AVX2 CPUs before reordering this |
Contributor
Author
Having all this commentary here isn't strictly necessary but I'm going to transplant more-or-less the same remarks into rustc (and maybe into LLVM???) later, so writing this down matters.
crates/core_simd/src/swizzle_dyn.rs
Show resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
No one assigned
None yet
No milestone
Successfully merging this pull request may close these issues.
None yet
Recommend
-
12
Copy link Contributor workingjubilee commented...
-
14
New issue Add saturating abs/neg #87 Merged ...
-
7
Copy link Contributor workingjubilee...
-
5
Copy link Contributor workingjubilee commented...
-
6
Copy link Contributor workingjubilee
-
3
Copy link rfcbot commented
-
5
Copy link Contributor workingjubilee...
-
9
Copy link Contributor workingjubilee ...
-
13
Copy link Contributor workingjubilee...
-
4
Is "Not intended to be called" any different from "it is UB to be called that way"? If 0 and -1 are the only possible values, the "returns the most significant bit (MSB...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK