Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement special swizzles for masks and remove {to,from}_bitmask_vector #423

Merged
merged 3 commits into from
Jun 5, 2024

Conversation

calebzulawski
Copy link
Member

Bitmask vectors are a bit of a compromise, to allow bitmasks to be useful on vectors with length >64. They are wasteful, since they are Simd<u8, N>, 8x bigger than they need to be in order to work well with generics.

Additionally, in #422, I'm finding that bitmask vector codegen doesn't work for non-powers-of-two, and even with a workaround, seems to have incorrect codegen on a handful of architectures.

In this PR I add an extract swizzle, and implement all of the special swizzles for masks, which I believe allows us to remove bitmask vectors. With a hypothetical 128-element vector, you could do:

let bitmasks: [u64; 2] = [mask.to_bitmask(), mask.extract::<64, 64>().to_bitmask()];

Copy link
Member

@programmerjake programmerjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good enough, though I think we should re-add conversions from arbitrary sized Mask from/to integers to make >64 element vectors more ergonomic, maybe when Rust finally gains uint<N> types.

@calebzulawski calebzulawski merged commit 8c31005 into master Jun 5, 2024
64 checks passed
@calebzulawski calebzulawski deleted the bitmask-again-again-again branch June 5, 2024 23:46
@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2024

Additionally, in #422, I'm finding that bitmask vector codegen doesn't work for non-powers-of-two, and even with a workaround, seems to have incorrect codegen on a handful of architectures.

Is this a bug in the code, or in codegen? If the code behaves correctly with Miri but gives the wrong result with codegen then that seems like a critical bug to me that needs investigation.

unsafe {
// Compute the bitmask
let mut bytes: <LaneCount<N> as SupportedLaneCount>::BitMask =
core::intrinsics::simd::simd_bitmask(self.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This intrinsic still has some other uses... are those working fine with non-power-of-2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered here:

We still use the intrinsic, but currently (not yet synced to rust-lang/rust) we only use integer return types

@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2024

looks good enough, though I think we should re-add conversions from arbitrary sized Mask from/to integers to make >64 element vectors more ergonomic, maybe when Rust finally gains uint<N> types.

Are vectors with more than 64 elements a thing? Miri currently supports simd_bitmask and simd_select_bitmask only for vectors up to size 64, which I thought was sufficient since this crate also only goes up to 64.

@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2024

Additionally, in #422, I'm finding that bitmask vector codegen doesn't work for non-powers-of-two, and even with a workaround, seems to have incorrect codegen on a handful of architectures.

Did things break only on big-endian or also on little-endian?
If the trouble was big-endian-only, it could be related to the behavior of simd_bitmask not matching what to_bitmask_vector expected -- see rust-lang/rust#126171.

@programmerjake
Copy link
Member

Are vectors with more than 64 elements a thing?

yes, e.g. RISC-V V theoretically supports vectors with thousands of elements. NEC's SX-Aurora supports f64x64 so presumably supports f32x128 too (though idk for sure).

Miri currently supports simd_bitmask and simd_select_bitmask only for vectors up to size 64, which I thought was sufficient since this crate also only goes up to 64.

portable-simd only supports 64 elements only because that's where we stopped for now since we currently have to implement some things for each size manually, the long term plan is for the max size to be much larger. This is kinda like how Rust used to only implement Debug for arrays with <= 32 elements, but only because the compiler wasn't good enough yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants