

Add `default_union_representation` lint by jubnzv · Pull Request #8289 · rust-la...
source link: https://github.com/rust-lang/rust-clippy/pull/8289
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @camsteffen (or someone else) soon.
Please see the contribution instructions for more information.
I think if we have a lint that unilaterally enforces repr(C)
for unions (or other types), it would have to be restriction.
Well, this lint is implemented as suggested in #8235. It enforces the user to explicitly set any #[repr]
attribute for the unions, to make sure he understands their memory layout.
I personally don't like in this check that the user will have to set #[allow(unspecified_layout_union)]
on each union which have the default layout, because it is not possible to explicitly set #[repr(Rust)
in the code. But how can it be implemented better?
I personally don't like in this check that the user will have to set #[allow(unspecified_layout_union)] on each union which have the default layout
Agreed.
But how can it be implemented better?
Other than making it a restriction lint, I don't know.
Do you have a particular motivation for adding this lint? Maybe that would help us decide how the lint should work. Or, if you're just looking for an issue to get started with contributing to Clippy, maybe I could suggest a different issue to start with?
Or, if you're just looking for an issue to get started with contributing to Clippy, maybe I could suggest a different issue to start with?
Yes, I'm just looking for tasks to start contributing to Clippy. I would really appreciate, if you can advice some issues.
Maybe there is something that would be more useful for the project right now?
Do you have a particular motivation for adding this lint? Maybe that would help us decide how the lint should work.
I think, this lint could be useful in two kinds of projects:
- Embedded systems, in particular, the implementation of various communication protocols. If they use structures to describe packages, that are transmitted in the binary format, we always need to know their exact layout.
- Something with FFI and interaction with C code.
Most likely, in other projects, this lint will almost always be globally suppressed.
I haven't written such things on Rust yet, so this is just my guess. Probably, @5225225 can provide a better rationale for this.
My reasoning behind it is I see people assuming that unions are an okay way to transmute between two types, even without a #[repr(C)]
annotation.
Which is wrong, you must never read from a variant that hasn't been written to at least once (and if your type has validity invariants, this statement might be stronger, you must never read from a variant that isn't the most recently written one, if the two variants overlap but not fully).
Though now that I look at it, simply warning on all unions that have no #[repr(C)]
might be wrong (Since that would call MaybeUninit and other unions like it wrong).
What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants. Since that would allow the definition of MaybeUninit, but not allow
union FloatOrInt { a: f64, b: u64, }
(which is not correct)
One bug that this lint would have caught is https://github.com/ImageOptim/libimagequant/blob/e99d92ad9abc08c8c912f8d3dc19bc054afb468e/src/hist.rs#L78-L82 , where the union here must be laid out overlapping, and if you don't, it's UB.
I'm generally wary of lints along the lines of "there's nothing wrong with this code, but did you realize...". I just don't think that's what lints are for. Lints should detect code that is deterministically wrong or sub-optimal.
What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants.
I think you mean "fields" instead of "variants"? This would eliminate some cases from the lint, but largely the problem still remains.
Yes, I'm just looking for tasks to start contributing to Clippy. I would really appreciate, if you can advice some issues.
Maybe there is something that would be more useful for the project right now?
In general the ones labelled good-first-issue are good (but I know you already tried that ). Some specific ones that look good to me: #8282, #8212, #8215.
Maybe others will have a different view. @rust-lang/clippy?
I'm generally wary of lints along the lines of "there's nothing wrong with this code, but did you realize...". I just don't think that's what lints are for. Lints should detect code that is deterministically wrong or sub-optimal.
Well, there could be something wrong with the code. It depends how the union is used.
In the PR above for libimagequant, there was definitely a bug there. I'm not sure clippy is powerful enough to determine stuff like "Oh, you wrote to this field of the union, and then you read from a different one, that's UB" (especially if it was cross function, as it might be).
It could maybe detect union fields that are never written to/have a pointer taken, but are read from? That might catch some cases, but feels like it might have too many false negatives.
And in any case, there's plenty of existing lints for "there might not be something wrong with this code, but there probably is something wrong here.", as I understand it, that's what suspicious
is for. Though this lint might belong in pedantic
.
Searching down the list of non-repr(C) unions found by doing https://github.com/search?o=desc&p=1&q=%22pub+union%22+language%3Arust&s=indexed&type=Code , it does find a few cases that are definitely unsound, and some that are probably fine
Non-repr(C) unions that have multiple non-ZST fields are rare, but I found these 4 cases
1
is definitely unsound, simply create a register with the int
field init, and then call .get_f32()
2
looks sound, it's externally tagged as to what variant is active. So this would be a false positive.3
.... looks unsound but not for this reason, they're using a non-#[repr(C)]
thing in what looks to be FFI. So.... half positive? We're right to lint here, but for the wrong reasons.4
looks like a similar case to 2
, externally tagged.
So 1.5/4 on being correct, but also this lint really wouldn't fire that often. 2.5/5 if we count the issue I fixed.
What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants.
I agree with this point. Unions with a single non-ZST field have layout the same as the layout of that field. I updated PR to make the lint only checks unions with at least two non-ZST fields.
I also improved the lint to make it warn about any repr
other than C
, because we don't actually know the layout of the union, if it uses the default (#[repr(Rust)]
) layout.
In general the ones labelled good-first-issue are good (but I know you already tried that smile). Some specific ones that look good to me:
Thank you! I managed to localize #8282 in the Clippy code. It takes some time to fix it, because I'm new to the rustc
internals.
By the way, the unsafe code guidelines is not what is promised.
It's what they think should be promised.
It says this both at the top of that page, and in the introduction.
It is currently legal for a compiler to add padding into a single field type of any kind, including unions.
I'd also add some tests for repr(transparent) unions (both with / without ZST fields, as well as multiple non-ZSTs)
granted, you can't have a repr transparent union with multiple non-ZST fields, but we shouldn't lint if they do that.
Maybe others will have a different view.
I didn't read the whole thread, but I see parallels with the recently discussed non_send_fields_in_send_ty
lint. You have to write unsafe
code in order to do the thing the lint complains about.
In order to get the UB described in the lint description, you have to read from the union. And in order to do that you have to write unsafe
code, which already is a barrier.
So all this lint would do is pointing to the documentation of unions in Rust. I agree with @camsteffen that this is not really the point of lints. If we would add this, it would have to be restriction
, because having a union
without a repr
attr is totally fine by itself, which means that this lint would restrict the language.
which means that this lint would restrict the language.
Oh!
I disagree with putting this lint in restriction
, seeing as it does find soundness bugs.
And you probably wouldn't have written the bugs if you knew to look out for them, and in order to look out for them using this lint if it was in restriction
, you'd have to explicitly name this lint. (As in, the only case the lint would be useful is if you're in a union
heavy crate and you know you messed this up in the past, but even then, simply being aware of it makes it pretty easy to spot)
After all, other lints for things that are often perfectly sound but might indicate a bug exist in pedantic
, like cast_ptr_alignment
I disagree with putting this lint in
restriction
, seeing as it does find soundness bugs.
Soundness bugs are indeed serious bugs. But severity is not enough. It has to be a likely bug in order to be pedantic
or suspicious
. To say that union
by itself is "likely a bug" is a far reach.
We get a lot of opportunities to compromise on "drawing the line" like this. If we don't take a hard stance, Clippy gets very noisy very quickly.
And you probably wouldn't have written the bugs if you knew to look out for them
This is what documentations is for, not lints. Lints are for telling you when you've already made a detectable mistake. And like flip1995 said the coder is already given a warning with unsafe blocks.
After all, other lints for things that are often perfectly sound but might indicate a bug exist in
pedantic
, likecast_ptr_alignment
That lint detects a code pattern that is suspicious in its own right. But if the lint isn't correct at least for a majority of cases (preferably a very large majority) in practice, then it shouldn't be pedantic
or suspicious
.
Okay, suggestion:
We land this lint as-is, in restriction
. Still only warning on more likely to be incorrect usages (not linting for literally "you used a union", but a union with multiple non-ZST fields which aren't forced to overlap by a repr(C))
And then a new lint is made, with correctness level, looking for non-repr C unions that have a field being read from, that definitely couldn't have been init at that point in the code. Probably leaving it open ended as to exactly what it lints on, but only linting on stuff that's definitely UB (depending on layout)
(Say, you make a union specifying field a
, you pass a pointer to field b
to some FFI function, and then you read field c
, all in the same fn. Or you have a union that had any field that is not public, is never written to/has an offset taken/has a pointer taken, but is read from)
And in the mean time, I go fix the union
docs warning people about this, because their example code has UB, and is doing the exact thing this lint complains about.
I disagree with putting this lint in
restriction
, seeing as it does find soundness bugs.
My issue with this lint is, that it doesn't find unsoundness bugs, but rather tells you that there might be or might not be unsoundness in your code. It doesn't even tell you where there might be unsoundness, it just spells out the documentation for you again and sends you off on a journey to find the unsoundness yourself.
And then a new lint is made, with correctness level, looking for non-repr C unions that have a field being read from, that definitely couldn't have been init at that point in the code. Probably leaving it open ended as to exactly what it lints on, but only linting on stuff that's definitely UB (depending on layout)
That would also be my suggestion. We shouldn't lint on the union type, but on the union read. Only linting on something that is "definitely UB" might be very hard or even impossible (if it were possible, shouldn't the compiler error on this?). But I can see a lint in pedantic
that lints all reads of non-repr(C)
unions.
I'd also add some tests for repr(transparent) unions (both with / without ZST fields, as well as multiple non-ZSTs)
granted, you can't have a repr transparent union with multiple non-ZST fields, but we shouldn't lint if they do that.
Thank you, that's a good idea. I added tests for #[repr(transparent)]
.
If we would add this, it would have to be restriction, because having a union without a repr attr is totally fine by itself, which means that this lint would restrict the language.
Well, that sounds reasonable, because we don't actually check if we the use of these unions in unsafe blocks. I updated the level of the lint to restriction
.
The only thing that confuses me, is why the #[repr(Rust)]
is present in the documentation, but cannot be explicitly specified by the user. Is this an intentional design decision?
Okay so I think we're on the same page that this lint would have to be restriction. But now I have to ask is this lint worth it as a restriction lint? If the motivation for the lint were "I want to enforce that all unions in my crate have a non-default repr", then this lint would make perfect sense. But rather the motivation seems to be "let's make an attempt to catch unsoundness bugs". And we know that restriction lints are not so good for catching bugs. The former motivation sounds plausible, but I don't want to blindly assume that use case exists.
The only thing that confuses me, is why the
#[repr(Rust)]
is present in the documentation, but cannot be explicitly specified by the user. Is this an intentional design decision?
That confused me too so it should probably be clarified. :) But yes I'm pretty sure it's intentional - it's like pseudo code for documentation purposes only.
But now I have to ask is this lint worth it as a restriction lint? If the motivation for the lint were "I want to enforce that all unions in my crate have a non-default repr", then this lint would make perfect sense.
I see the motivation for this lint as: "I want to ensure that all unions have a defined layout, so I can't get unexpected behavior in unsafe blocks". We have already seen the examples of potential errors that could be fixed with this lint.
I have a pretty low bar for restriction lints like this. However, the lint documentation should not claim, that it finds unsoundness bugs, but rather that it enforces a repr
for unions (which side effect may be to fix unsoundness bugs). I think the current documentation conveys this quite well.
If some people want a lint that restricts the language for them, like this lint does AND the lint helps with finding real world problems, I think it is fine to include it in Clippy.
Let's go ahead with this then. Just some minor feedback.
clippy_lints/src/unspecified_layout_union.rs
Outdated Show resolved
clippy_lints/src/unspecified_layout_union.rs
Outdated Show resolved
clippy_lints/src/unspecified_layout_union.rs
Outdated Show resolved
clippy_lints/src/unspecified_layout_union.rs
Outdated Show resolved
clippy_lints/src/unspecified_layout_union.rs
Outdated Show resolved
clippy_lints/src/unspecified_layout_union.rs
Outdated Show resolved
clippy_lints/src/unspecified_layout_union.rs
Outdated Show resolved
clippy_lints/src/unspecified_layout_union.rs
Outdated Show resolved
}
}
fn has_c_repr_attr(cx: &LateContext<'_>, hir_id: HirId) -> bool {
I think the lint should allow any repr
, not just repr(C)
.
Even if the union has some alignment attribute (like align
or packed
), it still has no any guarantees about the layout, if the default Rust
representation is used. I think, those cases more likely will contain bugs, because the user may want to set a specific layout using only these attributes.
Maybe we should leave this as it is in order to find such errors?
Gotcha. Let's just add a comment somewhere that C is the only non-default representation for unions.
We have the following description:
What it does
Displays a warning when a union is declared with the default representation (without a
#[repr(C)]
attribute)
Should we explain this in more detail?
I think it's okay but you're welcome to expand that.
Well, I'm not sure how best to expand this so as not to annoy the user with unnecessary repetitions. I think, it's enough, because if the user wants to restrict the language, they at least understand which layouts are possible in Rust.
changed the title
Add unspecified_layout_union
lint
Add default_union_representation
lint
Looks good! Please squash commits.
Commit b7000b2 has been approved by
camsteffen
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK