3

Add `default_union_representation` lint by jubnzv · Pull Request #8289 · rust-la...

 3 years ago
source link: https://github.com/rust-lang/rust-clippy/pull/8289
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Copy link

Contributor

jubnzv commented on Jan 15

edited

Closes #8235

changelog: Added a new lint [default_union_representation]

Copy link

Collaborator

rust-highfive commented on Jan 15

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @camsteffen (or someone else) soon.

Please see the contribution instructions for more information.

Copy link

Collaborator

camsteffen commented on Jan 15

I think if we have a lint that unilaterally enforces repr(C) for unions (or other types), it would have to be restriction.

Copy link

Contributor

Author

jubnzv commented on Jan 15

Well, this lint is implemented as suggested in #8235. It enforces the user to explicitly set any #[repr] attribute for the unions, to make sure he understands their memory layout.

I personally don't like in this check that the user will have to set #[allow(unspecified_layout_union)] on each union which have the default layout, because it is not possible to explicitly set #[repr(Rust) in the code. But how can it be implemented better?

Copy link

Collaborator

camsteffen commented on Jan 16

I personally don't like in this check that the user will have to set #[allow(unspecified_layout_union)] on each union which have the default layout

Agreed.

But how can it be implemented better?

Other than making it a restriction lint, I don't know.

Do you have a particular motivation for adding this lint? Maybe that would help us decide how the lint should work. Or, if you're just looking for an issue to get started with contributing to Clippy, maybe I could suggest a different issue to start with?

Copy link

Contributor

Author

jubnzv commented on Jan 16

edited

Or, if you're just looking for an issue to get started with contributing to Clippy, maybe I could suggest a different issue to start with?

Yes, I'm just looking for tasks to start contributing to Clippy. I would really appreciate, if you can advice some issues.
Maybe there is something that would be more useful for the project right now?

Do you have a particular motivation for adding this lint? Maybe that would help us decide how the lint should work.

I think, this lint could be useful in two kinds of projects:

  • Embedded systems, in particular, the implementation of various communication protocols. If they use structures to describe packages, that are transmitted in the binary format, we always need to know their exact layout.
  • Something with FFI and interaction with C code.

Most likely, in other projects, this lint will almost always be globally suppressed.

I haven't written such things on Rust yet, so this is just my guess. Probably, @5225225 can provide a better rationale for this.

Copy link

Contributor

5225225 commented on Jan 16

edited

My reasoning behind it is I see people assuming that unions are an okay way to transmute between two types, even without a #[repr(C)] annotation.

Which is wrong, you must never read from a variant that hasn't been written to at least once (and if your type has validity invariants, this statement might be stronger, you must never read from a variant that isn't the most recently written one, if the two variants overlap but not fully).

Though now that I look at it, simply warning on all unions that have no #[repr(C)] might be wrong (Since that would call MaybeUninit and other unions like it wrong).

What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants. Since that would allow the definition of MaybeUninit, but not allow

union FloatOrInt {
    a: f64,
    b: u64,
}

(which is not correct)

One bug that this lint would have caught is https://github.com/ImageOptim/libimagequant/blob/e99d92ad9abc08c8c912f8d3dc19bc054afb468e/src/hist.rs#L78-L82 , where the union here must be laid out overlapping, and if you don't, it's UB.

Copy link

Collaborator

camsteffen commented on Jan 16

edited

I'm generally wary of lints along the lines of "there's nothing wrong with this code, but did you realize...". I just don't think that's what lints are for. Lints should detect code that is deterministically wrong or sub-optimal.

What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants.

I think you mean "fields" instead of "variants"? This would eliminate some cases from the lint, but largely the problem still remains.

Yes, I'm just looking for tasks to start contributing to Clippy. I would really appreciate, if you can advice some issues.
Maybe there is something that would be more useful for the project right now?

In general the ones labelled good-first-issue are good (but I know you already tried that smile). Some specific ones that look good to me: #8282, #8212, #8215.

Maybe others will have a different view. @rust-lang/clippy?

Copy link

Contributor

5225225 commented on Jan 16

I'm generally wary of lints along the lines of "there's nothing wrong with this code, but did you realize...". I just don't think that's what lints are for. Lints should detect code that is deterministically wrong or sub-optimal.

Well, there could be something wrong with the code. It depends how the union is used.

In the PR above for libimagequant, there was definitely a bug there. I'm not sure clippy is powerful enough to determine stuff like "Oh, you wrote to this field of the union, and then you read from a different one, that's UB" (especially if it was cross function, as it might be).

It could maybe detect union fields that are never written to/have a pointer taken, but are read from? That might catch some cases, but feels like it might have too many false negatives.

And in any case, there's plenty of existing lints for "there might not be something wrong with this code, but there probably is something wrong here.", as I understand it, that's what suspicious is for. Though this lint might belong in pedantic.

Searching down the list of non-repr(C) unions found by doing https://github.com/search?o=desc&p=1&q=%22pub+union%22+language%3Arust&s=indexed&type=Code , it does find a few cases that are definitely unsound, and some that are probably fine

Non-repr(C) unions that have multiple non-ZST fields are rare, but I found these 4 cases

1 is definitely unsound, simply create a register with the int field init, and then call .get_f32()
2 looks sound, it's externally tagged as to what variant is active. So this would be a false positive.
3.... looks unsound but not for this reason, they're using a non-#[repr(C)] thing in what looks to be FFI. So.... half positive? We're right to lint here, but for the wrong reasons.
4 looks like a similar case to 2, externally tagged.

So 1.5/4 on being correct, but also this lint really wouldn't fire that often. 2.5/5 if we count the issue I fixed.

Copy link

Contributor

Author

jubnzv commented on Jan 17

What might be a more useful lint is to look for all unions that have at least 2 non-ZST variants.

I agree with this point. Unions with a single non-ZST field have layout the same as the layout of that field. I updated PR to make the lint only checks unions with at least two non-ZST fields.

I also improved the lint to make it warn about any repr other than C, because we don't actually know the layout of the union, if it uses the default (#[repr(Rust)]) layout.

In general the ones labelled good-first-issue are good (but I know you already tried that smile). Some specific ones that look good to me:

Thank you! I managed to localize #8282 in the Clippy code. It takes some time to fix it, because I'm new to the rustc internals.

Copy link

Contributor

5225225 commented on Jan 17

By the way, the unsafe code guidelines is not what is promised.

It's what they think should be promised.

It says this both at the top of that page, and in the introduction.

It is currently legal for a compiler to add padding into a single field type of any kind, including unions.

Copy link

Contributor

5225225 commented 29 days ago

edited

I'd also add some tests for repr(transparent) unions (both with / without ZST fields, as well as multiple non-ZSTs)

granted, you can't have a repr transparent union with multiple non-ZST fields, but we shouldn't lint if they do that.

Copy link

Member

flip1995 commented 29 days ago

Maybe others will have a different view.

I didn't read the whole thread, but I see parallels with the recently discussed non_send_fields_in_send_ty lint. You have to write unsafe code in order to do the thing the lint complains about.

In order to get the UB described in the lint description, you have to read from the union. And in order to do that you have to write unsafe code, which already is a barrier.

So all this lint would do is pointing to the documentation of unions in Rust. I agree with @camsteffen that this is not really the point of lints. If we would add this, it would have to be restriction, because having a union without a repr attr is totally fine by itself, which means that this lint would restrict the language.

Copy link

Collaborator

camsteffen commented 29 days ago

which means that this lint would restrict the language.

Oh! exploding_head

Copy link

Contributor

5225225 commented 29 days ago

I disagree with putting this lint in restriction, seeing as it does find soundness bugs.

And you probably wouldn't have written the bugs if you knew to look out for them, and in order to look out for them using this lint if it was in restriction, you'd have to explicitly name this lint. (As in, the only case the lint would be useful is if you're in a union heavy crate and you know you messed this up in the past, but even then, simply being aware of it makes it pretty easy to spot)

After all, other lints for things that are often perfectly sound but might indicate a bug exist in pedantic, like cast_ptr_alignment

Copy link

Collaborator

camsteffen commented 28 days ago

I disagree with putting this lint in restriction, seeing as it does find soundness bugs.

Soundness bugs are indeed serious bugs. But severity is not enough. It has to be a likely bug in order to be pedantic or suspicious. To say that union by itself is "likely a bug" is a far reach.

We get a lot of opportunities to compromise on "drawing the line" like this. If we don't take a hard stance, Clippy gets very noisy very quickly.

And you probably wouldn't have written the bugs if you knew to look out for them

This is what documentations is for, not lints. Lints are for telling you when you've already made a detectable mistake. And like flip1995 said the coder is already given a warning with unsafe blocks.

After all, other lints for things that are often perfectly sound but might indicate a bug exist in pedantic, like cast_ptr_alignment

That lint detects a code pattern that is suspicious in its own right. But if the lint isn't correct at least for a majority of cases (preferably a very large majority) in practice, then it shouldn't be pedantic or suspicious.

Copy link

Contributor

5225225 commented 28 days ago

Okay, suggestion:

We land this lint as-is, in restriction. Still only warning on more likely to be incorrect usages (not linting for literally "you used a union", but a union with multiple non-ZST fields which aren't forced to overlap by a repr(C))

And then a new lint is made, with correctness level, looking for non-repr C unions that have a field being read from, that definitely couldn't have been init at that point in the code. Probably leaving it open ended as to exactly what it lints on, but only linting on stuff that's definitely UB (depending on layout)

(Say, you make a union specifying field a, you pass a pointer to field b to some FFI function, and then you read field c, all in the same fn. Or you have a union that had any field that is not public, is never written to/has an offset taken/has a pointer taken, but is read from)


And in the mean time, I go fix the union docs warning people about this, because their example code has UB, and is doing the exact thing this lint complains about.

Copy link

Member

flip1995 commented 28 days ago

I disagree with putting this lint in restriction, seeing as it does find soundness bugs.

My issue with this lint is, that it doesn't find unsoundness bugs, but rather tells you that there might be or might not be unsoundness in your code. It doesn't even tell you where there might be unsoundness, it just spells out the documentation for you again and sends you off on a journey to find the unsoundness yourself.

And then a new lint is made, with correctness level, looking for non-repr C unions that have a field being read from, that definitely couldn't have been init at that point in the code. Probably leaving it open ended as to exactly what it lints on, but only linting on stuff that's definitely UB (depending on layout)

That would also be my suggestion. We shouldn't lint on the union type, but on the union read. Only linting on something that is "definitely UB" might be very hard or even impossible (if it were possible, shouldn't the compiler error on this?). But I can see a lint in pedantic that lints all reads of non-repr(C) unions.

Copy link

Contributor

Author

jubnzv commented 27 days ago

edited

I'd also add some tests for repr(transparent) unions (both with / without ZST fields, as well as multiple non-ZSTs)
granted, you can't have a repr transparent union with multiple non-ZST fields, but we shouldn't lint if they do that.

Thank you, that's a good idea. I added tests for #[repr(transparent)].

If we would add this, it would have to be restriction, because having a union without a repr attr is totally fine by itself, which means that this lint would restrict the language.

Well, that sounds reasonable, because we don't actually check if we the use of these unions in unsafe blocks. I updated the level of the lint to restriction.

The only thing that confuses me, is why the #[repr(Rust)] is present in the documentation, but cannot be explicitly specified by the user. Is this an intentional design decision?

Copy link

Collaborator

camsteffen commented 26 days ago

Okay so I think we're on the same page that this lint would have to be restriction. But now I have to ask is this lint worth it as a restriction lint? If the motivation for the lint were "I want to enforce that all unions in my crate have a non-default repr", then this lint would make perfect sense. But rather the motivation seems to be "let's make an attempt to catch unsoundness bugs". And we know that restriction lints are not so good for catching bugs. The former motivation sounds plausible, but I don't want to blindly assume that use case exists.

The only thing that confuses me, is why the #[repr(Rust)] is present in the documentation, but cannot be explicitly specified by the user. Is this an intentional design decision?

That confused me too so it should probably be clarified. :) But yes I'm pretty sure it's intentional - it's like pseudo code for documentation purposes only.

Copy link

Contributor

Author

jubnzv commented 22 days ago

edited

But now I have to ask is this lint worth it as a restriction lint? If the motivation for the lint were "I want to enforce that all unions in my crate have a non-default repr", then this lint would make perfect sense.

I see the motivation for this lint as: "I want to ensure that all unions have a defined layout, so I can't get unexpected behavior in unsafe blocks". We have already seen the examples of potential errors that could be fixed with this lint.

Copy link

Member

flip1995 commented 22 days ago

I have a pretty low bar for restriction lints like this. However, the lint documentation should not claim, that it finds unsoundness bugs, but rather that it enforces a repr for unions (which side effect may be to fix unsoundness bugs). I think the current documentation conveys this quite well.

If some people want a lint that restricts the language for them, like this lint does AND the lint helps with finding real world problems, I think it is fine to include it in Clippy.

Copy link

Collaborator

@camsteffen camsteffen left a comment

Let's go ahead with this then. Just some minor feedback.

}

}

fn has_c_repr_attr(cx: &LateContext<'_>, hir_id: HirId) -> bool {

Copy link

Collaborator

@camsteffen camsteffen 22 days ago

I think the lint should allow any repr, not just repr(C).

Copy link

Contributor

Author

@jubnzv jubnzv 22 days ago

edited

Even if the union has some alignment attribute (like align or packed), it still has no any guarantees about the layout, if the default Rust representation is used. I think, those cases more likely will contain bugs, because the user may want to set a specific layout using only these attributes.

Maybe we should leave this as it is in order to find such errors?

Copy link

Collaborator

@camsteffen camsteffen 22 days ago

Gotcha. Let's just add a comment somewhere that C is the only non-default representation for unions.

Copy link

Contributor

Author

@jubnzv jubnzv 22 days ago

We have the following description:

What it does

Displays a warning when a union is declared with the default representation (without a #[repr(C)] attribute)

Should we explain this in more detail?

Copy link

Collaborator

@camsteffen camsteffen 22 days ago

I think it's okay but you're welcome to expand that.

Copy link

Contributor

Author

@jubnzv jubnzv 21 days ago

Well, I'm not sure how best to expand this so as not to annoy the user with unnecessary repetitions. I think, it's enough, because if the user wants to restrict the language, they at least understand which layouts are possible in Rust.

jubnzv

changed the title Add unspecified_layout_union lint

Add default_union_representation lint

22 days ago

Copy link

Collaborator

@camsteffen camsteffen left a comment

Looks good! Please squash commits.

Copy link

Collaborator

camsteffen commented 19 days ago

edited

@jubnzv and @5225225 thank you for your patience as we worked through this.

@bors r+

(edit: oops I had tagged the wrong person)

Copy link

Contributor

bors commented 19 days ago

pushpin Commit b7000b2 has been approved by camsteffen

Copy link

Contributor

bors commented 19 days ago

hourglass Testing commit b7000b2 with merge 7ceffde...

bors

merged commit 7ceffde into

rust-lang:master 19 days ago

8 checks passed

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK