3

Github Split a func into cold/hot parts, reducing binary size by sivadeilra · Pu...

 3 years ago
source link: https://github.com/rust-lang/rust/pull/80042
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Conversation

I noticed that the Size::bits function is called in many places,
and is inlined into them. On x86_64-pc-windows-msvc, this function
is inlined 527 times, and compiled separately (non-inlined) 3 times.

Each of those inlined calls contains code that panics. This commit
moves the panic! call into a separate function and marks that
function with #[cold].

This reduces binary size by 24 KB. Not much, but it's something.
Changes like this often reduce pressure on instruction-caches,
since it reduces the amount of code that is inlined into hot code
paths. Or more precisely, it removes cold code from hot cache lines.

Collaborator

rust-highfive commented on Dec 15, 2020

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @oli-obk (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

Member

camelid commented on Dec 15, 2020

Does this reduce the size of the rustc binary only or every binary?

Author

sivadeilra commented on Dec 15, 2020

Does this reduce the size of the rustc binary only or every binary?

It reduces the size of the rustc_driver-xxx.dll binary. If you're asking about the intermediate .rlib files, then I assume it is going to reduce the size of the ones affected (rustc_target, rustc_middle, rustc_mir, rustc_codegen_llvm, mainly.) But I'm more interested in the size of the final output than in the size of the intermediate files.

Contributor

oli-obk commented on Dec 16, 2020

Collaborator

rust-timer commented on Dec 16, 2020

Awaiting bors try build completion.

Contributor

bors commented on Dec 16, 2020

hourglass Trying commit bc8faf0 with merge 145cc5b...

Contributor

bors commented on Dec 16, 2020

sunny Try build successful - checks-actions
Build commit: 145cc5b (145cc5b50bad8a74379045e8cadab5a126bcae1d)

Collaborator

rust-timer commented on Dec 16, 2020

Queued 145cc5b with parent 99baddb, future comparison URL.

@rustbot label: +S-waiting-on-perf

Collaborator

rust-timer commented on Dec 16, 2020

Finished benchmarking try commit (145cc5b): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

Contributor

bjorn3 commented on Dec 16, 2020

The difference seems to be within noise.

Author

sivadeilra commented on Dec 16, 2020

I'm not expecting a big result from this. But eliminating 500 duplicates of cold panic code seems worth it, right?

Contributor

oli-obk commented on Dec 16, 2020

But eliminating 500 duplicates of cold panic code seems worth it, right?

Indeed. Do you think we'll get the same benefit even when keeping the original checked_mul and unwrap_or_else logic and just using the new function within the unwrap closure?

Author

sivadeilra commented on Dec 18, 2020

But eliminating 500 duplicates of cold panic code seems worth it, right?

Indeed. Do you think we'll get the same benefit even when keeping the original checked_mul and unwrap_or_else logic and just using the new function within the unwrap closure?

Possibly. I can check the generated assembly. I've seen checked_mul generate fairly efficient code, so I'm not worried about that part. Mainly it's the panicking and formatting that can be surprisingly expensive. Not just to execute, but just the code itself can be surprisingly large, which can be painful when it's duplicated many times.

Ideally, we would notice that a particular basic block always leads to a #[cold] path, and would "outline" that BB (prevent it from being inlined, and so prevent it from being dup'd). That would solve the general case. However, that's not something that rustc does automatically at this point. If it is ever added, then a lot of hand-tuning like this could be removed.

Author

sivadeilra commented on Dec 18, 2020

I tried switching the code back to using checked_mul. The generated assembly code is a bit worse; it uses a mul instruction, rather than shifting, and a jo to branch on overflow.

Maybe a better approach would be for the Size struct to contain a u64 that counts the number of bits, not the number of bytes, in a size. That way, the bytes function would just be fn bytes(self) -> u64 { self.bits >> 3 }. No conditionals at all. The constructor (Size::new) would test for overflow. Which would basically never happen -- we'll never have a legit type whose Size is 0x3fff_ffff_ffff_ffff in bytes, that is, a size that would overflow when expressed as a bit-count but not overflow when expressed as a byte-count.

Contributor

oli-obk commented on Dec 18, 2020

Hmm... it seems a bit wrong to store the bits if we can't actually encode the bit size, but must always encode a byte size.

One thing we could try is to make the Size struct use rustc_layout_scalar_valid_range_end to declare that the maximum value must be smaller than u64::max_value() >> 3 and see whether llvm can figure out that taking the raw field will thus allow us to multiply by 8 without causing any problems. This would also have the fun side effect of allowing the free bits of Size to be used in enum optimizations across the compiler. Additionally, we now need to check that the size is smaller than u64::max_value() >> 3 at Size construction time. Not sure if that gains us anything to checking it at the use site.

What do you think of that idea? We can also open a compiler MCP (major change proposal) to make sure that we get the input of the team on this, since it technically could be observed by users, but I don't think in any case in which some code currently compiles and won't compile after this change.

Contributor

bjorn3 commented on Dec 18, 2020

One thing we could try is to make the Size struct use rustc_layout_scalar_valid_range_end to declare that the maximum value must be smaller than u64::max_value() >> 3 and see whether llvm can figure out that taking the raw field will thus allow us to multiply by 8 without causing any problems.

As far as I know LLVM is not told about rustc_layout_scalar_valid_range_end.

Author

sivadeilra commented on Dec 19, 2020

I think using rustc_layout_scalar_valid_range_end is interesting. Option<Size> is used in a few places. Not enough that I think the benefits are huge, but it's always good to enable more layout optimizations.

I think the key thing is to move the validation to the constructor of Size. Everything else is just "how do we represent this". It seems very unlikely that any code relies on constructing a very large Size and then carefully avoids calling my_size.bits() on it; the calls to bits() greatly outnumber the calls to bytes().

Me personally, I think this is below the threshold of the MCP process; the compiler team has much bigger issues facing them. If we make this change (verifying the lower maximum value for Size::from_*), and it turns out that something does depend on it, then we'll get a deterministic failure. That seems to fit with the philosophy of the nightly / beta release process, rather than the MCP process. We can commit it to master, and let it go through the usual stabilization process. If it causes problems, which I think is very unlikely, then it will be easy to undo this change, since it affects only the internals of a type, and not its publicly-visible API.

Contributor

oli-obk commented on Dec 21, 2020

I agree on all points. Do you want to implement this as a replacement for this PR or do you have other steps in mind?

Author

sivadeilra commented 25 days ago

I agree on all points. Do you want to implement this as a replacement for this PR or do you have other steps in mind?

I'll implement this and submit an update to this PR. Sorry for the delay in responding -- holidays, etc.

Author

sivadeilra commented 23 days ago

@oli-obk I think this is ready now.

Contributor

oli-obk commented 20 days ago

Thanks!

@bors r+ rollup

Contributor

bors commented 20 days ago

pushpin Commit 4721b65 has been approved by oli-obk


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK