Github Split a func into cold/hot parts, reducing binary size by sivadeilra · Pu...
source link: https://github.com/rust-lang/rust/pull/80042
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Conversation
I noticed that the Size::bits function is called in many places,
and is inlined into them. On x86_64-pc-windows-msvc, this function
is inlined 527 times, and compiled separately (non-inlined) 3 times.
Each of those inlined calls contains code that panics. This commit
moves the panic!
call into a separate function and marks that
function with #[cold]
.
This reduces binary size by 24 KB. Not much, but it's something.
Changes like this often reduce pressure on instruction-caches,
since it reduces the amount of code that is inlined into hot code
paths. Or more precisely, it removes cold code from hot cache lines.
Collaborator
rust-highfive commented on Dec 15, 2020
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @oli-obk (or someone else) soon.
If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.
Please see the contribution instructions for more information.
Member
camelid commented on Dec 15, 2020
Does this reduce the size of the rustc
binary only or every binary?
Author
sivadeilra commented on Dec 15, 2020
Does this reduce the size of the
rustc
binary only or every binary?
It reduces the size of the rustc_driver-xxx.dll
binary. If you're asking about the intermediate .rlib
files, then I assume it is going to reduce the size of the ones affected (rustc_target
, rustc_middle
, rustc_mir
, rustc_codegen_llvm
, mainly.) But I'm more interested in the size of the final output than in the size of the intermediate files.
Contributor
oli-obk commented on Dec 16, 2020
Collaborator
rust-timer commented on Dec 16, 2020
Awaiting bors try build completion.
Contributor
bors commented on Dec 16, 2020
Contributor
bors commented on Dec 16, 2020
Try build successful - checks-actions
Build commit: 145cc5b (145cc5b50bad8a74379045e8cadab5a126bcae1d
)
Collaborator
rust-timer commented on Dec 16, 2020
Queued 145cc5b with parent 99baddb, future comparison URL.
@rustbot label: +S-waiting-on-perf
Collaborator
rust-timer commented on Dec 16, 2020
Finished benchmarking try commit (145cc5b): comparison url.
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup-
to bors.
Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.
@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf
Contributor
bjorn3 commented on Dec 16, 2020
The difference seems to be within noise.
Author
sivadeilra commented on Dec 16, 2020
I'm not expecting a big result from this. But eliminating 500 duplicates of cold panic code seems worth it, right?
Contributor
oli-obk commented on Dec 16, 2020
But eliminating 500 duplicates of cold panic code seems worth it, right?
Indeed. Do you think we'll get the same benefit even when keeping the original checked_mul
and unwrap_or_else
logic and just using the new function within the unwrap closure?
Author
sivadeilra commented on Dec 18, 2020
But eliminating 500 duplicates of cold panic code seems worth it, right?
Indeed. Do you think we'll get the same benefit even when keeping the original
checked_mul
andunwrap_or_else
logic and just using the new function within the unwrap closure?
Possibly. I can check the generated assembly. I've seen checked_mul
generate fairly efficient code, so I'm not worried about that part. Mainly it's the panicking and formatting that can be surprisingly expensive. Not just to execute, but just the code itself can be surprisingly large, which can be painful when it's duplicated many times.
Ideally, we would notice that a particular basic block always leads to a #[cold]
path, and would "outline" that BB (prevent it from being inlined, and so prevent it from being dup'd). That would solve the general case. However, that's not something that rustc does automatically at this point. If it is ever added, then a lot of hand-tuning like this could be removed.
Author
sivadeilra commented on Dec 18, 2020
I tried switching the code back to using checked_mul
. The generated assembly code is a bit worse; it uses a mul
instruction, rather than shifting, and a jo
to branch on overflow.
Maybe a better approach would be for the Size
struct to contain a u64
that counts the number of bits, not the number of bytes, in a size. That way, the bytes
function would just be fn bytes(self) -> u64 { self.bits >> 3 }
. No conditionals at all. The constructor (Size::new
) would test for overflow. Which would basically never happen -- we'll never have a legit type whose Size
is 0x3fff_ffff_ffff_ffff in bytes, that is, a size that would overflow when expressed as a bit-count but not overflow when expressed as a byte-count.
Contributor
oli-obk commented on Dec 18, 2020
Hmm... it seems a bit wrong to store the bits if we can't actually encode the bit size, but must always encode a byte size.
One thing we could try is to make the Size
struct use rustc_layout_scalar_valid_range_end
to declare that the maximum value must be smaller than u64::max_value() >> 3
and see whether llvm can figure out that taking the raw
field will thus allow us to multiply by 8 without causing any problems. This would also have the fun side effect of allowing the free bits of Size
to be used in enum optimizations across the compiler. Additionally, we now need to check that the size is smaller than u64::max_value() >> 3
at Size
construction time. Not sure if that gains us anything to checking it at the use site.
What do you think of that idea? We can also open a compiler MCP (major change proposal) to make sure that we get the input of the team on this, since it technically could be observed by users, but I don't think in any case in which some code currently compiles and won't compile after this change.
Contributor
bjorn3 commented on Dec 18, 2020
One thing we could try is to make the Size struct use rustc_layout_scalar_valid_range_end to declare that the maximum value must be smaller than u64::max_value() >> 3 and see whether llvm can figure out that taking the raw field will thus allow us to multiply by 8 without causing any problems.
As far as I know LLVM is not told about rustc_layout_scalar_valid_range_end
.
Author
sivadeilra commented on Dec 19, 2020
I think using rustc_layout_scalar_valid_range_end
is interesting. Option<Size>
is used in a few places. Not enough that I think the benefits are huge, but it's always good to enable more layout optimizations.
I think the key thing is to move the validation to the constructor of Size
. Everything else is just "how do we represent this". It seems very unlikely that any code relies on constructing a very large Size
and then carefully avoids calling my_size.bits()
on it; the calls to bits()
greatly outnumber the calls to bytes()
.
Me personally, I think this is below the threshold of the MCP process; the compiler team has much bigger issues facing them. If we make this change (verifying the lower maximum value for Size::from_*
), and it turns out that something does depend on it, then we'll get a deterministic failure. That seems to fit with the philosophy of the nightly / beta release process, rather than the MCP process. We can commit it to master
, and let it go through the usual stabilization process. If it causes problems, which I think is very unlikely, then it will be easy to undo this change, since it affects only the internals of a type, and not its publicly-visible API.
Contributor
oli-obk commented on Dec 21, 2020
I agree on all points. Do you want to implement this as a replacement for this PR or do you have other steps in mind?
Author
sivadeilra commented 25 days ago
I agree on all points. Do you want to implement this as a replacement for this PR or do you have other steps in mind?
I'll implement this and submit an update to this PR. Sorry for the delay in responding -- holidays, etc.
Author
sivadeilra commented 23 days ago
@oli-obk I think this is ready now.
Contributor
oli-obk commented 20 days ago
Thanks!
@bors r+ rollup
Contributor
bors commented 20 days ago
Commit 4721b65 has been approved by oli-obk
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK