6

Turn quadratic time on number of impl blocks into linear time

 3 years ago
source link: https://github.com/rust-lang/rust/pull/78317
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Contributor

est31 commented on Oct 24

edited

Previously, if you had a lot of inherent impl blocks on a type like:

struct Foo;

impl Foo { fn foo_1() {} }
// ...
impl Foo { fn foo_100_000() {} }

The compiler would be very slow at processing it, because
an internal algorithm would run in O(n^2), where n is the number
of impl blocks. Now, we add a new algorithm that allocates but
is faster asymptotically.

Comparing rustc nightly with a local build of rustc as of this PR (results in seconds):

N real time before real time after 4_000 0.57 0.46 8_000 1.31 0.84 16_000 3.56 1.69 32_000 10.60 3.73

I've tuned up the numbers to make the effect larger than the startup noise of rustc, but the asymptotic difference should hold for smaller n as well.

Note: current state of the PR omits error messages if there are other errors present already. For now, I'm mainly interested in a perf run to study whether this issue is present at all. Please queue one for this PR. Thanks!

Collaborator

rust-highfive commented on Oct 24

r? @matthewjasper

(rust_highfive has picked a reviewer for you, use r? to override)

Contributor

bugadani left a comment

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

// faster asymptotic runtime.

if impls.len() < 20 {

for (i, &impl1_def_id) in impls.iter().enumerate() {

for &impl2_def_id in &impls[(i + 1)..] {

bugadani on Oct 24

Contributor

I guess this is how it is and there is no reason to change, but wouldn't cloning the iterator for the inner loop be nicer? There's no need to complicate the iteration when we only need to look ahead, I think.
Am I missing something?

est31 on Oct 24

Author

Contributor

IDK for me as someone coming from C/C++, iterator magic is more complicated than doing things with indices.

bugadani on Oct 24

Contributor

I also have a background in C and I hate working with indices when I can avoid them, they are too much a footgun :)

Contributor

Author

est31 commented on Oct 24

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

Probably quite rare, however note that impl blocks might be distributed across multiple files. IIRC servo has something like this. I might misremember though. There might be no such case in the perf's testsuite at all. I just stumbled onto this code and it felt bad. Anyways, the swap I add above should have an impact for smaller number of impl blocks already. Maybe two perf runs are needed? IDK.

Contributor

bugadani commented on Oct 24

I guess rustc has a ton of impl TyCtxt (I counter 21 + a macro).

Collaborator

rust-timer commented on Oct 24

Awaiting bors try build completion

Contributor

bors commented on Oct 24

hourglass Trying commit c17fe5b with merge 925eba0...

Contributor

Author

est31 commented on Oct 24

I guess rustc has a ton of impl TyCtxt (I counter 21 + a macro).

A single location... won't be much of an impact I guess especially considering the fact that for 20 the effect is still low, at least compared to what the rest of the compiler is doing.

Also note that I pulled the 20 out of my nose. It's been chosen on gut feelings alone. If there is an impact, it might be improved. If there's none, maybe it was chosen wrongly.

Contributor

bors commented on Oct 24

sunny Try build successful - checks-actions
Build commit: 925eba0 (925eba0a75e6e25806ed3a4a654610329bd00254)

Collaborator

rust-timer commented on Oct 24

Contributor

Author

est31 commented on Oct 24

@jonas-schievink nice, someone had the same bright idea as me! I'd attribute the regression in that PR to cases where the brute-force search is better. The most extreme instance of this is probably two impl blocks, a short one followed by a very long one. If the order is right, the current O(n^2) search runs once over one of the short impl block, while looking up into the larger block. The linear algorithm of your PR would put both impl blocks into the hashmap and cause a lot of bureaucracy. In fact, a single long impl block would favour the O(n^2) search even better as.

This gives me lots of ideas to improve the algorithm...

The perf run of your PR shows (next to the regressions) a 2% improvement on syn-opt which is IMO a great achievement. Ten 2% improvements constitute a 19% improvement, 30 such improvements cut the time in half. Very encouraging!

Collaborator

rust-timer commented on Oct 24

Finished benchmarking try commit (925eba0): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

Contributor

Author

est31 commented on Oct 24

Up to 3.1% improvements for incr-unchanged of packed-simd-check... very nice.

Member

Mark-Simulacrum commented on Oct 24

I would not look at incremental benchmarks on PRs like this - they're not likely to be super meaningful as an expression of improvement (or regression) because they are skipping parts of the build, and unless we think this optimization permits skipping more (seems unlikely), the percent win there is just larger because the base is smaller.

It looks like this is about a 1% win in practice instruction counts wise, but appears to be a regression on wall times (e.g. rustc middle regressed roughly 2 seconds, it looks like).

I think we are least don't have solid evidence this is a win on the code examples tested by perf. I'm not sure if perhaps switching algorithms when we detect n > 30 or something like that makes sense?

Member

jonas-schievink commented on Oct 24

Maybe test the impact on a crate like https://github.com/stm32-rs/stm32-rs-nightlies/tree/master/stm32f4? I usually use those when working on the overlap checks.

Contributor

Author

est31 commented on Oct 24

edited

Oh right it's red down there... a bit unfortunate :/.

I think I should split up the PR as a next step. It's doing two separate optimizations, better measure them separately.

In any case, the improvement for packed-simd is real, as the crate_inherent_impls_overlap_check query used to take 1.96% of the build, now takes 0.1%.

@Mark-Simulacrum Do you know whether there is a way to show the query times for rustc-middle and other red crates?

Member

Mark-Simulacrum commented on Oct 24

We don't currently record self profile data for rustc compilation.

Contributor

Author

est31 commented on Oct 24

FYI if you're wondering, NodeRef of the alloc crate has 26 inherent impl blocks.

Contributor

Author

est31 commented on Oct 24

Maybe test the impact on a crate like https://github.com/stm32-rs/stm32-rs-nightlies/tree/master/stm32f4? I usually use those when working on the overlap checks.

Wow that crate has approximately a thousand inherent impl blocks per device that you can enable, and in total there are 11 features... If you enable them all you get 14043 impl blocks...

So it's definitely a good real case stress test for overlap checking. It also has a ton of name overlap, making the namespace checks very relevant. I think it's a good benchmark, thanks for suggesting it!

Contributor

Author

est31 commented on Oct 25

Note that the title of this PR as well as of #69009 can be misunderstood, as the entire impl overlap check in the worst case still runs in O(n^2). The worst case happens in the instance of stm32f4 where you have a) tons of impl blocks on the same generic struct but b) parameterized with different types so that they don't overlap. However, c) the impl blocks all contain the same function names, so the early check (that both my and @jonas-schievink 's PRs turn into linear time) registers an overlap, and gives the pair of impl blocks to the code that checks for a type wise overlap of the impl blocks. That function takes two impl blocks and its code is a bit more involved than the heuristics in the file changed by our PRs...

That being said, the earlier code has been changed from quadratic to linear time, and often there is no overlap. There's probably some way to speed up things for stm32f4 in particular by adding more heuristics to keep time down, but there will likely always be ways to construct quadratic time instances... need to look more into things...

est31

force-pushed the

est31:linear_in_impl_count

branch from ee55a93 to 73a7d93

23 days ago

est31

marked this pull request as ready for review

23 days ago

Contributor

Author

est31 commented 23 days ago

The PR is ready for review!

I have done some smaller fixes that made it a bit slower than the version I ran the benchmarks on, e.g. I added sorting to make the output not depend on hash set iterator ordering. I don't want to run all benchmarks yet again (as they are currently quite manual), but this is a table about a subset of the benchmarks:

features components cargo +local-bare check cargo +local check stm32f401,stm32f405 2335,968 30.915s 30.495s stm32f401,stm32f405,stm32f407,stm32f410,
stm32f411 5424,2182 1m33.891s 1m28.858s stm32f401,stm32f405,stm32f407,stm32f410,
stm32f411,stm32f412,stm32f413,stm32f427 9404,3765 3m31.394s 3m18.193s

Also updated the table of the PR description.

r? @matthewjasper

Contributor

Author

est31 commented 22 days ago

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

Contributor

Author

est31 commented 12 days ago

@matthewjasper friendly ping!

Member

jyn514 commented 11 days ago

Just out of curiousity, how common is it to have 20+ impl blocks in real world code?

I would expect this to be the main reason rustc is slow on diesel and stm32: rust-lang/rustc-perf#807 (comment)

Contributor

Author

est31 commented 11 days ago

@jyn514 this PR addresses the crate_inherent_impls_overlap_check pass. In your linked comment I couldn't see that pass appear at all, so I assume that it won't improve diesel's compile time.

Contributor

matthewjasper commented 9 days ago

@bors r+

Contributor

bors commented 9 days ago

pushpin Commit 73a7d93 has been approved by matthewjasper

Contributor

bors commented 9 days ago

hourglass Testing commit 73a7d93 with merge 3ce7ca6...

Collaborator

rust-log-analyzer commented 9 days ago

The job dist-s390x-linux failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[TIMING] StdLink { compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } }, target_compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } }, target: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } } -- 0.000
[TIMING] Std { target: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None }, compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } } } -- 88.479
Building LLVM for x86_64-unknown-linux-gnu
running: "cmake" "/checkout/src/llvm-project/llvm" "-G" "Ninja" "-DLLVM_ENABLE_ASSERTIONS=OFF" "-DLLVM_TARGETS_TO_BUILD=AArch64;ARM;Hexagon;MSP430;Mips;NVPTX;PowerPC;RISCV;Sparc;SystemZ;WebAssembly;X86" "-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR" "-DLLVM_INCLUDE_EXAMPLES=OFF" "-DLLVM_INCLUDE_TESTS=OFF" "-DLLVM_INCLUDE_DOCS=OFF" "-DLLVM_INCLUDE_BENCHMARKS=OFF" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_ENABLE_LIBEDIT=OFF" "-DLLVM_ENABLE_BINDINGS=OFF" "-DLLVM_ENABLE_Z3_SOLVER=OFF" "-DLLVM_PARALLEL_COMPILE_JOBS=16" "-DLLVM_TARGET_ARCH=x86_64" "-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-linux-gnu" "-DLLVM_ENABLE_ZLIB=ON" "-DLLVM_ENABLE_LIBXML2=OFF" "-DLLVM_VERSION_SUFFIX=-rust-1.50.0-nightly" "-DCMAKE_INSTALL_MESSAGE=LAZY" "-DCMAKE_C_COMPILER_LAUNCHER=sccache" "-DCMAKE_CXX_COMPILER_LAUNCHER=sccache" "-DCMAKE_C_COMPILER=cc" "-DCMAKE_CXX_COMPILER=c++" "-DCMAKE_ASM_COMPILER=cc" "-DCMAKE_C_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64 -static-libstdc++" "-DCMAKE_INSTALL_PREFIX=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_BUILD_TYPE=Release"
CMake Error: The source directory "/checkout/src/llvm-project/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
command did not execute successfully, got: exit code: 1


build script failed, must exit now', /cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.44/src/lib.rs:885:5
 finished in 0.011 seconds
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap dist --host s390x-unknown-linux-gnu --target s390x-unknown-linux-gnu
Build completed unsuccessfully in 0:01:29

Contributor

bors commented 9 days ago

broken_heart Test failed - checks-actions

Member

jyn514 commented 9 days ago

Building LLVM for x86_64-unknown-linux-gnu
running: "cmake" "/checkout/src/llvm-project/llvm" "-G" "Ninja" "-DLLVM_ENABLE_ASSERTIONS=OFF" "-DLLVM_TARGETS_TO_BUILD=AArch64;ARM;Hexagon;MSP430;Mips;NVPTX;PowerPC;RISCV;Sparc;SystemZ;WebAssembly;X86" "-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR" "-DLLVM_INCLUDE_EXAMPLES=OFF" "-DLLVM_INCLUDE_TESTS=OFF" "-DLLVM_INCLUDE_DOCS=OFF" "-DLLVM_INCLUDE_BENCHMARKS=OFF" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_ENABLE_LIBEDIT=OFF" "-DLLVM_ENABLE_BINDINGS=OFF" "-DLLVM_ENABLE_Z3_SOLVER=OFF" "-DLLVM_PARALLEL_COMPILE_JOBS=16" "-DLLVM_TARGET_ARCH=x86_64" "-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-linux-gnu" "-DLLVM_ENABLE_ZLIB=ON" "-DLLVM_ENABLE_LIBXML2=OFF" "-DLLVM_VERSION_SUFFIX=-rust-1.50.0-nightly" "-DCMAKE_INSTALL_MESSAGE=LAZY" "-DCMAKE_C_COMPILER_LAUNCHER=sccache" "-DCMAKE_CXX_COMPILER_LAUNCHER=sccache" "-DCMAKE_C_COMPILER=cc" "-DCMAKE_CXX_COMPILER=c++" "-DCMAKE_ASM_COMPILER=cc" "-DCMAKE_C_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64 -static-libstdc++" "-DCMAKE_INSTALL_PREFIX=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_BUILD_TYPE=Release"
CMake Error: The source directory "/checkout/src/llvm-project/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
command did not execute successfully, got: exit code: 1

cc @rust-lang/infra, I've seen this on a few different PRs now. It seems to be spurious, retrying will usually fix it.

@bors retry

Contributor

bors commented 9 days ago

hourglass Testing commit 73a7d93 with merge c609b2e...

Contributor

bors commented 9 days ago

sunny Test successful - checks-actions
Approved by: matthewjasper
Pushing c609b2e to master...

bors

merged commit c609b2e into rust-lang:master 9 days ago

11 checks passed

rustbot

added this to the 1.50.0 milestone

9 days ago

Member

pietroalbini commented 9 days ago

I've seen this on a few different PRs now. It seems to be spurious, retrying will usually fix it.

That seems to be a spurious network failure while downloading LLVM:

2020-12-20T19:34:57.8161875Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:34:57.8199484Z + [[ 1 -lt 5 ]]
2020-12-20T19:34:57.8199934Z + sleep 1
2020-12-20T19:34:58.8224385Z + (( n++ ))
2020-12-20T19:34:58.8225299Z Command failed. Attempt 2/5:
2020-12-20T19:34:58.8227627Z + echo 'Command failed. Attempt 2/5:'
2020-12-20T19:34:58.8228520Z + true
2020-12-20T19:34:58.8232046Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:03.1438299Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:03.1464331Z + [[ 2 -lt 5 ]]
2020-12-20T19:35:03.1464857Z + sleep 2
2020-12-20T19:35:05.1479280Z + (( n++ ))
2020-12-20T19:35:05.1481078Z + echo 'Command failed. Attempt 3/5:'
2020-12-20T19:35:05.1481971Z + true
2020-12-20T19:35:05.1482774Z Command failed. Attempt 3/5:
2020-12-20T19:35:05.1489474Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:14.8627824Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:14.8652110Z + [[ 3 -lt 5 ]]
2020-12-20T19:35:14.8652926Z + sleep 3
2020-12-20T19:35:17.8666398Z + (( n++ ))
2020-12-20T19:35:17.8666991Z Command failed. Attempt 4/5:
2020-12-20T19:35:17.8668227Z + echo 'Command failed. Attempt 4/5:'
2020-12-20T19:35:17.8668735Z + true
2020-12-20T19:35:17.8670734Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:21.0742337Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:21.0771810Z + [[ 4 -lt 5 ]]
2020-12-20T19:35:21.0772636Z + sleep 4
2020-12-20T19:35:25.0784632Z + (( n++ ))
2020-12-20T19:35:25.0785167Z Command failed. Attempt 5/5:
2020-12-20T19:35:25.0786668Z + echo 'Command failed. Attempt 5/5:'
2020-12-20T19:35:25.0787170Z + true
2020-12-20T19:35:25.0789147Z + sh -c 'rm -f download-src-llvm-project.tar.gz &&         curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:31.2565957Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:31.2597156Z + [[ 5 -lt 5 ]]
2020-12-20T19:35:31.2598171Z The command has failed after 5 attempts.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK