Turn quadratic time on number of impl blocks into linear time
source link: https://github.com/rust-lang/rust/pull/78317
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Previously, if you had a lot of inherent impl blocks on a type like:
struct Foo; impl Foo { fn foo_1() {} } // ... impl Foo { fn foo_100_000() {} }
The compiler would be very slow at processing it, because
an internal algorithm would run in O(n^2), where n is the number
of impl blocks. Now, we add a new algorithm that allocates but
is faster asymptotically.
Comparing rustc nightly with a local build of rustc as of this PR (results in seconds):
N real time before real time after 4_000 0.57 0.46 8_000 1.31 0.84 16_000 3.56 1.69 32_000 10.60 3.73I've tuned up the numbers to make the effect larger than the startup noise of rustc, but the asymptotic difference should hold for smaller n as well.
Note: current state of the PR omits error messages if there are other errors present already. For now, I'm mainly interested in a perf run to study whether this issue is present at all. Please queue one for this PR. Thanks!
Collaborator
rust-highfive commented on Oct 24
(rust_highfive has picked a reviewer for you, use r? to override)
Contributor
bugadani left a comment
Just out of curiousity, how common is it to have 20+ impl blocks in real world code?
// faster asymptotic runtime.
if impls.len() < 20 {
for (i, &impl1_def_id) in impls.iter().enumerate() {
for &impl2_def_id in &impls[(i + 1)..] {
Just out of curiousity, how common is it to have 20+ impl blocks in real world code?
Probably quite rare, however note that impl blocks might be distributed across multiple files. IIRC servo has something like this. I might misremember though. There might be no such case in the perf's testsuite at all. I just stumbled onto this code and it felt bad. Anyways, the swap I add above should have an impact for smaller number of impl blocks already. Maybe two perf runs are needed? IDK.
Collaborator
rust-timer commented on Oct 24
Awaiting bors try build completion
I guess rustc has a ton of impl TyCtxt (I counter 21 + a macro).
A single location... won't be much of an impact I guess especially considering the fact that for 20 the effect is still low, at least compared to what the rest of the compiler is doing.
Also note that I pulled the 20 out of my nose. It's been chosen on gut feelings alone. If there is an impact, it might be improved. If there's none, maybe it was chosen wrongly.
Try build successful - checks-actions
Build commit: 925eba0 (925eba0a75e6e25806ed3a4a654610329bd00254
)
Collaborator
rust-timer commented on Oct 24
@jonas-schievink nice, someone had the same bright idea as me! I'd attribute the regression in that PR to cases where the brute-force search is better. The most extreme instance of this is probably two impl blocks, a short one followed by a very long one. If the order is right, the current O(n^2) search runs once over one of the short impl block, while looking up into the larger block. The linear algorithm of your PR would put both impl blocks into the hashmap and cause a lot of bureaucracy. In fact, a single long impl block would favour the O(n^2) search even better as.
This gives me lots of ideas to improve the algorithm...
The perf run of your PR shows (next to the regressions) a 2% improvement on syn-opt which is IMO a great achievement. Ten 2% improvements constitute a 19% improvement, 30 such improvements cut the time in half. Very encouraging!
Collaborator
rust-timer commented on Oct 24
Finished benchmarking try commit (925eba0): comparison url.
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup-
to bors.
Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.
@bors rollup=never
Member
Mark-Simulacrum commented on Oct 24
I would not look at incremental benchmarks on PRs like this - they're not likely to be super meaningful as an expression of improvement (or regression) because they are skipping parts of the build, and unless we think this optimization permits skipping more (seems unlikely), the percent win there is just larger because the base is smaller.
It looks like this is about a 1% win in practice instruction counts wise, but appears to be a regression on wall times (e.g. rustc middle regressed roughly 2 seconds, it looks like).
I think we are least don't have solid evidence this is a win on the code examples tested by perf. I'm not sure if perhaps switching algorithms when we detect n > 30 or something like that makes sense?
Member
jonas-schievink commented on Oct 24
Maybe test the impact on a crate like https://github.com/stm32-rs/stm32-rs-nightlies/tree/master/stm32f4? I usually use those when working on the overlap checks.
Oh right it's red down there... a bit unfortunate :/.
I think I should split up the PR as a next step. It's doing two separate optimizations, better measure them separately.
In any case, the improvement for packed-simd is real, as the crate_inherent_impls_overlap_check
query used to take 1.96% of the build, now takes 0.1%.
@Mark-Simulacrum Do you know whether there is a way to show the query times for rustc-middle and other red crates?
Member
Mark-Simulacrum commented on Oct 24
We don't currently record self profile data for rustc compilation.
Maybe test the impact on a crate like https://github.com/stm32-rs/stm32-rs-nightlies/tree/master/stm32f4? I usually use those when working on the overlap checks.
Wow that crate has approximately a thousand inherent impl blocks per device that you can enable, and in total there are 11 features... If you enable them all you get 14043 impl blocks...
So it's definitely a good real case stress test for overlap checking. It also has a ton of name overlap, making the namespace checks very relevant. I think it's a good benchmark, thanks for suggesting it!
Note that the title of this PR as well as of #69009 can be misunderstood, as the entire impl overlap check in the worst case still runs in O(n^2)
. The worst case happens in the instance of stm32f4
where you have a) tons of impl blocks on the same generic struct but b) parameterized with different types so that they don't overlap. However, c) the impl blocks all contain the same function names, so the early check (that both my and @jonas-schievink 's PRs turn into linear time) registers an overlap, and gives the pair of impl blocks to the code that checks for a type wise overlap of the impl blocks. That function takes two impl blocks and its code is a bit more involved than the heuristics in the file changed by our PRs...
That being said, the earlier code has been changed from quadratic to linear time, and often there is no overlap. There's probably some way to speed up things for stm32f4
in particular by adding more heuristics to keep time down, but there will likely always be ways to construct quadratic time instances... need to look more into things...
Contributor
Author
est31 commented 23 days ago
The PR is ready for review!
I have done some smaller fixes that made it a bit slower than the version I ran the benchmarks on, e.g. I added sorting to make the output not depend on hash set iterator ordering. I don't want to run all benchmarks yet again (as they are currently quite manual), but this is a table about a subset of the benchmarks:
features components cargo +local-bare check cargo +local check stm32f401,stm32f405 2335,968 30.915s 30.495s stm32f401,stm32f405,stm32f407,stm32f410,stm32f411 5424,2182 1m33.891s 1m28.858s stm32f401,stm32f405,stm32f407,stm32f410,
stm32f411,stm32f412,stm32f413,stm32f427 9404,3765 3m31.394s 3m18.193s
Also updated the table of the PR description.
Contributor
Author
est31 commented 22 days ago
@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author
Contributor
Author
est31 commented 12 days ago
@matthewjasper friendly ping!
Member
jyn514 commented 11 days ago
Just out of curiousity, how common is it to have 20+ impl blocks in real world code?
I would expect this to be the main reason rustc is slow on diesel and stm32: rust-lang/rustc-perf#807 (comment)
Contributor
Author
est31 commented 11 days ago
@jyn514 this PR addresses the crate_inherent_impls_overlap_check
pass. In your linked comment I couldn't see that pass appear at all, so I assume that it won't improve diesel's compile time.
Contributor
matthewjasper commented 9 days ago
@bors r+
Contributor
bors commented 9 days ago
Commit 73a7d93 has been approved by matthewjasper
Contributor
bors commented 9 days ago
Collaborator
rust-log-analyzer commented 9 days ago
The job dist-s390x-linux
failed! Check out the build log: (web) (plain)
[TIMING] StdLink { compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } }, target_compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } }, target: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } } -- 0.000
[TIMING] Std { target: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None }, compiler: Compiler { stage: 0, host: TargetSelection { triple: "x86_64-unknown-linux-gnu", file: None } } } -- 88.479
Building LLVM for x86_64-unknown-linux-gnu
running: "cmake" "/checkout/src/llvm-project/llvm" "-G" "Ninja" "-DLLVM_ENABLE_ASSERTIONS=OFF" "-DLLVM_TARGETS_TO_BUILD=AArch64;ARM;Hexagon;MSP430;Mips;NVPTX;PowerPC;RISCV;Sparc;SystemZ;WebAssembly;X86" "-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR" "-DLLVM_INCLUDE_EXAMPLES=OFF" "-DLLVM_INCLUDE_TESTS=OFF" "-DLLVM_INCLUDE_DOCS=OFF" "-DLLVM_INCLUDE_BENCHMARKS=OFF" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_ENABLE_LIBEDIT=OFF" "-DLLVM_ENABLE_BINDINGS=OFF" "-DLLVM_ENABLE_Z3_SOLVER=OFF" "-DLLVM_PARALLEL_COMPILE_JOBS=16" "-DLLVM_TARGET_ARCH=x86_64" "-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-linux-gnu" "-DLLVM_ENABLE_ZLIB=ON" "-DLLVM_ENABLE_LIBXML2=OFF" "-DLLVM_VERSION_SUFFIX=-rust-1.50.0-nightly" "-DCMAKE_INSTALL_MESSAGE=LAZY" "-DCMAKE_C_COMPILER_LAUNCHER=sccache" "-DCMAKE_CXX_COMPILER_LAUNCHER=sccache" "-DCMAKE_C_COMPILER=cc" "-DCMAKE_CXX_COMPILER=c++" "-DCMAKE_ASM_COMPILER=cc" "-DCMAKE_C_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64 -static-libstdc++" "-DCMAKE_INSTALL_PREFIX=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_BUILD_TYPE=Release"
CMake Error: The source directory "/checkout/src/llvm-project/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
command did not execute successfully, got: exit code: 1
build script failed, must exit now', /cargo/registry/src/github.com-1ecc6299db9ec823/cmake-0.1.44/src/lib.rs:885:5
finished in 0.011 seconds
failed to run: /checkout/obj/build/bootstrap/debug/bootstrap dist --host s390x-unknown-linux-gnu --target s390x-unknown-linux-gnu
Build completed unsuccessfully in 0:01:29
Contributor
bors commented 9 days ago
Test failed - checks-actions
Member
jyn514 commented 9 days ago
Building LLVM for x86_64-unknown-linux-gnu
running: "cmake" "/checkout/src/llvm-project/llvm" "-G" "Ninja" "-DLLVM_ENABLE_ASSERTIONS=OFF" "-DLLVM_TARGETS_TO_BUILD=AArch64;ARM;Hexagon;MSP430;Mips;NVPTX;PowerPC;RISCV;Sparc;SystemZ;WebAssembly;X86" "-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=AVR" "-DLLVM_INCLUDE_EXAMPLES=OFF" "-DLLVM_INCLUDE_TESTS=OFF" "-DLLVM_INCLUDE_DOCS=OFF" "-DLLVM_INCLUDE_BENCHMARKS=OFF" "-DLLVM_ENABLE_TERMINFO=OFF" "-DLLVM_ENABLE_LIBEDIT=OFF" "-DLLVM_ENABLE_BINDINGS=OFF" "-DLLVM_ENABLE_Z3_SOLVER=OFF" "-DLLVM_PARALLEL_COMPILE_JOBS=16" "-DLLVM_TARGET_ARCH=x86_64" "-DLLVM_DEFAULT_TARGET_TRIPLE=x86_64-unknown-linux-gnu" "-DLLVM_ENABLE_ZLIB=ON" "-DLLVM_ENABLE_LIBXML2=OFF" "-DLLVM_VERSION_SUFFIX=-rust-1.50.0-nightly" "-DCMAKE_INSTALL_MESSAGE=LAZY" "-DCMAKE_C_COMPILER_LAUNCHER=sccache" "-DCMAKE_CXX_COMPILER_LAUNCHER=sccache" "-DCMAKE_C_COMPILER=cc" "-DCMAKE_CXX_COMPILER=c++" "-DCMAKE_ASM_COMPILER=cc" "-DCMAKE_C_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS=-ffunction-sections -fdata-sections -fPIC -m64 -static-libstdc++" "-DCMAKE_INSTALL_PREFIX=/checkout/obj/build/x86_64-unknown-linux-gnu/llvm" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_BUILD_TYPE=Release"
CMake Error: The source directory "/checkout/src/llvm-project/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
command did not execute successfully, got: exit code: 1
cc @rust-lang/infra, I've seen this on a few different PRs now. It seems to be spurious, retrying will usually fix it.
@bors retry
Contributor
bors commented 9 days ago
Contributor
bors commented 9 days ago
Test successful - checks-actions
Approved by: matthewjasper
Pushing c609b2e to master...
Member
pietroalbini commented 9 days ago
I've seen this on a few different PRs now. It seems to be spurious, retrying will usually fix it.
That seems to be a spurious network failure while downloading LLVM:
2020-12-20T19:34:57.8161875Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:34:57.8199484Z + [[ 1 -lt 5 ]]
2020-12-20T19:34:57.8199934Z + sleep 1
2020-12-20T19:34:58.8224385Z + (( n++ ))
2020-12-20T19:34:58.8225299Z Command failed. Attempt 2/5:
2020-12-20T19:34:58.8227627Z + echo 'Command failed. Attempt 2/5:'
2020-12-20T19:34:58.8228520Z + true
2020-12-20T19:34:58.8232046Z + sh -c 'rm -f download-src-llvm-project.tar.gz && curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:03.1438299Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:03.1464331Z + [[ 2 -lt 5 ]]
2020-12-20T19:35:03.1464857Z + sleep 2
2020-12-20T19:35:05.1479280Z + (( n++ ))
2020-12-20T19:35:05.1481078Z + echo 'Command failed. Attempt 3/5:'
2020-12-20T19:35:05.1481971Z + true
2020-12-20T19:35:05.1482774Z Command failed. Attempt 3/5:
2020-12-20T19:35:05.1489474Z + sh -c 'rm -f download-src-llvm-project.tar.gz && curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:14.8627824Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:14.8652110Z + [[ 3 -lt 5 ]]
2020-12-20T19:35:14.8652926Z + sleep 3
2020-12-20T19:35:17.8666398Z + (( n++ ))
2020-12-20T19:35:17.8666991Z Command failed. Attempt 4/5:
2020-12-20T19:35:17.8668227Z + echo 'Command failed. Attempt 4/5:'
2020-12-20T19:35:17.8668735Z + true
2020-12-20T19:35:17.8670734Z + sh -c 'rm -f download-src-llvm-project.tar.gz && curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:21.0742337Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:21.0771810Z + [[ 4 -lt 5 ]]
2020-12-20T19:35:21.0772636Z + sleep 4
2020-12-20T19:35:25.0784632Z + (( n++ ))
2020-12-20T19:35:25.0785167Z Command failed. Attempt 5/5:
2020-12-20T19:35:25.0786668Z + echo 'Command failed. Attempt 5/5:'
2020-12-20T19:35:25.0787170Z + true
2020-12-20T19:35:25.0789147Z + sh -c 'rm -f download-src-llvm-project.tar.gz && curl -f -sSL -o download-src-llvm-project.tar.gz https://github.com/rust-lang/llvm-project/archive/8d78ad13896b955f630714f386a95ed91b237e3d.tar.gz'
2020-12-20T19:35:31.2565957Z curl: (18) transfer closed with outstanding read data remaining
2020-12-20T19:35:31.2597156Z + [[ 5 -lt 5 ]]
2020-12-20T19:35:31.2598171Z The command has failed after 5 attempts.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK