[perf] cache type info for ParamEnv #123058

Conversation

Contributor

This is an attempt to mitigate some of the perf regressions in #122553 (comment), but seems worth to test and land separately, since it is mostly unrelated to that PR.

rustbot

added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

labels

Mar 25, 2024

compiler/rustc_middle/src/ty/list.rs

Outdated

Comment on lines

191 to 214

		let data_ptr = ptr::addr_of!(self.skel.data).cast::<T>();
		// SAFETY: `data_ptr` has the same provenance as `self` and can therefore
		// access the `self.len` elements stored at `self.data`.
		// Note that we specifically don't reborrow `&self.skel.data`, because that
		// would give us a pointer with provenance over 0 bytes.
		unsafe { slice::from_raw_parts(data_ptr, self.skel.len) }

Contributor

Author

This was UB before.

Member

lukas-code reacted with heart emoji

This comment has been minimized.

rustbot

added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label

Mar 25, 2024

Contributor

⌛ Trying commit 4a99cdc with merge be2a772...

Contributor

💥 Test timed out

lukas-code reacted with thumbs down emoji

Contributor

Author

uuh idk what bors is on about, the artifact looks like it got built? Let's try this:

☀️ Try build successful - checks-actions
Build commit: be2a772 (be2a772ea7aa3f1114b45c8fb9dd2a2466a5c152)

This comment has been minimized.

Contributor

Author

well that worked i guess?

Collaborator

Finished benchmarking commit (be2a772): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	0.4%	[0.3%, 0.6%]	4
Regressions ❌ (secondary)	0.5%	[0.5%, 0.5%]	3
Improvements ✅ (primary)	-1.7%	[-5.0%, -0.2%]	20
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.2%]	4
All ❌✅ (primary)	-1.3%	[-5.0%, 0.6%]	24

Max RSS (memory usage)

Results

Cycles

Results

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 672.58s -> 671.55s (-0.15%)
Artifact size: 315.08 MiB -> 315.13 MiB (0.01%)

Contributor

Author

The regression in token-stream-stress looks like it's probably noise:

The regressions in bitmaps might be legit, but to me it seems like the improvements outweigh the regressions here.

r? @lcnr or compiler

Contributor

☔ The latest upstream changes (presumably #123098) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

☔ The latest upstream changes (presumably #123327) made this pull request unmergeable. Please resolve the merge conflicts.

compiler/rustc_middle/src/ty/list.rs

Outdated

		#[repr(C)]
		pub struct ListWithCachedTypeInfo<T> {
		skel: ListWithCachedTypeInfoSkeleton<T>,
		opaque: OpaqueListContents,

Contributor

why is that moved out of ListSkeleton?

Contributor

Author

Because accessing a field with an extern type tail will always panic, because we don't know the align of the extern type and therefore the padding before the field. So basically if we just put List as the tail of ListWithCachedTypeInfo, then any attempt to access the list will panic.

Contributor

that does not seem right to me 🤔 it should be totally fine to access an unsized field which contains an extern type as long as you never try to do a place-to-value conversion. I don't think we should ever copy ListSkeleton to the stack. Which use of ListSkeleton causes issues if you keep the current setup of having OpaqueListContents as a field of it?

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=a47ebf1e7a6ce3908b3ab3fcdd6ecc8d

Contributor

Author

Hmmm apparently there is a special case if the field with an extern type tail is the only field. Which i guess makes sense since there can't be any padding in that case. If you change the outer type to have some extra data, your example will panic: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=92bafecde22b9fd72a3ddb9d9018bc68

I don't think we should ever copy ListSkeleton to the stack

I agree. If my code does that it's not intended.

Contributor

lcnr

left a comment

r=me after explaining the changes in ty/list.rs to me 🤔

Contributor

lcnr

left a comment

🤔 the panic with extern types is annoying, but I understand the issue now.

Looking at this again, we now duplicate the unsafe code for List which is somewhat intricate and I am a bit worried that it ends up getting out of sync. Could we instead have a single type:

struct RawList<H, T> {
    header: H,
    length: usize,
    data: [T; 0],
    opaque: OpaqueListConten,
}

This should avoid both the "extern type alignment" issue and aloows us to remove the code duplication.

Sorry for not thinking of that during the first review round 😅

lukas-code reacted with thumbs up emoji

Contributor

@bors r+ rollup=never

Contributor

📌 Commit 94d61d8 has been approved by lcnr

It is now in the queue for this repository.

bors

added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.

and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.

labels

Apr 6, 2024

Contributor

⌛ Testing commit 94d61d8 with merge 087ae97...

Contributor

☀️ Test successful - checks-actions
Approved by: lcnr
Pushing 087ae97 to master...

Collaborator

Finished benchmarking commit (087ae97): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.9%	[-4.9%, -0.2%]	47
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.9%	[-4.9%, -0.2%]	47

lukas-code reacted with hooray emoji

Reviewers

lcnr

lcnr left review comments

Assignees

lcnr

Labels

merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Milestone

1.79.0

Development

Successfully merging this pull request may close these issues.

None yet

6 participants

[perf] cache type info for ParamEnv by lukas-code · Pull Request #123058 · rust-...

[perf] cache type info for ParamEnv #123058

Conversation

This comment has been minimized.

This comment has been minimized.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Overall result: ✅ improvements - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Recommend

债务压顶流言难止，万科找上了“不差钱”的GIC

比亚迪的电、华为的「魂」，外资车企找到卖车「新姿势」

一路狂奔的百度智能云，正在改变云的模样

高科技不可少 “老套路”也要用

Docker in Action: Downloading Images, Building Containers, and Exposing Ports

赵柏基与张勇，失落的“安达信黑帮”精英们

杜比面向所有高端电影院线提供杜比视界和杜比全景声，拓展卓越观影体验

Amazon Prime US customer numbers jump 8% to 180 million, an all-time high | Tech...

N1发布九年后，小牛电动如何用N系新品再创两轮电动车“第一次”？

暴雪国服：完整保留用户数据；马云 5 年首发文：认清自己，向大公司病开刀；图灵奖最...

About Joyk