Description

This PR adds a specialization SpecFromElem for () which allows to significantly reduce vec![(), N] time in debug builds (specifically, tests) turning it from observable O(n) to O(1).

Observing the change

The problem this PR aims to fix explicitly is slow vec![(), N] on big Ns which may appear in tests (see Background section for more details).

The minimal example to see the problem:

#![feature(test)]

extern crate test;

#[cfg(test)]
mod tests {
    const HUGE_SIZE: usize = i32::MAX as usize + 1;

    #[bench]
    fn bench_vec_literal(b: &mut test::Bencher) {
        b.iter(|| vec![(); HUGE_SIZE]);
    }

    #[bench]
    fn bench_vec_repeat(b: &mut test::Bencher) {
        b.iter(|| [(); 1].repeat(HUGE_SIZE));
    }
}

Output

cargo +nightly  -- --report-time -Zunstable-options
   Compiling huge-zst-vec-literal-bench v0.1.0 (/home/progrm_jarvis/RustroverProjects/huge-zst-vec-literal-bench)
    Finished  [unoptimized + debuginfo] target(s)  0.31s
     Running unittests src/lib.rs (target/debug/deps/huge_zst_vec_literal_bench-e43b1ef287ba8b36)

running 2 tests
 tests::bench_vec_repeat  ... ok 0.000s
 tests::bench_vec_literal ... ok 14.382s

 result: ok. 2 passed 0 failed 0 ignored 0 measured 0 filtered out finished  14.38s

   Doc-tests huge-zst-vec-literal-bench

running 0 tests

 result: ok. 0 passed 0 failed 0 ignored 0 measured 0 filtered out finished  0.00s

Important

This problem is only observable in Debug (unoptimized) builds, while Release (optimized) ones do not observe this problem. It still is worth fixing it, IMO, since the original scenario observes the problem in tests for which optimizations are disabled by default and it seems unreasonable to override this for the whole project while the problem is very local.

Background

While working on a crate for a custom data format which has an i32::MAX limit on its list's sizes, I wrote the following test to ensure that this invariant is upheld:

#[test]
fn lists_should_have_i32_size() {
    assert!(
        RawNbtList::try_from(vec![(); i32::MAX as usize]).is_ok(),
        "lists should permit the size up to {}",
        i32::MAX
    );
    assert!(
        RawNbtList::try_from(vec![(); i32::MAX as usize + 1]).is_err(),
        "lists should have the size of at most {}",
        i32::MAX
    );
}

Soon I discovered that this takes ≈3−−4s per assertion on my machine, almost all of which is spent on vec![..].
While this would be logical for a non-ZST vector (which would require actual O(n) allocation), here () was used intentionally considering that for ZSTs size-changing operations should anyway be O(1) (at least from allocator perspective). Yet, this "overhead" is logical if we consider that in general case clone() (which is used by Vec literal) may have a non-trivial implementation and thus each element has to actually be visited (even if they occupy no space).

In my specific case, the following (contextual) equivalent solved the issue:

#[test]
fn lists_should_have_i32_size() {
    assert!(
        RawNbtList::try_from([(); 1].repeat(i32::MAX as usize)).is_ok(),
        "lists should permit the size up to {}",
        i32::MAX
    );
    assert!(
        RawNbtList::try_from([(); 1].repeat(i32::MAX as usize + 1)).is_err(),
        "lists should have the size of at most {}",
        i32::MAX
    );
}

which works since repeat explicitly uses T: Copy and so does not have to think about non-trivial Clone.

But it still may be counter-intuitive for users to observe such long time on the "canonical" vec literal thus the PR.

Generic solution

This change is explicitly non-generic. Initially I considered it possible to implement in generically, but this would require the specialization to have the following type requirements:

✅ the type must be a ZST: easily done via

if core::mem::size_of::<T>() == 0 {
  todo!("specialization")
}

use core::mem::SizedTypeProperties;
if T::IS_ZST {
  todo!("specialization")
}

:white_check_mark: the type must implement Copy`: implementable non-conflictable via a separate specialization:

trait IsCopyZst: Sized {
  fn is_copy_zst() -> bool;
}
impl<T> IsCopyZst for T {
  fn is_copy_zst() -> bool {
      false
  }
}
impl<T: Copy> IsCopyZst for T {
  fn is_copy_zst() -> bool {
      Self::IS_ZST
  }
}

❌ the type should have a trivial Clone implementation: since vec![t; n] is specified to use clone(), we can only use this "performance optimization" when we are guaranteed that clone() does nothing except for copying.

The latter is the real blocker for a generic fix since I am unaware of any way to get this information in a compiler-guaranteed way.

While there may be a fix for this (my friend @CertainLach has suggested a potential solution by an perma-unstable fn in Clone like is_trivially_cloneable() defaulting to false and only overridable by rustc on derive), this is surely out of this PRs scope.

feat: specialize `SpecFromElem` for `()` by JarvisCraft · Pull Request #118094 ·...

Description

Observing the change

Background

Generic solution

Recommend

求助分析：老人投资血本无归，望建议

Institute for Progress

实探联想天津产业园!5G+智造深入生产制造核心环节

速览Android14全新升级

editor/code: add option to suppress internal error notifications by jprochazk ·...

Add allow-by-default lint for unit bindings by jieyouxu · Pull Request #112380 ·...

Zorin OS 17 正在重新定义 Linux 发行版的视觉体验

余承东怒怼懂车帝“坑人”，究竟为何_原创_科技频道首页_财经网 - CAIJING.COM.CN

Why Norway is rethinking its reliance on electric cars - Vox

AI 招聘平台 HelloBoss 获数百万美元 Pre-A 轮融资，云启资本领投

About Joyk