Nearly Divisionless Random Integer Generation On Various Systems

It is common in software to need random integers within a range of values. For example, you may need to pick an object at random in an array. Random shuffling algorithms require such random integers.

Typically, you generate a regular integers (say a 64-bit integer). From these integers, you find a way to get integers without a range. O’Neill showed that this apparently unimportant operation can be more expensive than generating the original random integers .

Naively, you could take the random integer and compute the remainder of the division by the size of the interval. It works because the remainder of the division by D is always smaller than D. Yet it introduces a statistical bias. You can do better without being slower. The conventional techniques supported in Java and Go require at least one or two integer division per integer generated. Unfortunately, division instructions are relatively expensive.

If you are willing to suffer a slight statistical bias, you can generate a floating-point values in [0,1) and multiply the result by the size of the interval. It avoids the division but introduces other overhead.

There is a better approach that requires no division most of the time :

uint64_t nearlydivisionless ( uint64_t s ) {
  uint64_t x = random64 () ;
  __uint128_t m = ( __uint128_t ) x * ( __uint128_t ) s;
  uint64_t l = ( uint64_t ) m;
  if (l < s) {
    uint64_t t = -s % s;
    while (l < t) {
      x = random64 () ;
      m = ( __uint128_t ) x * ( __uint128_t ) s;
      l = ( uint64_t ) m;
    }
  }
  return m >> 64;
}

Let us suppose that I shuffle an array of size 1000. I generate 64-bit integers (or floating-point numbers) which I convert to indexes. I use C++ but I reimplement the algorithm used in Java. Let me look at the number of nanoseconds per input key being shuffled on a Skylake processor (Intel):

Java’s approach 7.30 ns Floating point approach (biased) 6.23 ns Nearly division-free 1.91 ns

So the division-free approach is much better.

Is this result robust to the hardware? Let us try on Cannon Lake processor where the division instruction is faster…

Java’s approach 3.75 ns Floating point approach (biased) 8.24 ns Nearly division-free 2.53 ns

The division-free approach is less beneficial because of the faster division, but the gains are still there.

What about an ARM processor? Let us try on an Ampere Skylark.

Java’s approach 20.67 ns Floating point approach (biased) 14.73 ns Nearly division-free 8.24 ns

Again, the division-free approach wins.

Practically speaking, avoiding integer divisions is a good way to generate faster code.

I make my code available . To ease reproducibility, I have used docker containers: you should be able to reproduce my results.

Further reading: Fast Random Integer Generation in an Interval , ACM Transactions on Modeling and Computer Simulation 29 (1), 2019

Recommend

《纽约客》上一篇长文，写最近古生物学上的一个重大发现，人类由此获得一批化石证据，...

Backer

科学证明了！玩手机使人焦虑玩手机使人上瘾

科学松鼠会 » 生活疑问：中性笔笔芯后边的那一段是啥？

世界上最轻的固体——气凝胶，外观上呈现出近乎透明的浅蓝色，如同凝固的烟和被困住的云...

翻车警告！分享下我今年 5 月在网上配近视镜经历（店家也在 V2ex）

『互联网架构』软件架构-解密电商系统-交易分库分表（75）

PROTON | NEW python framework to facilitate rapid server-side development

这几天刚刚发生的事，什么信号？孙宏斌让所有北京开发商惊掉了下巴。

智能门锁的千亿市场，为何迟迟不来？

About Joyk