35

Efficiently Checking if a Number Has Two Consecutive Bits Set | Zeeshan Qureshi

 5 years ago
source link: https://til.zqureshi.in/posts/2018/08/27/efficiently-checking-if-a-number-has-two-consecutive-bits-set/?amp%3Butm_source=golang
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Last week someone I mentor asked for help with a programming challenge thatinvolved some bit twiddling. The first part of the problem wasA number is known as special bit number if its binary representation contains atleast twoconsecutive 1’s or set bits. For example 7 wih binary representation 111 is a specialbit number. Similarly 3 (11) is also a special bit number.The naive implementation was quite simple but got me thinking if there were better options.1. Right Shift and CounterThis is the most straightforward approach where we check the least significantbit (LSB) and increment a counter if it is 1. As soon as the counter hits 2we exit or reset to 0 if the bit is not set.func isSpecial(n uint32) bool { for wasLastBitOne := false; n > 0; n = n >> 1 { isCurrentBitOne := n%2 == 1 if isCurrentBitOne && wasLastBitOne { return true } wasLastBitOne = isCurrentBitOne } return false}Now let’s write a quick benchmark to see how it performs.func BenchmarkIsSpecial(b *testing.B) { for n := 0; n < b.N; n++ { isSpecial(uint32(n)) }}And the results are: you can perform about a 100 million of these per second.> go test -bench 'BenchmarkIsSpecial$'goos: darwingoarch: amd64pkg: github.com/zqureshi/specialBenchmarkIsSpecial-8 200000000 9.46 ns/opPASSok github.com/zqureshi/special 2.861s2. Speedup via Lookup TableWe are working with 32-bit unsigned integers and notice that checking an 8-bitlong subsequence is no different than checking the whole number. For an 8-bitsequence we have 256 (2**8) possible combinations and can precompute a lookuptable of all possible results.var lookupTable = [256]bool{}func init() { fmt.Println("Recomputing lookup table...") for i := uint32(0); i < 256; i++ { lookupTable[i] = isSpecial(i) }}func isSpecialLookup(n uint32) bool { if lookupTable[3] == false { panic("Lookup table not initialized!") } return lookupTable[uint8(n)] || lookupTable[uint8(n>>7)] || lookupTable[uint8(n>>14)] || lookupTable[uint8(n>>21)] || lookupTable[uint8(n>>24)]}An interesting thing to note here is that we cannot just check four bit ranges0-7, 8-15, 16-23, 28-31 because we will miss numbers that have consecutivebits on the boundary i.e. bit 7 and 8, therefore we must always checkoverlapping ranges 0-7, 7-14, 14-21, 21-28, 24-31. In the last comparison ofbits 24-31 we re-check bits 24-28 but it is easier to just do so than specialcase it and add extra logic.And now let’s also benchmark this.func BenchmarkIsSpecialLookup(b *testing.B) { for n := 0; n < b.N; n++ { isSpecialLookup(uint32(n)) }}> go test -bench '.'Recomputing lookup table...goos: darwingoarch: amd64pkg: github.com/zqureshi/specialBenchmarkIsSpecial-8 200000000 9.85 ns/opBenchmarkIsSpecialLookup-8 1000000000 2.60 ns/opPASSok github.com/zqureshi/special 5.817sWe get an almost 4x speedup using this approach as we are doing lessercomparisons and memory writes but also due to the fact that the whole lookuptable fits in 32 bytes of memory which sits cosily in a 64 bytecache line.3. Larger Lookup TableWe could extend the previous approach and precompute 65536 (2**16) values which wouldallow us to cover the whole number in just 3 comparisons 0-15, 15-30, 16-31.func isSpecialLookup16(n uint32) bool { if lookupTable[3] == false { panic("Lookup table not initialized!") } return lookupTable[uint16(n)] || lookupTable[uint16(n>>15)] || lookupTable[uint16(n>>16)]}And benchmarkfunc BenchmarkIsSpecialLookup16(b *testing.B) { for n := 0; n < b.N; n++ { isSpecialLookup16(uint32(n)) }}Recomputing lookup table...goos: darwingoarch: amd64pkg: github.com/zqureshi/specialBenchmarkIsSpecial-8 200000000 9.57 ns/opBenchmarkIsSpecialLookup-8 1000000000 2.62 ns/opBenchmarkIsSpecialLookup16-8 1000000000 2.31 ns/opPASSok github.com/zqureshi/special 8.325sA 12% improvement in runtime but a 256x blowup in table size (32 bytes vs8192 bytes). While this performs better in this synthetic benchmark it mightnot perform as well in production because of cache line and page misses.4. Finding Something BetterAt this point I was quite happy with the progress but was wondering if there was some intrinsic property of numbers that could be exploited to make this even faster. I landed upon this HackerRank post and a collection of bit twiddling hacks from Stanford’s graphics department.The solution was quite ingenious, if you take any binary number it will haveruns of set bits followed by one or more unset bits. Now if you left shift thenumber, you move the 0 left and now if you bitwise and it with the originalnumber you have effectively cleared one set bit from a run. If you repeat thisprocess you will clear another bit, and another until the number is 0.A worked out example will demonstrate this better.14 = 111014 << 1 = 11100a & b = 0110012 = 110012 << 1 = 11000a & b = 010008 = 10008 << 1 = 10000a & b = 00000We can then simplify this algorithm to derive that if a number has at least twoconsecutive bits set, then left shift followed by bitwise and will result in anon-zero value. Let’s work through this.12 = 110012 << 1 = 11000a & b = 0100010 = 101010 << 1 = 10100a & b = 00000So now the code simply becomesfunc isSpecialLeftShift(n uint32) bool { return (n & (n << 1)) > 0}and benchmarkfunc BenchmarkIsSpecialLeftShift(b *testing.B) { for n := 0; n < b.N; n++ { isSpecialLeftShift(uint32(n)) }}Recomputing lookup table...goos: darwingoarch: amd64pkg: github.com/zqureshi/specialBenchmarkIsSpecial-8 200000000 9.55 ns/opBenchmarkIsSpecialLookup-8 1000000000 2.60 ns/opBenchmarkIsSpecialLookup16-8 1000000000 2.32 ns/opBenchmarkIsSpecialLeftShift-8 2000000000 0.27 ns/opPASSok github.com/zqureshi/special 8.871sThis is a 35x improvement over our naive approach and only involves 3 instructions, which means it is also a candidate for inlining.5. EpilogueOne thing that we didn’t talk about was how did I verify that my implementations were correct? I implemented the naive version and manually tested it on various inputs. Then for every other implementation I iterated through the first billion integers and reconciled the optimized against the naive.func main() { for n := uint32(0); n < 1000000000; n++ { result := isSpecial(n) if isSpecialLookup(n) != result { fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLookup: %v\n", n, n, result, isSpecialLookup(n)) break } if isSpecialLookup16(n) != result { fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLookup16: %v\n", n, n, result, isSpecialLookup16(n)) break } if isSpecialLeftShift(n) != result { fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLeftShift: %v\n", n, n, result, isSpecialLeftShift(n)) break } }}

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK