A Fast X86 Implementation of Select [pdf]
以下为 快照 页面，建议前往来源网站查看，会有更好的阅读体验。
Abstract: Rank and select are fundamental operations in succinct data structures, that is, data structures whose space consumption approaches the information-theoretic optimal. The performance of these primitives is central to the overall performance of succinct data structures.
Traditionally, the select operation is the harder to implement efficiently, and most prior implementations of select on machine words use 50--80 machine instructions. (In contrast, rank on machine words can be implemented in only a handful of instructions on machines that support POPCOUNT.) However, recently Pandey et al. gave a new implementation of machine-word select that uses only four x86 machine instructions; two of which were introduced in Intel's Haswell CPUs.
In this paper, we investigate the impact of this new implementation of machine-word select on the performance of general bit-vector-select. We first compare Pandey et al.'s machine-word select to the state-of-the-art implementations of Zhou et al. (which is not specific to Haswell) and Gog et al. (which uses some Haswell-specific instructions). We exhibit a speedup of 2X to 4X.
We then study the impact of plugging Pandey et al.'s machine-word select into two state-of-the-art bit-vector-select implementations. Both Zhou et al.'s and Gog et al.'s select implementations perform a single machine-word select operation for each bit-vector select. We replaced the machine-word select with the new implementation and compared performance. Even though there is only a single machine- word select operation, we still obtained speedups of 20% to 68%. We found that the new select not only reduced the number of instructions required for each bit-vector select, but also improved CPU instruction cache performance and memory-access parallelism.
README.md Brainfuck implementation in x86_64 assembly This is an implementation of the Bra...
Outsourcing Pharmaceutical Industry Conceptualization Implementation PDF 7850c9982
How millisecond-level hot reloading of a React WebApp can be implemented ..
Android-x86 Open Source Project
x86 finds its way into your iPhone Introduction In one of my several lives, I’m supposed to be a vulnerability researcher working on baseband exploitation. As every vulnerability researcher knows, be...
In this article we take a look at how the operands of x86 instructions are encoded. Review Let’s quickly review information from the earlierx86 addressing article. x86...
程序员 - @userlol - 家里每次断电对树莓派都是致命的，受制于 microSD 卡 IO 本来就不好，加上最近一次没断电情况下莫名其妙的 dmesg 中出了文件系统数据损坏的消息，然后还原备份，结果发现近几个月的备份都是坏的
For the first “real” post in this blog, we’ll build an x86 assembler in less than 256 lines of C code. Obviously, it won’t implement every x86 instruction, but it will implement a surprisingly useful subset: data moveme...
lilith A POSIX-like x86-64 kernel and userspace written in Crystal. Building lilith needs to be compile...
Page not found · GitHub Pages File not found The site configured at this address does not contain the requested file. If this is your...