

GitHub - scandum/fluxsort: A stable adaptive partitioning comparison sort.
source link: https://github.com/scandum/fluxsort
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Intro
This document describes a partitioning stable adaptive comparison-based sort named fluxsort. Benchmarks and a visualization are available at the bottom.
Analyzer
Fluxsort starts out with an analyzer that detects fully sorted arrays and sorts reverse order arrays using n comparisons. It also obtains a measure of presortedness and switches to quadsort if the array is less than 25% random.
Partitioning
Partitioning is performed in a top-down manner similar to quicksort. Fluxsort obtains the pseudomedian of 9 for partitions smaller than 1024 elements, and the pseudomedian of 15 otherwise. The median element obtained will be referred to as the pivot. Partitions that grow smaller than 24 elements are sorted with quadsort.
After obtaining a pivot the array is parsed from start to end. Elements smaller than the pivot are copied in-place to the start of the array, elements greater than the pivot are copied to swap memory. The partitioning routine is called recursively on the two partitions in main and swap memory.
Recursively partitioning through both swap and main memory is accomplished through the ptx pointer, which despite being simple in implementation is likely a novel technique since the logic behind it is a bit of a brain-twister.
Worst case handling
To avoid run-away recursion fluxsort switches to quadsort for both partitions if one partition is less than 1/16th the size of the other partition. On a distribution of random unique values the chance of a false positive is 1 in 65,536 for the pseudomedian of 9 and 1 in 16,777,216 for the pseudomedian of 15.
Combined with the analyzer fluxsort starts out with this makes the existence of killer patterns unlikely, other than a 2x performance slowdown by triggering the use of quadsort prematurely.
Branchless optimizations
Fluxsort uses a branchless comparison optimization similar to the method described in "BlockQuicksort: How Branch Mispredictions don't affect Quicksort" by Stefan Edelkamp and Armin Weiss.
Median selection uses a novel branchless comparison technique that selects the pseudomedian of 9 using between 8 and 12 (10.66 average) comparisons, and the pseudomedian of 15 using between 14 and 25 (21.33 average) comparisons.
These optimizations do not work as well when the comparisons themselves are branched and the largest performance increase is on 32 and 64 bit integers.
Memory
Fluxsort allocates n elements of swap memory, which is shared with quadsort. Recursion requires log n stack memory.
Data Types
The C implementation of fluxsort supports long doubles and 8, 16, 32, and 64 bit data types. By using pointers it's possible to sort any other data type, like strings.
Interface
Fluxsort uses the same interface as qsort, which is described in man qsort.
┌───────────────────────┐┌───────────────────────┐ │comparisons ││swap memory │ ┌───────────────┐├───────┬───────┬───────┤├───────┬───────┬───────┤┌──────┐┌─────────┐┌─────────┐ │name ││min │avg │max ││min │avg │max ││stable││partition││adaptive │ ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤ │fluxsort ││n │n log n│n log n││1 │n │n ││yes ││yes ││yes │ ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤ │quadsort ││n │n log n│n log n││1 │n │n ││yes ││no ││yes │ ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤ │quicksort ││n │n log n│n² ││1 │1 │1 ││no ││yes ││no │ ├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤ │introsort ││n log n│n log n│n log n││1 │1 │1 ││no ││yes ││no │ └───────────────┘└───────┴───────┴───────┘└───────┴───────┴───────┘└──────┘└─────────┘└─────────┘
Porting
People wanting to port fluxsort might want to have a look at twinsort, which is a simplified implementation of quadsort. Fluxsort itself is relatively simple.
Visualization
In the visualization below four tests are performed on 512 elements: Random, Generic, Random Half, and Ascending Tiles. Partitions greater than 48 elements use the pseudomedian of 15 to select the pivot.
Benchmarks
The following benchmark was on WSL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. The bar graph shows the best run out of 100 on 100,000 32 bit integers. Comparisons for fluxsort and std::stable_sort are inlined.
data tableThe following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1). The source code was compiled using gcc -O3 bench.c. The bar graph shows the best run out of 100 on 100,000 32 bit integers. Comparisons for fluxsort and qsort are not inlined. The stdlib qsort() in the benchmark is a mergesort variant.
data tableRecommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK