2

Go’s new sorting algorithm: pdqsort

 1 year ago
source link: https://itnext.io/gos-new-sorting-algorithm-pdqsort-822053d7801b
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Go’s new sorting algorithm: pdqsort

1*kpB-GqHZzrcjnJjkOUDqWQ.jpeg
Photo by Markus Spiske on Unsplash

I’ve always found Go’s standard library very approachable to read. Parts of the standard library include concepts that are self-contained and do not require too much prior context to dive into. When I read that the sorting algorithm for Go has changed¹ to something called “pdqsort”, I thought it would be nice to go take a look and learn about it.

In this post, we’ll do that by looking into both the previous sorting algorithm and the new one.

So far, Go relied on Quicksort in its standard library. You can already go ahead and check its implementation here but I’ll also provide relevant links as we move along.

In its core, Quicksort algorithm works as follows:

  1. select a pivot²
  2. arrange items that are less³ than the pivot to be on the “left side” of the pivot and the ones that are greater to the “right side”
  3. for both sides of the pivot, apply the same logic recursively from #1

After these steps, we should end up with a sorted collection. This works well in many datasets and we get a nice sorting algorithm with O(n logn)⁴ average complexity. However, there are situations that it does not perform as well and ends up having a quadratic complexity. This means it has a worst case performance of O(n ^ 2)⁵. These situations that Quicksort performs poorly mostly relate to sorted or almost sorted⁶ datasets.

Although I mentioned that Go used Quicksort initially, the Go standard library employed a few tricks to avoid falling into these situations where Quicksort would yield its worst-case performance or at a comparable level.

Reading the source code, some of these are as follows:

  • if the collection size is “small enough”⁷, use⁸ Shellsort
  • if the depth of recursion is too big, switch to⁹ Heapsort
  • try to choose a good pivot¹⁰

As you may have already noticed, Go’s Quicksort implementation was not the simple form of the algorithm but in fact a hybrid called Introsort.

Now, let’s talk about the new sorting algorithm “pdqsort” — Pattern Defeating Quicksort¹¹. Reading the name, it sounds like it’s still Quicksort. The description of the change outlines where it yielded better performance than the previous version in some of the worst case scenarios¹².

Let’s start here and see what has changed & stayed.

We still use to Insertion sort¹³ when the collection size is small and fall back to Heapsort in certain situations. The method used to choose the pivot is also similar to the previous implementation.

Diverging from the previous implementation, pdqsort attempts to avoid the worst-case scenario by moving things around when it encounters different patterns:

  • if we are dealing with a partition that can’t be partitioned partitioned in a balanced way¹⁴, we’ll move things around to avoid such imbalanced partitions in the future. In addition, instead of falling back to Heapsort purely based on the recursion depth, the number of imbalanced partitions is used to switch to the fallback.
  • handle parts of the collection that may already be sorted or sorted in reverse
  • treat elements that are equal to the pivot to avoid unnecessary swaps in the future

There is a lot more than this in the article that introduces the pdqsort algorithm. These are the main differences I could spot by reading the source code and the article. The implementation of the new algorithm seems to have more moving parts, although the code is organized in a way that makes it easy to follow.

This was helpful for me to understand what has changed. I hope this provides a good overview of the change and what it attempts to solve.

1. As of Go 1.18, this is still not released and is expected to land in a future version

2. Pivot is the item to be kept in-place while arranging other items in a given collection

3. The notion of ordering may differ based on the items that are being sorted. These may be integers or more complex data structures.

4. See this for the explanation of time complexity

5. Order of n square.

6. That may contain many repeated elements — see “Dutch national flag problem

7. Less than 13

8. See the relevant piece of code here

9. See here the how the depth of recursion to switch to Heapsort is calculated

10. See this explaining the difficulties and different approaches for choosing a good pivot. Most of these can be found in the Go’s source code as well: avoiding integer overflow, picking the “ninther” as the pivot

11. See the relevant article here for a more thorough explanation of the algorithm.

12. The author of the change included a comparison which suggests it’s about 10 times faster than the previous algorithm for sorted slices and never significantly slower than the previous version.

13. Although the previous algorithm used Shellsort, they are very similar algorithms.

14. This is done by checking either side (see 1 & 2) of the pivot to see if we ended up with 2 partitions of which one is much smaller than the other


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK