28

4 Ways To Speed Up Sorting Card Decks

 3 years ago
source link: https://towardsdatascience.com/4-ways-to-speed-up-sorting-card-decks-34b8b62b5bb5?gi=d4b9c1a62690
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Machine Learning | Data Science | Software Engineering

4 Ways To Speed Up Sorting Card Decks

Time Complexities, Code and Intuitive Explanations for Common Sorting Algorithms

Jun 13 ·7min read

YBfYN36.jpg!web

Photo by Olav Ahrens Røtne on Unsplash

Have you ever tried to sort a deck of cards by hand? You probably, intuitively, used the insertion sort algorithm. The following article will explain why this algorithm works and how long it takes. The good news is that you do end up with a sorted set, but, the bad news is that it can, unfortunately, take a while! The following article tells you why.

Computational Efficiency is key in Software Engineering and in that, a common time sink is routine tasks involving sorting. As an example, the following domains are directly impacted by the ability to sort something quickly:

  1. Searching : searching is much more efficient on a sorted set
  2. Selecting : selecting the n’th element is much more efficient on a sorted set
  3. Duplicates: identifying duplicates is much quicker on a sorted set
  4. Distributional Analysis : much quicker on a sorted set.

Moreover, as we enter an age of Big Data, we need to make sure we’re using the right sorting algorithm to efficiently manage our data to help us, as researchers and practitioners, get to conclusions quicker.

Trust me, it can take a while to sort millions of rows if you use the wrong algorithm.

Now before we get into the specific of each of these algorithms, we first need to decide on a metric to compare the speed of each algorithm, or otherwise, the worst case scenario. Lets therefore first go over the concept of Time Complexity.

Time Complexity and Big O Notation

In Layman terms, the computational complexity is the amount of time it takes to run an algorithm. Often, this is represented in what’s called “Big O Notation”, which is a method of writing the limiting (or worst case) behaviour of an algorithm.

So for example, if you wanted to count the number of items in a list, you could just go through the list one by one and this would take n steps. Even for a list that’s monumentally large, this algorithm would take n steps. Therefore the notation, in this case, would be O(n) .

Here are some other common examples:

O(n²) : This is easier to explain with an example. Say you’re checking if any element in your list is duplicated, then you have to compare every element in your list to the chosen element (iteratively), thus, you’re doing n checks across n items, thus, n*n = n²

O(log n) : Say you have a list, you cut it in half, and again, and again, until you only have one item left.

The following chart helps you visualise the speeds of different time complexities, for which, you can see O(log(n)) is quicker than O(n) , which is quicker than O(n²)

qyINBzE.png!web

Number of Steps in different time scales [ source ]

Now we’ve covered the concept of Time Complexity and Big O Notation, let’s move onto our algorithms.

Bubble Sort

The O.G. (maybe a bit of a stretch…) of sorting algorithms is the Bubble Sort. It’s the most commonly known sorting algorithm that you’ll probably have been asked about it in an interview or two.

iaUfyye.jpg!web

Photo by Braedon McLeod on Unsplash

The algorithm takes repeated steps through a list and compares adjacent items, swapping them if they’re in the wrong order. The algorithm keeps looping through the list until the list is ordered. Now depending on how you aim to order your list (highest to lowest or vice versa), this can naturally lead to differences in the time complexity of the problem.

Now assuming everything is in exactly the wrong order, then for every change you make, you have to check every other item in your list, so ultimately, if you are in this unfortunate setting, a Bubble Sort algorithm would take you roughly О(n²) . Therefore, bubble sort is not a practical sorting algorithm.

Let’s move onto something a bit better.

Example code at the end

Merge Sort

Here we begin to get a bit fancier. Merge Sort works by splitting an unsorted list into separate groups (by half) recursively until there is one element per group. Then, you compare each element and regroup them, repeating until the list is merge and sorted.

This sounds a bit complicated, but Swfung8 has created a great gif below to show how it works in practise here:

NJzaiyA.gif
Credit to Swfung8 — Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=14961648

Now the overall time complexity of Merge sort is O(nLogn) because the splitting phase takes roughly O(logn) time with the regrouping requiring n steps for each sub-group, coming together to take O(nlogn) , which is much faster than Bubble Sort.

Note if we want to get even fancier, we can also talk about the space complexity of a problem (which is the amount of space a problem requires). For Merge Sort, this is O(n) . This means that this algorithm takes a lot of space and may slow down operations for the last data points.

Example code at the end

Quick Sort

Like Merge Sort, QuickSort is a Divide and Conquer algorithm, but it’s famous for being about 2 or 3 times quicker. It picks an element as pivot and then partitions the list around the pivot point. The sub-arrays are then sorted recursively which can be done in-place, requiring small additional amounts of memory to perform the sorting.

I3Ib2eF.gif

Credit to : RolandH, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1965827

In most standard implementations it performs significantly faster than merge sort and rarely reaches its worst case complexity of O(n²) , however, generally speaking, it tends to be quicker than O(nlogn) .

Example code at the end

Insertion Sort

Now Insertion Sort deserves a quick shout out here because it’s actually reflective on how people sort cards out manually. On each iteration, people would remove one card from the deck, then find where the card belongs within another sorted deck and inserts it in its place, repeating this until the deck of cards is sorted.

VBFvyqZ.gif
Insert Sort: [ source ]

The time complexity of this problem, as you can guess, is quite inefficient as at the worst case, for every step, you have to go through an entirely separate list, and so you can, therefore, reach the worst case quadratic time complexity of O(n²) .

Note that this code has an extremely useful property that because it’s so simple, it can be coded very efficiently, and also, it works ‘offline’, in that it can sort a list as it receives it. However, it still is pretty slow so I’d recommend using quick sort above this.

Example code at the end

So as you can see, the story kind of develops from Bubble Sort to the widely used Quick Sort algorithm, with Insertion Sort showing us the benefits of a code-efficient and intuitive construction, despite it being relatively time inefficient.

A lot of these algorithms are preprogrammed into your language, for example, the sort function in pandas (in Python) uses Quick Sort, so I wouldn’t worry about it too much. Although, if you expect your data is already sorted (in a particular way), then selectively choosing a sorting algorithm you use can definitely improve its efficiency.

Either way, by taking a few extra minutes to think about the problem you’re trying to solve, you can definitely make headway in making quicker and more efficient programs =]

Thanks for reading again!! Let me know if you have any questions and I’ll be happy to help.

Keep up to date with my latest work here !

Code

Bubble Sort

def bubble_sort(array):
    n = len(array)
    for i in range(n):
         for j in range(n - i - 1):
             if array[j] > array[j + 1]:
                 array[j], array[j + 1] = array[j + 1], array[j]
                 already_sorted = False
         if already_sorted:
             break
     return array

Merge Sort

Assuming the list has a length greater than 1.

def merge_list(left_list,right_list):
    result=[] 
    i,j=0,0
    while i<len(left_list) and j<len(right_list):
        if left_list[i] < right[j]:
            result.append(left_list[i])
            i+=1
        else:
            result.append(right_list[j])
            j+=1
    result.extend(left_list[i:]) 
    result.extend(right_list[j:])
    return result

def merge_sort(data):
    middle=len(data)//2
    left_data=merge_sort(data[:middle])
    right_data=merge_sort(data[middle:])
    return merge_list(left_data,right_data)

Quick Sort

def quciksort(list):
    less = []
    equal = []
    greater = []
    pivot = list[0]
    for x in list:
        if x < pivot:
            list.append(x)
        elif x == pivot:
            equal.append(x)
        elif x > pivot:
            greater.append(x)
    return sort(less)+equal+sort(greater)

Insertion Sort

def insertionSort(list):
   for i in range(1, len(list)):
       current = list[i]
       while i>0 and list[i-1]>current:
           list[i] = list[i-1]
           i = i-1
           list[i] = current
   return list

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK