Essential Algorithms: The Merge Sort

Every programmer needs to know their algorithms and data structures. When studying them, you need to be sure to understand exactly what it does, what it's time and space complexity are and why they are that way, and be able to not only code it, but perform it by hand. This is what Essential Algorithms is all about.

Welcome to the first installation of Essential Algorithms, where today, we'll be going over the Merge Sort. Merge Sort is one of the most reliably consistent and well-performing sorting algorithms for sorting large datasets, performing almost 40% fewer comparisons at it's worst than Quick Sort does on average. So let's dig in and see what this algorithm is.

The Merge Sort Algorithm

To start, let's take a simple array of numbers, that are out of order, which we want to sort:

var arr = [27, 369, 63, 126, 252, 990, 54, 18]

The Merge Sort algorithm can take these numbers and put them in order, small to big, or big to small, in consistent and efficient manner. Merge Sort takes a divide and conquer approach recursively; that is, it breaks the array into smaller and smaller ones by calling itself, until each one only has one element inside of it. Then, it compares two numbers at a time, puts them into order and returns them to itself, as it backs out of it's recursively called functions. So, our above array turns into the following sub-arrays:

[[27, 369, 63, 126],
[252, 990, 54, 18]]

And again into:

[[[27, 369], [63, 126]], [[252, 990], [54, 18]]]

And one final time:

[[[[27], [369]], [[63], [126]]], [[[252], [990]], [[54], [18]]]]

With the elements sorted into sub-arrays, each pair needs to be evaluated, so that when merged together, the left value is smaller than the right one. In this example, only the last pair of single-element arrays has a larger value on the left, so

[54], [18]

would be merged in as

[18,54]

[[[27, 369], [63, 126]], [[252, 990], [18, 54]]]

With the smallest element held at index[0] of the most nested sub-arrays, the sub-arrays can now be merged, in order, in an efficient manner. To do so, compare the very first element of both arrays. Then, take the smaller element from the sub-arrays and put them into the array you are going to return, while taking them out of the arrays you are comparing.

Thus, comparing the arrays

[27, 369]

and

[63, 126]

, we need only to look at the first item of the arrays. Since 27 < 63, 27 would be taken out of the first array, and added to the one we will return. This leaves us comparing

[369], [63,126]

. We need only to look at the first elements again, and see 63 is next, so our returning array is

[27,63]

and we still have

[369], [126]

to compare. We then add 126, are left only with 369, so we add that. In all, the steps look like this:

[27, 369] [63,126]
[]

[369] [63,126]
[27]

[369] [126]
[27, 63]

[369]
[27, 63, 126]

[27, 63, 126, 369]

After the second iteration, the arrays look like this:

[[27, 63, 126, 369], [18, 54, 252, 990]]

The arrays are then merged again in the exact same way, comparing 27 to 18, then 27 to 54, then 63 and 54, so on, until the entire array is sorted and returned.

When I mentioned earlier that each pair would need to be evaluated, and only

[54],[18]

would need to swap in the first step, both of those were held in their own arrays. This means that one can create a single function to compare the numbers in all instances, by comparing the element at index 0 of both arrays, and removing the lesser item when adding it to the returning array. This will be run until one of the arrays is empty, and the remaining elements of the other array will be appended to the end of the returning elements.

The arrays will be merged together like that, recursively, until there is only one array left, and then it will be completely sorted. In a way, you can break it down into two fundamental steps:

Merging 2 arrays: Take two arrays. Compare the first element of the two of them. The smaller of the elements should be removed from that array, and added to the end of the array we are returning. When all of the elements are removed from one array, just add the rest of the other one to the sorted array and return it.
Divide and conquer: Take an array. Find the middle of it. Pass everything to the left of it into the divide and conquer function, and pass everything to the right of it to the divide and conquer function (so the function has been fed both halves of it), until there is only one. Then, start merging the 2 arrays.

Time Complexity and Efficiency

Merge Sort, much like Heap Sort, has a time complexity of O(n*log(n)) in its best and worst case scenarios. The reason this doesn't change is because it performs all of the operations necessary to completely sort the list whenever it is called; it does not matter how sorted the list is when Merge Sort gets it. It can be already sorted, or it can be as unsorted as possible, but Merge Sort will still have to perform O(log(n)) divisions of the list and O(n) comparisons.

This contrasts with other algorithms like Bubble Sort, which iterate over the list recursively until it is sorted. If given an ideal or already sorted list, Merge Sort's performance at O(n*log(n)) is significantly worse than Bubble Sort's O(n). However, the more unsorted a list is, the more passes Bubble Sort must make over the entire array, so Bubble Sort's average and worst-case scenario is O(n^2), making it terribly inefficient on average and worst cases.

But why is Merge Sort's complexity quasilinear, with a complexity of O(n*log(n))? This is because the longer the list is, the more operations you have to do. That is to say, the algorithm has O(n) from it requiring more time proportionate to the length of the array. The O(log(n)) comes from the number of divisions of the array that takes place, just like the binary tree. In our example, we had 8 elements, we had to break them from arrays with lengths of 8 => 4 => 2 => 1 to begin to compare them. That's 3 divisions, and 2^3 is 8. By comparison, if we used 16 elements, we'd need to divide 16 => 8 => 4 => 2 => 1, which is 4 divisions, and 2^4 is 16! So here we can see the number of operations to split the arrays grows in a logarithmic fashion with respect to the length of the input. Because of that, it both grows at O(log(n)) and O(n), giving us the O(n*log(n)) time complexity.

Merge Sort, when not sorted in place and implemented efficiently with space in mind, has a space complexity of O(n), meaning the memory required to execute grows proportionally to the size of the input. This means, for every byte of input, one byte must be allocated in memory to hold the output.

Code Examples

Enough overview, let's look at a few examples. Both examples will use the same array as above. Reading the above overview, we can break down what we need to do into two basic steps:

We need to recursively break the array into smaller sub arrays
We need to merge the sub arrays together, sorting it as we fill it.

With these 2 main steps, we can now write 2 different functions to implement the Merge Sort algorithm. Bear in mind, these examples are not going to be optimized for anything except code length and readability. I found many examples of the Merge Sort when I was first learning that contained several other parameters and variables, that while they worked and used space efficiently, they were harder to read when first starting out.

Example 1: Python

The first function we will write will be the one to break the supplied array into sub-arrays, and call our other function. Since this will be the parent function of the algorithm, I'll name this

merge_sort()

, which will be passed an unsorted list as it's only argument. It will handle breaking the arrays recursively, and passing them to the function that handles the actual merging and sorting,

merge()

Since we want to break the unsorted list into 2 separate lists, we need to calculate the midpoint of the list. Dividing the list length in half works, but if you're using Python 3, the

does floating-point operations, so use the

//

to specify integer division. We can declare

mid = len(sorted)//2

. With our calculated mid point, we can then pass the halves of the array back into

merge_sort()

def merge_sort(unsorted):
    mid = len(unsorted)//2
    left = merge_sort(unsorted[:mid])
    right = merge_sort(unsorted[mid:])

Now, this works fine if the arrays are longer than 1 element, but once we get down to one, we're gonna start getting errors and infinite loops. So, we need to tell the program to only do this while there's more than one element in each unsorted list passed to the function.

When an array or list of 2 or more elements is passed to the function, we want it to return to us the sorted list, which will be our next function. If only one element is passed, we want it to simply return that one element. So, we can wrap this all in an if/else statement, and add return statements, and this function will be complete:

def merge_sort(unsorted):
    if len(unsorted) > 1:
        mid = len(unsorted)//2
        left = merge_sort(unsorted[:mid])
        right = merge_sort(unsorted[mid:])  
        result = merge(left, right)
        return result
    else:
        return unsorted

Calling the

merge_sort()

function before calling our

merge()

function ensures that all of the divisions will be done before we start merging. With this in place, we can start to build the

merge()

function.

The

merge()

function will take 2 arguments, the left and the right arrays. We'll make an empty

sorted_list

to hold the returning results, that we'll populate with

pop(0)

from the left and right lists. We will use a few while loops to perform this task.

While there are both the left and right unsorted lists, we need to compare the first element. The smaller of the two will be popped from the front of the list into the

sorted_list

we will return.

def merge(left, right):
    sorted_list = []
    while len(left) > 0 and len(right) > 0:
        if left[0] <= right[0]:
            sorted_list.append(left.pop(0))
        else:
            sorted_list.append(right.pop(0))

So what happens when we are left with a list that is empty and one that isn't? Well, nothing. Since we don't know which list will run out first, or how long that list will be, after this loop is completed we need to add two more to check and see if the left and right are still populated, and add them to the

sorted_list

before returning. This leaves us with the following

merge()

function:

def merge(left, right):
    sorted_list = []
    while len(left) > 0 and len(right) > 0:
        if left[0] <= right[0]:
            sorted_list.append(left.pop(0))
        else:
            sorted_list.append(right.pop(0))
    while len(left) > 0:
        sorted_list.append(left.pop(0))
    while len(right) > 0:
        sorted_list.append(right.pop(0))
    return sorted_list

So now, our entire

mergesort.py

file looks like this:

def merge_sort(unsorted):
    if len(unsorted) > 1:
        mid = len(unsorted)//2
        left = merge_sort(unsorted[:mid])
        right = merge_sort(unsorted[mid:])  
        result = merge(left, right)
        return result
    else:
        return unsorted

def merge(left, right):
    sorted_list = []
    while len(left) > 0 and len(right) > 0:
        if left[0] <= right[0]:
            sorted_list.append(left.pop(0))
        else:
            sorted_list.append(right.pop(0))
    while len(left) > 0:
        sorted_list.append(left.pop(0))
    while len(right) > 0:
        sorted_list.append(right.pop(0))
    return sorted_list

And we can now test it out and see that it works:

And there you go! An easy to read, short implementation in Python.

Example 2: Javascript

Again, in this example we write our first

merge_sort()

function in order to break the unsorted array into smaller sub-arrays, calling itself recursively until all of the divisions have been done, before calling the

merge()

function.

Since we need to break this into 2 separate arrays, we again need to calculate the midpoint. Javascript will perform a floating-point calculation for odd numbers, so we'll use

Math.floor()

to round down, like Python did with the

//

divisor. So, we can set a variable for our midpoint, make our left and right arrays, and pass them back to

merge_sort()

before calling

merge()

const merge_sort = (function(unsorted){
    var mid = Math.floor(unsorted.length/2)
    var left = unsorted.slice(0,mid)
    var right = unsorted.slice(mid,)
})

This works just fine with arrays that are more than one element, but will cause an infinite loop and errors about undefined variables when you have only one item. We will have to wrap this in an if statement, performing this block of code only if

unsorted

has more than one element, and to just return the element if there's only one. We also need to call our

merge()

function for the left and right arrays, and return that if there are more than one element in the

unsorted

array.

const merge_sort = (function(unsorted){
    if(unsorted.length > 1){
        var mid = Math.floor(unsorted.length/2)
        var left = merge_sort(unsorted.slice(0,mid))
        var right = merge_sort(unsorted.slice(mid,))
        return merge(left, right)
    } else {
        return unsorted
    }
})

Now we can start on our

merge()

function. We'll initialize an array to hold our sorted results in, and while the left and right arrays both contain elements, we'll compare the first element of the two. Whichever element is smaller, will be shifted from that array, and pushed into the results. To keep the code short and compact, and with such a simple

if/else

statement, I opted to use a one-line ternary operation. If you're not familiar with ternary operations, they're one-line if-statements structured like so:

(condition to evaluate) ? (do this if true) : (do this if false)

The expression that would be held in the if-statement is written out first, followed by the

. The next expression is what to do if the evaluated statement is true, followed by a

, and then what to do if the operation is false. I prefer these in JavaScript and don't like them in Python because I find the readability easier. In Python, the syntax is

(do this if true) if (statement to evaluate) else (do this if false)

, leaving the statement to be evaluated in the middle, and the desired operations on each side. Personally, I don't find it as easy to read. It's one of those Python quirks I don't like, much like

split()

join()

syntax.

So, our first

while

loop will contain the ternary function to move the elements between the arrays appropriately, and we'll need to add 2 more

while

loops to check both the left and right arrays for remaining elements after the first loop, and add them to the sorted array.

const merge = (function(left, right){
    var sorted = []
    while(left.length > 0 && right.length > 0){
        left[0] < right [0] ? sorted.push(left.shift()) : sorted.push(right.shift())
    }
    while(left.length>0){
        sorted.push(left.shift())
    }
    while(right.length>0){
        sorted.push(right.shift())
    }
    return sorted
})

Our complete

mergesort.js

file now looks like this:

const merge_sort = (function(unsorted){
    if(unsorted.length > 1){
        var mid = Math.floor(unsorted.length/2)
        var left = merge_sort(unsorted.slice(0,mid))
        var right = merge_sort(unsorted.slice(mid,))
        return merge(left, right)
    } else {
        return unsorted
    }
})

const merge = (function(left, right){
    var sorted = []
    while(left.length > 0 && right.length > 0){
        left[0] < right [0] ? sorted.push(left.shift()) : sorted.push(right.shift())
    }
    while(left.length>0){
        sorted.push(left.shift())
    }
    while(right.length>0){
        sorted.push(right.shift())
    }
    return sorted
})

We can now test it in the browser console to make sure it works:

Time Complexity Revisited

Now that we have two different examples of the algorithm imposed in 2 different languages, we can come back and analyze it again to understand how and why Merge Sort's complexity is O(n*log(n)). Both of the examples above create two functions to perform the 2 major components of the algorithm. Both implementations use the parent function

merge_sort()

to recursively divide the lists it gets passed in half. Because of this, the actual

merge_sort()

function will be executed several times, and that number of times grows with the logarithm of the size of the list passed to it. So, we can say that the

merge_sort()

function itself has a complexity of O(log(n)).

On the other hand, the

merge()

function that gets called by

merge_sort()

has the task of taking the two inputs and combining them into a single output. The amount of time this will take is determined entirely by the size of the inputs passed to it. So, we can say the

merge()

function has a complexity of O(n). The

merge()

function, at O(n) gets called once for every time

merge_sort()

gets called, and the number of times

merge_sort()

gets called is log(n) times, giving us our final complexity O(n*log(n)).

The Merge Sort algorithm is consistently efficient, which is why it is used as the default sorting algorithm in languages like Java and Perl, and part of the hybrid Timsort used by Python. The Time Complexity stays constant, a good thing when considering performance and optimization, and due to the nature of it dividing the input in half recursively, the algorithm can be highly optimized if one has access to multiple CPUs at once.

So there you have it, the Merge Sort. Hopefully after this, you have a solid grasp on how to do it, and even actually understand what O(n*log(n)) actually means on an intuitive level. If you enjoyed this or learned something, share it!