2
votes

I know the basic concept of the merge sort algorithm but when it comes to implementing it via recursion I am having trouble grasping how it works. From what I understand, the merge sort function splits our current array into two halves and using recursion we keep doing this until we are left with 1 element for each side.

If our array is {38, 27, 43, 3, 9, 82, 10} then our recursion will start by calling itself using the subarray (left side of the original array) and repeat the process each time, halving the array and storing the left most side until we reach 1 element:

38 27 43 3 9 82 10
38 27 43 3 <-split
<---first subroutine/recursion
38 27 <-split
<---second subroutine/recursion
38 <---only 1 element left so we return the value back to the first subroutine that called

Then in our second subroutine we move on to the next line: right = merge_sort(right) which again calls itself to split the subarray and storing the right most side:

38 27 <-split
<---second subroutine/recursion
   27
<---only 1 element left so we return the value back to the first subroutine that called

Then in our second subroutine we move on to the next line: result = merge(left, right) which calls the merge function to sort our left and right arrays that are just 38 and 27. The merge function sorts our two values based on which is smaller and then it adds the first one to an array although I'm not sure which array. (I need specification on this; shouldn't we have a new array every time we merge two previous arrays?) Then the merge function returns the "result" to another result variable in our merge sort function from having called the merge function. I am assuming this result is the new array that has 38 and 27 sorted in order. Then it looks like we are returning that result again to whatever called the merge sort function but I am confused because wouldn't that end everything? What about the first subroutine that paused for the left side recursion? I'm not sure what happens to:

38 27 43 3
      43 3
      43
and
      43 3
         3

Pseudo-code:

 function merge_sort(m)
    if length(m) ≤ 1
        return m
    var list left, right, result


    var integer middle = length(m) / 2
    for each x in m up to middle
         add x to left
    for each x in m after middle
         add x to right
    left = merge_sort(left)
    right = merge_sort(right)
    result = merge(left, right)
    return result

Following writing merge_sort function, then it is required to merge both the left and right lists created above. There are several variants for the merge() function; one possibility is this:

function merge(left,right)
    var list result
    while length(left) > 0 or length(right) > 0
        if length(left) > 0 and length(right) > 0
            if first(left) ≤ first(right)
                append first(left) to result
                left = rest(left)
            else
                append first(right) to result
                right = rest(right)
        else if length(left) > 0
            append first(left) to result
            left = rest(left)             
        else if length(right) > 0
            append first(right) to result
            right = rest(right)
    end while
    return result

http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Merge_sort.html

1

1 Answers

3
votes

I'm not sure whether it is what you're looking for, but you can simplify your merge loop by replacing or with and in the main condition:

while length(left) > 0 and length(right) > 0
    if first(left) ≤ first(right)
        append first(left) to result
        left = rest(left)
    else
        append first(right) to result
        right = rest(right)
end while

# You know that one of left and right is empty
# Copy the rest of the data from the other
while length(left) > 0
    append first(left) to result
    left = rest(left)             
end while
while length(right) > 0
    append first(right) to result
    right = rest(right)
end while

Yes, there are three loops, but only one of the last two is ever executed.


Working C99 code based closely on pseudo-code

Thus code uses C99 variable-length arrays (an optional feature in C11). If compiled with -DDEBUG, you'll get extensive tracing while the program is running. If compiled without, you only get the input (unsorted) and output (sorted) arrays printed. I needed it to diagnose a stupid typo (an r_pos where an l_pos was clearly required). Note the general techniques:

  1. Document entry and exit from functions
  2. Create a diagnostic print function (here dump_array() with one argument a 'tag' (to identify which call is being used) and the other arguments the data structure to be printed.
  3. Call the diagnostic print function at suitable points.
  4. Make it easy to enable or disable diagnostics.

For production quality code, my diagnostic print functions also take a FILE *fp argument and write to the given file; I cheated and used stdout here. The extra generality means the function can be used to write to stderr or a log file as well as, or instead of, stdout.

Space management

The merge_sort() code copies the complete input array into two smaller arrays (left and right) and then sorts the smaller arrays (recursion) and merges the sorted smaller arrays into the input array. This happens at each of log N levels of recursion. Some empirical testing shows that the space used is approximately 2N items — it is O(N) space usage.

Shouldn't we have a new array every time we merge two previous arrays?

In a functional programming language, you would have new arrays. In C, you use the input array as the output array too. The code copies the original input array into separate smaller arrays, sorts those smaller arrays, and merges the sorted smaller arrays into the original array.

My other question is what procedure in the code allows us to go back to before the recursion where we split the left side of our array so we can work on the right side to get 43 a 3 in order to merge them as well.

The splitting process creates a copy of the input array (so the information in the original data is temporarily superfluous). The merging process copies the (now sorted) split arrays back into the original array. (Largely repeating myself.)

Source

#include <stddef.h>

extern void merge_sort(int *array, size_t arrlen);

/* Debug */
#ifdef DEBUG
static void dump_array(const char *tag, int *array, size_t len);
static void enter_func(const char *func);
static void exit_func(const char *func);
#else
#define dump_array(t, a, l) ((void)0)
#define enter_func(f)       ((void)0)
#define exit_func(f)        ((void)0)
#endif

/*
function merge(left, right)
   var list result
    while length(left) > 0 and length(right) > 0
        if first(left) ≤ first(right)
            append first(left) to result
            left = rest(left)
        else
            append first(right) to result
            right = rest(right)
    end while

    # You know that one of left and right is empty
    # Copy the rest of the data from the other
    while length(left) > 0
        append first(left) to result
        left = rest(left)             
    end while
    while length(right) > 0
        append first(right) to result
        right = rest(right)
    end while
    return result
end function
*/

static void merge(int *left, size_t l_len, int *right, size_t r_len, int *output)
{
    size_t r_pos = 0;
    size_t l_pos = 0;
    size_t o_pos = 0;
    enter_func(__func__);
    dump_array("Left:", left, l_len);
    dump_array("Right:", right, r_len);
    while (r_pos < r_len && l_pos < l_len)
    {
        if (right[r_pos] < left[l_pos])
            output[o_pos++] = right[r_pos++];
        else
            output[o_pos++] = left[l_pos++];
    }
    while (r_pos < r_len)
        output[o_pos++] = right[r_pos++];
    while (l_pos < l_len)
        output[o_pos++] = left[l_pos++];
    dump_array("Output:", output, r_len + l_len);
    exit_func(__func__);
}

/*
function merge_sort(m)
    if length(m) ≤ 1
        return m
    var list left, right, result

    var integer middle = length(m) / 2
    for each x in m up to middle
        add x to left
    for each x in m after middle
        add x to right
    left = merge_sort(left)
    right = merge_sort(right)
    result = merge(left, right)
    return result
*/

void merge_sort(int *array, size_t len)
{
    if (len <= 1)
        return;
    int left[(len+1)/2];
    int l_pos = 0;
    int right[(len+1)/2];
    int r_pos = 0;
    size_t mid = len / 2;

    enter_func(__func__);
    dump_array("Input:", array, len);
    for (size_t i = 0; i < mid; i++)
        left[l_pos++] = array[i];
    for (size_t i = mid; i < len; i++)
        right[r_pos++] = array[i];
    dump_array("Left:", left, l_pos);
    dump_array("Right:", right, r_pos);
    merge_sort(left, l_pos);
    merge_sort(right, r_pos);
    merge(left, l_pos, right, r_pos, array);
    dump_array("Result:", array, len);
    exit_func(__func__);
}

/* Test code */
#include <stdio.h>

#ifdef DEBUG
static void enter_func(const char *func)
{
    printf("-->> %s\n", func);
}

static void exit_func(const char *func)
{
    printf("<<-- %s\n", func);
}
#endif

/* dump_array is always used */
#undef dump_array

static void dump_array(const char *tag, int *array, size_t len)
{
    printf("%-8s", tag);
    for (size_t i = 0; i < len; i++)
        printf(" %2d", array[i]);
    putchar('\n');
}

int main(void)
{
    int array[] = { 38, 27, 43, 3, 9, 82, 10 };
    size_t arrlen = sizeof(array) / sizeof(array[0]);

    dump_array("Before:", array, arrlen);
    merge_sort(array, arrlen);
    dump_array("After:", array, arrlen);
    return 0;
}

Sample outputs

Non-debugging

Before:  38 27 43  3  9 82 10
After:    3  9 10 27 38 43 82

Debugging

Before:  38 27 43  3  9 82 10
-->> merge_sort
Input:   38 27 43  3  9 82 10
Left:    38 27 43
Right:    3  9 82 10
-->> merge_sort
Input:   38 27 43
Left:    38
Right:   27 43
-->> merge_sort
Input:   27 43
Left:    27
Right:   43
-->> merge
Left:    27
Right:   43
Output:  27 43
<<-- merge
Result:  27 43
<<-- merge_sort
-->> merge
Left:    38
Right:   27 43
Output:  27 38 43
<<-- merge
Result:  27 38 43
<<-- merge_sort
-->> merge_sort
Input:    3  9 82 10
Left:     3  9
Right:   82 10
-->> merge_sort
Input:    3  9
Left:     3
Right:    9
-->> merge
Left:     3
Right:    9
Output:   3  9
<<-- merge
Result:   3  9
<<-- merge_sort
-->> merge_sort
Input:   82 10
Left:    82
Right:   10
-->> merge
Left:    82
Right:   10
Output:  10 82
<<-- merge
Result:  10 82
<<-- merge_sort
-->> merge
Left:     3  9
Right:   10 82
Output:   3  9 10 82
<<-- merge
Result:   3  9 10 82
<<-- merge_sort
-->> merge
Left:    27 38 43
Right:    3  9 10 82
Output:   3  9 10 27 38 43 82
<<-- merge
Result:   3  9 10 27 38 43 82
<<-- merge_sort
After:    3  9 10 27 38 43 82