Thursday, March 12, 2015

Merge Sort for Delphi Revisited

A reader named dangph did me a great favor and created a complete test case for my MergeSort unit. He had already commented on my StackOverflow answer.

Well, he found that the algorithm was not stable after all. And he is correct. BTW, thank you!
So I did a little test to discover why it was not stable. Some hidden bug or a complete logic failure? Seems that the book algorithm has a bug... The author uses an "improved" Insertion sort algorithm when the element set is small enough. Take a look at the original code:

  // find the smallest element in the list
  IndexOfMin := FirstIndex;
  for i := FirstIndex to LastIndex do begin
    if (CompareFunc(ptrList^[i], ptrList^[IndexOfMin]) < 0) then begin
      IndexOfMin := i;
    end;
  end;
  if (FirstIndex <> IndexOfMin) then begin
    Temp := ptrList^[FirstIndex];
    ptrList^[FirstIndex] := ptrList^[IndexOfMin];
    ptrList^[IndexOfMin] := Temp;
  end;

Explanation for that, from the book:

A much better optimization is to scan the entire list for the smallest item and swap it into the first position (in essence, performing the first cycle of selection sort).

 But this approach just breaks the stability. Take a look at some real data used when debugging this algorithm. Please note the value column: the first number is the Key and the second number is the Index. The Index keeps the original position of this element in the list, when it was added/created:

This first picture is a snapshot of some part of the list, before entering that part of the code (from Index=50 to Index=62). Please note elements 50 and 60. Element 50 is the first of the list, and element 60 is the smallest element on the list. The piece of code above will swap them (put the smallest element of that part of the list in the first position, skipping the first cycle of selection sort). And this is the result:


Now, please note that elements 50 and 60 were indeed swapped. But this just broke the stability of the merge sort algorithm because, as you can see, there is another element (53) that has a bigger original index (it was originally ahead) than the element in position 50. In order to make this stable, Element 60 should be swapped with element 50, which in turn should be swapped with element 53.

To avoid all this trouble, I decided to use a simple insertion sort in place of the improved insertion sort of the book. Now the code passes all test cases! :-)

Please find the fixed algorithm and the test case project at Bitbucket. Link here.

No comments: