Thursday, March 12, 2015

Merge Sort for Delphi Revisited

A reader named dangph did me a great favor and created a complete test case for my MergeSort unit. He had already commented on my StackOverflow answer.

Well, he found that the algorithm was not stable after all. And he is correct. BTW, thank you!
So I did a little test to discover why it was not stable. Some hidden bug or a complete logic failure? Seems that the book algorithm has a bug... The author uses an "improved" Insertion sort algorithm when the element set is small enough. Take a look at the original code:

  // find the smallest element in the list
  IndexOfMin := FirstIndex;
  for i := FirstIndex to LastIndex do begin
    if (CompareFunc(ptrList^[i], ptrList^[IndexOfMin]) < 0) then begin
      IndexOfMin := i;
    end;
  end;
  if (FirstIndex <> IndexOfMin) then begin
    Temp := ptrList^[FirstIndex];
    ptrList^[FirstIndex] := ptrList^[IndexOfMin];
    ptrList^[IndexOfMin] := Temp;
  end;

Explanation for that, from the book:

A much better optimization is to scan the entire list for the smallest item and swap it into the first position (in essence, performing the first cycle of selection sort).

 But this approach just breaks the stability. Take a look at some real data used when debugging this algorithm. Please note the value column: the first number is the Key and the second number is the Index. The Index keeps the original position of this element in the list, when it was added/created:

This first picture is a snapshot of some part of the list, before entering that part of the code (from Index=50 to Index=62). Please note elements 50 and 60. Element 50 is the first of the list, and element 60 is the smallest element on the list. The piece of code above will swap them (put the smallest element of that part of the list in the first position, skipping the first cycle of selection sort). And this is the result:


Now, please note that elements 50 and 60 were indeed swapped. But this just broke the stability of the merge sort algorithm because, as you can see, there is another element (53) that has a bigger original index (it was originally ahead) than the element in position 50. In order to make this stable, Element 60 should be swapped with element 50, which in turn should be swapped with element 53.

To avoid all this trouble, I decided to use a simple insertion sort in place of the improved insertion sort of the book. Now the code passes all test cases! :-)

Please find the fixed algorithm and the test case project at Bitbucket. Link here.

Tuesday, March 10, 2015

So, .NET code is not slower than native Delphi code?

Just to prove once again that .NET IS slower than Delphi generated code in LOTS of scenarios:

In this post I created a few implementations for a StringReplace() replacement and found that my custom FastStringReplace() implementation was the fastest, correct?

Now, I want to check if it is REALLY fast. Why not compare with something like .NET super-duper-optimized-code?

In .NET, they told me somewhere that I could not do much more than string.Replace() method. So I did a simple WinForms application in C# using VS 2013 + .NET 4.5, using build configuration. The code is really simple:

for (int i=1; i <= 1000000; i++) {
    sourceString = originalString;
    s = sourceString.Replace(searchPattern, replacePattern);
}

You can test all the 6 test cases I did in my previous Delphi project. I won't give you all the numbers, just one:

Test case 1, 1000000 iterations:

C# code: 6.2 seconds

Delphi code: 0.98 seconds. Let's round it: 1.0 second.

Then, my Delphi native code runs 600% faster than C# code, correct?

So... There is no algorithm involved, no hidden compiler setting, no tricky cache thing, nothing. I just have one string and want to replace some parts of it and copy the new string to another variable.

But C# programmers will still notice that in .NET strings are immutable and this is creating a new string and blah, blah, blah. But Delphi code also creates a new string, because the whole content of the old string is copied to a new string and returned as the result. Of course the Delphi string is not immutable and there is a clear advantage here, but don't blame me! Blame Microsoft, Anders Hejlsberg, Obama, financial crisis...
Both C# and Delphi code are pretty basic and can be found in any software out there written in those languages.
Then another C# programmer I know told me: "Oh, but you didn't compare Delphi's standard StringReplace(), you used a custom code". Yes that's true. And that's the beauty behind non-managed, native languages like Delphi. I can use whatever I want, including assembly code. I'm not limited to classes and methods available in some framework. In fact I patched my Delphi program at runtime, and I don't even need to replace the standard StringReplace() calls with the FastStringReplace().

Also, in StackOverflow I found some "tips" to make the C# code faster: use StringBuilder class instead of string.Replace() method. But, creating a StringBuilder instance inside the loop makes no big difference.


Thursday, March 5, 2015

Fastest StringReplace for Delphi?

Not sure. At least this is the fastest pure pascal, unicode compatible, StringReplace() alternative I've ever tested. I wrote this based on some very old code I found in my HDD and I just don't know where it came from.
I also use a Pos() replacement for x64 (this came directly from FastCode project), because Delphi x64 Pos() function is DOG SLOW!!!

If someone has a faster StringReplace() implementation, written in pure pascal, I would be glad to know it :-)

This is my final conclusion:

In my tests it is at least 20% faster in small strings and up to 500% faster in most cases.
Some corner cases tests showed that this version can be 100 times (yes!) faster than standard StringReplace(), e.g. Length(OldPattern) = Length(NewPattern) and huge number of occurrences of OldPattern in the string. It also kicks TStringBuilder.Replace() butt in all tested scenarios.

Not bad at all, isn't it?

The project code is in my BitBucket repository here

Take a look at these results for instance. In this case we have:

  • A long string (S)
  • Length(SearchPattern) = 1
  • Length(ReplacePattern) > Length(SearchPattern)
  • Multiple occurrences of the SearchPattern within the string S
The results are quite impressive:
  • Standard StringReplace(): 0,7350 secs
  • StringReplace1: 0,2650 secs
  • StringReplace2: 0,2970 secs
  • StringReplace3: 2,1410 secs
  • StringReplace4: 0,4060 secs
  • FastStringReplace: 0,1090 secs
FastStringReplace is my custom implementation. It is more than 100% faster than the second place (StringReplace1, based on TStringBuilder.Replace() method). The most important: It is 7 times faster than standard StringReplece()! This difference gets bigger and bigger if S grows.
In some specific cases, FastStringReplace() may be slower than one of the implementations, but not by much (10% or less). You also have to notice that:

  • TStringBuilder can't handle case insensitive search/replace, so it is not a real replacement for StringReplace() without reviewing your code completely.
  • StringReplace4 implementation beats FastStringReplace in really small strings, but has the same limitation of TStringBuilder.
  • Standard StringReplace() is fast for small strings but dog slow if S is long. In really long strings (32 Kb or more), StringReplace() performance is almost prohibitive.

Ops! I almost forgot! This is also part of IntraWeb from version 14.0.38 and up.

About the license.... this code is FREE and there is no boring license. If you make any improvements (or fixes) please let me know. :-)