Tuesday, February 7, 2012

Investigating string concatenation performance in Delphi

I got really puzzled when I read Arnaud Bouchez blog post Delphi doesn't like multi-core CPUs (or the contrary). Delphi's generated code should be fast, shouldn't it?? Maybe it is very fast in a single threaded program or even in a multi-threaded program running on a single core computer. As pointed by Arnaud, the asm LOCK prefix is used to ensure exclusive access to the memory address. In a multi-core CPU, all cores just freeze during LOCK execution.

What about Delphi?

Quoting the original post:
String types and dynamic arrays just use the same LOCKed asm instruction everywhere, i.e. for every access which may lead into a write to the string.
So, if you are writing to a string, there are a few LOCK instructions being generated by the compiler, every single time. Even if your computer is a super-duper multi-core machine, it will behave - during these instructions - if they were a single-core CPU.

What about string concatenation in Delphi?

Well, if you do a lot of string concatenation, then a really big part of the time is being spent writing to strings, isn't it? Maybe, during the string concatenation, your cores are being LOCKed and your software is not using all the CPU power at its disposal... So, I decided to create a simple test case to show me if this is true, and how much this may affect the performance.

The test

I've created 2 different thread types, doing the same work during their execute method (a simple string concatenation inside a tight loop). The first uses a standard string type:
type
  TMyThread_String = class(TThread)
  private
    FStr: string;
  public
    procedure Execute; override;
  end;

procedure TMyThread_String.Execute;    
begin
  FCount := 0;
  repeat
    FStr := 'string 1';
    FStr := FStr + ' + string 2';
    Inc(FCount);
  until FCount = MaxCount;
end;
The second thread type uses a TStrBuilder class (my own TStringBuilder implemenation):
type
  TMyThread_StrBuilder = class(TThread)
  private
    FStr: TStrBuilder;
  public
    procedure Execute; override;
  end;

procedure TMyThread_StrBuilder.Execute;  
begin
  FCount := 0;
  repeat
    FStr.Clear;
    FStr.Append('string 1').Append(' + string 2');
    Inc(FCount);
  until FCount = MaxCount;
end;
Both threads do exactly the same. So I've put both on test: First I've created two instances of TMyThread_String and started both in parallel (using a dual-core machine). Then I repeated the test, but using TMyThread_StrBuilder.

The results

Impressive results! These are the task manager CPU charts:

Running with TMyThread_String:


Running with TMyThread_StrBuilder:


Note that the test with TStrBuilder class is burning 100% of my CPU, while the test with standard string type just can't use all the resources, even if my PC is idle.

More results

The times spent to do the string concatenation are even more impressive:

Running with TMyThread_String: 26.4 seconds
Running with TMyThread_StrBuilder: 7.3 seconds. Almost 4 times faster!

The ShortString case

Following Arnaud advice, I've tried the ShortString type also, so I've created a third thread type, using ShortString concatenation, and not a standard string type. The results are identical to those obtained using standard string type: The same CPU usage chart (ranging from 60-70%) and a slightly worse time (28.5 seconds).

Conclusion

Using my simple test, I've discovered that if you have heavy string concatenation (writing) in a multi-threaded environment (for instance: COM+ servers, Intraweb applications, multi-threaded services, etc.) maybe you should avoid standard string types, and use another approach like a TStringBuilder class.
Please note:
  • I did not use the TStringBuilder class from Delphi's own RTL, but my own TStrBuilder class. AFAIK, standard TStringBuilder has the same problem because it uses a standard string internally...
  • I have to investigate a little further to know why ShortStrings are as bad as standard strings (or even worse).
More on this later!