12 min read

This tutorial will be explaining how to find performance bottlenecks and apply the correct algorithm to fix them when working with Delphi. Also, teach you how to improve your algorithms before taking you through parallel programming.

The article is an excerpt from a book written by Primož Gabrijelčič, titled Delphi High Performance.

Never access UI from a background thread

Let’s start with the biggest source of hidden problems—manipulating a user interface from a background thread. This is, surprisingly, quite a common problem—even more so as all Delphi resources on multithreaded programming will simply say to never do that. Still, it doesn’t seem to touch some programmers, and they will always try to find an excuse to manipulate a user interface from a background thread.

Indeed, there may be a situation where VCL or FireMonkey may be manipulated from a background thread, but you’ll be treading on thin ice if you do that. Even if your code works with the current Delphi, nobody can guarantee that changes in graphical libraries introduced in future Delphis won’t break your code. It is always best to cleanly decouple background processing from a user interface.

Let’s look at an example which nicely demonstrates the problem. The ParallelPaint demo has a simple form, with eight TPaintBox components and eight threads. Each thread runs the same drawing code and draws a pattern into its own TPaintBox. As every thread accesses only its own Canvas, and no other user interface components, a naive programmer would therefore assume that drawing into paintboxes directly from background threads would not cause problems. A naive programmer would be very much mistaken.

If you run the program, you will notice that although the code paints constantly into some of the paint boxes, others stop to be updated after some time. You may even get a Canvas does not allow drawing exception. It is impossible to tell in advance which threads will continue painting and which will not.

The following image shows an example of an output. The first two paint boxes in the first row, and the last one in the last row were not updated anymore when I grabbed the image:parallel paint

The lines are drawn in the DrawLine method. It does nothing special, just sets the color for that line and draws it. Still, that is enough to break the user interface when this is called from multiple threads at once, even though each thread uses its own Canvas:

procedure TfrmParallelPaint.DrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor);
begin
  Canvas.Pen.Color := color;
  Canvas.MoveTo(p1.X, p1.Y);
  Canvas.LineTo(p2.X, p2.Y);
end;

Is there a way around this problem? Indeed there is. Delphi’s TThread class implements a method, Queue, which executes some code in the main thread.

Queue takes a procedure or anonymous method as a parameter and sends it to the main thread. After some short time, the code is then executed in the main thread. It is impossible to tell how much time will pass before the code is executed, but that delay will typically be very short, in the order of milliseconds. As it accepts an anonymous method, we can use the magic of variable capturing and write the corrected code, as shown here:

procedure TfrmParallelPaint.QueueDrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor);
begin
  TThread.Queue(nil,
    procedure
    begin
      Canvas.Pen.Color := color;
      Canvas.MoveTo(p1.X, p1.Y);
      Canvas.LineTo(p2.X, p2.Y);
    end);
end;

In older Delphis you don’t have such a nice Queue method but only a version of Synchronize that accepts a normal  method. If you have to use this method, you cannot count on anonymous method mechanisms to handle parameters. Rather, you have to copy them to fields and then Synchronize a parameterless method operating on these fields. The following code fragment shows how to do that:

procedure TfrmParallelPaint.SynchronizedDraw;
begin
  FCanvas.Pen.Color := FColor;
  FCanvas.MoveTo(FP1.X, FP1.Y);
  FCanvas.LineTo(FP2.X, FP2.Y);
end;
procedure TfrmParallelPaint.SyncDrawLine(canvas: TCanvas; p1, p2: TPoint; color: TColor);
begin
FCanvas := canvas;
FP1 := p1;
FP2 := p2;
FColor := color;
TThread.Synchronize(nil, SynchronizedDraw);
end;

If you run the corrected program, the final result should always be similar to the following image, with all eight  TPaintBox components showing a nicely animated image:

TPaintBox components

Simultaneous reading and writing

The next situation which I’m regularly seeing while looking at a badly-written parallel code is simultaneous reading and writing from/to a shared data structure, such as a list.  The SharedList program demonstrates how things can go wrong when you share a data structure between threads. Actually, scrap that, it shows how things will go wrong if you do that.

This program creates a shared list, FList: TList<Integer>. Then it creates one background thread which runs the method ListWriter and multiple background threads, each running the ListReader method. Indeed, you can run the same code in multiple threads. This is a perfectly normal behavior and is sometimes extremely useful.

The ListReader method is incredibly simple. It just reads all the elements in a list and does that over and over again. As I’ve mentioned before, the code in my examples makes sure that problems in multithreaded code really do occur, but because of that, my demo code most of the time also looks terribly stupid. In this case, the reader just reads and reads the data because that’s the best way to expose the problem:

procedure TfrmSharedList.ListReader;
var
  i, j, a: Integer;
begin
  for i := 1 to CNumReads do
    for j := 0 to FList.Count - 1 do
      a := FList[j];
end;

The ListWriter method is a bit different. It also loops around, but it also sleeps a little inside each loop iteration.

After the Sleep, the code either adds to the list or deletes from it. Again, this is designed so that the problem is quick to appear:

procedure TfrmSharedList.ListWriter;
var
  i: Integer;
begin
  for i := 1 to CNumWrites do
  begin
    Sleep(1);
    if FList.Count > 10 then
      FList.Delete(Random(10))
    else
      FList.Add(Random(100));
  end;
end;

If you start the program in a debugger, and click on the Shared lists button, you’ll quickly get an EArgumentOutOfRangeException exception. A look at the stack trace will show that it appears in the line a := FList[j];.

In retrospect, this is quite obvious. The code in ListReader starts the inner for loop and reads the FListCount. At that time, FList has 11 elements so Count is 11. At the end of the loop, the code tries to read FList[10], but in the meantime ListWriter has deleted one element and the list now only has 10 elements. Accessing element [10] therefore raises an exception.

We’ll return to this topic later, in the section about Locking. For now you should just keep in mind that sharing data structures between threads causes problems.

Sharing a variable

OK, so rule number two is “Shared structures bad“. What about sharing a simple variable? Nothing can go wrong there, right? Wrong! There are actually multiple ways something can go wrong.

The program IncDec demonstrates one of the bad things that can happen. The code contains two methods: IncValue and DecValue. The former increments a shared FValue: integer; some number of times, and the latter decrements it by the same number of times:

procedure TfrmIncDec.IncValue;
var
  i: integer;
  value: integer;
begin
  for i := 1 to CNumRepeat do begin
    value := FValue;
    FValue := value + 1;
  end;
end;
procedure TfrmIncDec.DecValue;
var
i: integer;
value: integer;
begin
for i := 1 to CNumRepeat do begin
value := FValue;
FValue := value - 1;
end;
end;

A click on the Inc/Dec button sets the shared value to 0, runs IncValue, then DecValue, and logs the result:

procedure TfrmIncDec.btnIncDec1Click(Sender: TObject);
begin
  FValue := 0;
  IncValue;
  DecValue;
  LogValue;
end;

I know you can all tell what FValue will hold at the end of this program. Zero, of course. But what will happen if we run IncValue and DecValue in parallel? That is, actually, hard to predict!

A click on the Multithreaded button does almost the same, except that it runs IncValue and DecValue in parallel. How exactly that is done is not important at the moment (but feel free to peek into the code if you’re interested):

procedure TfrmIncDec.btnIncDec2Click(Sender: TObject);
begin
  FValue := 0;
  RunInParallel(IncValue, DecValue);
  LogValue;
end;

Running this version of the code may still sometimes put zero in FValue, but that will be extremely rare. You most probably won’t be able to see that result unless you are very lucky. Most of the time, you’ll just get a seemingly random number from the range -10,000,000 to 10,000,000 (which is the value of the CNumRepeatconstant).

In the following image, the first number is a result of the single-threaded code, while all the rest were calculated by the parallel version of the algorithm:

To understand what’s going on, you should know that Windows (and all other operating systems) does many things at once. At any given time, there are hundreds of threads running in different programs and they are all fighting for the limited number of CPU cores. As our program is the active one (has focus), its threads will get most of the CPU time, but still they’ll sometimes be paused for some amount of time so that other threads can run.

Because of that, it can easily happen that IncValue reads the current value of FValue into value (let’s say that the value is 100) and is then paused. DecValue reads the same value and then runs for some time, decrementing FValue. Let’s say that it gets it down to -20,000. (That is just a number without any special meaning.)

After that, the IncValue thread is awakened. It should increment the value to -19,999, but instead of that it adds 1 to 100 (stored in value), gets 101, and stores that into FValue. Ka-boom! In each repetition of the program, this will happen at different times and will cause a different result to be calculated.

You may complain that the problem is caused by the two-stage increment and decrement, but you’d be wrong. I dare you—go ahead, change the code so that it will modify FValue with Inc(FValue) and Dec(FValue) and it still won’t work correctly.

Well, I hear you say, so I shouldn’t even modify one variable from two threads at the same time? I can live with that. But surely, it is OK to write into a variable from one thread and read from another?

The answer, as you can probably guess given the general tendency of this section, is again—no, you may not. There are some situations where this is OK (for example, when a variable is only one byte long) but, in general, even simultaneous reading and writing can be a source of weird problems.

The ReadWrite program demonstrates this problem. It has a shared buffer, FBuf: Int64, and a pointer variable used to read and modify the data, FPValue: PInt64. At the beginning, the buffer is initialized to an easily recognized number and a pointer variable is set to point to the buffer:

FPValue := @FBuf;
FPValue^ := $7777777700000000;

The program runs two threads. One just reads from the location and stores all the read values into a list. This value is created with Sorted and Duplicates properties, set in a way that prevents it from storing duplicate values:

procedure TfrmReadWrite.Reader;
var
  i: integer;
begin
  for i := 1 to CNumRepeat do
    FValueList.Add(FPValue^);
end;

The second thread repeatedly writes two values into the shared location:

procedure TfrmReadWrite.Writer;
var
  i: integer;
begin
  for i := 1 to CNumRepeat do begin
    FPValue^ := $7777777700000000;
    FPValue^ := $0000000077777777;
  end;
end;

At the end, the contents of the FValueList list are logged on the screen. We would expect to see only two values—$7777777700000000 and $0000000077777777. In reality, we see four, as the following screenshot demonstrates:

The reason for that strange result is that Intel processors in 32-bit mode can’t write a 64-bit number (as int64 is) in one step. In other words, reading and writing 64-bit numbers in 32-bit code is not atomic.

When multithreading programmers talk about something being atomic, they want to say that an operation will execute in one indivisible step. Any other thread will either see a state before the operation or a state after the operation, but never some undefined intermediate state.

How do values $7777777777777777 and $0000000000000000 appear in the test application? Let’s say that FValue^ contains $7777777700000000. The code then starts writing $0000000077777777 into FValue by firstly storing a $77777777 into the bottom four bytes. After that it starts writing $00000000 into the upper four bytes of FValue^, but in the meantime Reader reads the value and gets $7777777777777777.

In a similar way, Reader will sometimes see $0000000000000000 in the FValue^.

We’ll look into a way to solve this situation immediately, but in the meantime, you may wonder—when is it okay to read/write from/to a variable at the same time? Sadly, the answer is—it depends. Not even just on the CPU family (Intel and ARM processors behave completely differently), but also on a specific architecture used in a processor. For example, older and newer Intel processors may not behave the same in that respect.

You can always depend on access to byte-sized data being atomic, but that is that. Access (reads and writes) to larger quantities of data (words, integers) is atomic only if the data is correctly aligned. You can access word sized data atomically if it is word aligned, and integer data if it is double-word aligned. If the code was compiled in 64-bit mode, you can also atomically access in 64 data if it is quad-word aligned.

When you are not using data packing (such as packed records) the compiler will take care of alignment and data access should automatically be atomic. You should, however, still check the alignment in code, if nothing else to prevent stupid programming errors.

If you want to write and read larger amounts of data, modify the data, or if you want to work on shared data structures, correct alignment will not be enough. You will need to introduce synchronization into your program.

If you found this post useful, do check out the book Delphi High Performance to learn more about the intricacies of how to perform High-performance programming with Delphi.

Read next

Publishing Product Manager interested in learning how emerging technologies are making the world a better place | Still learning to write better and read more.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here