Wednesday, 27 February 2013

Parallel Programming in .Net

Introduction to TPL(Task Parallel Library)

I have to admit that I’m not an expert in multithreading or parallel computing. However, people often ask me about easy introductions and beginner’s samples for new features. And I have an enormous advantage over most newbies in this area – I can ask people who developed this library about what I’m doing wrong and what to do next. By the way, if you want to ask someone about what to do next with your parallel program, I'd recommend you to go to this forum  Parallel Extensions to the .NET Framework Forum.

I have a simple goal this time. I want to parallelize a long-running console application and add a responsive WPF UI. By the way, I’m not going to concentrate too much on measuring performance. I’ll try to show the most common caveats, but in most cases just seeing that the application runs faster is good enough for me.
Now, let the journey begin. Here’s my small program that I want to parallelize. The SumRootN method returns the sum of the nth root of all integers from one to 10 million, where n is a parameter. In the Main method, I call this method for roots from 2 through 19. I’m using the Stopwatch class to check how many milliseconds the program takes to run

using System.Threading.Tasks;
using System.Threading;
using System.Diagnostics;
using System;

class Program
{

    static void Main(string[] args)
    {
        var watch = Stopwatch.StartNew();
        for (int i = 2; i < 20; i++)
        {
            var result = SumRootN(i);
            Console.WriteLine("root {0} : {1} ", i, result);
        }
        Console.WriteLine(watch.ElapsedMilliseconds);
        Console.ReadLine();
    }

    public static double SumRootN(int root)
    {
        double result = 0;
        for (int i = 1; i < 10000000; i++)
        {
            result += Math.Exp(Math.Log(i) / root);
        }
        return result;
    }
}
On my 3-GHz dual-core 64-bit computer with 4 GB of RAM the program takes about 18 seconds to run.
Since I’m using a for loop, the Parallel.For method is the easiest way to add parallelism. All I need to do is replace

for (int i = 2; i < 20; i++)
{
    var result = SumRootN(i);
    Console.WriteLine("root {0} : {1} ", i, result);
}

Parallel.For(2, 20, (i) =>
{
    var result = SumRootN(i);
    Console.WriteLine("root {0} : {1} ", i, result);
});
Notice how little the code changed. I supplied start and end indices (same as I did in the simple loop) and a delegate in the form of a lambda expression. I didn’t have to change anything else, and now my little program takes about 9 seconds.
When you use the Parallel.For method, the .NET Framework automatically manages the threads that service the loop, so you don’t need to do this yourself. But remember that running code in parallel on two processors does not guarantee that the code will run exactly twice as fast. Nothing comes for free; although you don’t need to manage threads yourself, the .NET Framework still uses them behind the scenes. And of course this leads to some overhead. In fact, if your operation is simple and fast and you run a lot of short parallel cycles, you may get much less benefit from parallelization than you might expect.
Another thing you probably noticed when you run the code is that now you don’t see the results in the proper order: Instead of seeing increasing roots, you see quite a different picture. But let’s pretend that we just need results, without any specific order. In this blog post, I’m going to leave this problem unresolved.
Now it’s time to take things one step further. I don’t want to write a console application; I want some UI. So I’m switching to Windows Presentation Foundation (WPF). I have created a small window that has only one Start button, one text block to display results, and one label to show elapsed time.
The event handler for the sequential execution looks pretty simple:

private void start_Click(object sender, RoutedEventArgs e)
{
    textBlock1.Text = "";
    label1.Content = "Milliseconds: ";

    var watch = Stopwatch.StartNew();
    for (int i = 2; i < 20; i++)
    {
        var result = SumRootN(i);
        textBlock1.Text += "root " + i.ToString() + " " +
                           result.ToString() + Environment.NewLine;

    }
    var time = watch.ElapsedMilliseconds;
    label1.Content += time.ToString();
}

Compile and run the application to make sure that everything works fine. As you might notice, UI is frozen and the text block does not update until all of the computations are done. This is a good demonstration of why WPF recommends never executing long-running operations in the UI thread.
Let’s change the for loop to the parallel one:

Parallel.For(2, 20, (i) =>
{
    var result = SumRootN(i);
    textBlock1.Text += "root " + i.ToString() + " " +
                        result.ToString() + Environment.NewLine;

});

Click the button…and…get an InvalidOperationException that says “The calling thread cannot access this object because a different thread owns it.”
What happened? Well, as I mentioned earlier, the Task Parallel Library still uses threads. When you call the Parallel.For method, the .NET Framework starts new threads automatically. I didn’t have problems with the console application because the Console class is thread safe. But in WPF, UI components can be safely accessed only by a dedicated UI thread. Since Parallel.For uses worker threads besides the UI thread, it’s unsafe to manipulate the text block directly in the parallel loop body. If you use, let’s say, Windows Forms, you might have different problems, but problems nonetheless (another exception or even an application crash).
Luckily, WPF provides an API that solves this problem. Most controls have a special Dispatcher object that enables other threads to interact with the UI thread by sending asynchronous messages to it. So our parallel loop should actually look like this:

Parallel.For(2, 20, (i) =>
{
    var result = SumRootN(i);
    this.Dispatcher.BeginInvoke(new Action(() =>
        textBlock1.Text += "root " + i.ToString() + " " +
                           result.ToString() + Environment.NewLine)
         , null);
});

In the above code, I’m using the Dispatcher to send a delegate to the UI thread. The delegate will be executed when the UI thread is idle. If UI is busy doing something else, the delegate will be put into a queue. But remember that this type of interaction with the UI thread may slow down your application.
Now I have our parallel WPF application running on my computer almost twice fast. But what about this freezing UI? Don’t all modern applications have responsive UI? And if Parallel.For starts new threads, why is the UI thread still blocked?
The reason is that Parallel.For tries to exactly imitate the behavior of the normal for loop, so it blocks the further code execution until it finishes all its work.
Let’s take a short pause here. If you already have an application that works and satisfies all your requirements, and you want to simply speed it up by using parallel processing, it might be enough just to replace some of the loops with Parallel.For or Parallel.ForEach. But in many cases you need more advanced tools.
To make the UI responsive, I am going to use tasks, which is a new concept introduced by the Task Parallel Library. A task represents an asynchronous operation that is often run on a separate thread. The .NET Framework optimizes load balancing and also provides a nice API for managing tasks and making asynchronous calls between them. To start an asynchronous operation, I’ll use the Task.Factory.StartNew method.
So I’ll delete the Parallel.For and replace it with the following code, once again trying to change as little as possible.
for (int i = 2; i < 20; i++)
{
    var t = Task.Factory.StartNew(() =>
    {
        var result = SumRootN(i);
        this.Dispatcher.BeginInvoke(new Action(() =>
            textBlock1.Text += "root " + i.ToString() + " " +
               result.ToString() + Environment.NewLine)
            ,null);

    });
}
Compile, run… Well, UI is responsive. I can move and resize the window while the program calculates the results. But I have two problems now:
1. My program tells me that it took 0 milliseconds to execute.
2. The program calculates the method only for root 20 and shows me a list of identical results.
Let’s start with the last one. C# experts can shout it out: closure! Yes, i is used in a loop, so when a thread starts working, i’s value has already changed. Since i is equal to 20 when the loop is exited, this is the value that is always passed to newly created tasks.
Problems with closure like this one are common when you deal with lots of delegates in the form of lambda expressions (which is almost inevitable with asynchronous programming), so watch out for it. The solution is really easy. Just copy the value of the loop variable into a variable declared within the loop. Then use this local variable instead of the loop variable

for (int i = 2; i < 20; i++)
{
    int j = i;
    var t = Task.Factory.StartNew(() =>
    {
        var result = SumRootN(j);
        this.Dispatcher.BeginInvoke(new Action(() =>
             textBlock1.Text += "root " + j.ToString() + " " +
                                 result.ToString() + Environment.NewLine)
             , null);
    });
}
Now let’s move to the second problem: The execution time isn’t measured. I perform the tasks asynchronously, so nothing blocks the code execution. The program starts the tasks and goes to the next line, which is reading time and displaying it. And it doesn’t take that long, so I get 0 on my timer.
Sometimes it’s OK to move on without waiting for the threads to finish their jobs. But sometimes you need to get a signal that the work is done, because it affects your workflow. A timer is a good example of the second scenario.
To get my time measurement, I have to wrap code that reads the timer value into yet another method from the Task Parallel Library: TaskFactory.ContinueWhenAll. It does exactly what I need: It waits for all the threads in an array to finish and then executes the delegate. This method works on arrays only, so I need to store all the tasks somewhere to be able to wait for them all to finish.
Here’s what my final code looks like:
public partial class MainWindow : Window
{
    public MainWindow()
    {
        InitializeComponent();
    }
    public static double SumRootN(int root)
    {
        double result = 0;
        for (int i = 1; i < 10000000; i++)
        {
            result += Math.Exp(Math.Log(i) / root);
        }
        return result;
    }

    private void start_Click(object sender, RoutedEventArgs e)
    {
        textBlock1.Text = "";
        label1.Content = "Milliseconds: ";

        var watch = Stopwatch.StartNew();
        List<Task> tasks = new List<Task>();
        for (int i = 2; i < 20; i++)
        {
            int j = i;
            var t = Task.Factory.StartNew(() =>
            {
                var result = SumRootN(j);
                this.Dispatcher.BeginInvoke(new Action(() =>
                     textBlock1.Text += "root " + j.ToString() + " " +
                                         result.ToString() +
                                         Environment.NewLine)
                , null);
            });
            tasks.Add(t);
        }

        Task.Factory.ContinueWhenAll(tasks.ToArray(),
              result =>
              {
                  var time = watch.ElapsedMilliseconds;
                  this.Dispatcher.BeginInvoke(new Action(() =>
                      label1.Content += time.ToString()));
              });

    }
}
Finally, everything works as expected: I have a list of results, the UI does not freeze, and the elapsed time displays correctly. The code definitely looks quite different from what I started with, but surprisingly it’s not that long, and I was able to reuse most of it (thanks to lambda expression syntax).
Again, I tried to cover common problems that most beginners in parallel programming and multithreading are likely to encounter, without getting too deep into details