Asynchrony in Action

0
82
16 min read

(For more resources related to this topic, see here.)

Asynchrony

When we talk about C# 5.0, the primary topic of conversation is the new asynchronous programming features. What does asynchrony mean? Well, it can mean a few different things, but in our context, it is simply the opposite of synchronous. When you break up execution of a program into asynchronous blocks, you gain the ability to execute them side-by-side, in parallel.

As you can see in the following diagram, executing multiple ac-tions concurrently can bring various positive qualities to your programs:

Parallel execution can bring performance improvements to the execution of a program. The best way to put this into context is by way of an example, an example that has been experienced all too often in the world of desktop software.

Let’s say you have an application that you are developing, and this software should fulfill the following requirements:

  1. When the user clicks on a button, initiate a call to a web service.

  2. Upon completion of the web service call, store the results into a database.

  3. Finally, bind the results and display them to the user.

There are a number of problems with the naïve way of implementing this solution. The first is that many developers write code in such a way that the user interface will be completely unresponsive while we are waiting to receive the results of these web service calls. Then, once the results finally arrive, we continue to make the user wait while we store the results in a database, an operation that the user does not care about in this case.

The primary vehicle for mitigating these kinds of problems in the past has been writing multithreaded code. This is of course nothing new, as multi-threaded hardware has been around for many years, along with software capabilities to take advantage of this hardware. Most of the programming languages did not provide a very good abstraction layer on top of this hardware, often letting (or requiring) you program directly against the hardware threads.

Thankfully, Microsoft introduced a new library to simplify the task of writing highly concurrent programs, which is explained in the next section.

Task Parallel Library

The Task Parallel Library (TPL) was introduced in .NET 4.0 (along with C# 4.0). Firstly, it is a huge topic and could not have been examined properly in such a small space. Secondly, it is highly relevant to the new asynchrony features in C# 5.0, so much so that they are the literal foundation upon which the new features are built. So, in this section, we will cover the basics of the TPL, along with some of the background information about how and why it works.

TPL introduces a new type, the Task type, which abstracts away the concept of something that must be done into an object. At first glance, you might think that this abstraction already exists in the form of the Thread class. While there are some similarities between Task and Thread, the implementations have quite different implications.

With a Thread class, you can program directly against the lowest level of parallelism supported by the operating system, as shown in the following code:

Thread thread = new Thread(new ThreadStart(() => { Thread.Sleep(1000); Console.WriteLine("Hello, from the Thread"); })); thread.Start(); Console.WriteLine("Hello, from the main thread"); thread.Join();

In the previous example, we create a new Thread class, which when started will sleep for a second and then write out the text Hello, from the Thread. After we call thread.Start(), the code on the main thread immediately continues and writes Hello, from the main thread. After a second, we see the text from the background thread printed to the screen.

In one sense, this example of using the Thread class shows how easy it is to branch off the execution to a background thread, while allowing execution of the main thread to continue, unimpeded. However, the problem with using the Thread class as your “concurrency primitive” is that the class itself is an indication of the implementation, which is to say, an operating system thread will be created. As far as abstractions go, it is not really an abstraction at all; your code must both manage the lifecycle of the thread, while at the same time dealing with the task the thread is executing.

If you have multiple tasks to execute, spawning multiple threads can be disastrous, because the operating system can only spawn a finite number of them. For performance intensive applications, a thread should be considered a heavyweight resource, which means you should avoid using too many of them, and keep them alive for as long as possible. As you might imagine, the designers of the .NET Framework did not simply leave you to program against this without any help. The early versions of the frameworks had a mechanism to deal with this in the form of the ThreadPool, which lets you queue up a unit of work, and have the thread pool manage the lifecycle of a pool of threads. When a thread becomes available, your work item is then executed. The following is a simple example of using the thread pool:

int[] numbers = { 1, 2, 3, 4 }; foreach (var number in numbers) { ThreadPool.QueueUserWorkItem(new WaitCallback(o => { Thread.Sleep(500); string tabs = new String('t', (int)o); Console.WriteLine("{0}processing #{1}", tabs, o); }), number); }

This sample simulates multiple tasks, which should be executed in parallel. We start with an array of numbers, and for each number we want to queue a work item that will sleep for half a second, and then write to the console. This works much better than trying to manage multiple threads yourself because the pool will take care of spawning more threads if there is more work. When the configured limit of concurrent threads is reached, it will hold work items until a thread becomes available to process it. This is all work that you would have done yourself if you were using threads directly.

However, the thread pool is not without its complications. First, it offers no way of synchronizing on completion of the work item. If you want to be notified when a job is completed, you have to code the notification yourself, whether by raising an event, or using a thread synchronization primitive, such as ManualResetEvent. You also have to be careful not to queue too many work items, or you may run into system limitations with the size of the thread pool.

With the TPL, we now have a concurrency primitive called Task. Consider the following code:

Task task = Task.Factory.StartNew(() => { Thread.Sleep(1000); Console.WriteLine("Hello, from the Task"); }); Console.WriteLine("Hello, from the main thread"); task.Wait();

Upon first glance, the code looks very similar to the sample using Thread, but they are very different. One big difference is that with Task, you are not committing to an implementation. The TPL uses some very interesting algorithms behind the scenes to manage the workload and system resources, and in fact, allows you customize those algorithms through the use of custom schedulers and synchronization contexts. This allows you to control the parallel execution of your programs with a high degree of control.

Dealing with multiple tasks, as we did with the thread pool, is also easier because each task has synchronization features built-in. To demonstrate how simple it is to quickly parallelize an arbitrary number of tasks, we start with the same array of integers, as shown in the previous thread pool example:

int[] numbers = { 1, 2, 3, 4 };

Because Task can be thought of as a primitive type that represents an asynchronous task, we can think of it as data. This means that we can use things such as Linq to project the numbers array to a list of tasks as follows:

var tasks = numbers.Select(number => Task.Factory.StartNew(() => { Thread.Sleep(500); string tabs = new String('t', number); Console.WriteLine("{0}processing #{1}", tabs, number); }));

And finally, if we wanted to wait until all of the tasks were done before continuing on, we could easily do that by calling the following method:

Task.WaitAll(tasks.ToArray());

Once the code reaches this method, it will wait until every task in the array completes before continuing on. This level of control is very convenient, especially when you consider that, in the past, you would have had to depend on a number of different synchronization techniques to achieve the very same result that was accomplished in just a few lines of TPL code.

With the usage patterns that we have discussed so far, there is still a big disconnect between the process that spawns a task, and the child process. It is very easy to pass values into a background task, but the tricky part comes when you want to retrieve a value and then do something with it. Consider the following requirements:

  1. Make a network call to retrieve some data.

  2. Query the database for some configuration data.

  3. Process the results of the network data, along with the configuration data.

The following diagram shows the logic:

Both the network call and query to the database can be done in parallel. With what we have learned so far about tasks, this is not a problem. However, acting on the results of those tasks would be slightly more complex, if it were not for the fact that the TPL provides support for exactly that scenario.

There is an additional kind of Task, which is especially useful in cases like this called Task<T>. This generic version of a task expects the running task to ultimately return a value, whenever it is finished. Clients of the task can access the value through the .Result property of the task. When you call that property, it will return immediately if the task is completed and the result is available. If the task is not done, however, it will block execution in the current thread until it is.

Using this kind of task, which promises you a result, you can write your programs such that you can plan for and initiate the parallelism that is required, and handle the response in a very logical manner. Look at the following code:

varwebTask = Task.Factory.StartNew(() => { WebClient client = new WebClient(); return client.DownloadString("http://bing.com"); }); vardbTask = Task.Factory.StartNew(() => { // do a lengthy database query return new { WriteToConsole=true }; }); if (dbTask.Result.WriteToConsole) { Console.WriteLine(webTask.Result); } else { ProcessWebResult(webTask.Result); }

In the previous example, we have two tasks, the webTask, and dbTask, which will execute at the same time. The webTask is simply downloading the HTML from http://bing.com Accessing things over the Internet can be notoriously flaky due to the dynamic nature of accessing the network so you never know how long that is going to take. With the dbTask task, we are simulating accessing a database to return some stored settings. Although in this simple example we are just returning a static anonymous type, database access will usually access a different server over the network; again, this is an I/O bound task just like downloading something over the Internet.

Rather than waiting for both of them to execute like we did with Task.WaitAll, we can simply access the .Result property of the task. If the task is done, the result will be returned and execution can continue, and if not, the program will simply wait until it is.

This ability to write your code without having to manually deal with task synchronization is great because the fewer concepts a programmer has to keep in his/her head, the more resources he/she can devote to the program.

If you are curious about where this concept of a task that returns a value comes from, you can look for resources pertaining to “Futures”, and “Promises” at:

http://en.wikipedia.org/wiki/Promise_%28programming%29

At the simplest level, this is a construct that “promises” to give you a result in the “future”, which is exactly what Task<T> does.

Task composability

Having a proper abstraction for asynchronous tasks makes it easier to coordinate multiple asynchronous activities. Once the first task has been initiated, the TPL allows you to compose a number of tasks together into a cohesive whole using what are called continuations. Look at the following code:

Task<string> task = Task.Factory.StartNew(() => { WebClient client = new WebClient(); return client.DownloadString("http://bing.com"); }); task.ContinueWith(webTask => { Console.WriteLine(webTask.Result); });

Every task object has the .ContinueWith method, which lets you chain another task to it. This continuation task will begin execution once the first task is done. Unlike the previous example, where we relied on the .Result method to wait until the task was done—thus potentially holding up the main thread while it completed—the continuation will run asynchronously. This is a better approach for composing tasks because you can write tasks that will not block the UI thread, which results in very responsive applications.

Task composability does not stop at providing continuations though, the TPL also provides considerations for scenarios, where a task must launch a number of subtasks. You have the ability to control how completion of those child tasks affects the parent task. In the following example, we will start a task, which will in turn launch a number of subtasks:

int[] numbers = { 1, 2, 3, 4, 5, 6 }; varmainTask = Task.Factory.StartNew(() => { // create a new child task foreach (intnum in numbers) { int n = num; Task.Factory.StartNew(() => { Thread.SpinWait(1000); int multiplied = n * 2; Console.WriteLine("Child Task #{0}, result {1}", n, multiplied); }); } }); mainTask.Wait(); Console.WriteLine("done");

Each child task will write to the console, so that you can see how the child tasks behave along with the parent task. When you execute the previous program, it results in the following output:

Child Task #1, result 2 Child Task #2, result 4 done Child Task #3, result 6 Child Task #6, result 12 Child Task #5, result 10 Child Task #4, result 8

Notice how even though you have called the .Wait() method on the outer task before writing done, the execution of the child task continues a bit longer after the task is concluded. This is because, by default, child tasks are detached, which means their execution is not tied to the task that launched it.

An unrelated, but important bit in the previous example code, is you will notice that we assigned the loop variable to an intermediary variable before using it in the task.

int n = num; Task.Factory.StartNew(() => { int multiplied = n * 2;

This is actually related to the way closures work, and is a common misconception when trying to “pass in” values in a loop. Because the closure actually creates a reference to the value, rather than copying the value in, using the loop value will end up changing every time the loop iterates, and you will not get the behavior you expect.

As you can see, an easy way to mitigate this is to set the value to a local variable before passing it into the lambda expression. That way, it will not be a reference to an integer that changes before it is used.

You do however have the option to mark a child task as Attached, as follows:

Task.Factory.StartNew( () =>DoSomething(), TaskCreationOptions.AttachedToParent);

The TaskCreationOptions enumeration has a number of different options. Specifically in this case, the ability to attach a task to its parent task means that the parent task will not complete until all child tasks are complete.

Other options in TaskCreationOptions let you give hints and instructions to the task scheduler. From the documentation, the following are the descriptions of all these options:

  • None: This specifies that the default behavior should be used.

  • PreferFairness: This is a hint to a TaskScheduler class to schedule a task in as fair a manner as possible, meaning that tasks scheduled sooner will be more likely to be run sooner, and tasks scheduled later will be more likely to be run later.

  • LongRunning: This specifies that a task will be a long-running, coarsegrained operation. It provides a hint to the TaskScheduler class that oversubscription may be warranted.

  • AttachedToParent: This specifies that a task is attached to a parent in the task hierarchy.

  • DenyChildAttach: This specifies that an exception of the type InvalidOperationException will be thrown if an attempt is made to attach a child task to the created task.

  • HideScheduler: This prevents the ambient scheduler from being seen as the current scheduler in the created task. This means that operations such as StartNew or ContinueWith that are performed in the created task, will see Default as the current scheduler.

The best part about these options, and the way the TPL works, is that most of them are merely hints. So you can suggest that a task you are starting is long running, or that you would prefer tasks scheduled sooner to run first, but that does not guarantee this will be the case. The framework will take the responsibility of completing the tasks in the most efficient manner, so if you prefer fairness, but a task is taking too long, it will start executing other tasks to make sure it keeps using the available resources optimally.

Error handling with tasks

Error handling in the world of tasks needs special consideration. In summary, when an exception is thrown, the CLR will unwind the stack frames looking for an appropriate try/catch handler that wants to handle the error. If the exception reaches the top of the stack, the application crashes.

With asynchronous programs, though, there is not a single linear stack of execution. So when your code launches a task, it is not immediately obvious what will happen to an exception that is thrown inside of the task. For example, look at the following code:

Task t = Task.Factory.StartNew(() => { throw new Exception("fail"); });

This exception will not bubble up as an unhandled exception, and your application will not crash if you leave it unhandled in your code. It was in fact handled, but by the task machinery. However, if you call the .Wait() method, the exception will bubble up to the calling thread at that point. This is shown in the following example:

try { t.Wait(); } catch (Exception ex) { Console.WriteLine(ex.Message); }

When you execute that, it will print out the somewhat unhelpful message One or more errors occurred, rather than the fail message that is the actual message contained in the exception. This is because unhandled exceptions that occur in tasks will be wrapped in an AggregateException exception, which you can handle specifically when dealing with task exceptions. Look at the following code:

catch (AggregateException ex) { foreach (var inner in ex.InnerExceptions) { Console.WriteLine(inner.Message); } }

If you think about it, this makes sense, because of the way that tasks are composable with continuations and child tasks, this is a great way to represent all of the errors raised by this task. If you would rather handle exceptions on a more granular level, you can also pass a special TaskContinuationOptions parameter as follows:

Task.Factory.StartNew(() => { throw new Exception("Fail"); }).ContinueWith(t => { // log the exception Console.WriteLine(t.Exception.ToString()); }, TaskContinuationOptions.OnlyOnFaulted);

This continuation task will only run if the task that it was attached to is faulted (for example, if there was an unhandled exception). Error handling is, of course, something that is often overlooked when developers write code, so it is important to be familiar with the various methods of handling exceptions in an asynchronous world.

LEAVE A REPLY

Please enter your comment!
Please enter your name here