Friday, 27 March 2015

Why asynchronous server side code can be a good thing and lessons re-learned (in .net 4.5)

Background

I thought I had a solid understanding of asynchronous code, why it was useful and when to use it; despite the fact I don’t actually get called to write code this way very often.
But…

Having taken some time recently to re-learn or re-establish my understanding with a view to being able to better advise my colleagues... I discovered along the way that I didn't know quite as much as I thought I did and that despite the advances in language and the lower barrier to entry in .net 45 there’s still a lot the developer *should know before attempting to write asynchronous server side code.

Furthermore I was reminded that writing asynchronous server side code is a conscious *design decision, it is in effect an optimization, it shouldn't necessarily be a default starting point– the pro’s and con’s need to be carefully considered. You shouldn't code in optimizations before you need them, and it’s a common mistake (and maybe human nature) to sometimes over engineer code for performance and scalability when in reality there’s no real requirement to do so.

Such mistakes can be costly, whilst asynchronous code is a lot simpler than it used to be and the barrier of entry is undeniably lower, with great power comes great responsibility. Asynchronous code is still harder to write, harder to maintain and requires you understand enough to be able to support an application written this way once it’s out in the wild.
My experience has long centred on a classic architecture which combines UI tier (browser, usually client side JavaScript & mark-up) calling back to the web server to update part of a page or for a whole page submission. This paradigm is roughly the same for basic MVC as it is for web forms pages.

The web server side may then defer to a composition of services (typically, internal services, mostly implemented as windows communication foundation – WCF over basic http)  which in turn either interact with the databases or defer to third party services for data we wish to consume.

Typically the services have latency because their IO bound not CPU bound (there’s no intensive computational work taking place). As such it isn't unusual for some services to take more than 1000 ms to respond, in some cases taking up to 7000ms where complex data aggregation is performed at the back end.

It’s understandable why net 4.5 & the magic async/await keyword pair would be attractive and potentially useful as implementations get refreshed and re-factored.

When to be asynchronous?

When looking at a new application or new requirements to an existing application it’s worth keeping the following in mind…
To quote from Stephen Toub here – who is able to put it far more succinctly than I: -There are two primary benefits I see to asynchrony: scalability and offloading (e.g. responsiveness, parallelism). Which of these benefits matters to you is typically dictated by the kind of application you’re writing. 

An interesting behaviour of the ThreadPool

A really great introduction to async/await exists in Stephen Cleary’s MSDN magazine article (Stephen Cleary has literally written the book on .net asynchronous programming – “Concurrency in C# Cookbook” – available on Amazon and others). In this great article Stephen mentions an interesting behaviour of the CLR thread pool.
Now, I’m not talking about the fact that threads are finite and that asynchronous code make better use of the managed thread  pool by allowing idle threads to be re-used that might otherwise be blocked – which greatly improves scalability.
Though this is true…

I’m talking about a lesser known fact (at least to me) which is that the thread pool implementation (quite deliberately) throttles the rate at which threads can be injected into the pool

At the time of writing the default starting minimum number of threads is 4. This behaviour can adversely affect your application if it is subjected to burst loads. The timer will start, the request has been made, but it can’t run till it’s allocated a thread. This manifests itself as latency within the caller, who may, if things get really bad start to receive Http Error 503 (Service unavailable).

According to MSDN (see managed thread pool - “Thread Injection” section) threads can be added at a rate of 1 ~500 ms (or 2 per second).

As it turns out WCF – which unsurprisingly also requires a thread pool thread on which to run a created instance, can also exhibit this behaviour. See MS support advice here. This can lead to your service scaling up slowly under burst load, which will negatively impact the caller.

At this point you may be asking the question, why not simply increase the minimum number of threads from 4 to the amount that we think we require.
In certain circumstances you might elect to do so, though Stephen points out in his article (linked above) there are very good reasons why you may not wish to do so,  the same reasons the default is low, namely efficient use of resources.

It’s worth remembering that, specifically in IIS and for asp net applications, certain environment defaults are set by use of the machine config “autoconfig=true” flag.
They can only be overridden in the machine config processModel section. An observation here; changes to these values affect all!

This old but often updated article from Thomas Marquardt’s has often been a real helpful reference point for me when troubleshooting live issues helps explain the dynamic settings and some of the background from earlier .net / IIS versions. Note that Thomas (in one of the last updates to that article) also points out the "gotcha" in behaviour with standard minimum thread count and burst loads.

An example

The back-end Service

In order to see some of this for myself I created a simple WCF net 4.5 service in Visual Studio.

It exposes two simple methods

[ServiceContract]
public interface IService1
{
    [OperationContract]
    string EchoWithDelay(string value1, int value2, bool value3);

    [OperationContract]
    Task<stringEchoWithEfficientDelay(string value1, int value2, bool value3);

}

Both methods echo back the calling arguments as a concatenated string and both introduce an artificial delay.
The first “EchoWithDelay” using
System.Threading.Thread.Sleep(5000);

The second method is marked “async”, and returns a task of type “string”, the implementation is more efficient using the .net 4.5…

await Task.Delay(TimeSpan.FromMilliseconds(5000));

Note they both introduce a delay of 5 seconds (and echo back the input as a string) but do nothing much else.
At this point it’s worth being aware that WCF has its own throttling (and in fact if you host under IIS using WAS then remember there’s an additional level of throttling outside of WCF for WAS itself).

I believe the defaults described here and here described for WCF4 are still relevant at the time of writing. Like IIS auto config settings, they’re dynamic based on the number of processors. The multiplier for concurrent calls is 16, so a 4 core machine would allow for 64 concurrent calls. See here for details on turning the relevant performance counters on!
I hosted my WCF service on a spare windows 2008 R2 server (note that any kind of load testing on client windows is ill advised as there’s a built in concurrency limit for client versions of windows), a very old 4 core Dell desktop in the corner. I created a unique app pool and website on the IISv7.x for it to live on.



A consumer

Next I created a simple MVC application without authentication and added some public methods (to call via HTTP GET) each method wrapping a method on my new shiny “downstream” service, just described!
The comments speak for themselves; I included the last method as a demo that older APM style “begin” “end” methods can still be wrapped as a task using “Task.Factory.FromAsync(…)”

// This code runs synch on the same thread pool thread which is held for the duration
public ActionResult CallServiceSync()
{
    // All this code runs synch on thread pool thread "A"
    // the thread is held whilst the long running service operation completes
    var result =_service.EchoWithDelay("hello", 1, true);
    ViewBag.Message = "Sync " + result;
    return View();
}
 
// This code runs synch on the same thread pool thread which is held for the duration
public ActionResult CallServiceAsyncBad()
{
    // All this code runs synch on thread pool thread "A"
    var t = _service.EchoWithDelayAsync("hello", 1, true);
    // the thread is still held whilst the long running service operation completes
    // worse than that the
    t.Wait();
 
    ViewBag.Message = "Bad Async " + t.Result;
    return View();
}
 
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResultCallServiceAsync()
{
    Debug.WriteLine("1 thread id - " + Thread.CurrentThread.ManagedThreadId);
    Debug.WriteLine("1 Culture - " + GetCurrentCulture());
 
    // The first part of the code runs synch on thread pool thread "A"
    // call IO bound long running service, thread may be released here
    // call via newer TAP pattern
    var result = await _service.EchoWithDelayAsync("hello", 1, true);//.ConfigureAwait(continueOnCapturedContext: false);
 
    // this code runs later, asynch on new thread pool thread, same context, captured from above!
    ViewBag.Message = "Async " + result;
 
    Debug.WriteLine("2 thread id - " + Thread.CurrentThread.ManagedThreadId);
    Debug.WriteLine("2 Culture - " + GetCurrentCulture());
 
    return View();
}
 
 
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResultCallServiceEfficientAsync()
{
    Debug.WriteLine("1 thread id - " + Thread.CurrentThread.ManagedThreadId);
    Debug.WriteLine("1 Culture - " + GetCurrentCulture());
 
    // The first part of the code runs synch on thread pool thread "A"
    // call IO bound long running service, thread may be released here
    // call via newer TAP pattern
    var result = await _service.EchoWithEfficientDelay("hello", 1, true);
 
    // this code runs later, asynch on new thread pool thread, same context, captured from above!
    ViewBag.Message = "Efficient wait Async " + result;
 
    Debug.WriteLine("2 thread id - " + Thread.CurrentThread.ManagedThreadId);
    Debug.WriteLine("2 Culture - " + GetCurrentCulture());
 
    return View();
}
 
 
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResultCallServiceAsyncLegacy()
{
    // The first part of the code runs synch on thread pool thread "A"
    // call IO bound long running service, thread may be released here
    // Wrap legacy APM pattern methods in a task which is awaitable
    var result = await Task<string>.Factory.FromAsync<stringintbool>(_service1.BeginEchoWithDelay,
        _service1.EndEchoWithDelay, "hello", 1, truenull); // last argument is optional async state, null here
 
    // this code runs later, asynch on new thread pool thread, same context, captured from above!
    ViewBag.Message = "Legacy Async " + result;
    return View();
}

With that done I deployed the web app to my “test server” (the aforementioned ancient 4 core Dell in the corner) under its own web site with its own app pool.



The client (aka load test rig)

Finally, in a separate Visual Studio (and I have Ultimate which allows me to do this, lucky me…) I created some web tests, one for each GET method on my controller (as above) and a VS load test to consume each one.
Each load test was the same; demonstrate a sudden increase in users, 20 per second to a limit of 200 users, all simultaneously hitting my server method for a duration of 2 mins.
Nothing too clever here, the load test controller and agent are all run from my Dell Laptop (windows 7 with an i7 CPU and plenty of RAM memory).

The tests

[Note: The counters I used were fairly limited, there’s a lot, but I wanted only a view on executing requests, number of requests and time to execute. Which can be found in “ASP.NET Apps v4…” category (for my web app) and “ServiceModelService 4.0.0.0” category (for my service).]

Test 1

GETs to the first “CallServiceSync” controller method (as snip above), which was wrapping a synchronous call to my “mock” service with its 5 second delay, were pretty poor.
My average response time was 35 seconds, though all calls completed.
I figured this was probably the ramp up on the thread pool, I could see that instances of my service were getting created roughly in line with the prescribed maximum rate that threads can be injected into the thread pool (unsurprisingly, 1 thread is required for 1 WCF instance) so I altered the machine config min thread settings (in processmodel) on my test server. I went from 4 to, well 200!

Test 2

The next test to “CallServiceSync” (the synchronous wrapper to my service method) was much improved now the thread pool was created with enough threads for my test up front, but still not fantastic; the average response time, this time around was 13.3 seconds.
Something else was also feeling wrong.

I suspected it’d either be with the number of allowed concurrent outbound connections – usually this is x2 but in IIS through autoconfig it’s 12x the number of CPU – I believe this is still correct at time of writing. So, on my 4 core test box this would be 48. But in fact looking at the WCF counters versus the counters for my web app…

…I could see that whilst the service calls were returning OK, I was getting less off them than requests waiting for my MVC app. I figured that the default [inbound] throttling for my “pressure” test needed tweaking to satisfy my 200 odd concurrent users. So I added this to my services behaviour.

          <serviceThrottling maxConcurrentCalls="250"
                          maxConcurrentInstances="250"
                          maxConcurrentSessions="250" />

...To override the per CPU defaults. That helped things along nicely...

Test 3

This test was still to “CallServiceSync” (my synchronous wrapper to my service method with its 5 second delay) but this time with the 200 thread starting size for my thread pool(s) and with WCF accepting 250 concurrent calls. The result was a duration average of 5.8 seconds.

A lot better.

The problem is, setting a high minimum (starting) number of thread pool threads probably isn't the right thing to do in this case. Not least because of the additional resources AND because its machine wide, in production the server would likely be multi-tenant – would all my services be expecting this “burst” load, or this pressure?

Test 4

OK, reverting back to the minimum of 4 starting threads but keeping the higher concurrency setting (of 250 calls) in WCF.

The average response time is back to an average of around 35 seconds. Interestingly (but unsurprisingly) this is the same result if I try the load test GET'ing my “CallServiceAsyncBad” controller method. This method tries to be async by running (or offloading) the call to the service using a task, but then waiting on the result (synchronously). In fact it’s worse than that because doing so incurs overhead in the context switching required.

public ActionResult CallServiceBad()
  {
      var t = _service.EchoWithDelayAsync("hello", 1, true);
      t.Wait();
      ViewBag.Message = "Bad Async " + t.Result;
      return View();
  }

Test 5

Now to try the same test with my “proper” async method, using the async option provided when I added my service via the WCF generated service proxy.
Still default minimum threads (starting 4) and the increased concurrency limit for my service (250), for 200 concurrent users, for the same duration.

public async Task<ActionResultCallServiceAsync()
{
    var result = await _service.EchoWithDelayAsync("hello", 1, true);
    ViewBag.Message = "Async " + result;
    return View();
}

…And… An average call time over the test of 10.5 seconds… Hmmmm… Interestingly it shows as a bell curve, the first couple of calls are pretty quick (around 5.5 seconds) then the latency increases, by 40 seconds it’s 24 seconds average call response time?! Then from 40 seconds to the end of the 2 min period it slowly drops away back to about 5.5 seconds (ish).
Still..Hmmmm.
The answer seems to be with my mock/test service. The “Thread.Delay()” – which is after all pretty much the only thing going on service side is the prime culprit.
Time to run the test again this time with the method that wraps a “Task.Factory.Delay()” which is the same pause but awaitable (asynchronously implemented).

public async Task<ActionResultCallServiceEfficientAsync()
{
    var result = await _service.EchoWithDelayEfficientAsync("hello", 1, true);
    ViewBag.Message = "Efficient wait Async " + result;
    return View();
}

That’s better, this time the average response time was 5.3 seconds! Remember this is with the default minimum threads (starting at 4) and just the service tweaked to accept the load.

Conclusions…

1. Unless it really doesn't matter (and for most serious applications it does), do load testing. Start early on and keep going, automate it if you can. Always!
2. Don’t pre-emptively tune. Tune when you need to once you understand what and where and by how much.
3. If you’re going to use asynchronous code, and await/async for scalability then it’s got to be from top to bottom and if your architecture has your web app dependent on services then – if those services are IO bound, the “asynchronicity” needs to extend there too.
4. Asynchronous code for scalability only really works if its efficient code, waiting on a callback from the framework, OS or driver level callback is efficient.
5. Consider carefully whether you’re service method code needs to be asynchronous or not and if so why? If the code is CPU heavy maybe it’s better to have the UI wrap the call in its own task (or offload) to ensure the UI remains responsive whilst you’re method is left to run synchronously service side.
6. Understand the impact of the threadpool thread injection rate can have on your application performance (approx. 2 threads a second) especially if you expect burst loads... but...Be very wary about changing the thread-pool minimums. Do remember the thread injection "throttling" affects all .net apps that use a managed thread pool.
7. Do understand that by introducing asynchronous code it needs to be for a reason, be able to justify the additional complexity and be prepared to be asked to support it! (Try not to over-engineer).
8. Read up!

References and further reading





Don’t block on Asynchronous code and Don’t block in Asynchronous code (Stephen Cleary)  – this is a relatively well covered blog topic and in SO, an easy trap to fall into.

Best Practices in Asynchronous Programming (Stephen Cleary) – namely why you really need asynchronous code all the way from top to bottom.


The excellent - Should I expose asynchronous wrappers for synchronous methods (Stephen Toub – MSFT) – covers reason for asynchronous code including scalability and offloading and when to avoid (i.e. CPU bound operations).





7 comments:

Freelancer said...
This comment has been removed by the author.
Freelancer said...

Nice post. Could you maybe upload the source code somewhere? Because i couldn't find how many requests you made from consumer at your tests
Thanks in advance

John Smith said...

Searching for Verify Receipt of your chase bank Credit Card? On the off chance that verify your chase bank credit card, Chase Card Activation then you are at correct spot. I have composed bit by bit direct for activate chase and pursue record and Visa at chase.com/verifycard.
Read More

John Cornejo said...

This post is exceptionally written. I can imagine how much hard work you have put into this so, congratulations.This one. A must read article!Really great article. Keep it up.Thank you for bringing to a halt my long search topic. I really benefited from your content.Visit my profile letter generator.

casinositeking 카지노사이트 said...

It's late finding this act. At least, it's a thing to be familiar with that there are such events exist. I agree with your Blog and I will be back to inspect it more in the future so please keep up your act. Also visit my website: 카지노사이트

Anna Buckley said...

Amazing article, very informative and easy to understand. We provide information about norse god names in this article, if you are looking for Norse mythological names. Unless otherwise noted, each name is computer generated and we encourage you to research the naming conventions and meanings for your exact area.

MeriFain said...

Thanks for sharing it. I really enjoyed reading it. It's fun. In this article on the space bar, I am sharing another informative article spacebar counter game that can help you improve your typing speed.