Background
I thought I had a solid understanding of asynchronous code,
why it was useful and when to use it; despite the fact I don’t actually get
called to write code this way very often.
But…
Having taken some time recently to re-learn or re-establish
my understanding with a view to being able to better advise my colleagues... I
discovered along the way that I didn't know quite
as much as I thought I did and that despite the advances in language and the
lower barrier to entry in .net 45 there’s still a lot the developer *should know before attempting to write asynchronous
server side code.
Furthermore I was reminded that writing asynchronous server
side code is a conscious *design decision, it is in effect an optimization, it shouldn't necessarily be a default starting point– the pro’s and con’s need to
be carefully considered. You shouldn't code in optimizations before you need
them, and it’s a common mistake (and maybe human nature) to sometimes over
engineer code for performance and scalability when in reality there’s no real
requirement to do so.
Such mistakes can be costly, whilst asynchronous code is a
lot simpler than it used to be and the barrier of entry is undeniably lower, with
great power comes great responsibility. Asynchronous code is still harder to
write, harder to maintain and requires you understand enough to be able to
support an application written this way once it’s out in the wild.
My experience has long centred on a classic architecture
which combines UI tier (browser, usually client side JavaScript & mark-up)
calling back to the web server to update part of a page or for a whole page
submission. This paradigm is roughly the same for basic MVC as it is for web
forms pages.
The web server side may then defer to a composition of
services (typically, internal services, mostly implemented as windows
communication foundation – WCF over basic http)
which in turn either interact with the databases or defer to third party
services for data we wish to consume.
Typically the services have latency because their IO bound
not CPU bound (there’s no intensive computational work taking place). As such
it isn't unusual for some services to take more than 1000 ms to respond, in some
cases taking up to 7000ms where complex data aggregation is performed at the
back end.
It’s understandable why net 4.5 & the magic async/await
keyword pair would be attractive and potentially useful as implementations get
refreshed and re-factored.
When to be
asynchronous?
When looking at a new application or new requirements to an
existing application it’s worth keeping the following in mind…
To quote from Stephen
Toub here – who is able to put it far more succinctly than I: -“There are two primary benefits I see to
asynchrony: scalability and offloading (e.g. responsiveness,
parallelism). Which of these benefits matters to you is typically dictated
by the kind of application you’re writing. ”
An interesting
behaviour of the ThreadPool
A really great introduction to async/await exists in Stephen Cleary’s
MSDN magazine article (Stephen
Cleary has literally written the book on .net asynchronous programming –
“Concurrency in C# Cookbook” – available on Amazon and others). In this great
article Stephen mentions an interesting behaviour of the CLR thread pool.
Now, I’m not
talking about the fact that threads are finite and that asynchronous code make
better use of the managed thread pool by
allowing idle threads to be re-used that might otherwise be blocked – which
greatly improves scalability.
Though this is true…
I’m talking about a
lesser known fact (at least to me) which is that the thread pool implementation (quite deliberately) throttles the rate at which threads can be
injected into the pool.
At the time of writing the default starting minimum
number of threads is 4. This behaviour can adversely affect your application if
it is subjected to burst loads. The timer will start, the request has been
made, but it can’t run till it’s allocated a thread. This manifests itself as
latency within the caller, who may, if things get really bad start to receive
Http Error 503 (Service unavailable).
As it turns out WCF –
which unsurprisingly also requires a thread pool thread on which to run a
created instance, can also exhibit this behaviour. See MS support advice here.
This can lead to your service scaling up slowly under burst load, which will
negatively impact the caller.
At this point you may be asking the question, why not simply
increase the minimum number of threads from 4 to the amount that we think we
require.
In certain circumstances you might elect to do so, though
Stephen points out in his article (linked above) there are very good reasons
why you may not wish to do so, the same
reasons the default is low, namely efficient use of resources.
It’s worth remembering that, specifically in
IIS and for asp net applications, certain environment defaults are set by use
of the machine config “autoconfig=true” flag.
They can only be
overridden in the machine config processModel
section. An observation here; changes to these values affect all!
This
old but often updated article from Thomas Marquardt’s has often been a real
helpful reference point for me when troubleshooting live issues helps explain
the dynamic settings and some of the background from earlier .net / IIS
versions. Note that Thomas (in one of the last updates to that article) also
points out the "gotcha" in behaviour with standard minimum thread count and burst
loads.
An example
The back-end Service
In order to see some of this for myself I created a simple
WCF net 4.5 service in Visual Studio.
It exposes two simple methods
[ServiceContract]
public interface IService1
{
[OperationContract]
string EchoWithDelay(string value1, int value2, bool value3);
[OperationContract]
Task<string> EchoWithEfficientDelay(string value1, int value2, bool value3);
}
Both methods echo back the calling arguments as a
concatenated string and both introduce an artificial delay.
The first “EchoWithDelay” using
System.Threading.Thread.Sleep(5000);
The second method is marked “async”, and returns a task of type “string”, the implementation is
more efficient using the .net 4.5…
await Task.Delay(TimeSpan.FromMilliseconds(5000));
Note they both introduce a delay of 5 seconds (and echo back
the input as a string) but do nothing much else.
At this point it’s worth being aware that WCF has its own
throttling (and in fact if you host under IIS using WAS then remember there’s
an additional level of throttling outside of WCF for WAS itself).
I believe the defaults described here
and here
described for WCF4 are still relevant at the time of writing. Like IIS auto
config settings, they’re dynamic based on the number of processors. The
multiplier for concurrent calls is 16, so a 4 core machine would allow for 64
concurrent calls. See here for
details on turning the relevant performance counters on!
A consumer
Next I created a simple MVC application without
authentication and added some public methods (to call via HTTP GET) each method
wrapping a method on my new shiny “downstream” service, just described!
The comments speak for themselves; I included the last
method as a demo that older APM style “begin” “end” methods can still be
wrapped as a task using “Task.Factory.FromAsync(…)”
// This code runs synch on the same thread pool thread which is held for the duration
public ActionResult CallServiceSync()
{
// All this code runs synch on thread pool thread "A"
// the thread is held whilst the long running service operation completes
var result =_service.EchoWithDelay("hello", 1, true);
ViewBag.Message = "Sync " + result;
return View();
}
// This code runs synch on the same thread pool thread which is held for the duration
public ActionResult CallServiceAsyncBad()
{
// All this code runs synch on thread pool thread "A"
var t = _service.EchoWithDelayAsync("hello", 1, true);
// the thread is still held whilst the long running service operation completes
// worse than that the
t.Wait();
ViewBag.Message = "Bad Async " + t.Result;
return View();
}
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResult> CallServiceAsync()
{
Debug.WriteLine("1 thread id - " + Thread.CurrentThread.ManagedThreadId);
Debug.WriteLine("1 Culture - " + GetCurrentCulture());
// The first part of the code runs synch on thread pool thread "A"
// call IO bound long running service, thread may be released here
// call via newer TAP pattern
var result = await _service.EchoWithDelayAsync("hello", 1, true);//.ConfigureAwait(continueOnCapturedContext: false);
// this code runs later, asynch on new thread pool thread, same context, captured from above!
ViewBag.Message = "Async " + result;
Debug.WriteLine("2 thread id - " + Thread.CurrentThread.ManagedThreadId);
Debug.WriteLine("2 Culture - " + GetCurrentCulture());
return View();
}
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResult> CallServiceEfficientAsync()
{
Debug.WriteLine("1 thread id - " + Thread.CurrentThread.ManagedThreadId);
Debug.WriteLine("1 Culture - " + GetCurrentCulture());
// The first part of the code runs synch on thread pool thread "A"
// call IO bound long running service, thread may be released here
// call via newer TAP pattern
var result = await _service.EchoWithEfficientDelay("hello", 1, true);
// this code runs later, asynch on new thread pool thread, same context, captured from above!
ViewBag.Message = "Efficient wait Async " + result;
Debug.WriteLine("2 thread id - " + Thread.CurrentThread.ManagedThreadId);
Debug.WriteLine("2 Culture - " + GetCurrentCulture());
return View();
}
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResult> CallServiceAsyncLegacy()
{
// The first part of the code runs synch on thread pool thread "A"
// call IO bound long running service, thread may be released here
// Wrap legacy APM pattern methods in a task which is awaitable
var result = await Task<string>.Factory.FromAsync<string, int, bool>(_service1.BeginEchoWithDelay,
_service1.EndEchoWithDelay, "hello", 1, true, null); // last argument is optional async state, null here
// this code runs later, asynch on new thread pool thread, same context, captured from above!
ViewBag.Message = "Legacy Async " + result;
return View();
}
With that done I deployed the web app to my “test server”
(the aforementioned ancient 4 core Dell in the corner) under its own web site
with its own app pool.
The client (aka load
test rig)
Finally, in a separate Visual Studio (and I have Ultimate
which allows me to do this, lucky me…) I created some web tests, one for each
GET method on my controller (as above) and a VS load test to consume each one.
Each load test was the same; demonstrate a sudden increase
in users, 20 per second to a limit of 200 users, all simultaneously hitting my
server method for a duration of 2 mins.
Nothing too clever here, the load test controller and agent
are all run from my Dell Laptop (windows 7 with an i7 CPU and plenty of RAM
memory).
The tests
[Note: The counters I used were fairly limited, there’s a
lot, but I wanted only a view on executing requests, number of requests and
time to execute. Which can be found in “ASP.NET Apps v4…” category (for my web
app) and “ServiceModelService 4.0.0.0” category (for my service).]
Test 1
GETs to the first “CallServiceSync”
controller method (as snip above), which was wrapping a synchronous call to my
“mock” service with its 5 second delay, were pretty poor.
My average response time was 35 seconds, though all calls completed.
I figured this was probably the ramp up on the thread pool,
I could see that instances of my service were getting created roughly in line
with the prescribed maximum rate that threads can be injected into the thread
pool (unsurprisingly, 1 thread is required for 1 WCF instance) so I altered the
machine config min thread settings (in processmodel) on my test server. I went
from 4 to, well 200!
Test 2
The next test to “CallServiceSync”
(the synchronous wrapper to my service method) was much improved now the thread
pool was created with enough threads for my test up front, but still not
fantastic; the average response time, this time around was 13.3 seconds.
Something else was also feeling wrong.
I suspected it’d either be with the number of allowed
concurrent outbound connections – usually this is x2 but in IIS through autoconfig it’s 12x
the number of CPU – I believe this is still correct at time of writing. So,
on my 4 core test box this would be 48. But in fact looking at the WCF counters
versus the counters for my web app…
…I could see that whilst the service calls were returning
OK, I was getting less off them than requests waiting for my MVC app. I figured
that the default [inbound] throttling for my “pressure” test needed tweaking to
satisfy my 200 odd concurrent users. So I added this to my services behaviour.
<serviceThrottling maxConcurrentCalls="250"
maxConcurrentInstances="250"
maxConcurrentSessions="250" />
...To override the per CPU defaults. That helped things
along nicely...
Test 3
This test was still to “CallServiceSync”
(my synchronous wrapper to my service method with its 5 second delay) but this
time with the 200 thread starting size for my thread pool(s) and with WCF
accepting 250 concurrent calls. The result was a duration average of 5.8 seconds.
A lot better.
The problem is, setting a high minimum (starting) number of
thread pool threads probably isn't the right thing to do in this case. Not
least because of the additional resources AND because its machine wide, in
production the server would likely be multi-tenant – would all my services be
expecting this “burst” load, or this pressure?
Test 4
OK, reverting back to the minimum of 4 starting threads but
keeping the higher concurrency setting (of 250 calls) in WCF.
The average response time is back to an average of around 35 seconds. Interestingly (but unsurprisingly)
this is the same result if I try the load test GET'ing my “CallServiceAsyncBad” controller method. This method tries to be
async by running (or offloading) the call to the service using a task, but then
waiting on the result (synchronously). In fact it’s worse than that because
doing so incurs overhead in the context switching required.
public ActionResult CallServiceBad()
{
var t = _service.EchoWithDelayAsync("hello", 1, true);
t.Wait();
ViewBag.Message = "Bad Async " + t.Result;
return View();
}
Test 5
Now to try the same test with my “proper” async method, using
the async option provided when I added my service via the WCF generated service
proxy.
Still default minimum threads (starting 4) and the increased
concurrency limit for my service (250), for 200 concurrent users, for the same duration.
public async Task<ActionResult> CallServiceAsync()
{
var result = await _service.EchoWithDelayAsync("hello", 1, true);
ViewBag.Message = "Async " + result;
return View();
}
…And… An average
call time over the test of 10.5 seconds…
Hmmmm… Interestingly it shows as a bell curve, the first couple of calls are
pretty quick (around 5.5 seconds) then the latency increases, by 40 seconds it’s
24 seconds average call response time?! Then from 40 seconds to the end of the 2 min
period it slowly drops away back to about 5.5 seconds (ish).
Still..Hmmmm.
The answer seems to be with my mock/test service. The “Thread.Delay()”
– which is after all pretty much the only thing going on service side is the
prime culprit.
Time to run the test again this time with the method that
wraps a “Task.Factory.Delay()” which is the same pause but awaitable
(asynchronously implemented).
public async Task<ActionResult> CallServiceEfficientAsync()
{
var result = await _service.EchoWithDelayEfficientAsync("hello", 1, true);
ViewBag.Message = "Efficient wait Async " + result;
return View();
}
That’s better, this time the average response time was 5.3 seconds! Remember this is with the default
minimum threads (starting at 4) and just the service tweaked to accept the load.
Conclusions…
1. Unless it really doesn't matter
(and for most serious applications it does), do load testing. Start early on
and keep going, automate it if you can. Always!
2. Don’t pre-emptively tune. Tune
when you need to once you understand what and where and by how much.
3. If you’re going to use
asynchronous code, and await/async for scalability then it’s got to be from top
to bottom and if your architecture has your web app dependent on services then –
if those services are IO bound,
the “asynchronicity” needs to extend there too.
4. Asynchronous code for scalability
only really works if its efficient code, waiting on a callback from the framework,
OS or driver level callback is efficient.
5. Consider carefully whether you’re
service method code needs to be asynchronous or not and if so why? If the code
is CPU heavy maybe it’s better to have the UI wrap the call in its own task (or
offload) to ensure the UI remains responsive whilst you’re method is left to
run synchronously service side.
6. Understand the impact of the threadpool thread injection rate can have on your application performance (approx. 2 threads a second) especially if you expect burst loads... but...Be very wary about changing the
thread-pool minimums. Do remember the thread injection "throttling" affects
all .net apps that use a managed thread pool.
7. Do understand that by introducing
asynchronous code it needs to be for a reason, be able to justify the
additional complexity and be prepared to be asked to support it! (Try not to
over-engineer).
8. Read up!
References and
further reading