Friday, 27 March 2015

Why asynchronous server side code can be a good thing and lessons re-learned (in .net 4.5)

Background

I thought I had a solid understanding of asynchronous code, why it was useful and when to use it; despite the fact I don’t actually get called to write code this way very often.
But…

Having taken some time recently to re-learn or re-establish my understanding with a view to being able to better advise my colleagues... I discovered along the way that I didn't know quite as much as I thought I did and that despite the advances in language and the lower barrier to entry in .net 45 there’s still a lot the developer *should know before attempting to write asynchronous server side code.

Furthermore I was reminded that writing asynchronous server side code is a conscious *design decision, it is in effect an optimization, it shouldn't necessarily be a default starting point– the pro’s and con’s need to be carefully considered. You shouldn't code in optimizations before you need them, and it’s a common mistake (and maybe human nature) to sometimes over engineer code for performance and scalability when in reality there’s no real requirement to do so.

Such mistakes can be costly, whilst asynchronous code is a lot simpler than it used to be and the barrier of entry is undeniably lower, with great power comes great responsibility. Asynchronous code is still harder to write, harder to maintain and requires you understand enough to be able to support an application written this way once it’s out in the wild.
My experience has long centred on a classic architecture which combines UI tier (browser, usually client side JavaScript & mark-up) calling back to the web server to update part of a page or for a whole page submission. This paradigm is roughly the same for basic MVC as it is for web forms pages.

The web server side may then defer to a composition of services (typically, internal services, mostly implemented as windows communication foundation – WCF over basic http)  which in turn either interact with the databases or defer to third party services for data we wish to consume.

Typically the services have latency because their IO bound not CPU bound (there’s no intensive computational work taking place). As such it isn't unusual for some services to take more than 1000 ms to respond, in some cases taking up to 7000ms where complex data aggregation is performed at the back end.

It’s understandable why net 4.5 & the magic async/await keyword pair would be attractive and potentially useful as implementations get refreshed and re-factored.

When to be asynchronous?

When looking at a new application or new requirements to an existing application it’s worth keeping the following in mind…
To quote from Stephen Toub here – who is able to put it far more succinctly than I: -There are two primary benefits I see to asynchrony: scalability and offloading (e.g. responsiveness, parallelism). Which of these benefits matters to you is typically dictated by the kind of application you’re writing. 

An interesting behaviour of the ThreadPool

A really great introduction to async/await exists in Stephen Cleary’s MSDN magazine article (Stephen Cleary has literally written the book on .net asynchronous programming – “Concurrency in C# Cookbook” – available on Amazon and others). In this great article Stephen mentions an interesting behaviour of the CLR thread pool.
Now, I’m not talking about the fact that threads are finite and that asynchronous code make better use of the managed thread  pool by allowing idle threads to be re-used that might otherwise be blocked – which greatly improves scalability.
Though this is true…

I’m talking about a lesser known fact (at least to me) which is that the thread pool implementation (quite deliberately) throttles the rate at which threads can be injected into the pool

At the time of writing the default starting minimum number of threads is 4. This behaviour can adversely affect your application if it is subjected to burst loads. The timer will start, the request has been made, but it can’t run till it’s allocated a thread. This manifests itself as latency within the caller, who may, if things get really bad start to receive Http Error 503 (Service unavailable).

According to MSDN (see managed thread pool - “Thread Injection” section) threads can be added at a rate of 1 ~500 ms (or 2 per second).

As it turns out WCF – which unsurprisingly also requires a thread pool thread on which to run a created instance, can also exhibit this behaviour. See MS support advice here. This can lead to your service scaling up slowly under burst load, which will negatively impact the caller.

At this point you may be asking the question, why not simply increase the minimum number of threads from 4 to the amount that we think we require.
In certain circumstances you might elect to do so, though Stephen points out in his article (linked above) there are very good reasons why you may not wish to do so,  the same reasons the default is low, namely efficient use of resources.

It’s worth remembering that, specifically in IIS and for asp net applications, certain environment defaults are set by use of the machine config “autoconfig=true” flag.
They can only be overridden in the machine config processModel section. An observation here; changes to these values affect all!

This old but often updated article from Thomas Marquardt’s has often been a real helpful reference point for me when troubleshooting live issues helps explain the dynamic settings and some of the background from earlier .net / IIS versions. Note that Thomas (in one of the last updates to that article) also points out the "gotcha" in behaviour with standard minimum thread count and burst loads.

An example

The back-end Service

In order to see some of this for myself I created a simple WCF net 4.5 service in Visual Studio.

It exposes two simple methods

[ServiceContract]
public interface IService1
{
    [OperationContract]
    string EchoWithDelay(string value1, int value2, bool value3);

    [OperationContract]
    Task<stringEchoWithEfficientDelay(string value1, int value2, bool value3);

}

Both methods echo back the calling arguments as a concatenated string and both introduce an artificial delay.
The first “EchoWithDelay” using
System.Threading.Thread.Sleep(5000);

The second method is marked “async”, and returns a task of type “string”, the implementation is more efficient using the .net 4.5…

await Task.Delay(TimeSpan.FromMilliseconds(5000));

Note they both introduce a delay of 5 seconds (and echo back the input as a string) but do nothing much else.
At this point it’s worth being aware that WCF has its own throttling (and in fact if you host under IIS using WAS then remember there’s an additional level of throttling outside of WCF for WAS itself).

I believe the defaults described here and here described for WCF4 are still relevant at the time of writing. Like IIS auto config settings, they’re dynamic based on the number of processors. The multiplier for concurrent calls is 16, so a 4 core machine would allow for 64 concurrent calls. See here for details on turning the relevant performance counters on!
I hosted my WCF service on a spare windows 2008 R2 server (note that any kind of load testing on client windows is ill advised as there’s a built in concurrency limit for client versions of windows), a very old 4 core Dell desktop in the corner. I created a unique app pool and website on the IISv7.x for it to live on.



A consumer

Next I created a simple MVC application without authentication and added some public methods (to call via HTTP GET) each method wrapping a method on my new shiny “downstream” service, just described!
The comments speak for themselves; I included the last method as a demo that older APM style “begin” “end” methods can still be wrapped as a task using “Task.Factory.FromAsync(…)”

// This code runs synch on the same thread pool thread which is held for the duration
public ActionResult CallServiceSync()
{
    // All this code runs synch on thread pool thread "A"
    // the thread is held whilst the long running service operation completes
    var result =_service.EchoWithDelay("hello", 1, true);
    ViewBag.Message = "Sync " + result;
    return View();
}
 
// This code runs synch on the same thread pool thread which is held for the duration
public ActionResult CallServiceAsyncBad()
{
    // All this code runs synch on thread pool thread "A"
    var t = _service.EchoWithDelayAsync("hello", 1, true);
    // the thread is still held whilst the long running service operation completes
    // worse than that the
    t.Wait();
 
    ViewBag.Message = "Bad Async " + t.Result;
    return View();
}
 
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResultCallServiceAsync()
{
    Debug.WriteLine("1 thread id - " + Thread.CurrentThread.ManagedThreadId);
    Debug.WriteLine("1 Culture - " + GetCurrentCulture());
 
    // The first part of the code runs synch on thread pool thread "A"
    // call IO bound long running service, thread may be released here
    // call via newer TAP pattern
    var result = await _service.EchoWithDelayAsync("hello", 1, true);//.ConfigureAwait(continueOnCapturedContext: false);
 
    // this code runs later, asynch on new thread pool thread, same context, captured from above!
    ViewBag.Message = "Async " + result;
 
    Debug.WriteLine("2 thread id - " + Thread.CurrentThread.ManagedThreadId);
    Debug.WriteLine("2 Culture - " + GetCurrentCulture());
 
    return View();
}
 
 
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResultCallServiceEfficientAsync()
{
    Debug.WriteLine("1 thread id - " + Thread.CurrentThread.ManagedThreadId);
    Debug.WriteLine("1 Culture - " + GetCurrentCulture());
 
    // The first part of the code runs synch on thread pool thread "A"
    // call IO bound long running service, thread may be released here
    // call via newer TAP pattern
    var result = await _service.EchoWithEfficientDelay("hello", 1, true);
 
    // this code runs later, asynch on new thread pool thread, same context, captured from above!
    ViewBag.Message = "Efficient wait Async " + result;
 
    Debug.WriteLine("2 thread id - " + Thread.CurrentThread.ManagedThreadId);
    Debug.WriteLine("2 Culture - " + GetCurrentCulture());
 
    return View();
}
 
 
// The thread pool thread that started the method is not held for the entire duration
// This code is thread pool efficient as the service callback triggers resuming the code on the initial (captured) context
public async Task<ActionResultCallServiceAsyncLegacy()
{
    // The first part of the code runs synch on thread pool thread "A"
    // call IO bound long running service, thread may be released here
    // Wrap legacy APM pattern methods in a task which is awaitable
    var result = await Task<string>.Factory.FromAsync<stringintbool>(_service1.BeginEchoWithDelay,
        _service1.EndEchoWithDelay, "hello", 1, truenull); // last argument is optional async state, null here
 
    // this code runs later, asynch on new thread pool thread, same context, captured from above!
    ViewBag.Message = "Legacy Async " + result;
    return View();
}

With that done I deployed the web app to my “test server” (the aforementioned ancient 4 core Dell in the corner) under its own web site with its own app pool.



The client (aka load test rig)

Finally, in a separate Visual Studio (and I have Ultimate which allows me to do this, lucky me…) I created some web tests, one for each GET method on my controller (as above) and a VS load test to consume each one.
Each load test was the same; demonstrate a sudden increase in users, 20 per second to a limit of 200 users, all simultaneously hitting my server method for a duration of 2 mins.
Nothing too clever here, the load test controller and agent are all run from my Dell Laptop (windows 7 with an i7 CPU and plenty of RAM memory).

The tests

[Note: The counters I used were fairly limited, there’s a lot, but I wanted only a view on executing requests, number of requests and time to execute. Which can be found in “ASP.NET Apps v4…” category (for my web app) and “ServiceModelService 4.0.0.0” category (for my service).]

Test 1

GETs to the first “CallServiceSync” controller method (as snip above), which was wrapping a synchronous call to my “mock” service with its 5 second delay, were pretty poor.
My average response time was 35 seconds, though all calls completed.
I figured this was probably the ramp up on the thread pool, I could see that instances of my service were getting created roughly in line with the prescribed maximum rate that threads can be injected into the thread pool (unsurprisingly, 1 thread is required for 1 WCF instance) so I altered the machine config min thread settings (in processmodel) on my test server. I went from 4 to, well 200!

Test 2

The next test to “CallServiceSync” (the synchronous wrapper to my service method) was much improved now the thread pool was created with enough threads for my test up front, but still not fantastic; the average response time, this time around was 13.3 seconds.
Something else was also feeling wrong.

I suspected it’d either be with the number of allowed concurrent outbound connections – usually this is x2 but in IIS through autoconfig it’s 12x the number of CPU – I believe this is still correct at time of writing. So, on my 4 core test box this would be 48. But in fact looking at the WCF counters versus the counters for my web app…

…I could see that whilst the service calls were returning OK, I was getting less off them than requests waiting for my MVC app. I figured that the default [inbound] throttling for my “pressure” test needed tweaking to satisfy my 200 odd concurrent users. So I added this to my services behaviour.

          <serviceThrottling maxConcurrentCalls="250"
                          maxConcurrentInstances="250"
                          maxConcurrentSessions="250" />

...To override the per CPU defaults. That helped things along nicely...

Test 3

This test was still to “CallServiceSync” (my synchronous wrapper to my service method with its 5 second delay) but this time with the 200 thread starting size for my thread pool(s) and with WCF accepting 250 concurrent calls. The result was a duration average of 5.8 seconds.

A lot better.

The problem is, setting a high minimum (starting) number of thread pool threads probably isn't the right thing to do in this case. Not least because of the additional resources AND because its machine wide, in production the server would likely be multi-tenant – would all my services be expecting this “burst” load, or this pressure?

Test 4

OK, reverting back to the minimum of 4 starting threads but keeping the higher concurrency setting (of 250 calls) in WCF.

The average response time is back to an average of around 35 seconds. Interestingly (but unsurprisingly) this is the same result if I try the load test GET'ing my “CallServiceAsyncBad” controller method. This method tries to be async by running (or offloading) the call to the service using a task, but then waiting on the result (synchronously). In fact it’s worse than that because doing so incurs overhead in the context switching required.

public ActionResult CallServiceBad()
  {
      var t = _service.EchoWithDelayAsync("hello", 1, true);
      t.Wait();
      ViewBag.Message = "Bad Async " + t.Result;
      return View();
  }

Test 5

Now to try the same test with my “proper” async method, using the async option provided when I added my service via the WCF generated service proxy.
Still default minimum threads (starting 4) and the increased concurrency limit for my service (250), for 200 concurrent users, for the same duration.

public async Task<ActionResultCallServiceAsync()
{
    var result = await _service.EchoWithDelayAsync("hello", 1, true);
    ViewBag.Message = "Async " + result;
    return View();
}

…And… An average call time over the test of 10.5 seconds… Hmmmm… Interestingly it shows as a bell curve, the first couple of calls are pretty quick (around 5.5 seconds) then the latency increases, by 40 seconds it’s 24 seconds average call response time?! Then from 40 seconds to the end of the 2 min period it slowly drops away back to about 5.5 seconds (ish).
Still..Hmmmm.
The answer seems to be with my mock/test service. The “Thread.Delay()” – which is after all pretty much the only thing going on service side is the prime culprit.
Time to run the test again this time with the method that wraps a “Task.Factory.Delay()” which is the same pause but awaitable (asynchronously implemented).

public async Task<ActionResultCallServiceEfficientAsync()
{
    var result = await _service.EchoWithDelayEfficientAsync("hello", 1, true);
    ViewBag.Message = "Efficient wait Async " + result;
    return View();
}

That’s better, this time the average response time was 5.3 seconds! Remember this is with the default minimum threads (starting at 4) and just the service tweaked to accept the load.

Conclusions…

1. Unless it really doesn't matter (and for most serious applications it does), do load testing. Start early on and keep going, automate it if you can. Always!
2. Don’t pre-emptively tune. Tune when you need to once you understand what and where and by how much.
3. If you’re going to use asynchronous code, and await/async for scalability then it’s got to be from top to bottom and if your architecture has your web app dependent on services then – if those services are IO bound, the “asynchronicity” needs to extend there too.
4. Asynchronous code for scalability only really works if its efficient code, waiting on a callback from the framework, OS or driver level callback is efficient.
5. Consider carefully whether you’re service method code needs to be asynchronous or not and if so why? If the code is CPU heavy maybe it’s better to have the UI wrap the call in its own task (or offload) to ensure the UI remains responsive whilst you’re method is left to run synchronously service side.
6. Understand the impact of the threadpool thread injection rate can have on your application performance (approx. 2 threads a second) especially if you expect burst loads... but...Be very wary about changing the thread-pool minimums. Do remember the thread injection "throttling" affects all .net apps that use a managed thread pool.
7. Do understand that by introducing asynchronous code it needs to be for a reason, be able to justify the additional complexity and be prepared to be asked to support it! (Try not to over-engineer).
8. Read up!

References and further reading





Don’t block on Asynchronous code and Don’t block in Asynchronous code (Stephen Cleary)  – this is a relatively well covered blog topic and in SO, an easy trap to fall into.

Best Practices in Asynchronous Programming (Stephen Cleary) – namely why you really need asynchronous code all the way from top to bottom.


The excellent - Should I expose asynchronous wrappers for synchronous methods (Stephen Toub – MSFT) – covers reason for asynchronous code including scalability and offloading and when to avoid (i.e. CPU bound operations).





Friday, 13 March 2015

Windbg


Over 90% of the more serious support cases I've been involved, in my years at my current employer, where I've not understood a serious server condition, taking a mini dump file from the misbehaving process and analysing it via windbg has more often than not (surprisingly quickly!) provided a lead to the correct answer…

You need not be a windbg expert – it’s a program that can be quite intimidating to use, some will simply shrug and say it’s “too hard” … This shouldn't be the case… don’t be intimidated… Whilst you can do a lot clever things with windbg, in the majority of cases; a basic understanding of windows processes and the tooling coupled with the utilisation of a few relatively simple commands through the .net extension libraries available is often as much as it takes to figure out a rough lead.

Third party support is useful, but the engineers won't know your code or your business domain - you are very often going to be the quickest way to Root-Cause-Analysis.

Application Monitoring frameworks (such as the fantastic NewRelic) can also be leveraged to correlate real time application metrics against the mini dump files.

Quite often it's worthwhile generating not just one but 3 or 4 dump files from the offending process at a regular interval, for example 30 seconds. This can be scripted.

For creating mini dumps or crash dumps (and this could be via script) see here.

The 3 extensions which are most worthwhile spending some time on researching
  1. SOS (Son of strike) – the windbg extension for .net managed code commands
  2. Superset of SOS (recommended) -->  PSSCor2 – for .net2+ and  PSSCor4 – for .net4
  3. NetExt (>.net45x) – Standalone Net extensions for web apps and WCF; wish I’d know about this in times gone by!!! This is NOT a dependent of SOS or PSSCOR. See here for a use case.
The go-to place for beginners wanting to ramp up quick on SOS and Windbg – If broken it is, fix it you should, Ex PSS Eng. Tess Ferrandez’s (old but still relevant) blog. 
Related is an old hanselminutes debugging 101 with Ms Ferrandez.
The netext people also have a neat pdf tutorial, well worth a look.
More advanced: http://www.windbg.org/
Or Google…. 

Monday, 9 March 2015

Azure API Management - APIM, consuming a SOAP WCF service over HTTP


Update 26/March/20 - In the intervening years Azure APIM has come a long way - it's now quite common to see customers of APIM facading SOAP webservices - using both Liquid and XSLT to transform legacy webservices to a more modern RESTful experience. 
Please do checkout the Azure APIM docs.

***

At the time of writing:

I've been looking at the excellent API-Management feature of Azure .. I understand this functionality has come about through an acquisition (Apiphany). 

I think Azure APIM could offer some great benefits. 

After becoming familiar with the product through the documentation I was still a little confused about how or in fact whether a WCF-SOAP service was supported as the back-end service. The APIM documentation focuses very much on a REST based example back-end service.

(Note: I assume the readers familiarity with APIM and a previous walk through of the documented examples).

To add to my confusion one of the feedback comments for the APIM product appeared to suggest SOAP was not in fact YET supported: feedback-for-api-management-suggestions 

However logic suggests it should work, a SOAP envelope submission over HTTP is via a POST, there are really only three prerequisites 

1. Valid SOAP envelope XML which contains an XML body which the target Web service will be able to de-serialize. 
2. A matching SOAPAction request header
3. A Content-Type header of "text/xml"

One further consideration is how the consumer obtains the contract for the service. It's an implementation detail, but it's possible to define an http and/or https endpoint from which the consumer of the service can acquire the service definition in the form of a WSDL. From .net v4.5 onward a "singleWSDL" option is supported which I believe to be generally preferable and better for interop.

There were a couple of goals:

1. Firstly I wanted to look at options around surfacing some existing internally facing WCF implemented basic-http (SOAP) services for the future

2. I wanted policy and the end user experience to centre around both endpoint and operation.

In my test I published a simple test WCF service to a public facing server, I didn't add security to this service - wishing to keep it simple initially - but the service was restricted to TLS from the front end load balancer out. So any test would be over https.

The URL for my test back-end service was something like this

https://ourdomain.com/helloworld/service1.svc?singlewsdl

The service had two methods:

[ServiceContract]
public interface IService1
{
    [OperationContract]
    string HelloWorldOperation1(string value1, int value2, bool value3);
 
    [OperationContract]
    CompositeType HelloWorldOperation2(CompositeType composite);
}

(Composite type wrapped the same basic types passed in operation 1 as params)

After creating an API instance on Azure within our account I added some users, a product and a new API!

1. Publisher Portal, API Management, APIs: Settings

The settings of the API were very basic - I did try and import the WSDL definition from file, out of curiosity and after playing with the encoding to get that right (it didn't like the initial file we copied out of Visual Studio). However quickly got an error around the WSDL not being understood. 
It could have been me, so it might work, I didn't take more than a couple of minutes trying. Certainly at the time of writing WADL and SWAGGER were the two radio option buttons to describe the definition contents type.

Web API Name = "Hello World" 
Description =
Connection = Directly (in a real scenario I'd explore VPN between APIM and back-end, which is beyond this post)

Web Service URLhttps://ourdomain.com/helloworld/service1.svc
Web API URL Suffix = service1Endpoint
Https = [ticked]

This was the resulting Web API URL

https://[ourazuredomain]helloworld.azure-api.net/service1Endpoint

2. Publisher Portal, API Management, APIs: Operations

Here's the interesting bit, APIM really doesn't do much until you add at least one operation and doesn't seem to surface at all right now if you publish an API without any (which is logical).

This is the bit I got stuck with, what to put in the operation section for a SOAP endpoint method?

What I began by doing was adding a single POST method to cover all endpoint methods, thus:

Http Verb = "POST" (I want to post my soap envelope)
Url Template = "/"
Rewrite URL tempate = ""
Display Name = "All operations"

But, I want to be in a position to add a POST operation to the API for each endpoint method, and this endpoint encapsulates ALL the methods, it's the SOAPAction at the back-end service which determines which method gets invoked on the endpoint and then the request XML within the SOAP body must match!

So I went back to the Web Service URL from step 1 and changed it to this:

Web Service URL = https://ourdomain.com/helloworld

(note I've lopped off the "/service1.svc" endpoint part)

Then I went back and edited my API method to look like this:-

Http Verb = "POST" 
Url Template = "/operation1"
Rewrite URL tempate = "/service1.svc"
Display Name = "Operation 1"

The key point here is that we now specify a URL segment to represent the method but we ALWAYS rewrite it to the service endpoint.

The reason for doing this is so that I can specify distinct operations.

But now I've got a further problem - there's nothing that can enforce the fact that I need the right payload for this operation to go to the right method....Yet...

Next I added an additional operation in for the second web service method, identical to that above except for the URL Template ("/operation2") and the Display name "Operation 2".

Finally I added a third method, a GET this time, to allow my consumers a chance to see the WSDL

Http Verb = "GET" 
Url Template = "/GetWsdl"
Rewrite URL tempate = "/service1.svc?wsdl" 
Display Name = "Get WSDL"

(Note: My test server was net4.0 otherwise it'd have been ?singleWsdl)
The important point is that my GET to the exposed URL segment "/GetWsdl" will always return the wsdl for this endpoint as it's re-rewritten as "/service1.svc?wsdl". I.e. with the wsdl URL query param.

Finally, one more step before I publish; need a way to associate the SOAP header to the correct operation - and one more thing I need to ensure I'm sending and receiving with a content type of "text/xml". And not "text/plain".

3. Publisher Portal, API Management, Policies: Policy Scope

I added a policy using the wizard for each of the endpoint operations defined in the steps above, the XML ended up looking like this:

GetWsdl
  


<policies>
 <inbound>
  <set-header exists-action="override" name="content-type">
   <value>text/xml</value>
  </set-header>
  <base></base>
  <rewrite-uri template="/service1.svc?wsdl">
 </rewrite-uri></inbound>
 <outbound>
  <set-header exists-action="override" name="content-type">
   <value>text/xml</value>
  </set-header>
  <base></base>
 </outbound>
</policies>





Operation1
  


<policies>
 <inbound>
  <set-header exists-action="override" name="content-type">
   <value>text/xml</value>
  </set-header>
  <set-header exists-action="override" name="SOAPAction">
   <value>http://tempuri.org/IService1/HelloWorldOperation1</value>
  </set-header>
  <base></base>
  <rewrite-uri template="/service1.svc">
 </rewrite-uri></inbound>
 <outbound>
  <set-header exists-action="override" name="content-type">
   <value>text/xml</value>
  </set-header>
  <base></base>
 </outbound>
</policies>




Operation2
  

<policies>
 <inbound>
  <base></base>
  <set-header exists-action="override" name="content-type">
   <value>text/xml</value>
  </set-header>
  <set-header exists-action="override" name="SOAPAction">
   <value>http://tempuri.org/IService1/HelloWorldOperation2</value>
  </set-header>
  <rewrite-uri template="/service1.svc">
 </rewrite-uri></inbound>
 <outbound>
  <set-header exists-action="override" name="content-type">
   <value>text/xml</value>
  </set-header>
  <base></base>
 </outbound>
</policies>




Given the fact the policies are hierarchical I could have optimized this by moving the content type header up a level.

The next most important part is the SOAPAction for the POST operations.

This ensures the right action goes with the right operation. There's nothing stopping the end user from sending a different action and request content but note the policy is set to OVERRIDE whatever the users sending which means they'd get an error if they sent the wrong payload to the wrong URL.

Clearly there's one quite big problem with all of this: When the user imports the WSDL into a .net application it's going to want to do so under a single endpoint - whereas APIM will want to a distinct URL per method. So there's a mismatch.

One approach could be to define the operations with the differentiator as a query string argument (of the SOAPAction name) instead of the URL segment, here's how operation1 might look like

Http Verb = "POST" 
Url Template = "/?op=HelloWorldOperation1"
Rewrite URL tempate = "/service1.svc"
Display Name = "Operation 1"


Later on that would be called, along with the subscription key instead of:

https://[ourazuresomain]helloworld.azure-api.net/service1Endpoint/operation1/?subscription-key=[ourkey]

It would be


https://[ourazuredomain]helloworld.azure-api.net/service1Endpoint/?op=HelloWorldOperation1&subscription-key=[ourkey]


Now the caller only has to manage the one URL in their web.config (let's assume they're using .net for now).

However this raises yet another point, which is how does the customer ensure that the subscription key gets added in the first place so the APIM façade allows the call through... 

We could ditch the subscription key, it's optional within APIM, but it's one of the things I'd want to use.

As it turns out, the subscription key doesn't have to be passed as a query string param, it can instead be passed via a request header.
The subscription key request header is "ocp-apim-subscription-key".

This might make more sense for a consuming customer, who could configure a set outbound header against their caller/client app to be attached for outgoing requests to our API.

In Summary
The exercise has proved we can relatively easily façade a wcf http SOAP service through APIM and go onto craft a neat developer portal experience, but, it doesn't feel quite right yet. The APIM is currently REST centric and requires a unique URL per operation. This is at odds with SOAP which is a single URL and utilises a SOAP Action to route to the correct operation for the contract the endpoint URL represents. 
The steps detailed above put the onus on the APIM façade consumer to modify the outbound call from client to service [façade] and that doesn't feel right. Also posted a comment here: feedback-azure-api-management-suggestions-soap-support