Friday, 27 February 2015

Multi threaded windows service

We recently had a requirement to host an internal windows service which would continuously process small, but relatively long (for example 15-90 seconds) running packets of work obtained from a queue.

 We also wanted a degree of parallelism for efficiency, the service being hosted on a relatively powerful multi-core windows server.

 Each packet of work would represent a business critical process - so we were keen to ensure that the design of the service was robust and that stopping the service for any reason would not cause worker threads to be terminated abruptly and in addition there was sufficient scope for error handling and compensation within the service.

 We did not have a requirement to cluster an individual service instance. It quickly became apparent that we needed to carefully consider how the windows Service-Control-Manager interacted with our service.

 What started off as a relatively trivial bit of code ended up being slightly more complex. I couldn't find very many good examples, so here's an idea of how the orchestrating service start and stop logic could be implemented, should it be useful.

  Note: This is an example and not production quality code. All comments welcomed.

 1. Starting
  • The first goal with this example is to ensure that the service OnStart() method clears quickly (it isn't blocked) 
  • That we start a "foreground thread" to manage the background threads. Foreground threads are less likely to be abruptly terminated by windows.
In our service example we can start by defining some low level constructs to help us manage the threads and provide a wait event to trigger stopping (this could equally have been implemented as a TPL cancellation token) This code goes at the root of the service class inheriting from ServiceBase

  private const int ThreadCount = 3;
  private readonly CountdownEvent _threadCounter = new CountdownEvent(1);
  private readonly ManualResetEvent _stopWork = new ManualResetEvent(false);

Next we define the overridden OnStart method implementation
  protected override void OnStart(string[] args)
  {
   Debug.WriteLine("OnStart() - Max background threads allowed ={0}" , ThreadCount);

   var thread = new Thread(OrchestrateBackgroundTasks) { Name = "Foreground", IsBackground = false };
   var pool = new Semaphore(ThreadCount, ThreadCount + 1);
            thread.Start(new WaitHandle[] { pool, _stopWork }); // pass argument object here

   Debug.WriteLine("OnStart() - complete");
  }

Note the method doesn't get blocked, the thread is defined as foreground (it's not background!) and set to work. The thread which entered the OnStart() method then exits.

 2. Stopping.
  • The second goal is to ensure that the OnStop() method has a way of signalling the orchestrating [foreground] thread that it must wind up without starting any more workers
  • The OnStop() method should also wait for any currently running background threads to complete gracefully.
  • It should also keep SCM (Service Control Manager) informed periodically
  • It shouldn't run forever, there should be a time out
  protected override void OnStop()
  {
   Debug.WriteLine("OnStop()");
   // signal stop
   _stopWork.Set();
   // wait on any still active threads, comment this out if waiting on the pool 
   // in OrchestrateBackgroundTasks() 
   // however doing that means that SCM could try and teardown the service instance
   // before it completes as it won't receive regular feedback
   WaitOnRunningTasksToComplete();
   Debug.WriteLine("OnStop() - complete");
  }


The OnStop Implementation signals the stop event then blocks waiting on running tasks to complete. Only when all background threads have completed gracefully or the time out reached will this method complete.

The OrchestrateBackgroundTasks method, which is run on our foreground thread, acquires two objects.

1. The "pool" represented here by a Semaphore
2. A reference to the stop event

As each task completes (but only to the maximum number of tasks allowed by the pool) a new task is scheduled. The task continuation is used to handle / log worker errors and signal that a new task can be enqueued.

Once the stop event is set/triggered the foreground thread and this procedure will exit.

  private void OrchestrateBackgroundTasks(object args)
  {
   // TODO: type and null checks required
   var handles = args as WaitHandle[];

   Debug.WriteLine("DoWork()");

   // wait on pool, or stop
   while (WaitHandle.WaitAny(handles) == 0)
   {
    // maintain pool
    var task = Task.Factory.StartNew(LongRunningBackgroundTask, _threadCounter.CurrentCount);

    //Increment the worked thread counter
    _threadCounter.AddCount();

    // once the background task is complete, handle any error and
    // signal the pool so a new worker task can be added (above)
    task.ContinueWith(t =>
    {
     // this code only runs when the background task completes
     if (t.IsFaulted)
     {
      // faulted 
      // log exception etc
     }

     Debug.WriteLine(string.Format("DoWork() - A background Task completed"));
     // release from pool and signal complete
     (handles[0] as Semaphore).Release();
     _threadCounter.Signal();
     
    });
   }

   // Indicate that the main thread is exiting.
   Debug.WriteLine("DoWork() signals main thread complete");
   _threadCounter.Signal();

   // Note: 
   // Could wait *here once stop has been signalled
   // rather than calling WaitOnRunningTasksToComplete() from OnStop()
   // ...by adding this line
   //_threadCounter.Wait();
   // However, within WaitOnRunningTasksToComplete() we can add code to
   // report back to the Service Control Manager (SCM) which in long
   // running scenarios helps windows understand the service is waiting and hasn't died.

   Debug.WriteLine("DoWork() - complete");
  }



The WaitOnRunningTasksToComplete method simply waits on the background thread counter or until time out, whichever happens sooner. Periodically it will signal back to the SCM via a ServiceBase method!
This helps with the end user experience.

Note: Thread sleep is used here, because we want the thread that entered OnStop() to be blocked.

  private void WaitOnRunningTasksToComplete()
  {
   Debug.WriteLine("Cleanup()");
   const int timeout = 5000; // 5 secs
   const int spinTotal = 5;
   var spinCounter = 0;

   while (_threadCounter.CurrentCount > 0 && spinCounter < spinTotal)
   {
    Debug.WriteLine("Cleanup() - blocking {0}ms", timeout);
    Thread.Sleep(timeout);// block thread
    if (_threadCounter.CurrentCount > 0)
    {
     Debug.WriteLine("Cleanup() - Work outstanding, ask SCM for {0}ms additional time", timeout);
     base.RequestAdditionalTime(timeout);
    }
    spinCounter++;
   }
   if (_threadCounter.CurrentCount > 0)
   {
    Debug.WriteLine("Cleanup() - complete, work outstanding");
   }
   else
   {
    Debug.WriteLine("Cleanup() - complete");
   }
  }

Might be a nicer way to implement this using TPL and certainly it could be engineered to be more reusable. TPL wraps a lot of the lower level constructs, but does have the advantage of being easier to read / maintain.


Here's the debug output for a Start, followed by a Stop...

OnStart() - Max background threads allowed =3
OnStart() - complete
DoWork()
LongRunningTask() - New #1
LongRunningTask() - New #2
LongRunningTask() - New #3
LongRunningTask() - #2 complete
LongRunningTask() - #1 complete
DoWork() - A background Task completed
LongRunningTask() - New #4
LongRunningTask() - #3 complete
DoWork() - A background Task completed
LongRunningTask() - New #3
DoWork() - A background Task completed
LongRunningTask() - New #4
OnStop()
DoWork() signals main thread complete
DoWork() - complete
Cleanup()
Cleanup() - blocking 5000ms
Cleanup() - Work outstanding, ask SCM for 5000ms additional time
Cleanup() - blocking 5000ms
LongRunningTask() - #4 complete
LongRunningTask() - #3 complete
DoWork() - A background Task completed
LongRunningTask() - #4 complete
DoWork() - A background Task completed
DoWork() - A background Task completed
Cleanup() - complete
OnStop() - complete 


4 comments:

Hostilio X. Macias said...

I know this is old but do you have an example of LongRunningBackgroundTask?

luckys said...

english to malayalam typing

Unknown said...

I second the request to provide an example of the LongRunningBackgroundTask in the presentation. It would make the whole sample complete and easy to play with. This solution still viable for those who need setting up a service and don't want it torn down while background tasks finishing. Thanks, Sid

Anna Buckley said...

Really appreciate this wonderful post that you have provided for us. Great site and a great topic as well I really get amazed to read this. Its really good. Random Number Picker This profile can help you choose any random number from a set of numbers. In Wheels, you can see the default options for rewards. Either you can use the same default options, or you can change them.