Scheduling Job in a Web Farm RRS feed

  • Question

  • Hello,
    I have a farm of .NET servers. The application also needs scheduled
    jobs - like sending reports / notifications periodically.
    The question is - what is the best way to schedule this job?
    I want the same set of software to run on all machines - so that if in
    case one machine goes down, the other is able to run the scheduled job.
    The options I am considering:

    1) making them SQL Server jobs ( which are failed over in a cluster),
    but these jobs need resources available on the .NET servers, and I
    donot think that opening a backward connectivity from DB to App would
    be the best way.

    2) To have some kind of a lock in the database, and schedule the job on
    all the machines in windows scheduler. The one which wakes up first,
    gets the lock and executes the job. The others just exit.

    3) Same as 2, but windows service instead of scheduled job. In this
    case - I am unable to find a good way of allowing users to schedule the
    job. Is there a library which can read a unix style cron job setting
    and run the programs accordingly?

    Is there a better way of doing this ? Are there any clustered scheduler
    available for windows?



    Saturday, July 15, 2006 7:54 PM


All replies

  • You may want to look at Grid/utility computing

    e.g.  or



    Sunday, July 16, 2006 6:42 PM
  • I checked the same with Java architects. They typically use one of the Java schedulers ( Flux or Quartz). The advantage being that you need to change the schedule setting only once and it takes effect on all machines. ( Otherwise it requires modifications to be made on all machines).


    Monday, July 17, 2006 2:39 PM
  • I know of flux (it is also used for workflow and BPM). since it also supports webservices you can probably use it with .NET as well

    The reason I mentioned grid computing was you also wanted " want the same set of software to run on all machines - so that if in case one machine goes down, the other is able to run the scheduled job. "



    Tuesday, July 18, 2006 7:53 AM
  • Hi Arnon,

    The key requirement here is not distribution, The key requirement is to have its a clean architecture for Windows applications running on Web farms - where we expect everything to be clustered.

    Grid computing might be an overkill - but is an interesting area to explore. I have now learned that If I have a Cluster OS ( either windows or Vertitas doing it), then "failed over" scheduled jobs are available.

    I am yet to find a .NET scheduler which is cluster aware. Is someone in the community aware of any?

    Wednesday, July 19, 2006 3:44 PM
  • You can take any service and run it in a cluster using "generic Service" to make it fail-over when the cluster fail.

    you can try any  .Net scheduler (e.g. and see how it behaves when run as such (generic service)

    Again (last time I promise :) ) note that grids provide a more comprohensive solution - see for example  the following blurb on the Microsoft "compute cluster" scheduler:


    Windows Compute Cluster Server 2003 includes both a command-line job scheduler and the Compute Cluster Manager that let users schedule jobs, allocate resources needed for the job, and change the tasks and properties associated with the job.

    The CLI supports a variety of languages, including Perl, Fortran, C/C++, C#, and Java. Jobs can be single task or multiple tasks and can specify the number of processors required for the job and whether those processors are needed exclusively or can be shared with other jobs/tasks.

    The important distinguishing features of the scheduler include:

    ·         Error Recovery. This feature provides automatic retry of failed tasks and jobs and automatic routing around unresponsive nodes. Automatic detection of nodes that become responsive is also provided.

    ·         Automated Cleanup. Each process associated with a job or task is tracked and proactively shut down on all compute nodes at the conclusion of the job or task, preventing “run away” processes on the compute nodes.

    ·         Security. Each job or task runs in the context of the submitting user and maintains security throughout the process.




    Wednesday, July 19, 2006 6:52 PM
  • Hi Panshu,


    I’m going to rephrase your problem just to make sure that I understand it and to highlight the points that I think are relevant.


    What you need and what you have:


    1.      You need one instance of a job to execute at specific times.

    2.      The job MUST execute.  

    3.      You have multiple machines (compute resources) available to execute the job with the job software installed on each one.

    4.      You only want one instance of the job to run at the scheduled time.




    You have scheduled software installed on each compute node setup to run this one job.


    1.      You are trying to make sure that only one compute node really gets to run the job. 

    2.      You have no way to guarantee that the job gets executed.


    Panshu, I see your problem as a bottom up solution.  You are coming at this problem from the point of view of trying to control and coordinate the compute nodes.  I actually think this architecture will be hard to maintain and control.  If you flip the architecture upside down and come at it with a top down solution I think the problem will be simplified.


    Let’s say you solved this problem with the Digipede Network which provides a top down solution.  You would have a Window Scheduler running somewhere; this would act as a job submission client.  At the specified time the Window Scheduler submits a job to the Digipede Server.  Each compute node on your grid is running a Digipede Agent that periodically checks in with the Digipede Server to see if there is any work to be done.  The Digipede Agent running on the compute node decides if it can do the work.  This means that if you lose a machine on your grid, that another machine will pick up the work. 


    Let’s just say that the job was started on one machine and for some reason that machine fails.  Because the Digipede Server acts like a traffic cop, it will notice that that job is taking longer than expected and will put the job back into the queue for another compute node to pick up.  Thus your job gets executed no matter when a compute node fails.


    If you don’t want to have to worry about the plumbing, or trying to figure out how to guarantee the completion of the job then a grid solution very well may not be over kill.  It may in fact be the cleanest and most reliable solution.


    I am a Digipede employee and I’ve worked with the system a lot.  From the information you provided I think that the Digipede Network actually provides a cleaner and more maintainable solution.  There is no cost to try it out and it should be very straightforward for you to test this scenario.  It shouldn’t take you more than an hour.  Just go to and request the Developer Edition, it’s free.  I also recommend reading our white paper on CCS and Digipede together.


    When I first read this post I didn’t think of it as a grid problem because I thought that you wanted to schedule work to run on EACH compute node.  That type of problem is better served by a batch scheduler of which I don’t know of any to recommend.  Thanks Arnon for suggesting a grid because that caused me to look at this in more depth.


    Kim Greenlee

    Wednesday, July 19, 2006 7:53 PM
  • Thanks Arnon for persisting with the Grid computing options. I am trying to get used to the idea that it is for real. Thanks Kim for reconfirming the faith.

    I looked at digipede are overview and it seeme to be providing real and simple solutions.

    I used to think that grid computing would involve distributing classes on the fly and would involve a complex runtime, eliminating the need for installutils etc. I had participated in an experimental project some 8 years back, but the restrictions posed on the client object were so many that it was good only for "self contained" classes, without any external dependancy - like interfaces, databases etc. However that would be a topic for another post.

    Wednesday, July 19, 2006 8:49 PM