none
Continous webjob keeps running at 5 min intervals RRS feed

  • Question

  • Hi

    I am running a continous webjob service used by a web application to do some background work. All new jobs are placed on a queue in an blob storage account which is polled by the webjob.

    It's been working fine until a couple of days ago, when it suddenly started re-running old jobs. The job failed or did not finish, that's fine I'm in mid development so it is to be expected. But the thing is the webjob keeps trying to run the failed job again and again. Not the instant retry it runs 5 times but for several days now every 5 minutes it starts a rerun. (Actually I think it's 10 minutes but two different failed jobs)

    The queue feeding the webjob is empty, the poison-queue keeps filling up though with every failed attempt.

    To stop the madness I've tried republish the application and webjob from Visual Studio, I've stopped and restarted the webjob and application in Azure and I've tried to delete the webjob and republish it. No luck.

    Any thoughts on what could be causing this and/or how I can stop it?

    This is what the webjob log looks like:

    (I tried to include a image of the log, but wasn't allowed to. It's clear that the webjob run every 5 minutes and fails or never finish)

    Wednesday, January 25, 2017 2:37 PM

Answers

  • So you have a queue triggered function with [QueueTrigger] attribute applied to the trigger parameter? Can you share some of your function code?

    The queue trigger works by polling the queue, and when if finds a message it invokes your function. Before it dispatches to your function it sets a "visibility timeout" on the message for a default period of 10 minutes. That makes the message invisible in the queue to any other consumers while the invocation is working on it. If your function completes successfully the message is removed from the queue. If your function fails, it will be released back to the queue for immediate processing. However, if something brings the JobHost down before it can run this logic (e.g. the host crashes, or during your development you kill the host), the message will appear back in the queue again (because it hasn't been successfully processed) after the aforementioned 10 minute visibility timeout expires. You can see the code for all this here.

    It sounds like something like this might be happening to you. Is your host crashing, or are you otherwise killing it in your development while it is processing? The 10 minute magic number you mentioned initially seems to indicate this.


    Mathew Charles [MSFT]


    Friday, January 27, 2017 3:34 AM

All replies

  • When you say "queue in a blob storage account" are you saying that your job is a QueueTrigger or BlobTrigger function?

    Mathew Charles [MSFT]

    Wednesday, January 25, 2017 8:06 PM
  • As this is a bit new to me I'm not entirely sure... But I guess it's QueueTriggered, the webjob is polling the queue, a queue on a Azure Storage Account, and when ever it finds a new message on the queue it does it's thing.
    Thursday, January 26, 2017 3:52 PM
  • So you have a queue triggered function with [QueueTrigger] attribute applied to the trigger parameter? Can you share some of your function code?

    The queue trigger works by polling the queue, and when if finds a message it invokes your function. Before it dispatches to your function it sets a "visibility timeout" on the message for a default period of 10 minutes. That makes the message invisible in the queue to any other consumers while the invocation is working on it. If your function completes successfully the message is removed from the queue. If your function fails, it will be released back to the queue for immediate processing. However, if something brings the JobHost down before it can run this logic (e.g. the host crashes, or during your development you kill the host), the message will appear back in the queue again (because it hasn't been successfully processed) after the aforementioned 10 minute visibility timeout expires. You can see the code for all this here.

    It sounds like something like this might be happening to you. Is your host crashing, or are you otherwise killing it in your development while it is processing? The 10 minute magic number you mentioned initially seems to indicate this.


    Mathew Charles [MSFT]


    Friday, January 27, 2017 3:34 AM
  • Yes, that sounds exactly like my problem, spot on actually.

    I was not aware that the message are put back on the queue (made visible) if the function fails. This is not really how I want it to work in this instance, I guess I could find a way to delete the message when ever something goes wrong. Or handle it in some other manner.

    The function fails in some cases during testing. It's supposed to process user supplied excel files, sometimes these files are missing vital information or the format is off. These cases will be handled, but it was strange and a bit frustrating when the message seemingly spontaneously kept popping back up on the queue. Now i know why.

    Thank you very much sir.

    Friday, January 27, 2017 7:26 AM
  • Generally most people want to ensure all messages are successfully processed at least once - that's the reason it works the way it does.

    There are options for you if you want to change this behavior. E.g. in your function you can identify your error cases and swallow errors in certain cases (i.e. return successfully from the function) if you want the message to be deleted.


    Mathew Charles [MSFT]

    Saturday, January 28, 2017 12:26 AM