locked
SQL Agent jobs hanging RRS feed

  • Question

  • Periodically I have different SQL jobs (which run a DTS packages) that are scheduled to run daily and one or two will hang.  Notifications are setup to send out email to certain personnel as well as write to  the event log it the job fails. 

     

    The problem is, is that, because it hangs no notification is sent out becuase it hasn't failed and sometimes takes days before anyone notices that the job is no longer running.  Once it is discovered, I stop the job and then it starts to run on its daily schedule without problems. 

     

    How do I set up a job notification for a hung job?  Or better yet, is there a way to kill the job after it runs for a set period of tile like I can do with a sheduled task?

     

    Any help will be appreciated.

     

     

    • Moved by Lukasz Pawlowski -- MSMicrosoft employee Friday, February 19, 2010 5:07 PM SQL Agent and SSIS are not part of Notification Services, moving where an expert might be able to help. (From:SQL Server Notification Services)
    Wednesday, May 9, 2007 4:27 PM

Answers

  • We resolved it. First the background. The client had not provided us the administrator id to the server. We had a seperate local id which was given admin rights. Steps that we took to resolve it: 1. Rebooted the server (worked for a few days, but then back to same issue) 2. Ensured enough disk space on the local drives 3. Shrinked the mdf and ldf files of tempdb and msdb 4. Increased the maxsize of msdb to unlimited from a previous set value 5. Set the SQL Server Agent account to run as this user id that we had insted of the local system account 6. Gave complete permissions of all groups to our particular windows user id (not the best approach)
    Monday, January 31, 2011 5:26 PM

All replies

  • I'm having the same issue in SQL Server 2005 with jobs that execute SSIS packages.  The jobs run fine for a week or so, then I'll come to find that four or five (of the ten or so jobs) are hung in "executing" status.  They seem to hang indefinitely (as some have been "executing" for hours with no end.  The schedules of the hung jobs are all different, varying from every 10 minutes to nightly.  The packages perform completely diffent tasks, as well.  I can't seem to find any common thread with the jobs that get hung, other than they are all executing SSIS packages.

     

    I've tried manually stopping the jobs and restarting the agent,and SQL Server but the jobs hang again.  The only thing that fixes the issue is rebooting the box, and then the jobs hang again in a week or so.  Could some sort of memory leak be consuming resources throughout the week and be causing the jobs to eventually hang?  I just rebooted the box and the sqlagent90.exe process is currently using about 7mb of memory.  I'll keep an eye on it.  Any other suggestions?

     

    I've thought of creating another job that stops jobs that are hung, but what's to say that this job won't get hung as well?  Plus this seems like a band-aid fix...

     

    I don't recall having these problems until installing SQL Server 2005 SP2.  Could this be related?  I've searched like crazy and still can't find a resolution to this.  It's becoming a big PITA...

     

    Anyway, any suggestions would be very much appreciated!

    Tuesday, June 26, 2007 5:46 AM
  • We just have had the same thing happen to us with SSIS packages we migrated from DTS. I found a couple of old posts on other sites about this but never an answer. If anyone finds out anything please post it. We are still trying to figure it out. We are running the package from a workstation that is copying data from a 2000 database to a 2005 database. We think it may have to do with requiring transactions and it seems to get hung when trying to commit data. We are the only ones logged in to this test database so there are no other users using the table we are copying into.
    Wednesday, June 27, 2007 9:51 PM
  • We have a smilar issue. Once in a while all the jobs that have started after a particular time hangs indefinitely. We are resolving this by restarting SQL Server Agent. So, instead of having to reboot the server try restarting the agent and see if it fixes it.

     

    We are on SQL Server 2005 SP1. What version & SP are others on? I am hoping eventually when we apply SP2 this would get fixed.

     

    Thanks

    Raj

    Tuesday, July 10, 2007 1:56 PM
  • Ditto,

     

    We are having this issue as well since installing SP2.  Just seems to go away we have many many jobs.  Could this be Linked Server related?

    Thursday, July 12, 2007 5:16 PM
  • I can't remember having this issue prior to installing SP2... No linked servers, either.

     

    Restarting the Agent seems to correct the issue, but only for a short time.  For now I am going to try restarting the Agent each week to see if this helps while I try and find/fix the real issue.  If this doesn't help, I will have to resort to Microsoft paid support.

    Wednesday, July 25, 2007 9:51 PM
  • This is getting ridiculous.

     

    This morning I found that four jobs were hung up in executing status since last night, preventing them from executing per their schedules.  I checked the Application, Security, and System event logs, as well as those of the SQL Server and SQL Agent.  Nothing remarkable.  I see the entries where the jobs begin executing, but nothing else - no errors/warnings pertaining to the jobs in question.

     

    I stopped the four jobs and after about an hour or so, one was hung again (this job executes every 10 minutes).

     

    I read somewhere that the growth of the tempdb could be a source of concern, so I bumped up the tempdb's data and log file sizes to 500 Mb (previiously 25 Mb) w/ 20% autogrowth (previously 10%).  This was done last week, so I'm guessing that this did nothing to alleviate the problem. There is about 2 Gb of free space on the drive that houses the tempdb.  What else can I do to troubleshoot this?

    Tuesday, August 7, 2007 6:03 PM
  • I'm having this issue and have not updated to SP2 yet.  Currently on SP1

    Tuesday, August 14, 2007 5:10 PM
  • How is disk space on the drive where tempdb and msdb reside?  I've increased free space from 2GB to about 5GB and have not had a problem since. *fingers crossed*

     

    My guess is that the tempdb would consume the free space and the agent could no longer write to the log, causing the jobs to hang...we'll see if my assumption holds true.

     

    Rocco

    Tuesday, August 14, 2007 7:49 PM
  • Do you have this issue any more? we are struggling with this issue and wnat to seeif increasing the free space really helped?

     

    Thanks

    Anil

     

    Wednesday, August 29, 2007 2:57 PM
  • I will report the following in hopes that it will help others with similar issues.

     

    We have some legacy jobs that run as scheduled SQL Server jobs (some daily, some monthly, etc.) executing DTS's which are typically composed of .BAT files that run various legacy programs.  Many of these jobs / packages are not even accessing SQL DB's; they are processing text files or MF COBOL indexed files, etc.

     

    As some of these files are on other servers (some being remote from the SQL server running the job), the batch files have "NET USE" commands in them that either DELETE (clear) any existing mapping / connection or establish a new mapping / connection in order to run a program.

     

    These SQL jobs began to mysteriously hang for no apparent reasonThey would hang when run as a SQL Server Agent "Job" (regardless of whether run on the automatic schedule or whether run by manual initiation).  Once this problem started, they would hang every time.  But, they would NOT hang if manually run as a DTS Local Package at the DTS Local Package level.

     

    It appears that we may have solved the problem by inserting "wait" time delay commands at various points in the batch files - either preceeding, in between, or following the "NET USE" commands and the programs that preceeded or followed them - such that there is a bit of extra time delay between establishing a connection with a NET USE command and the program which follows it (and which will try to access the [remote] file), and/or also between the program that creates and then closes/releases a file and a NET USE that then may try to DELETE the connection.

     

    Here is the "wait" command / method we used:

     

    PING 1.1.1.1 -n 1 -w 60000

     

    And here is the explanation and credit:

     

    http://www.robvanderwoude.com/wait.html

     

    Please post a reply if this helps you and solves your problem. 

    Friday, September 21, 2007 5:24 PM
  • Follow up to above:

     

    For some reason (change to network, routers, etc.), it is now immediately rejecting the PING 1.1.1.1, after working fine for several months.  I changed it to PING 1.0.0.1 and it is working again (performing a "wait").

     

    LegacySupport

     

    Thursday, December 20, 2007 6:27 PM
  • Well, for notifications I can only think of yet another job checking sysjobhistory.

    However on a different tack a job hanging is often related to an operating system command meeting a situation which requires a user response. File overwrite warnings are a good example. Some minor changes here can remove the root source tfo the problems.

    HaggardPete





    Tuesday, January 8, 2008 2:55 PM
  • Has anyone found the answer yet. this is driving me crazy.
    Wednesday, February 10, 2010 4:31 PM
  • Any additional information would be helpful to me, as well!

    Thursday, February 25, 2010 7:29 PM
  • I met similar issues during past week. I suggest you to find detail log in windows application events. If you call cmd.exe using ssis code, it might cause error.
    Monday, March 15, 2010 4:05 AM
  • SQL Server Agent is running with LocalSystem Account? I had the same issue and changing the user of the Agent service solved the issue, now I'm trying to get the cause of why it fails when the Agent is started with LocalSystem account.

    Any suggestions?

    Tuesday, May 18, 2010 2:45 PM
  • True enough HaggardPete that a message box or other pop-up error window displayed when a DTS is run from the scheduler will hang a job.  However with that ruled out, at least from the perspective of running the jobs manually, jobs are still hanging. 

    The other thread I found interesting is from contributor Rocco in regards to space issues in the MSDB.  I shrunk the log file, increased the size by 20% and changed the way it allocates more space.  Jobs began to run again. 

    Initially, jobs hanging was more of an inconvience that seemed to self correct when the job was cancelled manually.  However, for the past several weeks it has been a consistent problem that requires immediate attention by Microsoft. 

    Upon further reflection, my guess is most jobs hang because of pop-up errors and how jobs handle msdb space issues. Since some Pop-up errors can occur for various reasons that cannot be recreated when running the job manually, it would be goodnees if Microsoft would provide the ability to redirect Pop-up informaiton to a log file when jobs are run through the scheduler since there is no way for the scheduled job to respond.  If problem with some jobs hanging is related to deadlocks or msdb space issues, well that too needs to be handled.

     

     

    Wednesday, June 9, 2010 1:00 PM
  • We resolved it. First the background. The client had not provided us the administrator id to the server. We had a seperate local id which was given admin rights. Steps that we took to resolve it: 1. Rebooted the server (worked for a few days, but then back to same issue) 2. Ensured enough disk space on the local drives 3. Shrinked the mdf and ldf files of tempdb and msdb 4. Increased the maxsize of msdb to unlimited from a previous set value 5. Set the SQL Server Agent account to run as this user id that we had insted of the local system account 6. Gave complete permissions of all groups to our particular windows user id (not the best approach)
    Monday, January 31, 2011 5:26 PM
  • I don't think the marked answered here is the answer to our problems...

    I am having this problem with our sql server 2005 sp3 as well. the SQL Server Agent is running the jobs in executing but if you check the history of the jobs like we have a scheduled job to run daily and if today is 10/4/2012 the last history is 9/26/2012. So the problem is why does the SQL Server agent just suddenly hangs?

    Thursday, October 4, 2012 1:06 AM
  • Use the "call" command and then for the wait time use the ping command just like Legacysupport mentionned above.

    so for your program to work, you would do:

    step n:

    "call" "programname.exe" /anyfollowingcommand

    ping 1.0.0.1 -w 900 (for 15 minutes for EX:)

    when using "call" the program is called upon and next step in the job is ran withoutt wainting for it to return confirmation or error. by knowing the estimative time that the program would run, you can set the ping request after the w.

    What I noticed is that the program in question still runs in the background, so you might want to add a line to kill the service of the program before going any further.

    Hope this helps.

    • Proposed as answer by Block800 Tuesday, December 18, 2012 4:25 PM
    Tuesday, December 18, 2012 4:24 PM
  • It still hung in cmdexec I used the Powershel instead, syntax goes like this:

    & 'c:\programname.exe' /function

    No more hanging.

    Tuesday, December 18, 2012 5:18 PM
  • I'm having this exact same issue.  Is this still an issue in SQL2008? 

    I'm not sure how to fix this, the job I'm running executes an SSIS package which in turn executes many other SSIS packages within that.  I'm not a DBA so I don't know much about powershell or how to go about running the job through that instead of the server jobs.

    Any help would be greatly appreciated.

    Friday, March 15, 2013 5:26 PM