locked
BizTalk stops processing messages RRS feed

  • Question

  • About 100 000 messages have been loaded for processing in a BizTalk server. It has processed over 80 000 in 2 days but then stopped doing anything. Performance counters show no throttling is happening, no errors or warnings in logs. Restarting hosts doesn't help.

    Messages are stuck in "queued" state and there are few orchestrations in active state (but their messages are still "queued"). If I suspend and then resume one of this orchestration instances - BizTalk resumes processing of all messages for a while. Few thousands messages later it stucks again.

    I don't get what can it be?

    Monday, February 6, 2012 1:50 PM

Answers

All replies

  • This sounds awfully like throttling.  Are you checking all throttling counters on all hosts?

    What does your OS for your app servers and database servers say for resources?  Is there any memory or CPU pressure?


    If this is helpful or answers your question - please mark accordingly.
    Because I get points for it which gives my life purpose (also, it helps other people find answers quickly)
    Monday, February 6, 2012 2:05 PM
  • To start with, can you please share the performance counter stats of folllowing counters (as all other counters value shows in healthy state):

    BizTalk: Message Agent/Process Memory Usage and Process Memory Usage Threshold

    • Few things u can do, check how many connections are there on SQL Server by running sp_who command in SQL Server query analyzer. Also check from the result is there any spid blocked (You can do this by checking any any non-zero value present in blk column in the result pane). If yes try to find out which spid is blocking and run kill command as kill <<SPID VALUE>>  (please take extra care or check what this spid does before running kill command in production)
    • Another thing you can do is check by restarting WMI services from the server where orchestration host and send port (send handler host) run
    • And check there are sufficient disk space available on your BizTalk SQL Server

    Please mark the post answered your question as answer, and mark other helpful posts as helpful, it'll help other users who are visiting your thread for the similar problem,
    • Edited by Avi08 Monday, February 6, 2012 2:26 PM
    Monday, February 6, 2012 2:12 PM
  • "Message delivery throttling state" and "Message publishing  throttling state" counters are 0 for all hosts.

    BizTalk server has 16 GB of RAM. BizTalk is 32 bit only, though, but it uses up to 200 MB for process, so it should be ok. Average CPU load is about 40%.

    SQL server has 16 GB also and only 3 are used. Its CPU load is 100% now, while BizTalk is working..



    • Edited by Mutari Monday, February 6, 2012 2:19 PM
    Monday, February 6, 2012 2:18 PM
  • Process Memory Usage Threshold is varying from 500 to 1000.  Process Memory Usage - from 73 to 422. 

    There are 120 connections, but no blocking. But it's still working now, guess I should look for blocks again when it stuck.

    There are a lot of free space on servers, near 20 GBs on each. 

    I'm not sure I understand your point about WMI services, how can they affect BizTalk processing? Also it seems that ports are not involved here: there are few hosts for different tasks and stucked instances belong to "middle" hosts.

    Monday, February 6, 2012 2:34 PM
  • Hi Mutari,

    Process Memory Usage Threshold should not fluctuate. It should be constant value but anyway actual usage looks within limit so that won't impact the processing. Keep a watch on SQL blocking, this may cause this kind of weird issues as i have encounter it previously.

    On WMI, can you briefly explain what activities orchestration does?


    Please mark the post answered your question as answer, and mark other helpful posts as helpful, it'll help other users who are visiting your thread for the similar problem,
    • Edited by Avi08 Monday, February 6, 2012 3:09 PM
    Monday, February 6, 2012 2:42 PM
  • One orchestration is sequential convoy: it receives messages based on InternalID custom context property. Through a .NET component it makes some calls to a DB located on the same SQL server where BizTalk DBs are. Basically it do another log in one table and query\write message id in another. The component is a static class with every method like: lock(syncRoot) { return something from DB\ write something to DB }. After all that work is done it submits the message through a direct binding on req-resp port with correlation to the MessageBox to another orchestration.

    One of other orchestrations picks up the message by its type. There are calls to Axapta .NET Business Connector and WCF send port binded to CRM. After message is processed, a response message is sent with original message id to correlate back to the first orchestration.

    Monday, February 6, 2012 3:01 PM
  • I would suggest to keep a watch on SQL locking when this issue resurface and share your result
    Please mark the post answered your question as answer, and mark other helpful posts as helpful, it'll help other users who are visiting your thread for the similar problem,
    • Edited by Avi08 Monday, February 6, 2012 3:10 PM
    Monday, February 6, 2012 3:08 PM
  • It may be worth checking where things are seizing up.  Perhaps it's some blocking code (like the lock or such) which puts all the Orchestrations in a waiting or dead-lock state.

    Do you have any tracing through your .NET code?


    If this is helpful or answers your question - please mark accordingly.
    Because I get points for it which gives my life purpose (also, it helps other people find answers quickly)
    Monday, February 6, 2012 3:18 PM
  • This can happen if your biztalk cleanup jobs are not running and the message box DB is getting full. Please check that all SQL Jobs on your SQL server are running. specially the message box cleaner job.

    If Jobs are not started, start them and then restart the host instances of biztalk . You can also check the available size in message box DB by checking its properties. when you start SQL agent jobs you will notice that the available size increased.

    Hope that was helpfull.

     

    Mazin


    Regards, Mazin - MCTS BizTalk Server 2006
    Monday, February 6, 2012 3:36 PM
  • I would recommend you to set up PAL on both SQL servers and BTS Servers to read out results when you have the issue.

    Best regards

    Tord Glad Nordahl
    Bouvet ASA, Norway
    http://www.BizTalkAdmin.com | @tordeman

    Please indicate ”Mark as Answer” if this post has answered the question.

    Monday, February 6, 2012 10:22 PM
  • Hi,

    I agree with Tord for PAL, I would just suggest one more thing before running PAL, Run Best Practice Analyzer and MessageBoxViewer.

    Let us know the observations once you run these tools.


    Thanks With Regards,
    Shailesh Kawade
    MCTS BizTalk Server
    Please Mark This As Answer If This Helps You.
    http://shaileshbiztalk.blogspot.com/

    Tuesday, February 7, 2012 4:08 AM
  • As I can understand this situation is in production, so using PAL, MBV is problematic.

    I had such problems when there was big payload and several Instance Subscriptions (convoys). BizTalk works in these cases suboptimal. In several cases the stacked instances start working after several hours of idling.

    One of the tricks: setting up the Ordered Delivery on Send Ports was very helpful.

    Tuning up throttling did a very little help.


    Leonid Ganeline [BizTalk MVP] BizTalkien: Advanced Questions: have fun - test your knowledge

    Tuesday, February 7, 2012 9:03 PM
    Moderator
  • It stucked again on weekends.. Convoy orchestrations just don't want to publish messages.

    I used MBV tool and got 3 critical warnings: 

    DTA Tables

    DTA Orphaned Instances (Incompleted Instances in DTA but not in Msgbox)

    97594 (Large number can impact DTA Size and so perfs) - one possible cause is described in KB 978796 - Contact Microsoft CSS for more info !!

    BizTalk Jobs

    BizTalk Job 'Backup BizTalk Server (BizTalkMgmtDb)'

    Disabled + No History Found (Job to backup BizTalk DBs - SQL Tn log can increase very fast if not running frequently) ! !!

    Servers Versions

    Windows 2008 installed on VM-CRM-BIZTALK

    BizTalk Servers with version < 2009 are not supported on Windows 2008 !!

    I deleted orphaned instances - nothing happens. MsgBox DB size is 1.5 GB and logs are 1.5 GB too, there are no events about transaction log on SQL server. 

    About Windows version.. Can it cause such problems? I don't see why should it. I didn't mention before my version: BizTalk 2006 R2 with CU3.

    I ran sp_who - no locks.

    I'm going to run PAL now.


    • Edited by Mutari Monday, February 13, 2012 9:57 AM
    Monday, February 13, 2012 9:48 AM
  • Hi Mutari,

    Did you check the job "MessageBox_Message_Cleanup_BizTalkMsgBoxDb" ?
    did you find any errors? if Yes, can you please post the error.


    Raj, http://rajwebjunky.blogspot.com

    Monday, February 13, 2012 10:13 AM
  • Hi Buddu,

    According to Job Activity Monitor it runs constantly without errors (the job itself is disabled, I assume another one calls it).

    Monday, February 13, 2012 10:20 AM
  • which verstion of BizTalk and  OS?.

    I had a similar error with BizTalk 2006 version which I solved and bloged it here.


    Raj, http://rajwebjunky.blogspot.com

    Monday, February 13, 2012 10:25 AM
  • I'm looking at PAL report from SQL server now. Something strange I can't understand yet:

    Condition \LogicalDisk(*)\Free Megabytes Min Avg Max Hourly Trend Std Deviation 10% of Outliers Removed 20% of Outliers Removed 30% of Outliers Removed
    Less than 500 MB of free disk space VM-CRM-BIZTALK-/HarddiskVolume1 71 71 71 0 0 71 71 71
    OK VM-CRM-BIZTALK-/_Total 25,594 25,594 25,594 0 0 25,594 25,594 25,594
    OK VM-CRM-BIZTALK-/C: 25,523 25,523 25,523 0 0 25,523 25,523 25,523

    Monday, February 13, 2012 10:25 AM
  • Hi Mutari,

    It clearly says that on your VM the hard disk contains less than 500 MB.

    Please ensure that you have plenty of space in your drive.


    Thanks With Regards,
    Shailesh Kawade
    MCTS BizTalk Server
    Please Mark This As Answer If This Helps You.
    http://shaileshbiztalk.blogspot.com/

    Monday, February 13, 2012 10:30 AM
  • I looked using Disk Management: there are two partitions: one is C: that has 25GB free and another is "System Reserved" that is 100 MB overall and it has 70mb free.
    • Edited by Mutari Monday, February 13, 2012 10:38 AM
    Monday, February 13, 2012 10:32 AM
  • Another thing it says:

    More than 80% of Pool Paged Kernel Memory Used


    • Edited by Mutari Monday, February 13, 2012 11:36 AM
    Monday, February 13, 2012 10:38 AM
  • Okay Muturi,

    First of all you do have some critical issues with your environment. If this is production I would order Microsoft or some consultancy agencies nearby come to you and perform a health check of your environment. This is a deep dive and they should locate any issues with your BizTalk installation.

    I would advice you to do the following:

    Check the performance monitor and locate these counters

    BizTalk : MessageBox: General Counters -> MsgBoxName -> Spool Size

    BiTalk:Message Agent -> Message Delivery Throttling State -> All hosts

    BizTalk:Message Agent -> Message Publishing Throttling State -> All hosts

    The BizTalk backup jobs needs to be run with a few special things in mind (some make their own while i advice you to perform the regular one). BizTalk sets a transaction mark in the databases to make sure that a disaster recovery will work and that the databases are synchronal. How to configure the Backup BizTalk Server Job

    You are also running BizTalk in a non-supported environment, if this is a production environment move it to a supported platform. Also check the size of all databases, what are they?

    If this isn’t a production environment I would advise you clean up your MessageBox install any missing CU’s and try again. You can clean up the MessageBox by performing the following stored procedures. Can you please update this thread with the values from the performance monitor?

    Also take a look at the warnings you got from MessageBox Viewer.

    Also take a look at the following Wiki article:

    BizTalk Health Check

    Best regards

    Tord Glad Nordahl
    Bouvet ASA, Norway
    http://www.BizTalkAdmin.com |@tordeman

    Please indicate ”Mark as Answer” if this post has answered the question.


    Monday, February 13, 2012 10:40 AM
  • Hi Tord,

    Throttling is 0 and Spool is 26000. There is CU3 (it is latest as far as I know).

    Monday, February 13, 2012 10:54 AM
  • Okay,

    Good to see that your environment is not throttling. You do have a big spool size, be aware that running debatching inside and orchestration will help to increase the spool size, all messages coming into or going out of biztalk passes through the spool table. So too big spool tables will create delays, however an amount like that should not give a full stop.

    You report from PAL said you had too litle disk space on the SQL database, can you improve this. If you run out of disk space BizTalk will stop processing messages. You should also use tools like SCOM to warn you regarding this.

    Remember to split the data and log files to separate disks for the BizTalk databases.

    Do you process message at the moment or is it still a full stop, is this production?

    When you say messages are stoped in queue states, do you by this mean "Active" or "Ready to run" ?

    Best regards

    Tord Glad Nordahl
    Bouvet ASA, Norway
    http://www.BizTalkAdmin.com |@tordeman

    Please indicate ”Mark as Answer” if this post has answered the question.

    Monday, February 13, 2012 11:44 AM
  • It is future production. It hasn't become one yet, but this is exact enviroment that should start working right.. It's not critical that it's stopped now. I want to find the reason of it to not let it happen when thay start really using it.

    Disk C:\ on SQL Server, where database files located, has plenty of free space, it's some small "system partition" that is 100 mb and has 70 mb free.

    BizTalk is not doing anything still.

    Almost all service instances are "Ready to run", but 5 are "Active" (they have messages attached to them in state "Queued (Awaiting processing)"). The picture is exactly the same after hosts restart.


    • Edited by Mutari Monday, February 13, 2012 12:02 PM
    Monday, February 13, 2012 12:00 PM
  • I would then investigate issues between the SQL and BizTalk servers, DTCPing and Pathping to look for network issues and packet loss.

    Messages that are "Ready to run" has not yet been written to the messagebox entirly, and has not been started working with. many ready to run messages without any throttling issues often indicates network related issues.

    I bet some network gurus at your company can assist you with this troubleshooting.

    Best regards

    Tord Glad Nordahl
    Bouvet ASA, Norway
    http://www.BizTalkAdmin.com |@tordeman

    Please indicate ”Mark as Answer” if this post has answered the question.

    Monday, February 13, 2012 12:06 PM
  • DTCPing and Pathping work just fine without loss.

    Messages are in DB, I can see them in [BizTalkMsgBoxDb].[dbo].[BizTalkServerApplication3Q]. If only I knew what its' columns mean..

    • Edited by Mutari Monday, February 13, 2012 12:25 PM
    Monday, February 13, 2012 12:25 PM
  • Look what's happening:

    I suspend one of those 5 Active instances through the BizTalk Admin Console and restart the corresponding host. Suddenly plenty of of instances (100+) go Suspended state with error "The instance completed without consuming all of its messages". And BizTalk started to process messages.

    Does it tell you something? Any other thing I can dig in BizTalk?

    I'm looking into my convoy orchestration. I noticed it has no transaction type, maybe I should set it to long-running? Can it possibly result in  such behavior?
    • Edited by Mutari Monday, February 13, 2012 12:41 PM
    Monday, February 13, 2012 12:31 PM
  • Thats called zombies.. ;)

    Usually happens when an Orchestration is completed without consuming all its messages (the messages coming in after the orchestration finish off are suspended with that error messages).

    So you are right, your convoy orchestration might be messing up BizTalk.

    Best regards

    Tord Glad Nordahl
    Bouvet ASA, Norway
    http://www.BizTalkAdmin.com |@tordeman

    Please indicate ”Mark as Answer” if this post has answered the question.

    Monday, February 13, 2012 1:36 PM
  • Zombies:

    see these articles

    BizTalk: Instance Subscription and Convoys: Details

    BizTalk: Suspend shape and Convoy

    Do not think Microsoft helps you. You are working on unsupported configuration. It definitely could influence the Microsoft ability to reproduce your solution.

    Possible you have some error in your workflow, but this error appears in rare circumstances. For example with some combinations of the instance subscriptions. Which from the other side could appears with many thousands messages waiting in the queue. I didn't say it is your case, I said it is one of the possible cases.


    Leonid Ganeline [BizTalk MVP] BizTalkien: Advanced Questions: have fun - test your knowledge

    Monday, February 13, 2012 4:57 PM
    Moderator
  • I still don't get it. Ofc I get zombies if I manually stop a convoy. But until I intervene I don't get any zombie-related errors. Even if I get zombies it's not a reason to stop processing all other messages. But it just stops.

    I see the same pattern every time:

    • 5 stucked active instances
    • I suspend one of them, restart host, resume it - processing of all messages resumes

    There has to be something in it. Why always 5? Why resuming one message triggers processing of all messages? 

    Thursday, February 16, 2012 2:59 PM
  • It sounds like you have some application error. In order to exclude the possibility that it is evnironment related create a new application. Make it go in a loop on a fileshare, simple passthru messages (and xml with some line of dummy text so it has some size). Disable your old application.

    Let the application run for 10 minutes or so and see if BizTalk stops. If it doesnt the problem is somewhere in your application(s).

    It  can be:

    • Network releated errors
    • Developed bugs ;)
    • timeouts from other systems
    • etc

    Try moving the application you have problem with to another environment where you know it works, and try to verify it there.

    Best of luck!

    Best regards

    Tord Glad Nordahl
    Bouvet ASA, Norway
    http://www.BizTalkAdmin.com |@tordeman

    Please indicate ”Mark as Answer” if this post has answered the question.

    Friday, February 17, 2012 7:31 AM
  • Hi Mutari,

    Few suggestions on this issue,

    1) In Discussions above you rerpoted that there were couple of issues highlighted from PAL reports on SQL server and BizTalk Server. Did you fixed them ? if not i would suggest to focus and fix them. Many a times issues are interlinekd and Fixing these issues might solve your main issue as well.

    2) Another point I would like to make it if you can run Best Practise Analyzer and see the results.
       
    3) As you are pointing out that after host instance restart these messages get resumed and everythign is fine after that. When we restart the host instance it really means that memory and threads are getting released.

    I would suggest following to try out,
    Increase thread count :- Refer to section "Define CLR hosting thread values for BizTalk host instances" from link below
    http://msdn.microsoft.com/en-us/library/dd722826(v=bts.10).aspx

    Increase allocated host memory - This can be done from host settings. Advanced Options -> Throttling settings -> Process memory usage. Default value is 25% you can increase this to higher value say 50% and check the results.

    Let us know the results.


    Thanks With Regards,
    Shailesh Kawade
    MCTS BizTalk Server
    Please Mark This As Answer If This Helps You.
    http://shaileshbiztalk.blogspot.com/

    Friday, February 17, 2012 8:58 AM
  • Shailesh, as he states he has no throttling issues, so changing the thresholds wont help him out here, he is better of leaving the thresholds as they are set. I suspect the application itself and not the BizTalk setup.

    Best regards

    Tord Glad Nordahl
    Bouvet ASA, Norway
    http://www.BizTalkAdmin.com |@tordeman

    Please indicate ”Mark as Answer” if this post has answered the question.

    Friday, February 17, 2012 9:07 AM
  • Hi Tord,

    If there is no throttling then the momory should not be changed but i think thread starvation issues sometimes are not reported directly from throttling.

    I agree to fact that the code review also has to be done. But I assume these things must have been done or started already and my suggestions are just on top of that in case he does not find any issues with code.

    I would also emphasize on fixing the issues reported from PAL on both servers (BizTalk and SQL) and running the BPA.


    Thanks With Regards,
    Shailesh Kawade
    MCTS BizTalk Server
    Please Mark This As Answer If This Helps You.
    http://shaileshbiztalk.blogspot.com/




    Friday, February 17, 2012 9:15 AM
  • Hi Mutari,

    I was reading this thread, and i uderstand your current farm is going to be a future production Farm?

    I would suggest you to please following MS recommendations for creating/configuring the production.

    Please see following two documents

    Performance optimization guide 

    Microsoft BizTalk Server Operations Guide

    I gues putting the production SQL files(data/log) on C drive should be avoided? It is better to consider these things before moving production.


    HTH,Thanks, Naushad (MCC/MCTS) http://alamnaushad.wordpress.com |@naushadalam

    If this is helpful or answers your question - please mark accordingly! Please "Vote As Helpful" if this was useful while resolving your question!

    Wednesday, February 22, 2012 9:04 PM
    Moderator
  • This thread is a bit old, but we had a simular issue in our environment. By design all the Biztalk db (SQL) growth is set to 1mb. If you handle a lot of messages this kind of issue can occure when the database gets full, and will need to grow. 1mb/growth is not enough if you are handling a lot of messages.

    Symptom: Messages gets stuck as active or simular state when waiting to be handeled

    We instantly increased the setting in DB to 100mb per grow cycle. Under normal circumstanses the cleanup job will free enough space. But if the load gets to high it will fill up. 

    Perhaps this will solve some other headace :-)


    Wednesday, November 21, 2018 12:51 PM
  • Recommended configuration is no less than 100 Mb pr step
    https://docs.microsoft.com/en-us/biztalk/technical-guides/post-configuration-database-optimizations2

    /Peter


    When asking a question please be as thoroughly as possible this will make it easier to assist you http://www.catb.org/esr/faqs/smart-questions.html

    Wednesday, November 21, 2018 1:13 PM