locked
BizTalk Server - IIS traffic blocked RRS feed

  • Question

  • Dear,

    We've encountered an issue several times the past months where we are unable to pinpoint the root cause. There's a moment through time, were our end-users experience a lot of time-outs for one minute when consuming web services exposed by BizTalk in IIS.

    Looking at the IIS logs, we cleary see the timeouts (high / abnormal duration):

    2017-02-03 12:36:16 10.14.240.238 POST /BTWWWaliServices/DeliverDistribution.svc - 80 AG\TWWSWP 10.14.201.11 - - 200 0 0 51309
    2017-02-03 12:36:16 10.14.240.238 POST /BTWWWaliServices/DeliverDistribution.svc - 80 AG\TWWSWP 10.14.201.11 - - 200 0 0 78571
    2017-02-03 12:36:16 10.14.240.238 POST /BTTAMadCobolLegacyService/CobolLegacyService.svc - 443 - 10.14.201.11 java - 200 0 0 72335
    2017-02-03 12:36:16 10.14.240.238 POST /BTWWWaliServices/DeliverDistribution.svc - 80 - 10.14.201.11 - - 401 5 0 16972

    When looking at the BizTalk tracking for a specific service, which is called always during the day without pause:

    There's an obvious gap of one minute that is missing for the service. This is the case for all services exposed in IIS through the isolated host. It seems as if the requests are arriving a minute too late on the message box before they're properly processed. We can't find any evidence of throttling of the isolated hosts nor any events / warnings in the event viewer that indicate a problem.

    When this occurs is usually is one minute where there's no traces to be found and timeouts occur. After that everything is resolved automatically and it continues to operate as expected. This doesn't happen often. Usually we've had it once every month - 2 months, but today it happened 3 times in just 8 hours.

    Polling intervals for the isolated hosts are set to 50 ms. We have 4 servers actively processing. There are no memory / CPU issues to be found on the BizTalk virtual machines nor anything out of the ordinary on the SQL instance. I'm at a loss and have no clue as to why this happens... Every other in-process hosts keeps processing without problems. During normal processing, the duration of such a flow is average 200 - 300 ms. It's clear it's being blocked somewhere before arriving on the message box, but I can't find out why.

    Has anyone here encountered such an issue before, where the request from IIS to BizTalk (MB) seems to have a lot of latency without clear cause?

    Thanks in advance for any assistance!

    Friday, February 3, 2017 2:31 PM

All replies

  • Hi,

    I believe the delay is in publishing the message to BizTalk msgBox DB. Have you tried running the MBV tool or BHM. I suspect the spool count is way too high on your msgBox . Check that and run MBV and BHM.

    If it is high then check  the BizTalk purge and archiving jobs are functional and not failing.


    Regards PK: Please mark the reply as answer or vote it up, as deemed fit.


    Friday, February 3, 2017 7:57 PM
  • Hi Steven

    Are all the 4 servers running the Isolated Host Instance - is it setup that way in the Load Balancer?

    To verify that throttling is not happening, did you collect Perfmon counters?


    Thanks Arindam

    Saturday, February 4, 2017 8:13 AM
    Moderator
  • Hi Steven,

    As been suggested by Pushpendra ,please check size of spool table during the run . This might be the reason for the delay and timeout within the BizTalk Environment . It was always good to have spool table size within minimum range 


    If this answers your question please mark it accordingly. If this post is helpful, please vote as helpful by clicking the upward arrow mark next to my reply

    Saturday, February 4, 2017 11:23 PM
  • Hey,

    Thanks for the information. Unfortunately we don't have logs on that table size all the time. Currently the spool table averages around 5000 messages, but remains stable. I'm not sure if this counts as a high amount or not. Though it should be fine as long as it remains stable.

    In any case, we did have a lot of problems past week with the BizTalk backup job. It was failing all the time due to lack of disk space. In return this caused the 'purge and archive' to fail and as a result BizTalk dropped. At that moment, the spool table was indeed exploding towards 45k - 50k.

    I would assume that we had similar growth during those delays, but I can't verify this now. We will start monitoring the size of this table so that if it occurs again, we could correlate.

    Thanks for the tip!

    Tuesday, February 7, 2017 9:46 AM
  • Yes, all 4 servers are running the isolated host instance and are properly load balanced behind a F5 load balancer.

    Performance counters weren't available at the time, so I can't verify throttling with it, except that we should have noticed an entry in the event viewer. I don't believe throttling was activate at the time of the issue.
    Tuesday, February 7, 2017 9:47 AM
  • Hi,

    have you tried to use a monitoring tool to get insight while processing? We do provide a free version of our monitoring platform AIMS (www.aimsinnovation.com/aims-free) where you easily can access all performance data from servers, hosts, ports and orchestration in close to real time. Data for all components is also available for the past 24 hours so you can easliy chart and correlate different performance data to pinpoint the issue.

    Thanks,

    Marius

    Wednesday, March 15, 2017 8:50 AM