none
BizTalk 2009 SQL WCF Adapter Losing Messages? AKA TypedPolling Not Working RRS feed

  • Question

  • Can anybody help me with this amazingly frustrating issue? Firstly, the basiscs. I'm using BizTalk 2009 Standard Edition hosted on Windows Server 2008 

    Standard (64 bit). The BizTalk databases are stored on a seperate server, running SQL Server 2008 on Windows Server 2008 Enterprise (64 bit).

    I have configured a WCF SQL adapter to poll (using TypedPolling) a SQL 2008 database which in turn activates a relatively simple orchestration. The problem

    is that the orchestration does not get executed at the specified polling intervals. I have several other similar polling adapters in my application that work
    as expected. Generally it will run correctly once or twice after enabling/disabling the receive location or restarting the host instance. But after that it

    is completely erratic. If I have it set to run every hour (3600 secs), it might complete succssfully once or twice a day at seemingly random times. On other

    days it might run seven or eight times during a day.

    I've spent heaps of time reading blogs and changing various settings on my receive location and have really run out of ideas. I've profiled SQL Server and it

    seems that the WCF SQL adapter does hit the database at each polling period as specified, using both the polledDataAvailableStatement and the
    pollingStatement. Data is always returned by these queries (a small amount - around 30 records), but they just don't seem to get to the message box. No

    errors are ever reported in the Event Log.

    I use a stored procedure for determining if data is available and for returning data. They are both effectively the same statement, one returns a count

    whilst the other returns the data. The queries are using cross server joins as you can see below. Although I do no updates, it seems that I have to run the

    receive location with AmbientTransactions turned on. Turning them off just results in a SQL Timeout exception. I don't understand why this is and would be

    grateful if someone could explain.

    Data Available (contained in a stored proc):

    SELECT  COUNT(*)
        FROM    [LMNZLSQL002\AX].[LMNZ_AX_LIVE].[dbo].[CRM_CUSTTRANS_ITG] WITH (NOLOCK)
        WHERE    (TRANSTYPE = 8 OR TRANSTYPE = 13)  
        AND SETTLEAMOUNTCUR > 0
        AND CUSTFEEINVOICETRANSACTION30005 IS NOT NULL
        AND CUSTFEEINVOICETRANSACTION30005 <> ''      
        AND CUSTFEEINVOICETRANSACTION30005 IN
        (
            SELECT F.FeeId
            FROM [dbo].[Fees] F WITH (NOLOCK)      
            WHERE F.FeeStatus = 3        --Unpaid
            AND F.Payer IS NOT NULL
        )  


    Select Data (contained in a stored proc):

    SELECT  ACCOUNTNUM, CUSTFEEINVOICETRANSACTION30005, AMOUNTCUR, SETTLEAMOUNTCUR
        FROM    [LMNZLSQL002\AX].[LMNZ_AX_LIVE].[dbo].[CRM_CUSTTRANS_ITG] WITH (NOLOCK)
        WHERE    (TRANSTYPE = 8 OR TRANSTYPE = 13)  
        AND SETTLEAMOUNTCUR > 0
        AND CUSTFEEINVOICETRANSACTION30005 IS NOT NULL
        AND CUSTFEEINVOICETRANSACTION30005 <> ''      
        AND CUSTFEEINVOICETRANSACTION30005 IN
        (
            SELECT F.FeeId
            FROM [dbo].[Fees] F WITH (NOLOCK)
            WHERE F.FeeStatus = 3        --Unpaid
            AND F.Payer IS NOT NULL
        )  
        ORDER BY ACCOUNTNUM,CUSTFEEINVOICETRANSACTION30005


    At one time I suspected that the issue may be related to database locking. So, as you can see, I've added NOLOCK locking hints to rule this out. I'm also

    using a Service Behaviour (SqlAdapterInboundTransactionBehaviour) declaring a transaction level of ReadUncommitted. Obviousy I'd like to move away from the

    possibility of dirty reads. Also, I've checked the SQL activity monitors when I expect the process to run and can see no locks holding it up. Also, I can

    freely execute the stored procedures from SQL Management Studio at these times.

    I'm not too experienced at monitoring BizTalk, but I wondered if some kind of throttling was occuring. So I've moved the receive location (and the

    orchestration that doesn't always get initiated) to its own host instance and have done some monitoring via the performance counters in perfmon. I haven't

    spotted any thottling going on, but maybe I wasn't looking at the correct counters (mainly used the Message Agent items).

    Here's a summary of the receive location properties:

    Transport: WCF-Custom
    Receive Handler: AXPaymentsToFeesEngine (this host instance only hosts this receive location and the orchestration it should be activating).
    Receive pipeline: XmlReceive

    General tab:
    EndpointAddress: mssql://lmnzlsql003/CRM/FeesEngine?InboundId=PaidInAXUnpaidInFeesEngine
    Endpoint Identity - all default values

    Binding Tab:
    Binding Type: sqlBinding
    allowIdentityInsert: False
    batchSize: 20
    chunkSize: 4194304  
    enableBizTalkCompatibilityMode: True
    enablePerformanceCounters: False
    encrypt: False
    inboundOperationType: TypedPolling
    maxConnectionPoolSize: 100
    notificationStatement: Not specified
    notifyListenersOnStart: True
    polledDataAvailableStatement: EXEC [dbo].[bts_PollForAXFeePaymentsNotPresentInFeesEngine_IsDataAvailable]
    polledIntervalInSeconds: 3600
    pollingStatement: EXEC [dbo].[bts_GetFeesMarkedUnpaidThatHaveBeenPaidInAX]
    pollWhileDataFound: False
    UseAmbientTransaction: True
    useDatabaseNameInXsdNamespace: False
    workstationId: Not specified
    xmlStoredProcedureRootNodeName: Not specified
    xmlStoredProcedureRootNodeNamespace: Not specified
    All timeouts: 5 mins.

    Behaviour Tab
    ServiceBehaviour - sqlAdapterInboundTransactionBehaviour, ReadUncommitted, timeout mins

    Other Tab:
    Credentials: None
    Preserve message order: False

    Mesages Tab:
    Inbound BizTalk message body: Body
    Error handling: All disabled


    Thanks in advance,

    Mark
    Wednesday, June 9, 2010 1:01 AM

Answers

  • Personally, I like to use a SQL agent job coupled with TypedPolling to kick off an event. The accuracy in the time the event occurs is much higher and you have so much more control over the schedules. Just specifying "every hour" by 3600 seconds is not going to give you much control over when. I use a jobs table in SQL and just switch a bit flag from the SQL agent script and then have my typed polling watch for any bit flag changes in my table.

    If you have other typedPolling occuring I would wonder if any of these might occur at the same time and lead to a race condition which affects the execution order of the polling? This could result in blocking or execution priority problems. I recommend reducing the amount of time it takes to execute the polling statements - try to make these simpler statements (use an actual inner join rather than a subquery) so that you minimize the execution blocking.

    In my polling statements I use "select top 1 * from MyTable where bitFlag = 1" to have a very quick check. When you have a lot of typedPolling occuring you should keep the window of time to do the check as small as possible. You could do a "select top 1" rather than a "select count(*)" which would be quite a bit faster. A count will need to check every single row.

    Thanks,


    If this answers your question, please use the "Answer" button to say so | Ben Cline
    Wednesday, June 9, 2010 2:14 PM
    Moderator
  • Hi Mark,

    Although the problem is present with default settings for Notification mode read carefully. You will run into the same issue when using sqlAdapterInboundTransactionBehavior in Polling mode. Worse yet, setting receiveTimout to max in the behavior won't help in this case. So, definetily you want to remove this behavior from your endpoint.

    All that said, we do have production scaled out solution that heavily uses both polling and notification with the very adapter. So it can be done. Once you understand the problem you can take it under control and I recommended how.

    Like Ben mentioned I'd be concerned with long polling interval. I'm not sure what timers are used in the adapter. There was third party BizTalk Scheduled Task Adapter that got extremely unreliable on long intervals just because some timer/threading issues. We could not use it in production. While one hour does not sound very long, just do some thorough testing to see if it is a factor. Does the same happen with shorter polling intervals? Did you look at the performance counters to see if there are any clues?

    Best regards,

    Paul


    http://geekswithblogs.net/paulp/
    Thursday, June 10, 2010 8:01 PM
    Answerer

All replies

  • Locking is not likely an issue here, you can remove NOLOCK hints as well as sqlAdapterInboundTransactionBehavior. You may have run into one of the WCF adapter bugs may be not. You will have to do some debugging to find out.

    First, check if you observe symptoms I described here http://geekswithblogs.net/paulp/archive/2010/05/17/139876.aspx and if so use recommended workaround. If receive adapter and orchestration are in the same host instance - move adapter to dedicated handler (that's what you should do anyways). The reason is if adapter exhibits thread leak it will use them up and affects orchestration.

    Second, I don't understand how you prevent the same records from getting into the recordset again. Do you have some other process that removes them or changes status?


    http://geekswithblogs.net/paulp/
    Wednesday, June 9, 2010 4:48 AM
    Answerer
  • Personally, I like to use a SQL agent job coupled with TypedPolling to kick off an event. The accuracy in the time the event occurs is much higher and you have so much more control over the schedules. Just specifying "every hour" by 3600 seconds is not going to give you much control over when. I use a jobs table in SQL and just switch a bit flag from the SQL agent script and then have my typed polling watch for any bit flag changes in my table.

    If you have other typedPolling occuring I would wonder if any of these might occur at the same time and lead to a race condition which affects the execution order of the polling? This could result in blocking or execution priority problems. I recommend reducing the amount of time it takes to execute the polling statements - try to make these simpler statements (use an actual inner join rather than a subquery) so that you minimize the execution blocking.

    In my polling statements I use "select top 1 * from MyTable where bitFlag = 1" to have a very quick check. When you have a lot of typedPolling occuring you should keep the window of time to do the check as small as possible. You could do a "select top 1" rather than a "select count(*)" which would be quite a bit faster. A count will need to check every single row.

    Thanks,


    If this answers your question, please use the "Answer" button to say so | Ben Cline
    Wednesday, June 9, 2010 2:14 PM
    Moderator
  • Hi Paul.

    Thanks for the advice. Having read your post at the supplied link, it makes me wonder if the bugs in WCF will undermine my current architecture, which is fairly reliant on the typed polling approach.

    So you think that the only way to keep the adapter reliable is to restart the host instances regularly? This seems crazy. Do you know of any place that lists all known issues with the SQL WCF adapter? Does Microsoft have any plans to fix these issues?

    Knowing that there is a resource leak on the adapter makes it a good suggestion to seperate the adapter into its own instance; I'll be sure to do this.

    In answer to your second question, the data being selected is subject to change at random intervals so I actually want to select it every time, hence the lack of a 'status tracker'.

    Thanks again,

    Mark
    Thursday, June 10, 2010 5:16 AM
  • Cheers Ben.

    Firstly, thanks for the tip on shortening the query execution times. That's an excellent suggestion that I'll apply to the other polling adapters that I have.

    I was also wondering if there were some race conditions/locking going on (hence the locking hints in the queries) but haven't been able to find any. And of cource I agree that the scheduling is very weak when you can only specify the polling interval.

    Please can you elaborate on your use of a SQL job in conjunction with the polling...I'm not sure I've understood your suggestion entirely. If I use a job to set a flag, how will this help with the reliability of polling? If the adapter is (seemingly) failing to do its job in as much as its not returning data when expected, how will this help? And doesn't this also imply that I'll have to have a much shorter polling interval to be able to detect the changes? Which would seem to increase the probability of resource leaks occuring as suggested by Paul in a previous post...

    Thank you for taking the time to assist,

    Mark

    Thursday, June 10, 2010 5:29 AM
  • Paul's post mentioned a recommendation for the notification mode of typed polling but you are not using the notification statement. As for restarting the host instances, I would consider this a good step in a regular maintenance plan if you can afford any downtime. This should not be a required step.

    The race conditions I think are in the thread execution controls that are in the adapter code or the way that BizTalk prioritizes the threads - not in your database code. I am just suggesting some things that can attempt to minimize the possibility of a thread switch or a timeout. Basically I think what is happening is that the logic to determine whether data has changed is taking too long and is getting prioritized down in the thread pool. The reason it is taking more than 3600 seconds is that other threads are blocking and the time interval slips. Having a custom host/adapter handler will help with this as Paul mentioned. My other suggestions were ways to improve the SQL performance.

    It also sounds like the adapter is not checking for changes as timely as you would like. Checking for changes is an approach for near real-time responses. Using a SQL job would be a more reliable approach to ensure that the checking happens and would be slightly less real-time but you would have more control over the execution. So for this trade off I would change the process to query the database for changed data in an orchestration rather than from the port itself.

    A SQL job would help you move the checking for changes out of a time sensitive window over to an orchestration where you can take as long as you want to check for changes.

    Thanks,


    If this answers your question, please use the "Answer" button to say so | Ben Cline
    Thursday, June 10, 2010 1:33 PM
    Moderator
  • Hi Mark,

    Although the problem is present with default settings for Notification mode read carefully. You will run into the same issue when using sqlAdapterInboundTransactionBehavior in Polling mode. Worse yet, setting receiveTimout to max in the behavior won't help in this case. So, definetily you want to remove this behavior from your endpoint.

    All that said, we do have production scaled out solution that heavily uses both polling and notification with the very adapter. So it can be done. Once you understand the problem you can take it under control and I recommended how.

    Like Ben mentioned I'd be concerned with long polling interval. I'm not sure what timers are used in the adapter. There was third party BizTalk Scheduled Task Adapter that got extremely unreliable on long intervals just because some timer/threading issues. We could not use it in production. While one hour does not sound very long, just do some thorough testing to see if it is a factor. Does the same happen with shorter polling intervals? Did you look at the performance counters to see if there are any clues?

    Best regards,

    Paul


    http://geekswithblogs.net/paulp/
    Thursday, June 10, 2010 8:01 PM
    Answerer
  • Thanks everyone for your input on this. None of the suggestions actually seemed to solve our specific issue, and we ended up raising a support call with Microsoft. They got us to use some tracing tools that are not publically available and it turns out that there was a problem with the underlying WCF commnications channel, which was not visible via the usual windows event logs. They suggested a solution that, although very counter-intuitive, seems to have resolved our problem; set the ReceiveTimeout property of the receive adapter to <!-- /* Font Definitions */ @font-face {font-family:SimSun; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-alt:宋体; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-alt:"Calisto MT"; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-1610611985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-alt:"Arial Rounded MT Bold"; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-1610611985 1073750139 0 0 159 0;} @font-face {font-family:"\@SimSun"; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:SimSun; mso-fareast-language:ZH-CN;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-size:10.0pt; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} --> 24.20:31:23.6470000 and restart the service host.

    Here's the underlying exception:

    <!-- /* Font Definitions */ @font-face {font-family:SimSun; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-alt:宋体; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-alt:"Calisto MT"; mso-font-charset:0; mso-generic-font-family:roman; mso-font-pitch:variable; mso-font-signature:-1610611985 1107304683 0 0 159 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-alt:"Arial Rounded MT Bold"; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-1610611985 1073750139 0 0 159 0;} @font-face {font-family:"\@SimSun"; panose-1:2 1 6 0 3 1 1 1 1 1; mso-font-charset:134; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 680460288 22 0 262145 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:SimSun; mso-fareast-language:ZH-CN;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-size:10.0pt; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt;} @page WordSection1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} -->

    [0]1E2C.21C4::07/07/2010-4:52:25.804 [CSharp]:[Wcf] BtsErrorHandler.HandleError called with Exception: System.ServiceModel.CommunicationObjectAbortedException: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it has been Aborted.

       at System.ServiceModel.Channels.CommunicationObject.ThrowIfDisposedOrImmutable()

       at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)

       at System.ServiceModel.Channels.CommunicationObject.Open()

       at System.ServiceModel.Dispatcher.ChannelHandler.InitializeServiceChannel(ServiceChannel channel)

       at System.ServiceModel.Dispatcher.ChannelHandler.GetSessionChannel(Message message, EndpointDispatcher& endpoint, Boolean& addressMatched)

       at System.ServiceModel.Dispatcher.ChannelHandler.EnsureChannelAndEndpoint(RequestContext request)

       at System.ServiceModel.Dispatcher.ChannelHandler.TryRetrievingInstanceContext(RequestContext request).
    Monday, August 2, 2010 2:03 AM
  • This time without the dodgy formatting :o)

     

    hanks everyone for your input on this. None of the suggestions actually seemed to solve our specific issue, and we ended up raising a support call with Microsoft. They got us to use some tracing tools that are not publically available and it turns out that there was a problem with the underlying WCF commnications channel, which was not visible via the usual windows event logs. They suggested a solution that, although very counter-intuitive, seems to have resolved our problem; set the ReceiveTimeout property of the receive adapter to 24.20:31:23.6470000 and restart the service host.

    Here's the underlying exception:

    [0]1E2C.21C4::07/07/2010-4:52:25.804 [CSharp]:[Wcf] BtsErrorHandler.HandleError called with Exception: System.ServiceModel.CommunicationObjectAbortedException: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it has been Aborted.

       at System.ServiceModel.Channels.CommunicationObject.ThrowIfDisposedOrImmutable()

       at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)

       at System.ServiceModel.Channels.CommunicationObject.Open()

       at System.ServiceModel.Dispatcher.ChannelHandler.InitializeServiceChannel(ServiceChannel channel)

       at System.ServiceModel.Dispatcher.ChannelHandler.GetSessionChannel(Message message, EndpointDispatcher& endpoint, Boolean& addressMatched)

       at System.ServiceModel.Dispatcher.ChannelHandler.EnsureChannelAndEndpoint(RequestContext request)
       at System.ServiceModel.Dispatcher.ChannelHandler.TryRetrievingInstanceContext(RequestContext request).

     

     

    Monday, August 2, 2010 2:46 AM
  • We are having a similar issue as described above and I already had set my receiveTimeout to 24.20:31:23.6470000 but eventually the process gets this error for our time interval:

    The adapter "WCF-Custom" raised an error message. Details "System.ObjectDisposedException: Cannot access a disposed object.

    Object name: 'TransactionScope'.

    at System.Transactions.TransactionScope.Complete()

    at System.ServiceModel.Dispatcher.TransactionRpcFacet.ThreadLeave()

    at System.ServiceModel.Dispatcher.TransactionBehavior.ClearCallContext(MessageRpc& rpc)

    at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage7(MessageRpc& rpc)

    at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)".

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Here is the my polledDataAvailable statement:

    select count(*) from outbound with (nolock) where processed='N' and trans in ('0130', '0210', '0310', '0330', '0530', '0535', '0550', '0650', '0660', '0670', '1360')

    Here is my pollingStatement:

    select trans, batch, text, processed, seqnbr, update_date, update_user_id, update_pid from outbound where processed='N' and trans in ('0130', '0210', '0310', '0330', '0530', '0535', '0550', '0650', '0660', '0670', '1360') order by seqnbr; update outbound set processed='P' where processed = 'N' and trans in ('0130', '0210', '0310', '0330', '0530', '0535', '0550', '0650', '0660', '0670', '1360');

    Notice the update statement.  The polled statements returns the resultset immediately because I have nolock however when it comes time to get the data it locks just on the SELECT only.  I know this because I took the SELECT and put it in SSMS and the query just hung.  When I stopped the receive location then it got freed.  It has not gotten to the update because when I had the timeout occuring every minute which is our interval.  I seems to act up when there is a lot of activity on the table we are selecting.

    I have tried everything like changing interval and behavior setting it to ReadCommitted.

    Does anyone have any suggestions?

    Thanks in advance.

    Friday, October 15, 2010 11:44 AM
  • Hi all,

    we are seeing the same issue with a custom developed adapter, wich is based on the WCF LOB Adapter SDK SP2.

    Since we could debug our own code, we saw that all the polling etc. was working like a charm and BizTalk was also calling the correct method on the InboundHandler (TryReceive), but the message just vanished afterwards - it was nowhere in BizTalk and I'm almost certain that there is not just a bug with the WCF-SQL Adapter but the WCF-Custom Adapter in general.

    The same issue can happen with the WCF-SAP adapter. Messages (IDOCs) will have left the SAP system but never reach the processes in BizTalk.

    We really need the BizTalk Product Group to fix the issue. (It's valid for 2006 R2 and 2009 - we have at least 10 customer installations, where we saw the issue)

    We already tried the ReceiveTimeout suggestions etc., but in the end that's a "most-of-the-time-working" work-around. We should never be forced to restart host instances on a regular basis!

    Regards,

    Leo

     


    Please mark it as Answer if this answers your question.
    Sunday, October 17, 2010 11:29 AM
  • Hi all,

    We are having the same issue as described above...

    Did anyone come up with a solution already? Because this is rather annoying (both for the developers and the customer).

    Thanks!
    Christophe

    Wednesday, November 10, 2010 9:58 AM
  • Hi there

    WCF-SQL adapter loosing messages? I've got the same issue.

    Messages are getting flagged in the source systems, but never reach the destination systems. No warning, no error, no MSDTC issues...

    What is the opinion of the Miscosoft guys?

    Thx

    Wednesday, November 24, 2010 6:30 PM
  • I believe Leo Martens is right. I think this is a general problem with the WCF LOB Adapter. We have an adapter developed using WCF LOB Adapter SDK SP2 for BizTalk 2009, and have faced the same issues where we occasionally loose messages when receiving messages.

     

    We have spent a lot of time on this, and initially could not find any trace of the lost message after it left our custom code and got submitted to the WCF framework(via the public bool TryReceive() method exposed by the adapter). When we turned on full WCF logging we finally found a relevant error in the WCF logs: “The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it has been Aborted.”

     

    So, now we probably know the reason, but it does not help us very much. I have added an implementation of IErrorHandler as and IServiceBehavior and added this Behavior to the adapter. This gives us the possibility to catch the WCF error and log it in our logs(eventlog etc), but does not fix the problem with lost messages. The WCF LOB adapter does not handle problems/errors in the WCF channel without loosing the messages beeing commited to the faulted WCF channel. Yes, we can reduce the probability of this error occurring by modifying the different Timeout properties, and I guess that combined with a scheduled task  restarting either the port or host we can reduce the probability of the error occurring to a minimum. But, this is not an acceptable solution. BizTalk is supposed to be a _reliable _ message platform and _never _ lose a message, not just “very seldom”.

     

    I guess the best solution would be for Microsoft to update the way TryReceive() is implemented in the WCF LOB adapter by testing the state of the channel before submitting the message, and at the very least trickle the error up to the user, making us able to suspend/resume/log/whatever when this occurs.

     

    I have tried to dabble a bit in extending the IServiceBehavior implementations we used to catch the WCF error and log it. The theory being that we could add an event handler to detect when the channel state has either a Faulted or Closed state and then close/abort the channel and create a new channel. But, so far we have not been successful. If anyone has any theories/experiences with this then please elaborate. I feel a bit stuck.

     

    If it can be of use to any of you, then here is the code to catch and log errors occurring in WCF, which by default just “vanishes”.

     

    class MyErrorHandler : IErrorHandler, IServiceBehavior
     {
     public bool HandleError(Exception error)
     {
     //logg
     Trace.WriteLine("ERROR : WCF : " + error.ToString());
     //indicates if the fault is handled or not
     return false;
     }
     public void ProvideFault(Exception error, MessageVersion version, ref Message msg)
     {
     FaultException faultException = new FaultException(error.Message);
     MessageFault messageFault = faultException.CreateMessageFault();
     msg = Message.CreateMessage(version, messageFault, faultException.Action);
     }
    
     #region IServiceBehavior implementation
     public void ApplyDispatchBehavior(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
     {
     IErrorHandler errorHandler = new MyErrorHandler();
    
     foreach (ChannelDispatcherBase channelDispatcherBase in serviceHostBase.ChannelDispatchers)
     {
     ChannelDispatcher channelDispatcher = channelDispatcherBase as ChannelDispatcher;
     channelDispatcher.ErrorHandlers.Add(errorHandler);
     }
     }
     public void AddBindingParameters(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase,
     Collection<ServiceEndpoint> endpoints, BindingParameterCollection bindingParameters)
     {
     }
     public void Validate(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
     {
     }
     #endregion
    
     }

    You must also implement a class implementing the BehaviorExtensionElement:
    class MyErrorHandlerBehavior : BehaviorExtensionElement
     {
     public override Type BehaviorType
     {
     get
     {
     return typeof(MyErrorHandler);
     }
     }
    
     protected override object CreateBehavior()
     {
     return new MyErrorHandler();
     }
    
     }



    add som config to your machine.config files:

    <system.serviceModel>
    	<extensions>
    		<behaviorExtensions>
    			<add name="MyErrorLogging" type="MyErrorHandlerBehavior, MyLOBAdapter, Version=1.0.0.0, Culture=neutral, PublicKeyToken=1234567898765"/>
    		</behaviorExtensions>
    		<bindingElementExtensions/>			
    		<bindingExtensions/>
    	</extensions>
    	<commonBehaviors>
    		<endpointBehaviors/>
    			<serviceBehaviors>
    			<MyErrorLogging/>
    		</serviceBehaviors>
    	</commonBehaviors>
    </system.serviceModel>
    And finally, restart Biztalk hosts, and the Bts admin console. you should now be able to add the behavior to your Servicebehavior in config of your wcf custom adapter

     

    Monday, November 29, 2010 10:33 AM
  • Is Microsoft planning to release a fix for this issue? This really is a business blocker for serveral customers off ours.

    Grtz

    Monday, November 29, 2010 11:52 AM
  • I would report it to MS Support to see if they can release a hotfix. These typed polling issues are very hard to nail down through a forum because you need to know a lot about what is happening in the database (among other things).

    MS Support does not charge money for bugs and issues that require hotfixes.

    Thanks,


    If this answers your question, please use the "Answer" button to say so | Ben Cline
    Tuesday, November 30, 2010 3:51 PM
    Moderator
  • If you need to restart it might be thread leak. Could you test with this fix and low receive timeout (a bit more than max expected, e g 5-15 minutes)?:
    The thread count increases quickly when you use WCF Adapter receive locations in BizTalk Server 2006 R2 or in BizTalk Server 2009
    http://support.microsoft.com/?id=2300507
    "This problem occurs because the time-out value of WCF does not synchronize with the receive time-out when there is no data to poll. Additionally, WCF opens a new channel and waits for messages on the new channel. This may cause a memory leak situation."

    Thanks
    Niklas E

    Thursday, December 2, 2010 1:31 PM
  • We have opened a support case with MS on this issue. Hopefully we will receive a confirmation verifying that this is a bug, and then later a hotfix. I will try to keep this thread updated with the progress on this
    • Proposed as answer by Bhaswar Thursday, May 26, 2011 12:57 PM
    Monday, December 6, 2010 6:26 PM
  • Hi Pal,

    We are facing a similar issue on Biztalk 2010 WCF SQL adapter where the messages are being lost .Please let us know if a resolution was found for this.


    Bhaswar
    Thursday, May 26, 2011 1:01 PM
  • Please install BizTalk Adapter Pack 2010 CU1 and see if it helps

    http://support.microsoft.com/kb/2539794

    http://support.microsoft.com/kb/2522459

    and make sure you have AmbientTransaction True set.

     

    You can enable verbose WCF tracing (see above in thread and MSDN documentation under troubleshooting adapter), else I suggest you open a support case if you need more advanced log gathering and analysis analysis on BizTalk and SQL side.

     

    Cheers

    Niklas


    Thursday, May 26, 2011 7:53 PM
  • Hi.

    We did not really get anyware with our support case. We gathered a lot of log data, but the analysis did not really turn up anything usefull.

     

    We could catch the faulted state of the wcf channel using the above mentioned logger, but we were unable to resume the message that was beeing sendt into the faulted wcf channel.

     

    My experience suggest that this only happens when the wcf port has been idle for longer than the wcf timeout, but I have not found any solution to make sure the scenario will be handled if it occurs. A workaround could be to have a scheduled restart of the host running your receiveport. Not ideal, but in my experience this solved our problem

    Tuesday, August 30, 2011 12:37 PM
  • Unfortunately this adapter pack update does not solve this issue. Does anyone have any other suggestions to solve this issue? We are constantly losing messages in a production environment because of the WCF-SQL adapter. 
    Wednesday, November 9, 2011 1:16 PM
  • Did you ever get resolution?  We are having same problem in BT2010 with none of the CU's applied in Prod.  CU3 for the adapter packs caused an issue in our QA environment, so we are hesitant to deploy it.

    Thanks,
    Neal Walters

    Thursday, May 8, 2014 1:28 PM
  • Hi,

    Having exacty the same problem on a BizTalk Server 2013 R2 platform with the WCF-Oracle adapter. BizTalk randomly looses messages, which is not acceptable...

    After activating WCF logs, we have the following exception : AdapterInputChannel.TryReceive: Abort has been called, closing message.

    Any news from Microsoft ? 

    Anyway, thanks all for your help, it's been a long time we are facing this issue...

    Tuesday, December 5, 2017 10:35 AM