none
Intermittent Login Failures: Login timeOuts

    Question

  • Hello Folks,

    We are having some weird issues with connectivity and after doing the analysis from the client and the server side, I am writing it here in a hope that it will lead me to light.

    We have a gaming website where JAVA functionality is loaded on JBOSS servers and the backend is SQL Server 2008 R2 SP1.

    Now, because this is a free to play website, users play and go, but we do monitor a lot of exceptions on the client side of 'Login timeOut'.

    We can reproduce this issue even after reboot, and I verified that this is not a query performance problem because the query runs fast in the backend and there is no paramter sniffing kind of issues.

    I tried to read the data from Ring Connectivity Buffers and see a very very similar pattern there but somehow not able to go ahead further.

     

    Here is a sample XML from the output and 99 percent of the entries are the same:

    <Record id="634" type="RING_BUFFER_CONNECTIVITY" time="11674052938">
      <ConnectivityTraceRecord>
        <RecordType>LoginTimers</RecordType>
        <Spid>0</Spid>
        <SniConnectionId>9FA55CDC-8080-4952-8A35-C0F2EBF47051</SniConnectionId>
        <SniConsumerError>17830</SniConsumerError>
        <SniProvider>7</SniProvider>
        <State>11</State>
        <RemoteHost>10.133.0.240</RemoteHost>
        <RemotePort>29853</RemotePort>
        <LocalHost>10.136.0.175</LocalHost>
        <LocalPort>1433</LocalPort>
        <RecordTime>10/31/2013 12:53:25.823</RecordTime>
        <TdsBuffersInformation>
          <TdsInputBufferError>10054</TdsInputBufferError>
          <TdsOutputBufferError>0</TdsOutputBufferError>
          <TdsInputBufferBytes>0</TdsInputBufferBytes>
        </TdsBuffersInformation>
        <LoginTimers>
          <TotalLoginTimeInMilliseconds>0</TotalLoginTimeInMilliseconds>
          <LoginTaskEnqueuedInMilliseconds>0</LoginTaskEnqueuedInMilliseconds>
          <NetworkWritesInMilliseconds>0</NetworkWritesInMilliseconds>
          <NetworkReadsInMilliseconds>0</NetworkReadsInMilliseconds>
          <SslProcessingInMilliseconds>0</SslProcessingInMilliseconds>
          <SspiProcessingInMilliseconds>0</SspiProcessingInMilliseconds>
          <LoginTriggerAndResourceGovernorProcessingInMilliseconds>0</LoginTriggerAndResourceGovernorProcessingInMilliseconds>
        </LoginTimers>
      </ConnectivityTraceRecord>
      <Stack>
        <frame id="0">0X000000000206C45B</frame>
        <frame id="1">0X0000000002069246</frame>
        <frame id="2">0X000000000206DECE</frame>
        <frame id="3">0X0000000001325B1C</frame>
        <frame id="4">0X0000000000D20758</frame>
        <frame id="5">0X0000000000CDB450</frame>
        <frame id="6">0X0000000000CDB116</frame>
        <frame id="7">0X0000000000CDAF5B</frame>
        <frame id="8">0X0000000000E144FA</frame>
        <frame id="9">0X0000000000E147DD</frame>
        <frame id="10">0X000000000125C0CD</frame>
        <frame id="11">0X0000000000E153D2</frame>
        <frame id="12">0X0000000074A037D7</frame>
        <frame id="13">0X0000000074A03894</frame>
        <frame id="14">0X000000007741652D</frame>
        <frame id="15">0X000000007754C521</frame>
      </Stack>
    </Record>

     

    So in brief: <SniConsumerError>17830</SniConsumerError>
    <TdsInputBufferError>10054</TdsInputBufferError>

     

    17830 and 10054 are being repeatedly occuring. I am not sure if this is due to the network card. After the server gets rebooted, traffic takes time to build up so not sure if this is related to high NIC usage.

    On my windows 2008 server, TCP chimney is set as 'automatic', if this information is needed.

    Kindly help me to troubleshoot this further.

    Thanks

    Chandan Jha

     

     

     

     

     

     

     

     

    Thursday, October 31, 2013 1:10 PM

All replies

  • Hello Folks,

    We are having some weird issues with connectivity and after doing the analysis from the client and the server side, I am writing it here in a hope that it will lead me to light.

    We have a gaming website where JAVA functionality is loaded on JBOSS servers and the backend is SQL Server 2008 R2 SP1.

    Now, because this is a free to play website, users play and go, but we do monitor a lot of exceptions on the client side of 'Login timeOut'.

    We can reproduce this issue even after reboot, and I verified that this is not a query performance problem because the query runs fast in the backend and there is no paramter sniffing kind of issues.

    I tried to read the data from Ring Connectivity Buffers and see a very very similar pattern there but somehow not able to go ahead further.

    Here is a sample XML from the output and 99 percent of the entries are the same:

    <Record id="634" type="RING_BUFFER_CONNECTIVITY" time="11674052938">
      <ConnectivityTraceRecord>
        <RecordType>LoginTimers</RecordType>
        <Spid>0</Spid>
        <SniConnectionId>9FA55CDC-8080-4952-8A35-C0F2EBF47051</SniConnectionId>
        <SniConsumerError>17830</SniConsumerError>
        <SniProvider>7</SniProvider>
        <State>11</State>
        <RemoteHost>10.133.0.240</RemoteHost>
        <RemotePort>29853</RemotePort>
        <LocalHost>10.136.0.175</LocalHost>
        <LocalPort>1433</LocalPort>
        <RecordTime>10/31/2013 12:53:25.823</RecordTime>
        <TdsBuffersInformation>
          <TdsInputBufferError>10054</TdsInputBufferError>
          <TdsOutputBufferError>0</TdsOutputBufferError>
          <TdsInputBufferBytes>0</TdsInputBufferBytes>
        </TdsBuffersInformation>
        <LoginTimers>
          <TotalLoginTimeInMilliseconds>0</TotalLoginTimeInMilliseconds>
          <LoginTaskEnqueuedInMilliseconds>0</LoginTaskEnqueuedInMilliseconds>
          <NetworkWritesInMilliseconds>0</NetworkWritesInMilliseconds>
          <NetworkReadsInMilliseconds>0</NetworkReadsInMilliseconds>
          <SslProcessingInMilliseconds>0</SslProcessingInMilliseconds>
          <SspiProcessingInMilliseconds>0</SspiProcessingInMilliseconds>
          <LoginTriggerAndResourceGovernorProcessingInMilliseconds>0</LoginTriggerAndResourceGovernorProcessingInMilliseconds>
        </LoginTimers>
      </ConnectivityTraceRecord>
      <Stack>
        <frame id="0">0X000000000206C45B</frame>
        <frame id="1">0X0000000002069246</frame>
        <frame id="2">0X000000000206DECE</frame>
        <frame id="3">0X0000000001325B1C</frame>
        <frame id="4">0X0000000000D20758</frame>
        <frame id="5">0X0000000000CDB450</frame>
        <frame id="6">0X0000000000CDB116</frame>
        <frame id="7">0X0000000000CDAF5B</frame>
        <frame id="8">0X0000000000E144FA</frame>
        <frame id="9">0X0000000000E147DD</frame>
        <frame id="10">0X000000000125C0CD</frame>
        <frame id="11">0X0000000000E153D2</frame>
        <frame id="12">0X0000000074A037D7</frame>
        <frame id="13">0X0000000074A03894</frame>
        <frame id="14">0X000000007741652D</frame>
        <frame id="15">0X000000007754C521</frame>
      </Stack>
    </Record>

    So in brief: <SniConsumerError>17830</SniConsumerError>
    <TdsInputBufferError>10054</TdsInputBufferError>

    17830 and 10054 are being repeatedly occuring. I am not sure if this is due to the network card. After the server gets rebooted, traffic takes time to build up so not sure if this is related to high NIC usage.

    On my windows 2008 server, TCP chimney is set as 'automatic', if this information is needed.

    Kindly help me to troubleshoot this further.

    Thanks

    Chandan Jha

    Thursday, October 31, 2013 1:06 PM
  • Can someone please reply.

    Thanks

    Chandan

    Thursday, October 31, 2013 7:26 PM
  • Unfortunately, that is a generic java error and does not help diagnose the issue in any way.  There are 100s of reasons for this error, most of which have nothing to do with SQL Server having a problem. It is impossible to guess why you are getting login timeouts.

    I would start here:

    http://www.cubrid.org/blog/dev-platform/understanding-jdbc-internals-and-timeout-configuration/

    Thursday, October 31, 2013 8:27 PM
  • Try disabling the TCP Chimney Offload feature.

    1. Launch regedit.exe
    2. Edit DWORD EnableTCPChimney under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
    3. Edit DWORD EnableRSS under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
    4. Edit DWORD EnableTCPA under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
    5. Restart the server
    Thursday, October 31, 2013 10:31 PM
  • Unfortunately, that is a generic java error and does not help diagnose the issue in any way.  There are 100s of reasons for this error, most of which have nothing to do with SQL Server having a problem. It is impossible to guess why you are getting login timeouts.

    I would start here:

    http://www.cubrid.org/blog/dev-platform/understanding-jdbc-internals-and-timeout-configuration/

    Thanks for your remarks. I agree completely that it may not be a DB engine issue completely but I am more worried about the errors I am getting in Ring Buffer Security and on that, there is very very little documentation about the interpretation of the results.

    Friday, November 01, 2013 6:06 AM
  • Try disabling the TCP Chimney Offload feature.

    1. Launch regedit.exe
    2. Edit DWORD EnableTCPChimney under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
    3. Edit DWORD EnableRSS under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
    4. Edit DWORD EnableTCPA under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, set data value to 0
    5. Restart the server

    Thanks for your answer. I had read about diasbling this feature and thats why I mentioned this feature's configuration in my server while asking the question, even though I am not too sure what this feature does and why it has to be disabled. I can try disabling this, but then making any changes need to be backed up by some proof while getting approval from the management:-)

    Friday, November 01, 2013 6:09 AM
  • The TCP chimney is only used in specific circimstances and if your network card supports it.  It has generally been linked to network issues in Windows 2008.  It results in dropped packets and retries, which your network admin should be able to see.

    To see if it is even being used, please see the section "How to determine whether TCP Chimney Offload is working" here:

    http://support.microsoft.com/kb/951037

    It is unclear if this is your problem or not.  However, it should not cause any problem to turn it off (does not require a reboot) to see if the problem reoccurs.

    Friday, November 01, 2013 1:37 PM
  • The TCP chimney is only used in specific circimstances and if your network card supports it.  It has generally been linked to network issues in Windows 2008.  It results in dropped packets and retries, which your network admin should be able to see.

    To see if it is even being used, please see the section "How to determine whether TCP Chimney Offload is working" here:

    http://support.microsoft.com/kb/951037

    It is unclear if this is your problem or not.  However, it should not cause any problem to turn it off (does not require a reboot) to see if the problem reoccurs.

    I will go through the documentation and play little with the

    chimney feature. But I am really surprised that it has been long that ring buffer security has been introduced but the kind of in-depth documentation required to analyse the results is missing. I can see a lot of TDSInput Buffer errors but then living with it. I dont have any expert in networks in my team and nobody would like to change a NIC without a solid proof.

    Thanks

    Chandan

    Saturday, November 02, 2013 7:18 AM