none
SS2005 to SS2008 upgrade: sqlservr crashes with various Access Violations when querying Linked Server

    Domanda

  • Environment

    We have everything in production on Windows Server 2003, SS 2005 and a Third party ODBC driver for a legacy, ETL process that issues OPENQUERY selects against a Linked Server.

     

    We are upgrading to Windows Server 2008 R2, SS 2008 and have the following environment:

    Windows Server 2008 R2 Standard – Service Pack 1 (64-bit) (6.1.7601)

    Windows Firewall is disabled.

    No third party antivirus or anti-spyware software is installed.

     

    Third party 32-bit ODBC Driver with a 32-bit System DSN configured as a Linked Server

    HWEDACSP is the configured DSN connecting to a remote server.

     

    SQL Server 2008, 32 Standard SP3 CU 3

    The server is running as LocalSystem.

    Right now only service running is MSSQLSERVER.

    Shared Memory, Named Pipes and TCP/IP are configured with Configuration Manager and no aliases are defined.

    The SPNs are registered appropriately at start-up

    [MSSQLSvc/FASDMAIN.xxx.xxx]

    [MSSQLSvc/FASDMAIN.xxx.xxx:1433]

    SQL server and the DSN server are in separate domains.

    The Linked server is configured as:

     

    EXEC master.dbo.sp_addlinkedserver @server = N’LIVE’, @srvproduct=N'iWay 7.1 ODBC Driver', @provider=N'MSDASQL', @datasrc=N'HWEDACSP', @provstr=N'MSDASQL', @catalog=N'TEST'

     

    EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N’LIVE’,@useself=N'False',@locallogin=NULL,@rmtuser=N'########',@rmtpassword='########'

     

    Data Access, RPC and RPC OUT are enabled.

    Connect Timeout is set to 0.

    Query Timeout is set to 45.

    The MSDASQL provider is configured with “Allow inprocess” checked, everything else is off.

    We can ping the DSN’s IP address and hostname.

    We can telnet to the IP address or hostname.

    All logins to SQL Server during testing are as sa.

    Problem

    When issuing OPENQUERY statements against the Linked Server from SSMS, Wireshark shows communication is established to HWEDACSP and it responds with data. So, there isn’t an authentication or authorization problem. Now, here’s where the fun begins. After sending several queries and receiving data, SQL Server crashes with messages like the following in exception.log before the script completes:

     

    03/20/12 15:04:16 spid 52 Exception 0xc0000005 EXCEPTION_ACCESS_VIOLATION writing address 0724C000 at 0x76518DD2

     

    SQL Server does not produce a SQLDump.

    Nothing is recorded in ERRORLOG.

     

    Note that while debugging this issue the behavior did vary some based on the number of OPENQUERY statements. Running small scripts can succeed, even if they are executed multiple times. Long scripts that contain multiple copies of an original script can fail at the same location as the original or later. When sqlservr fails varies, but I can pretty easily get it to fail.

     

    With Debug Diagnostics Tool 1.2, we configured a crash rule for sqlserv.exe to watch for First Chance Exceptions on Access Violations. The configuration captures up to 10 Full Userdumps.

     

    Lately, when the error occurs, we receive the 10 First Chance dumps followed by a Second Chance exception (C0000374) on a strcat() in the Third Party ODBC driver.

     

    The First Chance exceptions are like:

     

    rpcrt4!NDRCContextBinding+13 in C:\Windows\SysWOW64\rpcrt4.dll from Microsoft Corporation has caused an access violation exception (0xC0000005) when trying to read from memory location 0x00000003 on thread 46

     

    The First Chance dump stack traces are like:

     

    Function

    rpcrt4!NDRCContextBinding+13

    rpcrt4!RpcSsDestroyClientContext+30

    sspicli!DisconnectRpcConnection+62

    sqlservr!CSECAutoImpersonateForDQ::~CSECAutoImpersonateForDQ+101

    sqlservr!COledbConnect::CreateSession+12fc

    sqlservr!COledbConnect::GetSession+325

    sqlservr!COledbRangeRowset::GatherRmtSchema+17a

    sqlservr!COledbRangeRowset::DoNormalize+40

    sqlservr!COledbRange::Normalize+59

    sqlservr!CAlgTableMetadata::FPartialBind+575

    sqlservr!CAlgTableMetadata::Bind+22f

    Etc.

    ntdll!_RtlUserThreadStart+1b

     

     

    The Second Chance exception is:

     

    ntdll!RtlReportCriticalFailure+57 in C:\Windows\SysWOW64\ntdll.dll from Microsoft Corporation has caused an unknown exception (0xc0000374) on thread 38

     

    The Second Chance dump stack trace is like:

     

    This thread is blocked by an unhandled exception

     

    Function

    ntdll!RtlReportCriticalFailure+57

    Etc.

    ntdll!RtlDispatchException+127

    ntdll!KiUserExceptionDispatcher+f

    msvcrt!strcat+89

    edaodb3x!SQLPutData+18b

    odbc32!ToAnsi_SQLPrepare+91

    odbc32!TraceSQLPrepareW+27a

    odbc32!SQLPrepareW+7d

    msdasql!CHstmtNode::Prepare+56

    msdasql!CImpICommandPrepare::Prepare+32e

    sqlservr!CallProtectorImpl::CallWithSEH<IWrapInterface<ICommandPrepare>::CallTraitsFpuPre,long,MethodCallBinder_1<long,ICommandPrepare,long (__stdcall ICommandPrepare::*)(unsigned long),unsigned long> const >+54

    sqlservr!CallProtectorImpl::CallExternalFull<IWrapInterface<ICommandPrepare>::CallTraitsFpuPre,long,MethodCallBinder_1<long,ICommandPrepare,long (__stdcall ICommandPrepare::*)(unsigned long),unsigned long> const >+bb

    sqlservr!IWrapInterface<ICommandPrepare>::HrCallExternal<MethodCallBinder_1<long,ICommandPrepare,long (__stdcall ICommandPrepare::*)(unsigned long),unsigned long> >+40

    sqlservr!CWrapICommandPrepare::Prepare+e2

    sqlservr!COledbRangeRowset::GatherColumnSchemaFromQuery+97

    Etc.

    ntdll!_RtlUserThreadStart+1b



    Recovered stack for thread 38

     

    Function

    msvcrt!strcat+89

    edaodb3x!SQLPutData+18b

    odbc32!ToAnsi_SQLPrepare+91

    odbc32!TraceSQLPrepareW+27a

    odbc32!SQLPrepareW+7d

    Etc.

    ntdll!_RtlUserThreadStart+1b

     

    For the stack traces above, SQL Server Tracing shows that the last successful OLEDB Call Event to the MSDASQL Provider was the Method IDBProperties::GetProperties.

     

    The first, First Chance exception for this run was at 09_33_41AM__913. This is before the Third parties ODBC trace utility and Wireshark show as the beginning of the communication. The SYN is at 09:33:45. This seems to correspond to about the third or fourth First Chance exception.

     

    The last, First Chance exception for this run was at 09_33_56AM__878.This seems to correspond to the second IDBCreateSession::CreateSession call in the SQL Server trace.

     

    The Second Chance exception for this run was at 09_34_01AM__266. From the Third parties ODBC trace utility log and Wireshark capture; twenty statements were executed from the script in this timeframe.

     

    The Third parties ODBC trace utility stops a bit earlier in the communication. It stops after the last query was sent. But, it does not receive the reply.

     

    Wireshark shows our server, FASDMAIN (XX.XX.XX.47), sending the last query; and then it receives a reply from HWEDACSP (XX.XX.XX.194). FASDMAIN sends the appropriate ACK and then a RST to close the connection:

     

    1624           15:50:11.740065            XX.XX.XX.47    XX.XX.XX.194  TCP      263       55790 > 49221 [PSH, ACK] Seq=5332 Ack=9886 Win=65280 Len=209

    1625           15:50:11.778581            XX.XX.XX.194  XX.XX.XX.47    TCP      352       49221 > 55790 [PSH, ACK] Seq=9886 Ack=5541 Win=130848 Len=298

    1630           15:50:11.979968            XX.XX.XX.47    XX.XX.XX.194  TCP      54         55790 > 49221 [ACK] Seq=5541 Ack=10184 Win=65024 Len=0

    1649           15:50:12.760122            XX.XX.XX.47    XX.XX.XX.194  TCP      54         55790 > 49221 [RST, ACK] Seq=5541 Ack=10184 Win=0 Len=0

     

    It looks like the crash is happening between ids 1624 and 1625. During this time, there is no other communication from FASDMAIN on the wire.

     

    On this run the statement that was last sent is:

     

                      SELECT

                            SUBSTRING(DESC_TBLE_KEYINFO,1,1),

                            TITLE_DESC_TBLE

                      FROMOPENQUERY(LIVE,

                            'SELECT

                                  DESC_TBLE_KEYINFO,

                                  TITLE_DESC_TBLE

                             FROM XXXXXX.DESCTSTA

                             WHERE TBLE_ID=''24''')

     

    The next statement would be:

     

                      EXECUTE ('SELECT TBLE_ID FROM OPENQUERY(LIVE, '+

                                  '''SELECT TBLE_ID '+

                                  'FROM XXXXXX.DESCTSTA

                                   WHERE TBLE_ID=''''25'''' '')')   

     

    If you isolate the query, of course, it executes fine.

     

    You can test the Linked Server, and it will succeed. If might throw a few First Chance Exceptions; but the server stays up.

     

     

    The Second Chance Exception is clearly in the Third party ODBC Driver. Not much can be done about it without a new version from the vendor.

     

    Note: I’ve actually seen the server go down without capturing a Second Chance exception!

     

    The First Chance Exception is different. SS’s OLE DB is trying to create a session and SSPI seems to disconnect the effort.  So, why does this occur?

     

    My understanding is that SSPI is used by SS for the login. But we’re way past the authentication and authorization phase in the communication with HWEDACSP. 20 – 30 queries are executed against HWEDACSP before SS dies. The Linked Server is not setup to use Kerberos; it uses a simple Challenge/Response.

     

    Along the way, we’ve tried/done:

    Applied the latest service packs we could find

    (2637692) = RPC threads may stop responding = has 6.1.7601.21849

    (2619234) = RCP over HTTP = has 6.1.7601.21855

    Updated the Network Drivers

    Verified that the network TCP Chimney, RSS and TCPA configurations didn’t apply

    Verified TCP/IP Offload didn’t apply

    Verified enabling Service Broker didn’t apply

    Verified DEP isn’t getting in the way

    Hit the books/postings

     

    Any input would be greatly appreciated!

    lunedì 2 aprile 2012 17:28

Risposte

  • Hi Brian Freeman,

    >>With the Second Chance exception clearly in the Third Party's driver, I was thinking that it was the cause.

    Thank you for your update. Based on my research if you have setup a linked server by using the MSDASQL provider, the issue you have might be caused by a 3rd party DLL. There was an access violation of a call from pgoe1023.dll in this case.

    Exception Access Violation in SQL, please refer to here.

    Please follow a workaround to run this DLL outside of the SQL Server process, for more information please refer to this blog: OLE DB provider “MSDASQL” - The provider reported an unexpected catastrophic failure

    If you are using SSIS, any problem you have please link to SQL Server Integration Services forums 


    Regards, Amber zhang

    • Contrassegnato come risposta Brian Freeman lunedì 16 aprile 2012 16:15
    mercoledì 11 aprile 2012 05:54
  • Hi,

    Here's an update/recap.

    The solution wound up being a combination of configuring the "iWay 7.1 ODBC Driver" to use Cautious Memory Allocation and running the MSDASQL driver out of process.

    For our configuration, merging the following registry file (.reg) configured the driver:

    Windows Registry Editor Version 5.00 ODBC_OPTB bit: 00000000000000000000000100000000 = Cautious Memory Allocation

    [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ODBC\ODBC.INI\HWEDACSP] "ODBC_OPTB"="01000000000001000000000100000000" [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ODBC\ODBC.INI\HWEDACST] "ODBC_OPTB"="01000000000001000000000100000000" [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ODBC\ODBCINST.INI\EDAODBC] "ODBC_OPTB"="01000000000001000000000100000000"

    The HWEDACSP and HWEDACST entries are the servers our DSNs point too. The setting on EDAODBC did the trick.

    We are now stable on both SS 2008 x86 and SS 2012 x86.

    Thanks again Amber,

    Brian



    lunedì 16 aprile 2012 16:13

Tutte le risposte

  • Hi Brain Freeman,

    Regarding to your description and some error message you provided, the exception seems to be due to an operation which the ODBC driver has initiated.  According to the RPC driver (Rpcrt4.dll) exception you provided, which might be related to an RPC call that involves the callback functions incorrectly and then caused by a memory leak on the RPC client. Additionally, an access violation occurs when the RPC client tries to use the buffer again.
    For more information and  Resolution, please refer to these KB as below:
    1. Access Violation in RPCRT4.DLL When Pickling Buffered RPC Data
    2. An access violation exception occurs in the "NdrClientCall2" function in Windows Vista or in Windows Server 2008

    According the First-chance exception, there are lots of things that can cause this but not actually bugs, it is a just a first notification form Debugger, you maybe deal with this issue in the latter code. Please refer to this KB article about First-chance exception:

    Meanwhile you can also skip this Access Violation exception in the debugger by unchecking Thrown checkbox under Tools > Exceptions > Win32 Exceptions > c000005 Access Violation. And the access vialation will be not thrown again.

    For more information, please refer to  these threads:
    1. NTDLL.dll access violation
    2. First-chance exception at 0x7c9194fe (ntdll.dll) in my application

    As you mentioned there is no SQL error logged, according to the exception which might be related to IIS issue. Please link to this forums.


    Regards, Amber zhang

    mercoledì 4 aprile 2012 06:26
  • Thanks Amber!

    I'll read through the references you provided.

    And yes, the First Chance exception might not be an issue. I'm hoping to get Reflector or similar in here to look at the code; but where I'm at blocks the download. I'll have to work around that ;)

    With the Second Chance exception clearly in the Third Party's driver, I was thinking that it was the cause. Hopefully negotiations with them will progress.

    I'll follow up with any updates, and thanks again!

    Brian

    mercoledì 4 aprile 2012 15:23
  • I've been able to debug the issue a bit more and it is clear to me that the Third party driver, MSDASQL and SS 2008 are not happy together.

    I'm still confused as to why when MSDASQL is trying to create a session, SSPI seems to disconnect the effort; but I am willing to abandon it given additional findings:

    I  started looking at the DLLs and ultimately comparing ODBC support in MSDASQL and the Third Party driver. Along the way I found reference to a “Cautious Memory Allocation” option in their UNIX documentation. The Windows documentation shows it grayed out in the options dialog. Go figure.

    A long story short, after hacking the Registry settings there are no crashes for scripts that transfer a "small" amount of data. SS stays up and all is well. The First Chance Access Violations in RPCRT4 do happen; but only on the initial connection to HWEDACSP after a SS restart. For successive queries/scripts they do not occur.

    For scripts that transfer larger amounts of data, 25,000 rows and up; SS is able to catch the driver dieing. Most of the dumps have been Access Violations. There was even one EXCEPTION_PRIV_INSTRUCTION caused by the driver in sqlserv parser::checkIntegerStr.

    Interestingly, with the Third Parties ODBC Test tool, all of these queries succeed - even when ran through the same session. At this point, I'm not sure if the test tool is going through the DSN or bypassing it and ODBC all together. The breakdown does not appear to be at the driver/OS level. It is somewhere in the Driver/ODBC provider/SS communication.

    The driver is Copyright 2001 and last built in 2007. It requires MDAC 2.1 and we're running it against MDAC 2.82.3959.0 successfully on Server 2003.

    I don't know of anything else I can try on the ODBC Provider/SS side of things, any ideas?

    One thought I had was to go up to 2012. I've heard that SSIS has a different ODBC code-base. Can anyone confirm that?

    Thanks,

    Brian

    lunedì 9 aprile 2012 18:18
  • Hi Brian Freeman,

    >> Windows Server 2008 R2 Standard – Service Pack 1 (64-bit) (6.1.7601)
    >>Third party 32-bit ODBC Driver with a 32-bit System DSN configured as a Linked Server
    >>Third party driver, MSDASQL and SS 2008 are not happy together

    Regarding to your description, it seems that 64-bit SQL Server cannot connect to a linked server via a 32-bit ODBC driver.
    Please refer to the following article which discusses how to connect to a SQL server on 64-bit SQL Server in detail.

    Alternatively, you need to do is download a 64-bit ODBC driver and connect via 64-bit MSDASQL. You can download from here: 64-Bit OLEDB Provider for ODBC (MSDASQL)

    Based on my research 64-bit OLEDB Provider for ODBC (MSDASQL) Is Now Available For Windows Server 2003, for more information please read here.


    Regards, Amber zhang

    martedì 10 aprile 2012 08:09
  • Thanks Amber,

    I've gone through those articles and don't believe they apply with this configuration.

    We're running:

    >> SQL Server 2008, 32 Standard SP3 CU 3

    So, it's 32 bit SS to 32 bit MSDASQL to 32 bit driver. We ran into the 64 bit SS to 32 bit driver issue and they dropped down to 32 bit SS.  Ultimately, I'd like to see SS 2012 64 bit and SSIS 2012 to do the ETL; but we're here now...

    I've been trying to confirm that we're picking up the right MSDASQL libraries. They're down in: C:\Program Files (x86)\Common Files\System\Ole DB, right?

    I need more caffeine, first ;)

    martedì 10 aprile 2012 15:56
  • Ok, got swamped there for a bit.


    From a previous crash, it looks like we're picking up the right files.

    LoadedModule[88]=C:\Program Files (x86)\Common Files\System\Ole DB\msdasql.dll
    LoadedModule[90]=C:\Program Files (x86)\Common Files\System\Ole DB\MSDASQLR.DLL

    martedì 10 aprile 2012 22:06
  • Hi Brian Freeman,

    >>With the Second Chance exception clearly in the Third Party's driver, I was thinking that it was the cause.

    Thank you for your update. Based on my research if you have setup a linked server by using the MSDASQL provider, the issue you have might be caused by a 3rd party DLL. There was an access violation of a call from pgoe1023.dll in this case.

    Exception Access Violation in SQL, please refer to here.

    Please follow a workaround to run this DLL outside of the SQL Server process, for more information please refer to this blog: OLE DB provider “MSDASQL” - The provider reported an unexpected catastrophic failure

    If you are using SSIS, any problem you have please link to SQL Server Integration Services forums 


    Regards, Amber zhang

    • Contrassegnato come risposta Brian Freeman lunedì 16 aprile 2012 16:15
    mercoledì 11 aprile 2012 05:54
  • Thanks Amber!

    That might have just done it!

    I hadn't tried operating MSDASQL out of process since adding the “Cautious Memory Allocation” option to the driver. I just tried and was able to run through our complete ETL process!

    BTW: Yes, the Access Violation's were occurring in edaodb3x, one of the Third Parties DLLs. With the addition of the “Cautious Memory Allocation” option to their driver, SQL Server was staying up and catching the violations, catastrophic failures and the like.

    We did receive an update from the vendor late yesterday; but looking at the versions on the files, there hasn't been an update to this drivers DLLs. The update is simply a new build.

    Thanks for the push to try it out of process again.

    I'm off to do more testing.

    Brian

    mercoledì 11 aprile 2012 16:23
  • Hi,

    Here's an update/recap.

    The solution wound up being a combination of configuring the "iWay 7.1 ODBC Driver" to use Cautious Memory Allocation and running the MSDASQL driver out of process.

    For our configuration, merging the following registry file (.reg) configured the driver:

    Windows Registry Editor Version 5.00 ODBC_OPTB bit: 00000000000000000000000100000000 = Cautious Memory Allocation

    [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ODBC\ODBC.INI\HWEDACSP] "ODBC_OPTB"="01000000000001000000000100000000" [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ODBC\ODBC.INI\HWEDACST] "ODBC_OPTB"="01000000000001000000000100000000" [HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\ODBC\ODBCINST.INI\EDAODBC] "ODBC_OPTB"="01000000000001000000000100000000"

    The HWEDACSP and HWEDACST entries are the servers our DSNs point too. The setting on EDAODBC did the trick.

    We are now stable on both SS 2008 x86 and SS 2012 x86.

    Thanks again Amber,

    Brian



    lunedì 16 aprile 2012 16:13