none
"Failed to read BLOB column" | "Failed to read column data" errors by Snapshot Agent

    Question

  • Hi all,

    I have (or had, rather) the following tranasctional replication topology:

    Publisher 

    • SQL Server 2008 R2 Enterprise Edition (64-bit) (10.50.1777)
    • 16 logical cores
    • 98GB RAM
    • Windows 2008 R2 Enterprise Edition (Service Pack 1)

    Distributor/Subscriber

    • SQL Server 2008 R2 Standard Edition (64-bit) (10.50.2789)
    • 4 vCPUs (VMware vSphere 4.1)
    • 16GB RAM
    • Windows 2008 R2 Datacenter Edition (Service Pack 1)

    Published Database

    • OLTP Database
    • 400GB in size
    • FULL recovery model
    • One Publication with around 65 articles (all tables with the exception of 1 view)

    Subscribed Database

    • SIMPLE recovery model

    Purpose is to offload the SAS reports from the OLTP environment.

    Initially I set this up with 60 articles and EVERYTHING WORKED FINE. No problems whatsoever. (Some of the replicated tables were 10GBs in size.)

    Business then identified an additional 5 tables that they needed to be replicated. So I modified the publication to reflect the new 5 tables and tried to reinitialise the subscription and perform a manual snapshot on the weekend. (One of the tables has 1.6 million records, the rest of them have only 100-200 records in them.)

    And this when the problems started.

    Basically the snaphot would not complete with Snapshot Agent reporting "Failed to read BLOB column" errors (on the tblAccountMedia table which is 64GB in size). This table was part of the initial publication where everything was working.

    In any case I decided to completely remove replication and set it up again from scratch. So I removed the distributor, publisher and subscriber roles as well as the publication and distribution database.

    So when I tried to setup transactional replication again I get the following errors from the Snapshot Agent (which I run manually):

    • "Failed to read BLOB column" 

    As a complete "stab in the dark" I reconfigured the Snapshot Agent's profile to:

    • -BcpBatchSize: 1000000 (Default is 100000)
    • -HistoryVerboseLevel: 3 (Default is 2)
    • -QueryTimeOut: 18000 (Default is 1800)

    So now I am getting the following errors:

    • "Failed to read BLOB column"
    • "Failed to read column data" (This is new, but perhaps it's a reflect of the -HistoryVerboseLevel switch.)

    Specifically the errors are:

    Error messages:

    Message: Failed to read BLOB column
    Stack: at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.ThrowNativeBcpOutException(CConnection* pNativeConnectionWrapper)
    at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.BcpOut(String strBcpObjectName, String strBcpObjectOwner, String strBaseBcpObjectName, Boolean fUnicodeConversion, String strDataFile, String strLoadOrderingHint, String strWhereClause, Boolean useTableLockHint, Int32 bcpFileFormatVersion)
    at Microsoft.SqlServer.Replication.Snapshot.SqlServer.BcpOutThreadProvider.DoWork(WorkItem workItem)
    at Microsoft.SqlServer.Replication.WorkerThread.NonExceptionBasedAgentThreadProc()
    at Microsoft.SqlServer.Replication.AgentCore.BaseAgentThread.AgentThreadProcWrapper() (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0

    Message: Protocol error in TDS stream
    Stack: (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0

    Message: Protocol error in TDS stream
    Stack: (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0

    Message: Protocol error in TDS stream
    Stack: (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0

    Message: Unspecified error
    Stack: (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0

     And:

    Error messages:

    Message: Failed to read column data
    Stack: at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.ThrowNativeBcpOutException(CConnection* pNativeConnectionWrapper)
    at Microsoft.SqlServer.Replication.Snapshot.SqlServer.NativeBcpOutProvider.BcpOut(String strBcpObjectName, String strBcpObjectOwner, String strBaseBcpObjectName, Boolean fUnicodeConversion, String strDataFile, String strLoadOrderingHint, String strWhereClause, Boolean useTableLockHint, Int32 bcpFileFormatVersion)
    at Microsoft.SqlServer.Replication.Snapshot.SqlServer.BcpOutThreadProvider.DoWork(WorkItem workItem)
    at Microsoft.SqlServer.Replication.WorkerThread.NonExceptionBasedAgentThreadProc()
    at Microsoft.SqlServer.Replication.AgentCore.BaseAgentThread.AgentThreadProcWrapper() (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0

    Message: Protocol error in TDS stream
    Stack: (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0
    Message: Protocol error in TDS stream
    Stack: (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0
    Message: Protocol error in TDS stream
    Stack: (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0
    Message: Failed to read from server
    Stack: (Source: MSSQLServer, Error number: 0)
    Get help: http://help/0

    Does anyone have any suggestions on what to do here?

    (Here are some pretty pictures...)

     


    Monday, November 14, 2011 2:19 AM

All replies

  • Hi SQL Ranger,

    At first glance it looks like it is failing during the BCP process.

    We need to enable verbose agent logging for the Snapshot Agent to determine exactly where it is failing.  For the Snapshot Agent job, add the parameters -OutputVerboseLevel 2 -Output C:\TEMP\snapshotagent.log.  Then re-run the Snapshot Agent and collect the log.

    Also, are you able to manually bcp out the 5 tables you added to your publication by any chance?


    Hope this helps.
    www.sqlrepl.com

    Monday, November 14, 2011 2:46 AM
  • Thanks,

    Yes, that would seem to be the case that BCP is the culprit.

    It seems to be failing for tables that did work in the past.

    Not for the new tables. In fact I just BCP'ed those 5 tables fine. They are tiny.

    So I have reconfigured the job as you suggested. Will have to wait until after-hours to run it, as it blocks processes during business hours.

    There is a possibility that this is related to a bug fix in SQL Server 2008 R2 RTM Cumulative Update 10 (Build 10.50.1807) (http://support.microsoft.com/kb/2591746)

    Specifically I am assuming "FIX: High memory usage when you run Replication Snapshot Agent in SQL Server 2008 or in SQL Server 2008 R2" (http://support.microsoft.com/kb/2606301)

    But this will not be ported to SQL Server 2008 R2 SP1 Cumulative Update 4 until mid-December.

    Meanwhile I need to exhaust other avenues in case this does not solve our problem, or it is unrelated.

    I will report the output of the Snapshot Agent when it is generated.

    Monday, November 14, 2011 3:41 AM
  • Right.  I have also seen this Connect item as well and was wondering if it was at all related.

    The OP for the Connect item mentioned that he had some success after several retries.

    If still no luck, one option would be to apply SP2, then CU6, if you're able to.  You may want to confirm with CSS that this is the issue you're facing before going this route.


    Hope this helps.
    www.sqlrepl.com

    Monday, November 14, 2011 3:51 AM
  •  

    Thanks again Brandon.


    I am learning very quickly about, what I perceive to be, the“flakiness” replication tools. (It would be good if all the replication tools offered the capability of generating the T-SQL, as they have “hung” a number of times.)

     

    Have gotten some very weird error messages which were COMPLETELY unrelated to the root cause of the problem.

     

    In any case, I did manage to get this to work last night.

     

    So the Snapshot Agent did generate the schema and BCP files for all 65 tables (120GB in total).

     

    However, for whatever reason only 37 of the tables have been replicated at the subscriber database.

     

    How this happened I do not know! Replication thinks everything is fine!


    My battle with Replication continues… 

    Tuesday, November 15, 2011 7:59 AM
  • Hi all,

    By way of keeping people in the loop of what has happened to help out people in the future this is what has happened:

    The next night I completely uninstalled the Replication architecture and set it again from scratch.

    This time round the snapshot worked on the 4th retry.

    So, sure... my problem might be related to:

    • Bug in product that will be fixed in SQL Server 2008 R2 SP1 CU4
    • Disk subsystem on VMware environment not being quick enough for 120GB snapshot to be generated
    • Network I/O problems (very doubtful in my mind)

    But I am suspecting more of a contention problem. BCP was not able to complete due to a locking problem, timed out and through this generic error.

    So everything seems to be working fine. Except one thing...

    The "Undistributed Commands" tab for the subscription is growing even thought everything is being replicated fine. So I need to investigate that further. Assume it's related to the "Immediate_Sync" option of the publication....

     

     



    Wednesday, November 16, 2011 6:26 AM
  • Hi SQL Ranger,

    I believe a fix for your issue has been released:  http://support.microsoft.com/kb/2606301

    And it has been included in SQL Server 2008 R2 SP1 CU4:  http://support.microsoft.com/kb/2633146
    Brandon Williams (blog | twitter)
    Monday, January 30, 2012 7:15 PM