locked
Bulk Data Transfer resulting in Server Crash. RRS feed

  • Question

  • We have been having this problem since past few weeks.

    When ever we try to copy the data from one server to another (one or both mirrored), the database becomes unresponsive or get extremly slow and all the services on that server crash! This continues to happen 5-10min after the data copy job is finished/ forcefully terminated. After the said 5-10 mins, the database comes back live and all the services resume.

    For example, today, I was trying to copy 2 big tables (700,000+ records each) at one go using the 'SQL Server Import Export Wizard' when the crash occurred! The problem could not be network related as the servers in question are located on a very close network and the network speed is 1 GB/sec!

    In addition, this just doesnt happen while bulk data transfer, but also while running Data Transformation commands on huge data sets - using locks does not help.

    All this seems to be mirroring related, as the above mishap does not occur when the databases are not mirrored or held on the pause mode.

    There is no error recorded in the system logs, but our server monitoring tool 'Whats up Gold' gives out an error code of 80004005

    The following are system configurations:

    *System
    Microsoft Windows Server 2003 R2, Standard X64 edition, Service pack 2.

    *Computer
    Intel (R) Xeon(R) CPU, 2.50 GHz, 32 GB RAM

    * SQL Version:
    Microsoft SQL Server 2005 - 9.00.4035.00 (X64)   Nov 24 2008 16:17:31   Copyright (c) 1988-2005 Microsoft Corporation  Standard Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)

    * The server are mirrored with witness.

    *The mirror mode: High safety with automatic failover (synchronus) 

    *The autogrowth on the databases is 1MB, unrestricted growth.

    I have tried looking for an answer, but no avail. Any leads that could lead to the solution will be greatly appreciated. In case you need any additional info to diagnoze the problem, I'll glady provide it.

    Thanks.
    Monday, April 20, 2009 11:54 PM

All replies

  • Mirroring likely isn't the cause of the crash - but it will consume constant resources on the server, so it may contribute to the resource drought you're seeing when you copy data (especially given synchronous mirroring).

    A couple of questions:
      - Are your data AND LOG files sufficiently large to allow the bulk transfer to finish without autogrowing?  Autogrowth is a very expensive operation, and 1MB autogrowth is just far too small.
      - Have you tried batching the transfer?  Say, 50k rows at a time?

    Note that if you "forcefully terminate" the data transfer, the server will continue to experience resource pressure because of the time required to roll back any open transactions.
    Aaron Alton | thehobt.blogspot.com
    Tuesday, April 21, 2009 4:12 AM
  •  "- Are your data AND LOG files sufficiently large to allow the bulk transfer to finish without autogrowing? "

    I am not sure if the current settings are sufficiently large. But, I can give you the settings for one of our mirrored database to give you an idea.

    the .mdf file - 15,952,320 kb
          .ldf   file - 996,544 kb

    In the database properties

          DATA File:
          Initial size = 15579 MB
          Data Autogrowth =  1MB unrestricted

          LOG file:
          Initial size = 974 MB
          Autogrowth = By 10 percent, restricted growth to 2097152

    "Autogrowth is a very expensive operation, and 1MB autogrowth is just far too small."

    On the basis of the info provided above, what do you think, the autogrowth be set to?

    "- Have you tried batching the transfer?  Say, 50k rows at a time?"
    Actually, no. I was in a rush to get the data in.

    I have a qn in this regard.
    Which server faces more overload - the one data is being copied from or the one the data copied into? In other words, which operation is more labour intensive for the server, data pulling or data pushing?

    Also, following is the error log sent be the server monitoring tool, when the said incident occurs.


    --------------------------------------------------------------------
    MonSql:Up=[ MonSql:CPU_MAX(OK) MonSql:MEMORY_MIN(OK)

    MonSql:DISK_READ_MAX(OK) MonSql:DISK_WRITE_MAX(OK)
    MonSql:DISK_FREE_SPACE_MIN(OK) MonSql:SYSTEM_MAX(OK)
    MonSql:MSSQLSERVER(Running) MonSql:SQLServerAgent(Running)
    MonSql:MSDTC(Running) MonSql:OLAP_ENABLED(Running)
    MonSql:INTEGRATION_ENABLED(Running) MonSql:FULLTEXTSEARCH_ENABLED(Running)
    MonSql:BROWSER_ENABLED(Running) MonSql:SQLWRITER_ENABLED(Running)]MonSql:Down=[ MonSql:Alerts(Error=80004005 ErrorMessage=Unspecified error Description=Timeout expired)] MonSql:UpSinceLast=[] MonSql:DownSinceLast=[ MonSql:Alerts(Error=80004005 ErrorMessage=Unspecified error Description=Timeout expired)] MonSql:CPU_MAX=[OK] MonSql:MEMORY_MIN=[OK] MonSql:DISK_READ_MAX=[OK] MonSql:DISK_WRITE_MAX=[OK] MonSql:DISK_FREE_SPACE_MIN=[OK] MonSql:SYSTEM_MAX=[OK] MonSql:BUFFERS_READ_MAX=[Not checked] MonSql:BUFFERS_WRITE_MAX=[Not checked] MonSql:CACHE_HITRATIO_MIN=[Not checked] MonSql:CACHE_USECOUNT_MIN=[Not checked] MonSql:LOCKS_NUMWAIT_MAX=[Not checked] MonSql:LOCKS_AVGWAIT_MAX=[Not checked] MonSql:TRANSACTIONS_ACTIVE_MAX=[Not checked] MonSql:TRANSACTIONS_PERSEC_MAX=[Not checked] MonSql:USERS_LOGINPERSEC_MAX=[Not checked] MonSql:USERS_CONNECTIONS_MAX=[Not checked] MonSql:MSSQLSERVER=[Running] MonSql:SQLServerAgent=[Running] MonSql:MSDTC=[Running] MonSql:MSSearch=[Not checked]

    MonSql:Alerts=[Error=80004005

    ErrorMessage=Unspecified error

    Description=Timeout expired]

    MonSql:OLAP_ENABLED=[Running]

    MonSql:REPORTS_ENABLED=[Not checked]

    MonSql:INTEGRATION_ENABLED=[Running]

    MonSql:FULLTEXTSEARCH_ENABLED=[Running]

    MonSql:BROWSER_ENABLED=[Running]

    MonSql:ADHELPER_ENABLED=[Not checked]

    MonSql:SQLWRITER_ENABLED=[Running]

    MonSql:[SQLServer:Locks \ Lock Requests/sec]=[Not checked] MonSql:[SQLServer:Locks \ Lock Timeouts (timeout > 0)/sec]=[Not checked] MonSql:[SQLServer:Locks \ Lock Timeouts/sec]=[Not checked] MonSql:[SQLServer:Locks \ Lock Wait Time (ms)]=[Not checked] MonSql:[SQLServer:Locks \ Lock Waits/sec]=[Not checked] MonSql:[SQLServer:Locks \ Number of Deadlocks/sec]=[Not checked]
    -------------------------------------------------------------------------

     

     

     

     







     

    Tuesday, April 21, 2009 3:54 PM