locked
DBCC OLEDB Bottleneck? RRS feed

  • Question

  • Is there anything I can do for a OLEDB wait type?  

    After restarting my instance, I started a DBCC checkdb on a 200 GB database that uses the spatial data type.  A couple of tables have 250 million rows.  There are no indexes on computed columns or filtered indexes.  It's been running about 2 hours and the percent complete is about 2%.  The estimated completion date (4 days from now) has been increasing by about and hour a minute.  It might be weeks before it completes.  (I'm using a db restored on my desktop - 6-core Zeon, 16 GB, and 2 SSDs - with SQL 2012 Developer SP2.  I enabled trace flags 2562 and 2549 enabled.  Compatibility on the db was changed to 2012 from 2008.) 

    I looked at the wait stats and see the OLEDB wait type at almost 100%.  When I look at the activity monitor "Resource Waits", the only wait I see is Network IO - about 1 sec per sec.  "Data File IO" is at 0.0 MB/sec with response times normally at 0 ms, but can be 2 ms from time to time.  CPU use is about 10%.  My machine feels idle. 

    An AVG scan recently started and just finish.  It did not seem to slow anything down or change the waits.  But that makes me wonder.  Can a real-time virus scanner be a source of the OLEDB waits?  Would it also show up as network io wait? 

    Any advice that can help me find the bottleneck would be appreciated.  It would great if I could move the bottleneck to CPU or disk. 

    Thanks.


    Randy in Marin

    Thursday, June 19, 2014 8:31 PM

All replies

  • Hi,
    DBCC Checkdb is resource intensive and should be run when system is under minimum load or during maintenance window. But what you said seems strange checkdb does takes time but this much is not acceptable. Is your system heavily accessed when you started Checkdb?

    OLEDB wait types are related to Linked servers I have environment where lot of distributes and Linked server query runs and this wait type was prominent but I don't think Checkdb has anything to do with OLEDB wait type.

    Are you sure no query is accessing when you ran checkdb ? Any query that uses linked server ?


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.

    My TechNet Wiki Articles

    Thursday, June 19, 2014 9:42 PM
  • My system is not busy with anything else and seems to be almost idle with the dbcc checkdb.  DBCC is all that I run on the instance.  I want my system to be pounded by the dbcc if possible so that it can complete more quickly.  After a few hours, the estimated completion time is now about a week away. 

    I don't have linked servers or a monitor using DMVs that would cause the OLEDB wait.  The OLEDB is a known wait for DBCC Checkdb.  I don't know what to do with it. 

    http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

    http://sqlsailor.com/2012/04/05/oledb-wait-and-dbcc-an-interesting-discussion/

    http://www.sqlskills.com/blogs/paul/dbcc-checkdb-performance-and-computed-column-indexes/

    The last link is why I mentioned indexes with computed columns. 


    Randy in Marin

    Thursday, June 19, 2014 11:10 PM
  • BTW, we did try to run this in production, but it would not finish.  Running on my desktop is faster.  I'm able to dedicate all it's resources to the DBCC.  Perhaps I need to look outside of SQL.  If OLEDB is central to DMV usage, then perhaps OLEDB activity is something that SQL can't provide detail about.  Would it be like a monitor that monitors itself?  Are there performance counters that can give detail about OLEDB or DBCC related activities?  I really don't know what to do with this. 

    Randy in Marin

    Thursday, June 19, 2014 11:28 PM
  • Randy,

    Thanks for informing me about this behavior I was not aware. The third article you posted ( which is by Paul Randal) says this is default behavior with Coumputed columns. Or a Non clustered index with computed columns. If you see( which I guess you must have noted) he said that

    And I found 6 nonclustered indexes with computed columns in, on some of the largest tables in the database. I then disabled those nonclustered indexes and re-ran the DBCC CHECKDB tests.

    17-18 minutes per run.

    Can you try this please. Just trying to help you here I am as Novice in this as you are. Internals with checkdb completely lies with Microsoft and one should thank Paul for bringing it out 


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.

    My TechNet Wiki Articles


    Friday, June 20, 2014 8:47 AM
  • We had many issues in our environment wherein Anti-Virus scan or third party backup has caused so many issues for sql server overall performance, hopefully you have put proper exclusion\inclusion lists for anti-virus scan.

    Also, how about db recovery is that happens faster and it is not suffering from too many VLF issues.

    If yes then please setup auto-growth option properly size like 256 MB for log and data may by around 150-200 MB.

    Check also, any log-backup is not going on during DBCC checkdb as it is suggested to execute during no such operations are performed.

    Also you have good free space on data\log disk for database.


    Santosh Singh



    Friday, June 20, 2014 9:05 AM
  • If CheckDB takes a lot of time it might be that it hit corruption.

    If you have a second instance available you may want to offload the work onto a different server. But still it will take time.

    What you could also do is to do only partly checks and maybe this gives you an indication if you really are in trouble with that database.

    For example you could do separately:

    DBCC CHECKALLOC --on the database DBCC CHECKTABLE --on certain critical tables to see if they are the problem DBCC CHECKCATALOG --on the database

    DBCC CHECKDB WITH PHYSICAL_ONLY -- on the database. only physical checks


    What you wan also do is run with option ESTIMATEONLY beforehand. But of course the estimate may still be way off in case of corruption.

    In the end you will have to do the complete check anyway though


    Andreas Wolter (Blog | Twitter)
    MCM - Microsoft Certified Master SQL Server 2008
    MCSM - Microsoft Certified Solutions Master Data Platform, SQL Server 2012
    www.andreas-wolter.com | www.SarpedonQualityLab.com


    Friday, June 20, 2014 10:11 AM
  • Though you dont have indexes on computed columns, could you try running check db with NOINDEX option? It could be not just computed columns but a non-clustered index on some other columns which may be causing the issue.

    http://sqlblog.com/blogs/argenis_fernandez/archive/2013/06/16/dbcc-checkdb-on-vvldb-and-latches-or-my-pain-is-your-gain.aspx

    Worth a try !!


    Regards, Ashwin Menon My Blog - http:\\sqllearnings.com

    Friday, June 20, 2014 10:27 AM
  • I did check for both computed columns and filtered indexes and found none.  However, perhaps it is something caused by the spatial index.  We ran a table check on a table that finished without issue - it took over half a day. 

    The current check is 10% done.  Now the expected finish date is July 8th.  I was hoping to get a full run to verify that corruption is not present.  I don't think it's corruption, however. 

    I think I will let it run for now.  When I get back to work on Monday, I'll request that AVG be disabled.  (It's not something I can do.)  With luck, it will turn out that OLEDB is doing lots of something that looks like file access and the AVG driver is in the middle. 

    Has anybody used the process utilities from sysinternals?  I have not,  but I was wondering if they could check if a AVG driver was introducing a delay. 


    Randy in Marin

    Saturday, June 21, 2014 4:36 PM
  • Ignore everything that Santosh and Shanky are saying - nonsense.

    What are the wait types of the CHECKDB threads when you look with sys.dm_os_waiting_tasks?

    With your DOP and the DB size, it should have been checked in under an hour. Spatial indexes don't cause slowdowns unless you're saying WITH EXTENDED_LOGICAL_CHECKS. Even with all the perf issues I've blogged about in the last year, 200GB shouldn't take more than 3-4 hours.


    • Proposed as answer by Kalman Toth Sunday, June 22, 2014 12:15 PM
    • Unproposed as answer by Paul RandalMVP Sunday, June 22, 2014 1:53 PM
    Saturday, June 21, 2014 9:19 PM
  • Ignore everything that Santosh and Shanky are saying - nonsense.

    What are the wait types of the CHECKDB threads when you look with sys.dm_os_waiting_tasks?

    With your DOP and the DB size, it should have been checked in under an hour. Spatial indexes don't cause slowdowns unless you're saying WITH EXTENDED_LOGICAL_CHECKS. Even with all the perf issues I've blogged about in the last year, 200GB shouldn't take more than 3-4 hours.


    That seems a fake profile of Paul Randal...below is the Original One

    http://social.msdn.microsoft.com/profile/paul%20randal/?ws=usercard-mini


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.

    My TechNet Wiki Articles

    Saturday, June 21, 2014 9:40 PM
  • Probably right. But, the two profiles are mixed up:

    http://social.msdn.microsoft.com/Profile/paulrandal/activity

    MODERATOR: take a look at this issue. Thanks.


    Kalman Toth Database & OLAP Architect SQL Server 2014 Design & Programming
    New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012







    • Edited by Kalman Toth Saturday, June 21, 2014 11:13 PM
    Saturday, June 21, 2014 10:28 PM
  • It looks like he has more than one profile. Both profiles are the real Paul Randal.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Saturday, June 21, 2014 11:22 PM
  • It looks like he has more than one profile. Both profiles are the real Paul Randal.


    Dan Guzman, SQL Server MVP, http://www.dbdelta.com

    Ohh k I was not aware you can keep more than one..why would one require more than one. Anyways its good for Randy I hope he gets solution.

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.

    My TechNet Wiki Articles

    Sunday, June 22, 2014 8:55 AM
  • Ohh k I was not aware you can keep more than one..why would one require more than one. Anyways its good for Randy I hope he gets solution.

    There could be all sorts of reasons. I have two, because at some point I was no longer to access my original profile. If you check my name, there are two spaces between the first name and the last name.

    And, yes, I can testify that it is the real Paul. I asked him to take a look at this thread, because the situation seemed abnormal to me and the advice given appeared to be off the mark. I did not say anything myself, as I did not have anything to say.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, June 22, 2014 9:37 AM
  • Ohh k I was not aware you can keep more than one..why would one require more than one. Anyways its good for Randy I hope he gets solution.

    There could be all sorts of reasons. I have two, because at some point I was no longer to access my original profile. If you check my name, there are two spaces between the first name and the last name.

    And, yes, I can testify that it is the real Paul. I asked him to take a look at this thread, because the situation seemed abnormal to me and the advice given appeared to be off the mark. I did not say anything myself, as I did not have anything to say.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Thank You Erland...:)

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.

    My TechNet Wiki Articles

    Sunday, June 22, 2014 10:28 AM
  • Ignore everything that Santosh and Shanky are saying - nonsense.

    What are the wait types of the CHECKDB threads when you look with sys.dm_os_waiting_tasks?

    With your DOP and the DB size, it should have been checked in under an hour. Spatial indexes don't cause slowdowns unless you're saying WITH EXTENDED_LOGICAL_CHECKS. Even with all the perf issues I've blogged about in the last year, 200GB shouldn't take more than 3-4 hours.


    That seems a fake profile of Paul Randal...below is the Original One

    http://social.msdn.microsoft.com/profile/paul%20randal/?ws=usercard-mini


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.

    My TechNet Wiki Articles

    You wish. Are you the real Shanky?

    Even if it were a fake profile, you advice was still the typical knee-jerk reaction to OLEDB waits and of no relevance to CHECKDB running really slowly. I suggest you read my articles at http://sqlperformance.com/2014/02/sql-performance/knee-jerk-performance-troubleshooting and http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/ which both explain the more common causes of OLEDB waits these days.

    And now back to the real issue on this thread...


    CEO, SQLskills.com (http://www.sqlskills.com/blogs/paul), SQL MVP, Microsoft Regional Director, Contributing Editor of TechNet and SQL Server Magazines, Author of 2005 DBCC CHECKDB/repair. Consulting/Training: http://www.sqlskills.com/

    Sunday, June 22, 2014 1:47 PM
  • You wish. Are you the real Shanky?

    Even if it were a fake profile, you advice was still the typical knee-jerk reaction to OLEDB waits and of no relevance to CHECKDB running really slowly. I suggest you read my articles at http://sqlperformance.com/2014/02/sql-performance/knee-jerk-performance-troubleshooting and http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/ which both explain the more common causes of OLEDB waits these days.

    And now back to the real issue on this thread...



    Thank you Paul for clarifying. Like I mentioned in my post I have not faced(may be not cared to run wait analysis) OLEDB wait types when running checkdb so post was outcome of that experience. I do read your articles on Checkdb( perhaps the only one giving deep knowledge)  and they are very helpful. Regarding your post at first sight I though it was different profile as I was not aware one can create many profiles and was guessing why owner of SQL Server Disaster recovery forum( and once Program manager in MS) would create one, and Erland cleared that. My reaction was just one to keep forum spam free hope you have not taken that in other way.

    Thank you very much for article. And I would say Randy is lucky to have you looking at  his thread. :)


    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.

    My TechNet Wiki Articles


    Sunday, June 22, 2014 1:58 PM
  • Like I mentioned in my post I have not faced(may be not cared to run wait analysis) OLEDB wait types when running checkdb so post was outcome of that experience.


    Then I would respectfully ask that you confine yourself to offering advice in situations where you have experience of the problem being discussed and understand the possibilities involved. Otherwise you're just causing confusion for the OP and wasting people's time. Thanks


    CEO, SQLskills.com (http://www.sqlskills.com/blogs/paul), SQL MVP, Microsoft Regional Director, Contributing Editor of TechNet and SQL Server Magazines, Author of 2005 DBCC CHECKDB/repair. Consulting/Training: http://www.sqlskills.com/

    Sunday, June 22, 2014 2:28 PM
  • Randy,

    Did you check and confirm the wait types as asked?

    Did the issues resolved for you or not yet?

    Your confirmation will help here to check things.

    Anyway, its always good to start with wait type analysis and so any other process execution should be effecting your overall Sql Server response time. Some time it helps by setting proper MAXDOP, as requirement varies from time to time as schedule job execution and adhoc request will need its own time to complete as these are different requirement types, and so use resources to get its execution slot and so response time.

    Sometime situation and things do behave differently then what it actually some times.


    Santosh Singh

    Sunday, June 22, 2014 5:32 PM
  • Can you stop spouting nonsense please and just wait for him to reply?

    CEO, SQLskills.com (http://www.sqlskills.com/blogs/paul), SQL MVP, Microsoft Regional Director, Contributing Editor of TechNet and SQL Server Magazines, Author of 2005 DBCC CHECKDB/repair. Consulting/Training: http://www.sqlskills.com/

    Monday, June 23, 2014 4:03 AM
  • Hi Paul.  I suppose if anybody can help with this, it will be you.  Here is a quick status. 

    1. I have not had any AVG exclusions defined yet. 

    2. The job aborted over the weekend with "TCP Provider: The network connection was aborted by the local system.    Communication link failure.  Process Exit Code 1236."  This is a work station, so it could have been network maintenance.  I did have shared memory disabled at one time.  I just insured that shared memory is enabled - it was - and rebooted. 

    3. I just took a step back for a reality check.  I loaded a different production database to my desktop that takes 45 GB.  It normally takes 7 minutes for the checkdb to run on our production server.  (Production is a VM using a new SAN.)  I updated a script I copied from a post by Brent Ozar.  A procedure stores the waits when called.  I call it before and after the CheckDB. 

    I just finished running the CheckDb on the 45 GB database.  All cores were busy - about 60 to 90% most of the time.  Also the disk IO was normally 300-500 MB/sec.  The Checkdb finished in 1 minute 17 seconds.  (I'm impressed with my desktop.  I'm not feeling impressed with our VM and new SAN.)  Here are the wait stat changes. 

    Final Time              Duration(Sec) wait_type                           WaitTime (Sec) Number Waits Avg ms/Wait

    ----------------------- ------------- ----------------------------------- -------------- ------------ -----------

    2014-06-23 10:40:16.850 77            CXPACKET                                     944.5       410558      2.301

    2014-06-23 10:40:16.850 77            OLEDB                                        675.7     13238393      0.051

    2014-06-23 10:40:16.850 77            LATCH_EX                                     117.9      1007824      0.117

    2014-06-23 10:40:16.850 77            IO_COMPLETION                                 99.8        55397      1.801

    2014-06-23 10:40:16.850 77            SOS_SCHEDULER_YIELD                           47.1        97552      0.483

    2014-06-23 10:40:16.850 77            PAGEIOLATCH_SH                                15.5         1543     10.049

    2014-06-23 10:40:16.850 77            WRITELOG                                       1.1          113      9.912

    2014-06-23 10:40:16.850 77            MSQL_XP                                        0.5           12     40.750

    2014-06-23 10:40:16.850 77            LATCH_SH                                       0.5           43     12.442

    2014-06-23 10:40:16.850 77            PREEMPTIVE_OS_WRITEFILE                        0.3           12     27.750

    2014-06-23 10:40:16.850 77            ASYNC_IO_COMPLETION                            0.2            3     63.000

    2014-06-23 10:40:16.850 77            PREEMPTIVE_OS_AUTHENTICATIONOPS                0.2          166      1.361

    2014-06-23 10:40:16.850 77            SLEEP_BPOOL_FLUSH                              0.1           47      2.681

    2014-06-23 10:40:16.850 77            PREEMPTIVE_OS_DELETESECURITYCONTEXT            0.1           27      2.593

    2014-06-23 10:40:16.850 77            PREEMPTIVE_OS_DEVICEIOCONTROL                  0.0            4      1.250

    2014-06-23 10:40:16.850 77            PREEMPTIVE_OS_DISCONNECTNAMEDPIPE              0.0            1      0.000

    4. I started the checkdb on the "TERRAIN" database that I has been having the checkdb issue.  It's been running about 7 minutes.  Here are the wait stat changes so far. 

    Final Time              Duration(Sec) wait_type                           WaitTime (Sec) Number Waits Avg ms/Wait

    ----------------------- ------------- ----------------------------------- -------------- ------------ -----------

    2014-06-23 11:19:01.290 420           OLEDB                                        442.2       643131      0.688

    2014-06-23 11:19:01.290 420           CXPACKET                                      30.8        14751      2.086

    2014-06-23 11:19:01.290 420           LATCH_EX                                      13.4        32123      0.418

    2014-06-23 11:19:01.290 420           PAGEIOLATCH_SH                                 2.6         3722      0.712

    2014-06-23 11:19:01.290 420           WRITELOG                                       1.4          368      3.679

    2014-06-23 11:19:01.290 420           SOS_SCHEDULER_YIELD                            1.2        56869      0.021

    2014-06-23 11:19:01.290 420           ASYNC_IO_COMPLETION                            0.4            6     58.500

    2014-06-23 11:19:01.290 420           BACKUPIO                                       0.1        13523      0.008

    2014-06-23 11:19:01.290 420           BACKUPTHREAD                                   0.1           82      0.622

    2014-06-23 11:19:01.290 420           LATCH_SH                                       0.1           94      1.138

    2014-06-23 11:19:01.290 420           MSQL_XP                                        0.1           84      0.679

    2014-06-23 11:19:01.290 420           PREEMPTIVE_OS_AUTHENTICATIONOPS                0.1          868      0.063

    5. Here is the current output from sys.dm_os_waiting_tasks.  It varies a bit every time I run the query. 

    wait_duration_ms     wait_type

    -------------------- ------------------------------

    8299276              FT_IFTSHC_MUTEX

    8299276              CHECKPOINT_QUEUE

    8299276              ONDEMAND_TASK_QUEUE

    8297318              BROKER_TRANSMITTER

    8297318              BROKER_TRANSMITTER

    8297237              KSOURCE_WAKEUP

    5903516              BROKER_EVENTHANDLER

    652889               CLR_AUTO_EVENT

    652889               CLR_AUTO_EVENT

    630863               DISPATCHER_QUEUE_SEMAPHORE

    202447               SP_SERVER_DIAGNOSTICS_SLEEP

    22202                XE_DISPATCHER_WAIT

    9490                 FT_IFTS_SCHEDULER_IDLE_WAIT

    3157                 SQLTRACE_INCREMENTAL_FLUSH_SLE

    942                  SLEEP_TASK

    442                  REQUEST_FOR_DEADLOCK_SEARCH

    442                  XE_TIMER_EVENT

    144                  HADR_FILESTREAM_IOMGR_IOCOMPLE

    112                  LAZYWRITER_SLEEP

    94                   LOGMGR_QUEUE

    68                   DIRTY_PAGE_POLL

    42                   OLEDB

    Thank, Randy


    Randy in Marin

    Monday, June 23, 2014 6:48 PM
  • It's been running over an hour and almost 2% done.  The OLEDB wait now dominates.  I will ask to have AVG disabled just in case something about this is different. 

    Final Time              Duration(Sec) wait_type                           WaitTime (Sec) Number Waits Avg ms/Wait

    ----------------------- ------------- ----------------------------------- -------------- ------------ -----------

    2014-06-23 12:22:58.053 4257          OLEDB                                       4276.8      1416257      3.020

    2014-06-23 12:22:58.053 4257          CXPACKET                                      31.3        14857      2.110

    2014-06-23 12:22:58.053 4257          LATCH_EX                                      13.4        32161      0.418

    2014-06-23 12:22:58.053 4257          WRITELOG                                       6.9         2932      2.367

    2014-06-23 12:22:58.053 4257          PAGEIOLATCH_SH                                 3.8         3813      1.001

    2014-06-23 12:22:58.053 4257          SOS_SCHEDULER_YIELD                            2.4       229066      0.010

    2014-06-23 12:22:58.053 4257          ASYNC_IO_COMPLETION                            1.3           24     54.542

    2014-06-23 12:22:58.053 4257          PREEMPTIVE_OS_AUTHENTICATIONOPS                0.6         8226      0.070

    2014-06-23 12:22:58.053 4257          BACKUPIO                                       0.5        41036      0.011

    2014-06-23 12:22:58.053 4257          MSQL_XP                                        0.5          774      0.609


    Randy in Marin

    Monday, June 23, 2014 7:29 PM
  • Hey Randy,

    Can you run this script a few times (this will give us useful info):

    SELECT
        [owt].[session_id],
        [owt].[exec_context_id],
        [owt].[wait_duration_ms],
        [owt].[wait_type],
        [owt].[blocking_session_id],
        [owt].[resource_description],
        CASE [owt].[wait_type]
            WHEN N'CXPACKET' THEN
                RIGHT ([owt].[resource_description],
                CHARINDEX (N'=', REVERSE ([owt].[resource_description])) - 1)
            ELSE NULL
        END AS [Node ID],
        [es].[program_name],
        [est].text,
        [er].[database_id],
        [eqp].[query_plan],
        [er].[cpu_time]
    FROM sys.dm_os_waiting_tasks [owt]
    INNER JOIN sys.dm_exec_sessions [es] ON
        [owt].[session_id] = [es].[session_id]
    INNER JOIN sys.dm_exec_requests [er] ON
        [es].[session_id] = [er].[session_id]
    OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
    OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
    WHERE
        [es].[is_user_process] = 1
    ORDER BY
        [owt].[session_id],
        [owt].[exec_context_id];
    GO

    And also run my wait stats and latch stats DMV queries? I don't know what the other scripts you're running are doing so I'd like to see the results in the format etc I know.

    Is the Terrain database being checked on the same system? What DOP is the system running?


    Monday, June 23, 2014 8:35 PM
  • Hi Paul,  

    The 45 GB "Court" and 200 GB "Terrain" checkdbs were both run on my desktop.  The behavior is very different.  I looked at the table sizes.  The largest Court table is 9 GB.  There are three Terrain tables over 20 GB.  I have 16 GB RAM.  Perhaps it has something to do with the table size vs RAM? 

    Here is the output of the query above. 

    session_id exec_context_id wait_duration_ms     wait_type                                                    blocking_session_id resource_description                                                                                                                                                                                                                                             Node ID                                                                                                                                                                                                                                                          program_name                                                                                                                     text                                                                                                                                                                                                                                                             database_id query_plan                                                                                                                                                                                                                                                       cpu_time
    ---------- --------------- -------------------- ------------------------------------------------------------ ------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------
    65         0               128                  OLEDB                                                        NULL                NULL                                                                                                                                                                                                                                                             NULL                                                                                                                                                                                                                                                             Microsoft SQL Server Management Studio - Query                                                                                   DBCC CHECKDB ([TERRAIN_TEST]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY                                                                                                                                                                                       9           NULL                                                                                                                                                                                                                                                             9148509
    
    (1 row(s) affected)
    
    
    Here is the output of your wait stats query from http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

    WaitType                                                     Wait_S                                  Resource_S                              Signal_S                                WaitCount            Percentage                              AvgWait_S                               AvgRes_S                                AvgSig_S
    ------------------------------------------------------------ --------------------------------------- --------------------------------------- --------------------------------------- -------------------- --------------------------------------- --------------------------------------- --------------------------------------- ---------------------------------------
    OLEDB                                                        10755.78                                10755.78                                0.00                                    28475396             75.57                                   0.0004                                  0.0004                                  0.0000
    CXPACKET                                                     1879.42                                 1222.97                                 656.46                                  831836               13.20                                   0.0023                                  0.0015                                  0.0008
    PREEMPTIVE_OS_WRITEFILEGATHER                                316.72                                  316.72                                  0.00                                    11                   2.23                                    28.7925                                 28.7925                                 0.0000
    LATCH_EX                                                     291.22                                  253.38                                  37.84                                   1964729              2.05                                    0.0001                                  0.0001                                  0.0000
    BACKUPTHREAD                                                 184.59                                  184.59                                  0.00                                    540                  1.30                                    0.3418                                  0.3418                                  0.0000
    ASYNC_IO_COMPLETION                                          181.93                                  181.85                                  0.07                                    44                   1.28                                    4.1347                                  4.1330                                  0.0017
    
    (6 row(s) affected)

    Here is the output of the latch stats query from http://www.sqlskills.com/blogs/paul/most-common-latch-classes-and-what-they-mean/

    LatchClass                                                   Wait_S                                  WaitCount            Percentage                              AvgWait_S
    ------------------------------------------------------------ --------------------------------------- -------------------- --------------------------------------- ---------------------------------------
    DBCC_CHECK_AGGREGATE                                         279.02                                  1955689              95.23                                   0.0001
    
    (1 row(s) affected)
    

    Thanks, Randy


    Randy in Marin

    Monday, June 23, 2014 9:08 PM
  • I don't know of anything definitive that would cause the behavior and symptoms you're seeing.

    Can you turn off 2562 and 2549 please? They're not applicable to the system you're running on and I suspect 2562 may actually be detrimental for the larger database as 2562 is forcing the whole database to be checked as one 'batch' rather than being split up into smaller batches.

    Restart without the two trace flags and see how it goes.

    If that's not the root cause, then I'd check the large tables individually until you find the one that takes an inordinate amount of time compared to the others, then we can look into its schema to see what's different.

    One other thing - have you tried the same comparison of databases on a 2012 SP1 system? Just wondering if there's something in SP2 that's causing an issue for the larger database.

    Thanks

    Tuesday, June 24, 2014 4:12 PM
  • It was also slow before I applied SP2 and the flags.  (I applied SP2 and used the flags in hopes of a faster CHECKDB.)  I will turn off the flags and try again.  It might be a little while before I can respond - so many servers, so little time. 


    Randy in Marin

    Tuesday, June 24, 2014 6:00 PM
  • I thought I posted some results yesterday, but they are not here. Darn, that was a lot of work.  I'll summarize.  It was still slow. I created a query to run DBCC CHECKTABLE on each table. The tables without spatial indexes are done first.  They are checked quickly - even the 40 GB table.  The CXPACKET and OLEDB are the main waits with CXPACKET slightly larger.  Some of these tables use a image data type to store GIS point data. 

    The tables with spatial indexes (they have a column of the geometry type) are done second.  These tables also have a constraint of the form ([SHAPE].[STSrid]=(2872)).  These do not run very fast.  Performance gets worse with more space/records.  The primary OLEDB wait swamps everything else. 

    It's still running on the last table with a spatial index.  It will be a long time for that table.  I will try again with NOINDEX to see if the tables can be checked.  If so, then that might indicate the spatial index is the source of the performance issue. 

    Here are the run times of the CHECKTABLE calls.  The records/sec and MB/sec just use the database record count and total size with the run time to get feel for how fast it's working - or not working. 

    BTW, I also had AVG disable for a while without any difference. 

    Row  IndexType TotalSpaceGB RowCount    CheckTableSec Records/sec MB/sec
    ---- --------- ------------ ----------- ------------- ----------- -------
     80                0.273        1030914             1    1030914  279.602
     81                1.077        7367710             3    2455903  367.549
     82                1.414        9677533             3    3225844  482.779
     83                1.451        7367710             5    1473542  297.261
     84                1.906        9677533             5    1935506  390.422
     85                2.049       14019848             6    2336641  349.695
     86                2.510       17175852             7    2453693  367.200
     87                2.762       14019848             8    1752481  353.521
     88                3.383       17175852             9    1908428  384.963
     89                7.050       48240943            18    2680052  401.059
     90                9.503       48240943            26    1855420  374.256
     91               19.211      249898908            54    4627757  364.306
     92               39.785      249898908           146    1711636  279.038
     93   Spatial      0.000              5             1          5    0.016
     94   Spatial      0.002             15             0         NA       NA
     95   Spatial      0.018          33391             2      16695    9.180
     96   Spatial      0.255        1030914            55      18743    4.740
     97   Spatial      0.433        7367710           904       8150    0.491
     98   Spatial      0.569        9677533          1454       6655    0.401
     99   Spatial      0.824       14019848          2796       5014    0.302
    100   Spatial      1.010       17175852          4040       4251    0.256
    101   Spatial      2.836       48240943         28353       1701    0.102
    102   Spatial     26.891      249898908 NULL                  NA       NA

    Final Time              Duration(Sec) wait_type                           WaitTime (Sec) Number Waits Avg ms/Wait
    ----------------------- ------------- ----------------------------------- -------------- ------------ -----------
    2014-06-25 14:34:06.643 45282         OLEDB                                      45269.9      3476014     13.024
    2014-06-25 14:34:06.643 45282         WRITELOG                                      79.6        33045      2.409
    2014-06-25 14:34:06.643 45282         PREEMPTIVE_OS_AUTHENTICATIONOPS                6.0        95355      0.063
    2014-06-25 14:34:06.643 45282         SOS_PHYS_PAGE_CACHE                            4.7        20472      0.231
    PS Sorry about the format.  The columns line up fine when in edit, but not once I submit.



    Randy in Marin


    Wednesday, June 25, 2014 9:42 PM
  • PS Sorry about the format.  The columns line up fine when in edit, but not once I submit.

    The format looks good in my text-only news reader where I have a monospaced font. So copy-and-pasting into monospaced window should help.

    I'm refraining from commenting on the issue as such - this definitely out of my league.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Wednesday, June 25, 2014 9:58 PM
  • Thanks for sharing things with us!!

    With Due to respect to all, below are statements, I need to make. I don't know who all have done what all in the past as computer industry is for all and I take myself always as learner. My goal is to share what I know as per my best awareness and it has nothing to do hurting\disrespecting anyone!

    Nice to note that CXPACKET are coming and that usually comes with adjustment for MAXDOP.

    Also you will need to disable the AV scan for longer time and then conclude.

    See, I don't know, how your environment setup is and normally on msdn people don't share all stuffs in one go.

    I need to ask you something or some queries to validate\check then there is little option for me to know.

    There are some basic facts:

    1. Let say  you don't have AVG scan defined for Exclusion\Inclusion_ Can you say it will never\not effect snapshot file got created for DBCC CHECKDB operation, which will be in one of your Databases data\log folders where scan continuously happens. This is something even MS says to setup first as best practice. This has nothing to do with CHECKDB alone, its a basic thing and needs to be setup when you setup your environment for the first time.

    2. I had mentioned about VLF, which seems to be not suitable here and I take that back.

    3. MAXDOP: It is coming now in your observation.

    Keep learning and share with me too!!


    Santosh Singh

    Thursday, June 26, 2014 4:14 AM
  • Hi Santosh.  There are two distinct results depending upon if a spatial type is present.  DBCC CHECK performance is great when I see CXPACKET as the top wait.  This is what I see with tables without a spatial data type. 

    AVG on or off does not appear to affect performance on tables without a spatial data type.  It might be that something is different about the spatial type or index that will invoke a specific AVG issue.  I watched the performance very carefully when AVG was off.  There was no difference in the counters, CPU, and waits.  It could still be AVG, but I have my doubts.   

    I checked VLF counts before starting this thread.  There are about 60 VLFs with sizes in the range 8-32 MB.  I came across an interesting post (http://www.sqlskills.com/blogs/kimberly/transaction-log-vlfs-too-many-or-too-few/) that indicated that you can have too many or few VLFs.  Besides, if this was the issue, I think I would have seen latency on the disks.  Disks are almost idle during the check when a spatial type is present. 

    With OLEDB as almost the only wait present when checking a table with a spatial index, I don't think MAXDOP is the next thing to try.  The last table is being checked at about 10% per day.  The CPU counter indicates that it's using about 1 logical core total most of the time.  The performance monitor shows and average CPU use of 10%.  (6 cores with hyper-threading enabled.)  The total elapsed time is slightly larger than the CPU time on the DBCC process.  I see a couple of logical cores running about 50-80% most of the time.  A few others are working at 10-20%.  Several are idle. 

    At this point, I don't have anything better to try after running the checks with NOINDEX.  If I have time, I will try disabling HT, setting a MAXDOP, and disabling AVG for the entire run. 


    Randy in Marin

    Thursday, June 26, 2014 3:57 PM
  • At this point, I don't have anything better to try after running the checks with NOINDEX.  If I have time, I will try disabling HT, setting a MAXDOP, and disabling AVG for the entire run. 

    A drastic step would be to drop only the spatial index to narrow down there the problem is. If that has the expected effect - CHECKTABLE completes in time, next step is to recreate it and try again. If the index was corrupted, CHECKTABLE may complete faster.

    OF course, this is mere speculation on my part, and Paul may have better ideas.

    My gut feeling that playing with MAXDOP and HT is a waste of time. And I don't really think that the antivirus has anything to do with it, as long as you have put the database files on the exclusion list. (Unless the antivirus finds it ways to the alternate stream which has the snapshot that DBCC uses.)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Thursday, June 26, 2014 10:16 PM
  • Thanks Erland.  It's 25% done with the large table.  I'm incline to let it continue (5-10% per day until perhaps 7/2) for one real pass to insure all is well with a recent migration.  I'm behind on other work, but I will come back to this and try the check on the base table and index separately.  I will also try to drop/create the index.  I was told that this index will take about a day to create. 

    We are also going to check the use of the data with our client.  It is a 5 foot elevation contour that might not be needed or might have too much resolution.  

    Me: Dr. SQL, it hurts when I use that data. 

    Dr. SQL: Well, then don't use that data. 


    Randy in Marin

    Friday, June 27, 2014 4:12 PM
  • I'm behind on other work,

    I'm not surprised to hear that, given how much time you have put into this!

    Please keep us updated!


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Friday, June 27, 2014 8:15 PM
  • My DBCC CHECKTABLE eventually failed with a memory error.  I did not set a memory limit on the instance and was running a VM plus a few other things.  The timing was logged, but I do not know the result of the check on the last table - good or bad.  70 KB/sec for 27 GB is not going to work for us. 

    Row  Table                TotalSpaceGB RowCount    CheckTableSec Records/sec MB/sec
    ---- -------------------- ------------ ----------- ------------- ----------- -------
    101  dbo.FEMA25               2.836       48240943         28353       1701    0.102
    102  dbo.CONTOUR_DENSIFIE    26.891      249898908        401743        622    0.069

    I removed the spatial indexes and the check constraints without improvement.  So, it appears that the presence of the geometry data type is the trigger for the slow performance.  I checked the datalengh of the geometry column on the table with worst performing DBCC.  It is only 30 bytes for all 250 million records.  The row length is about 100 bytes.  Perhaps there is some single threaded spatial related CLR call or such taking 1-2 ms per record? 

    Has anybody else run a DBCC on a table using the geometry data type with a high record count? 

    BTW, here are typical results on the same system for tables without a spatial type.  About a factor of 5000 faster in the number of records or MB per second. 

    Row  Table                TotalSpaceGB RowCount    CheckTableSec Records/sec MB/sec
    ---- -------------------- ------------ ----------- ------------- ----------- -------
     81  dbo.s948                 1.077        7367710             1    7367710  1102.65
     82  dbo.s946                 1.414        9677533             2    4838766  724.168
     83  dbo.f948                 1.451        7367710             3    2455903  495.435
     84  dbo.f946                 1.906        9677533             4    2419383  488.027
     85  dbo.s947                 2.049       14019848             3    4673282  699.391
     86  dbo.s949                 2.510       17175852             3    5725284  856.799
     87  dbo.f947                 2.762       14019848             6    2336641  471.362
     88  dbo.f949                 3.383       17175852             7    2453693  494.952
     89  dbo.s986                 7.050       48240943             9    5360104  802.117
     90  dbo.f986                 9.503       48240943            18    2680052  540.592
     91  dbo.s951                19.211      249898908            52    4805748  378.317
     92  dbo.f951                39.785      249898908           119    2099990  342.349

    Thanks


    Randy in Marin

    Tuesday, July 8, 2014 10:43 PM
  • Another thing I noticed, using the Process Monitor from Sysinternals, is that there is an IO pattern where the IO is done in spurts separated by a delay.  The spurt is about 50-60 reads done within .2 seconds.  The delay is about 8 seconds between spurts.  

    Randy in Marin

    Tuesday, July 8, 2014 11:37 PM
  • Looks like it's case-opening time to me...


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Thursday, July 10, 2014 9:25 PM
  • Yup - I concur. Looks like a bug in spatial processing, possibly on 2012.

    CEO, SQLskills.com (http://www.sqlskills.com/blogs/paul), SQL MVP, Microsoft Regional Director, Contributing Editor of TechNet and SQL Server Magazines, Author of 2005 DBCC CHECKDB/repair. Consulting/Training: http://www.sqlskills.com/

    Thursday, July 10, 2014 9:45 PM
  • Okay, I'll report it.  No time today.  Thanks for your advice. 

    Randy in Marin

    Thursday, July 10, 2014 11:58 PM
  • A status update for the curious.  I changed the service account to avoid locking pages in memory.  (DBCC would stop after a memory error.)  It's been running 3 days and the estimated finish date is 7/14/2014.  I looked at the sql process and noticed that it generates about 800000 page faults per second.  I did not know that paging could be so high.  No hard faults.  Even so, it's not something you see every day. 

    PS I have not submitted the bug yet.  I will. 


    Randy in Marin

    Thursday, July 17, 2014 11:48 PM
  • FYI: The production server DBCC finished after 12 days.  No errors reported. 

    Randy in Marin

    Monday, July 21, 2014 11:52 PM
  • A slight necro of this thread - sorry.

    Paul's discussion with Ryan about CHECKDB's expression evaluator (http://www.sqlskills.com/blogs/paul/dbcc-checkdb-performance-and-computed-column-indexes/), and the involvement of spatial data, compels me to attempt a contribution :).

    While I see removing the spatial index did not improve the situation, the way in which a spatial index is used is interesting, in that there are two or more filters used to satisfy a spatial index. The primary filter of a spatial index uses a grid (a good picture is at http://blogs.msdn.com/b/isaac/archive/2007/12/01/spatial-indexing-part-3-faster-primary-filtering.aspx). However, because spatial data uses floating point arithmetic (in part due to spatial's use of COS, SIN, TAN), a secondary filter (which is a CLR function) must be applied to near misses or near hits (after the primary filter has been applied). The secondary filter's CLR function can be seen in most query plans (that harness a spatial index), plus there are 1 or 2 additional CLR functions which determine the highest (parent) level cell that will satisfy the spatial index (see the graphic of the levels, under the "Deepest-Cell Rule" section of  http://msdn.microsoft.com/en-us/library/bb895265.aspx). Stll. removing the spatiall index (while retaining the spatial data) did not help, so I am not sure this paragraph was worth writing :).

    The other matter that comes to mind is CHECKDB's DATA_PURITY "option", which can be (or at least was) disabled by trace flag 2566 - see http://support.microsoft.com/kb/945770/en-ushttp://support.microsoft.com/kb/945770/en-us ( I can only hope someone at MS one day gets it into their 'head' that publishing most traceflg is no less dangerous than giving customers cars that can exceed all known speed limits...:). For spatial data to be 'pure', the spatial type must have .STIsValid = 1 (see BOL). STIsValid must adhere to the Open Geospatial Consortium's definitions of valid spatial data. One example: The series of points that define the vertices of a polygon must both begin and end with the same point. 2nd example: A linestring must not cross over itself more than once (this is documented somewhere in BOL, if memory serves me right). My 'point' is that .STIsValid must be implemented in CLR, and it must be costly. Assuming DATA_PURITY triggers .STIsValid, not only would each spatial value be checked, each *point* within each spatial value would be checked (which is worse than row by agonizing row, with CLR/OLEDB waits thrown in 'just for good measure')....

    Other than the references, the above is a guess. But, you might want to try the quasi-documented dbcc traceon(2566,-1) followed by a CHECKTABLE, and see if that helps. 






    Wednesday, August 6, 2014 8:56 AM