Asked by:
DBCC OLEDB Bottleneck?

Question
-
Is there anything I can do for a OLEDB wait type?
After restarting my instance, I started a DBCC checkdb on a 200 GB database that uses the spatial data type. A couple of tables have 250 million rows. There are no indexes on computed columns or filtered indexes. It's been running about 2 hours and the percent complete is about 2%. The estimated completion date (4 days from now) has been increasing by about and hour a minute. It might be weeks before it completes. (I'm using a db restored on my desktop - 6-core Zeon, 16 GB, and 2 SSDs - with SQL 2012 Developer SP2. I enabled trace flags 2562 and 2549 enabled. Compatibility on the db was changed to 2012 from 2008.)
I looked at the wait stats and see the OLEDB wait type at almost 100%. When I look at the activity monitor "Resource Waits", the only wait I see is Network IO - about 1 sec per sec. "Data File IO" is at 0.0 MB/sec with response times normally at 0 ms, but can be 2 ms from time to time. CPU use is about 10%. My machine feels idle.
An AVG scan recently started and just finish. It did not seem to slow anything down or change the waits. But that makes me wonder. Can a real-time virus scanner be a source of the OLEDB waits? Would it also show up as network io wait?
Any advice that can help me find the bottleneck would be appreciated. It would great if I could move the bottleneck to CPU or disk.
Thanks.
Randy in Marin
Thursday, June 19, 2014 8:31 PM
All replies
-
Hi,
DBCC Checkdb is resource intensive and should be run when system is under minimum load or during maintenance window. But what you said seems strange checkdb does takes time but this much is not acceptable. Is your system heavily accessed when you started Checkdb?OLEDB wait types are related to Linked servers I have environment where lot of distributes and Linked server query runs and this wait type was prominent but I don't think Checkdb has anything to do with OLEDB wait type.
Are you sure no query is accessing when you ran checkdb ? Any query that uses linked server ?
Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.
My TechNet Wiki ArticlesThursday, June 19, 2014 9:42 PM -
My system is not busy with anything else and seems to be almost idle with the dbcc checkdb. DBCC is all that I run on the instance. I want my system to be pounded by the dbcc if possible so that it can complete more quickly. After a few hours, the estimated completion time is now about a week away.
I don't have linked servers or a monitor using DMVs that would cause the OLEDB wait. The OLEDB is a known wait for DBCC Checkdb. I don't know what to do with it.
http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/
http://sqlsailor.com/2012/04/05/oledb-wait-and-dbcc-an-interesting-discussion/
http://www.sqlskills.com/blogs/paul/dbcc-checkdb-performance-and-computed-column-indexes/
The last link is why I mentioned indexes with computed columns.
Randy in Marin
Thursday, June 19, 2014 11:10 PM -
BTW, we did try to run this in production, but it would not finish. Running on my desktop is faster. I'm able to dedicate all it's resources to the DBCC. Perhaps I need to look outside of SQL. If OLEDB is central to DMV usage, then perhaps OLEDB activity is something that SQL can't provide detail about. Would it be like a monitor that monitors itself? Are there performance counters that can give detail about OLEDB or DBCC related activities? I really don't know what to do with this.
Randy in Marin
Thursday, June 19, 2014 11:28 PM -
Randy,
Thanks for informing me about this behavior I was not aware. The third article you posted ( which is by Paul Randal) says this is default behavior with Coumputed columns. Or a Non clustered index with computed columns. If you see( which I guess you must have noted) he said that
And I found 6 nonclustered indexes with computed columns in, on some of the largest tables in the database. I then disabled those nonclustered indexes and re-ran the DBCC CHECKDB tests.
17-18 minutes per run.
Can you try this please. Just trying to help you here I am as Novice in this as you are. Internals with checkdb completely lies with Microsoft and one should thank Paul for bringing it out
Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.
My TechNet Wiki Articles- Edited by Shanky_621MVP Friday, June 20, 2014 10:33 AM
Friday, June 20, 2014 8:47 AM -
We had many issues in our environment wherein Anti-Virus scan or third party backup has caused so many issues for sql server overall performance, hopefully you have put proper exclusion\inclusion lists for anti-virus scan.
Also, how about db recovery is that happens faster and it is not suffering from too many VLF issues.
If yes then please setup auto-growth option properly size like 256 MB for log and data may by around 150-200 MB.
Check also, any log-backup is not going on during DBCC checkdb as it is suggested to execute during no such operations are performed.
Also you have good free space on data\log disk for database.
Santosh Singh
- Edited by Santosh Chandel Friday, June 20, 2014 9:07 AM
Friday, June 20, 2014 9:05 AM -
If CheckDB takes a lot of time it might be that it hit corruption.
If you have a second instance available you may want to offload the work onto a different server. But still it will take time.
What you could also do is to do only partly checks and maybe this gives you an indication if you really are in trouble with that database.
For example you could do separately:
DBCC CHECKALLOC --on the database DBCC CHECKTABLE --on certain critical tables to see if they are the problem DBCC CHECKCATALOG --on the database
DBCC CHECKDB WITH PHYSICAL_ONLY -- on the database. only physical checks
What you wan also do is run with option ESTIMATEONLY beforehand. But of course the estimate may still be way off in case of corruption.
In the end you will have to do the complete check anyway though
Andreas Wolter (Blog | Twitter)
MCM - Microsoft Certified Master SQL Server 2008
MCSM - Microsoft Certified Solutions Master Data Platform, SQL Server 2012
www.andreas-wolter.com | www.SarpedonQualityLab.com- Edited by Andreas.WolterMicrosoft employee Friday, June 20, 2014 10:14 AM added DBCC CHECKDB WITH PHYSICAL_ONLY
Friday, June 20, 2014 10:11 AM -
Though you dont have indexes on computed columns, could you try running check db with NOINDEX option? It could be not just computed columns but a non-clustered index on some other columns which may be causing the issue.
Regards, Ashwin Menon My Blog - http:\\sqllearnings.com
Friday, June 20, 2014 10:27 AM -
I did check for both computed columns and filtered indexes and found none. However, perhaps it is something caused by the spatial index. We ran a table check on a table that finished without issue - it took over half a day.
The current check is 10% done. Now the expected finish date is July 8th. I was hoping to get a full run to verify that corruption is not present. I don't think it's corruption, however.
I think I will let it run for now. When I get back to work on Monday, I'll request that AVG be disabled. (It's not something I can do.) With luck, it will turn out that OLEDB is doing lots of something that looks like file access and the AVG driver is in the middle.
Has anybody used the process utilities from sysinternals? I have not, but I was wondering if they could check if a AVG driver was introducing a delay.
Randy in Marin
Saturday, June 21, 2014 4:36 PM -
Ignore everything that Santosh and Shanky are saying - nonsense.
What are the wait types of the CHECKDB threads when you look with sys.dm_os_waiting_tasks?
With your DOP and the DB size, it should have been checked in under an hour. Spatial indexes don't cause slowdowns unless you're saying WITH EXTENDED_LOGICAL_CHECKS. Even with all the perf issues I've blogged about in the last year, 200GB shouldn't take more than 3-4 hours.
- Proposed as answer by Kalman Toth Sunday, June 22, 2014 12:15 PM
- Unproposed as answer by Paul RandalMVP Sunday, June 22, 2014 1:53 PM
Saturday, June 21, 2014 9:19 PM -
Ignore everything that Santosh and Shanky are saying - nonsense.
What are the wait types of the CHECKDB threads when you look with sys.dm_os_waiting_tasks?
With your DOP and the DB size, it should have been checked in under an hour. Spatial indexes don't cause slowdowns unless you're saying WITH EXTENDED_LOGICAL_CHECKS. Even with all the perf issues I've blogged about in the last year, 200GB shouldn't take more than 3-4 hours.
That seems a fake profile of Paul Randal...below is the Original One
http://social.msdn.microsoft.com/profile/paul%20randal/?ws=usercard-mini
Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.
My TechNet Wiki ArticlesSaturday, June 21, 2014 9:40 PM -
Probably right. But, the two profiles are mixed up:
http://social.msdn.microsoft.com/Profile/paulrandal/activity
MODERATOR: take a look at this issue. Thanks.
Kalman Toth Database & OLAP Architect SQL Server 2014 Design & Programming
New Book / Kindle: Exam 70-461 Bootcamp: Querying Microsoft SQL Server 2012
- Edited by Kalman Toth Saturday, June 21, 2014 11:13 PM
Saturday, June 21, 2014 10:28 PM -
It looks like he has more than one profile. Both profiles are the real Paul Randal.
Dan Guzman, SQL Server MVP, http://www.dbdelta.com
Saturday, June 21, 2014 11:22 PM -
It looks like he has more than one profile. Both profiles are the real Paul Randal.
Dan Guzman, SQL Server MVP, http://www.dbdelta.com
Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.
My TechNet Wiki ArticlesSunday, June 22, 2014 8:55 AM -
Ohh k I was not aware you can keep more than one..why would one require more than one. Anyways its good for Randy I hope he gets solution.
There could be all sorts of reasons. I have two, because at some point I was no longer to access my original profile. If you check my name, there are two spaces between the first name and the last name.
And, yes, I can testify that it is the real Paul. I asked him to take a look at this thread, because the situation seemed abnormal to me and the advice given appeared to be off the mark. I did not say anything myself, as I did not have anything to say.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seSunday, June 22, 2014 9:37 AM -
Ohh k I was not aware you can keep more than one..why would one require more than one. Anyways its good for Randy I hope he gets solution.
There could be all sorts of reasons. I have two, because at some point I was no longer to access my original profile. If you check my name, there are two spaces between the first name and the last name.
And, yes, I can testify that it is the real Paul. I asked him to take a look at this thread, because the situation seemed abnormal to me and the advice given appeared to be off the mark. I did not say anything myself, as I did not have anything to say.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Thank You Erland...:)Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.
My TechNet Wiki ArticlesSunday, June 22, 2014 10:28 AM -
Ignore everything that Santosh and Shanky are saying - nonsense.
What are the wait types of the CHECKDB threads when you look with sys.dm_os_waiting_tasks?
With your DOP and the DB size, it should have been checked in under an hour. Spatial indexes don't cause slowdowns unless you're saying WITH EXTENDED_LOGICAL_CHECKS. Even with all the perf issues I've blogged about in the last year, 200GB shouldn't take more than 3-4 hours.
That seems a fake profile of Paul Randal...below is the Original One
http://social.msdn.microsoft.com/profile/paul%20randal/?ws=usercard-mini
Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.
My TechNet Wiki ArticlesYou wish. Are you the real Shanky?
Even if it were a fake profile, you advice was still the typical knee-jerk reaction to OLEDB waits and of no relevance to CHECKDB running really slowly. I suggest you read my articles at http://sqlperformance.com/2014/02/sql-performance/knee-jerk-performance-troubleshooting and http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/ which both explain the more common causes of OLEDB waits these days.
And now back to the real issue on this thread...
CEO, SQLskills.com (http://www.sqlskills.com/blogs/paul), SQL MVP, Microsoft Regional Director, Contributing Editor of TechNet and SQL Server Magazines, Author of 2005 DBCC CHECKDB/repair. Consulting/Training: http://www.sqlskills.com/
Sunday, June 22, 2014 1:47 PM -
You wish. Are you the real Shanky?
Even if it were a fake profile, you advice was still the typical knee-jerk reaction to OLEDB waits and of no relevance to CHECKDB running really slowly. I suggest you read my articles at http://sqlperformance.com/2014/02/sql-performance/knee-jerk-performance-troubleshooting and http://www.sqlskills.com/blogs/paul/common-wait-stats-24-hours/ which both explain the more common causes of OLEDB waits these days.
And now back to the real issue on this thread...
Thank you Paul for clarifying. Like I mentioned in my post I have not faced(may be not cared to run wait analysis) OLEDB wait types when running checkdb so post was outcome of that experience. I do read your articles on Checkdb( perhaps the only one giving deep knowledge) and they are very helpful. Regarding your post at first sight I though it was different profile as I was not aware one can create many profiles and was guessing why owner of SQL Server Disaster recovery forum( and once Program manager in MS) would create one, and Erland cleared that. My reaction was just one to keep forum spam free hope you have not taken that in other way.
Thank you very much for article. And I would say Randy is lucky to have you looking at his thread. :)
Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it.
My TechNet Wiki Articles- Edited by Shanky_621MVP Sunday, June 22, 2014 1:59 PM
Sunday, June 22, 2014 1:58 PM -
Like I mentioned in my post I have not faced(may be not cared to run wait analysis) OLEDB wait types when running checkdb so post was outcome of that experience.
Then I would respectfully ask that you confine yourself to offering advice in situations where you have experience of the problem being discussed and understand the possibilities involved. Otherwise you're just causing confusion for the OP and wasting people's time. Thanks
CEO, SQLskills.com (http://www.sqlskills.com/blogs/paul), SQL MVP, Microsoft Regional Director, Contributing Editor of TechNet and SQL Server Magazines, Author of 2005 DBCC CHECKDB/repair. Consulting/Training: http://www.sqlskills.com/
Sunday, June 22, 2014 2:28 PM -
Randy,
Did you check and confirm the wait types as asked?
Did the issues resolved for you or not yet?
Your confirmation will help here to check things.
Anyway, its always good to start with wait type analysis and so any other process execution should be effecting your overall Sql Server response time. Some time it helps by setting proper MAXDOP, as requirement varies from time to time as schedule job execution and adhoc request will need its own time to complete as these are different requirement types, and so use resources to get its execution slot and so response time.
Sometime situation and things do behave differently then what it actually some times.
Santosh Singh
Sunday, June 22, 2014 5:32 PM -
Can you stop spouting nonsense please and just wait for him to reply?
CEO, SQLskills.com (http://www.sqlskills.com/blogs/paul), SQL MVP, Microsoft Regional Director, Contributing Editor of TechNet and SQL Server Magazines, Author of 2005 DBCC CHECKDB/repair. Consulting/Training: http://www.sqlskills.com/
Monday, June 23, 2014 4:03 AM -
Hi Paul. I suppose if anybody can help with this, it will be you. Here is a quick status.
1. I have not had any AVG exclusions defined yet.
2. The job aborted over the weekend with "TCP Provider: The network connection was aborted by the local system. Communication link failure. Process Exit Code 1236." This is a work station, so it could have been network maintenance. I did have shared memory disabled at one time. I just insured that shared memory is enabled - it was - and rebooted.
3. I just took a step back for a reality check. I loaded a different production database to my desktop that takes 45 GB. It normally takes 7 minutes for the checkdb to run on our production server. (Production is a VM using a new SAN.) I updated a script I copied from a post by Brent Ozar. A procedure stores the waits when called. I call it before and after the CheckDB.
I just finished running the CheckDb on the 45 GB database. All cores were busy - about 60 to 90% most of the time. Also the disk IO was normally 300-500 MB/sec. The Checkdb finished in 1 minute 17 seconds. (I'm impressed with my desktop. I'm not feeling impressed with our VM and new SAN.) Here are the wait stat changes.
Final Time Duration(Sec) wait_type WaitTime (Sec) Number Waits Avg ms/Wait
----------------------- ------------- ----------------------------------- -------------- ------------ -----------
2014-06-23 10:40:16.850 77 CXPACKET 944.5 410558 2.301
2014-06-23 10:40:16.850 77 OLEDB 675.7 13238393 0.051
2014-06-23 10:40:16.850 77 LATCH_EX 117.9 1007824 0.117
2014-06-23 10:40:16.850 77 IO_COMPLETION 99.8 55397 1.801
2014-06-23 10:40:16.850 77 SOS_SCHEDULER_YIELD 47.1 97552 0.483
2014-06-23 10:40:16.850 77 PAGEIOLATCH_SH 15.5 1543 10.049
2014-06-23 10:40:16.850 77 WRITELOG 1.1 113 9.912
2014-06-23 10:40:16.850 77 MSQL_XP 0.5 12 40.750
2014-06-23 10:40:16.850 77 LATCH_SH 0.5 43 12.442
2014-06-23 10:40:16.850 77 PREEMPTIVE_OS_WRITEFILE 0.3 12 27.750
2014-06-23 10:40:16.850 77 ASYNC_IO_COMPLETION 0.2 3 63.000
2014-06-23 10:40:16.850 77 PREEMPTIVE_OS_AUTHENTICATIONOPS 0.2 166 1.361
2014-06-23 10:40:16.850 77 SLEEP_BPOOL_FLUSH 0.1 47 2.681
2014-06-23 10:40:16.850 77 PREEMPTIVE_OS_DELETESECURITYCONTEXT 0.1 27 2.593
2014-06-23 10:40:16.850 77 PREEMPTIVE_OS_DEVICEIOCONTROL 0.0 4 1.250
2014-06-23 10:40:16.850 77 PREEMPTIVE_OS_DISCONNECTNAMEDPIPE 0.0 1 0.000
4. I started the checkdb on the "TERRAIN" database that I has been having the checkdb issue. It's been running about 7 minutes. Here are the wait stat changes so far.
Final Time Duration(Sec) wait_type WaitTime (Sec) Number Waits Avg ms/Wait
----------------------- ------------- ----------------------------------- -------------- ------------ -----------
2014-06-23 11:19:01.290 420 OLEDB 442.2 643131 0.688
2014-06-23 11:19:01.290 420 CXPACKET 30.8 14751 2.086
2014-06-23 11:19:01.290 420 LATCH_EX 13.4 32123 0.418
2014-06-23 11:19:01.290 420 PAGEIOLATCH_SH 2.6 3722 0.712
2014-06-23 11:19:01.290 420 WRITELOG 1.4 368 3.679
2014-06-23 11:19:01.290 420 SOS_SCHEDULER_YIELD 1.2 56869 0.021
2014-06-23 11:19:01.290 420 ASYNC_IO_COMPLETION 0.4 6 58.500
2014-06-23 11:19:01.290 420 BACKUPIO 0.1 13523 0.008
2014-06-23 11:19:01.290 420 BACKUPTHREAD 0.1 82 0.622
2014-06-23 11:19:01.290 420 LATCH_SH 0.1 94 1.138
2014-06-23 11:19:01.290 420 MSQL_XP 0.1 84 0.679
2014-06-23 11:19:01.290 420 PREEMPTIVE_OS_AUTHENTICATIONOPS 0.1 868 0.063
5. Here is the current output from sys.dm_os_waiting_tasks. It varies a bit every time I run the query.
wait_duration_ms wait_type
-------------------- ------------------------------
8299276 FT_IFTSHC_MUTEX
8299276 CHECKPOINT_QUEUE
8299276 ONDEMAND_TASK_QUEUE
8297318 BROKER_TRANSMITTER
8297318 BROKER_TRANSMITTER
8297237 KSOURCE_WAKEUP
5903516 BROKER_EVENTHANDLER
652889 CLR_AUTO_EVENT
652889 CLR_AUTO_EVENT
630863 DISPATCHER_QUEUE_SEMAPHORE
202447 SP_SERVER_DIAGNOSTICS_SLEEP
22202 XE_DISPATCHER_WAIT
9490 FT_IFTS_SCHEDULER_IDLE_WAIT
3157 SQLTRACE_INCREMENTAL_FLUSH_SLE
942 SLEEP_TASK
442 REQUEST_FOR_DEADLOCK_SEARCH
442 XE_TIMER_EVENT
144 HADR_FILESTREAM_IOMGR_IOCOMPLE
112 LAZYWRITER_SLEEP
94 LOGMGR_QUEUE
68 DIRTY_PAGE_POLL
42 OLEDB
Thank, Randy
Randy in Marin
Monday, June 23, 2014 6:48 PM -
It's been running over an hour and almost 2% done. The OLEDB wait now dominates. I will ask to have AVG disabled just in case something about this is different.
Final Time Duration(Sec) wait_type WaitTime (Sec) Number Waits Avg ms/Wait
----------------------- ------------- ----------------------------------- -------------- ------------ -----------
2014-06-23 12:22:58.053 4257 OLEDB 4276.8 1416257 3.020
2014-06-23 12:22:58.053 4257 CXPACKET 31.3 14857 2.110
2014-06-23 12:22:58.053 4257 LATCH_EX 13.4 32161 0.418
2014-06-23 12:22:58.053 4257 WRITELOG 6.9 2932 2.367
2014-06-23 12:22:58.053 4257 PAGEIOLATCH_SH 3.8 3813 1.001
2014-06-23 12:22:58.053 4257 SOS_SCHEDULER_YIELD 2.4 229066 0.010
2014-06-23 12:22:58.053 4257 ASYNC_IO_COMPLETION 1.3 24 54.542
2014-06-23 12:22:58.053 4257 PREEMPTIVE_OS_AUTHENTICATIONOPS 0.6 8226 0.070
2014-06-23 12:22:58.053 4257 BACKUPIO 0.5 41036 0.011
2014-06-23 12:22:58.053 4257 MSQL_XP 0.5 774 0.609
Randy in Marin
Monday, June 23, 2014 7:29 PM -
Hey Randy,
Can you run this script a few times (this will give us useful info):
SELECT
[owt].[session_id],
[owt].[exec_context_id],
[owt].[wait_duration_ms],
[owt].[wait_type],
[owt].[blocking_session_id],
[owt].[resource_description],
CASE [owt].[wait_type]
WHEN N'CXPACKET' THEN
RIGHT ([owt].[resource_description],
CHARINDEX (N'=', REVERSE ([owt].[resource_description])) - 1)
ELSE NULL
END AS [Node ID],
[es].[program_name],
[est].text,
[er].[database_id],
[eqp].[query_plan],
[er].[cpu_time]
FROM sys.dm_os_waiting_tasks [owt]
INNER JOIN sys.dm_exec_sessions [es] ON
[owt].[session_id] = [es].[session_id]
INNER JOIN sys.dm_exec_requests [er] ON
[es].[session_id] = [er].[session_id]
OUTER APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est]
OUTER APPLY sys.dm_exec_query_plan ([er].[plan_handle]) [eqp]
WHERE
[es].[is_user_process] = 1
ORDER BY
[owt].[session_id],
[owt].[exec_context_id];
GO
And also run my wait stats and latch stats DMV queries? I don't know what the other scripts you're running are doing so I'd like to see the results in the format etc I know.
Is the Terrain database being checked on the same system? What DOP is the system running?
Monday, June 23, 2014 8:35 PM -
Hi Paul,
The 45 GB "Court" and 200 GB "Terrain" checkdbs were both run on my desktop. The behavior is very different. I looked at the table sizes. The largest Court table is 9 GB. There are three Terrain tables over 20 GB. I have 16 GB RAM. Perhaps it has something to do with the table size vs RAM?
Here is the output of the query above.
session_id exec_context_id wait_duration_ms wait_type blocking_session_id resource_description Node ID program_name text database_id query_plan cpu_time ---------- --------------- -------------------- ------------------------------------------------------------ ------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------- 65 0 128 OLEDB NULL NULL NULL Microsoft SQL Server Management Studio - Query DBCC CHECKDB ([TERRAIN_TEST]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY 9 NULL 9148509 (1 row(s) affected)
Here is the output of your wait stats query from http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/
WaitType Wait_S Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S ------------------------------------------------------------ --------------------------------------- --------------------------------------- --------------------------------------- -------------------- --------------------------------------- --------------------------------------- --------------------------------------- --------------------------------------- OLEDB 10755.78 10755.78 0.00 28475396 75.57 0.0004 0.0004 0.0000 CXPACKET 1879.42 1222.97 656.46 831836 13.20 0.0023 0.0015 0.0008 PREEMPTIVE_OS_WRITEFILEGATHER 316.72 316.72 0.00 11 2.23 28.7925 28.7925 0.0000 LATCH_EX 291.22 253.38 37.84 1964729 2.05 0.0001 0.0001 0.0000 BACKUPTHREAD 184.59 184.59 0.00 540 1.30 0.3418 0.3418 0.0000 ASYNC_IO_COMPLETION 181.93 181.85 0.07 44 1.28 4.1347 4.1330 0.0017 (6 row(s) affected)
Here is the output of the latch stats query from http://www.sqlskills.com/blogs/paul/most-common-latch-classes-and-what-they-mean/
LatchClass Wait_S WaitCount Percentage AvgWait_S ------------------------------------------------------------ --------------------------------------- -------------------- --------------------------------------- --------------------------------------- DBCC_CHECK_AGGREGATE 279.02 1955689 95.23 0.0001 (1 row(s) affected)
Thanks, Randy
Randy in Marin
Monday, June 23, 2014 9:08 PM -
I don't know of anything definitive that would cause the behavior and symptoms you're seeing.
Can you turn off 2562 and 2549 please? They're not applicable to the system you're running on and I suspect 2562 may actually be detrimental for the larger database as 2562 is forcing the whole database to be checked as one 'batch' rather than being split up into smaller batches.
Restart without the two trace flags and see how it goes.
If that's not the root cause, then I'd check the large tables individually until you find the one that takes an inordinate amount of time compared to the others, then we can look into its schema to see what's different.
One other thing - have you tried the same comparison of databases on a 2012 SP1 system? Just wondering if there's something in SP2 that's causing an issue for the larger database.
Thanks
Tuesday, June 24, 2014 4:12 PM -
It was also slow before I applied SP2 and the flags. (I applied SP2 and used the flags in hopes of a faster CHECKDB.) I will turn off the flags and try again. It might be a little while before I can respond - so many servers, so little time.
Randy in Marin
Tuesday, June 24, 2014 6:00 PM -
I thought I posted some results yesterday, but they are not here. Darn, that was a lot of work. I'll summarize. It was still slow. I created a query to run DBCC CHECKTABLE on each table. The tables without spatial indexes are done first. They are checked quickly - even the 40 GB table. The CXPACKET and OLEDB are the main waits with CXPACKET slightly larger. Some of these tables use a image data type to store GIS point data.
The tables with spatial indexes (they have a column of the geometry type) are done second. These tables also have a constraint of the form ([SHAPE].[STSrid]=(2872)). These do not run very fast. Performance gets worse with more space/records. The primary OLEDB wait swamps everything else.
It's still running on the last table with a spatial index. It will be a long time for that table. I will try again with NOINDEX to see if the tables can be checked. If so, then that might indicate the spatial index is the source of the performance issue.
Here are the run times of the CHECKTABLE calls. The records/sec and MB/sec just use the database record count and total size with the run time to get feel for how fast it's working - or not working.
BTW, I also had AVG disable for a while without any difference.
Row IndexType TotalSpaceGB RowCount CheckTableSec Records/sec MB/sec ---- --------- ------------ ----------- ------------- ----------- ------- 80 0.273 1030914 1 1030914 279.602 81 1.077 7367710 3 2455903 367.549 82 1.414 9677533 3 3225844 482.779 83 1.451 7367710 5 1473542 297.261 84 1.906 9677533 5 1935506 390.422 85 2.049 14019848 6 2336641 349.695 86 2.510 17175852 7 2453693 367.200 87 2.762 14019848 8 1752481 353.521 88 3.383 17175852 9 1908428 384.963 89 7.050 48240943 18 2680052 401.059 90 9.503 48240943 26 1855420 374.256 91 19.211 249898908 54 4627757 364.306 92 39.785 249898908 146 1711636 279.038 93 Spatial 0.000 5 1 5 0.016 94 Spatial 0.002 15 0 NA NA 95 Spatial 0.018 33391 2 16695 9.180 96 Spatial 0.255 1030914 55 18743 4.740 97 Spatial 0.433 7367710 904 8150 0.491 98 Spatial 0.569 9677533 1454 6655 0.401 99 Spatial 0.824 14019848 2796 5014 0.302 100 Spatial 1.010 17175852 4040 4251 0.256 101 Spatial 2.836 48240943 28353 1701 0.102 102 Spatial 26.891 249898908 NULL NA NA
Final Time Duration(Sec) wait_type WaitTime (Sec) Number Waits Avg ms/Wait ----------------------- ------------- ----------------------------------- -------------- ------------ ----------- 2014-06-25 14:34:06.643 45282 OLEDB 45269.9 3476014 13.024 2014-06-25 14:34:06.643 45282 WRITELOG 79.6 33045 2.409 2014-06-25 14:34:06.643 45282 PREEMPTIVE_OS_AUTHENTICATIONOPS 6.0 95355 0.063 2014-06-25 14:34:06.643 45282 SOS_PHYS_PAGE_CACHE 4.7 20472 0.231
PS Sorry about the format. The columns line up fine when in edit, but not once I submit.
Randy in Marin
- Edited by Randy in Marin Wednesday, June 25, 2014 9:47 PM
Wednesday, June 25, 2014 9:42 PM -
PS Sorry about the format. The columns line up fine when in edit, but not once I submit.
The format looks good in my text-only news reader where I have a monospaced font. So copy-and-pasting into monospaced window should help.
I'm refraining from commenting on the issue as such - this definitely out of my league.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seWednesday, June 25, 2014 9:58 PM -
Thanks for sharing things with us!!
With Due to respect to all, below are statements, I need to make. I don't know who all have done what all in the past as computer industry is for all and I take myself always as learner. My goal is to share what I know as per my best awareness and it has nothing to do hurting\disrespecting anyone!
Nice to note that CXPACKET are coming and that usually comes with adjustment for MAXDOP.
Also you will need to disable the AV scan for longer time and then conclude.
See, I don't know, how your environment setup is and normally on msdn people don't share all stuffs in one go.
I need to ask you something or some queries to validate\check then there is little option for me to know.
There are some basic facts:
1. Let say you don't have AVG scan defined for Exclusion\Inclusion_ Can you say it will never\not effect snapshot file got created for DBCC CHECKDB operation, which will be in one of your Databases data\log folders where scan continuously happens. This is something even MS says to setup first as best practice. This has nothing to do with CHECKDB alone, its a basic thing and needs to be setup when you setup your environment for the first time.
2. I had mentioned about VLF, which seems to be not suitable here and I take that back.
3. MAXDOP: It is coming now in your observation.
Keep learning and share with me too!!
Santosh Singh
Thursday, June 26, 2014 4:14 AM -
Hi Santosh. There are two distinct results depending upon if a spatial type is present. DBCC CHECK performance is great when I see CXPACKET as the top wait. This is what I see with tables without a spatial data type.
AVG on or off does not appear to affect performance on tables without a spatial data type. It might be that something is different about the spatial type or index that will invoke a specific AVG issue. I watched the performance very carefully when AVG was off. There was no difference in the counters, CPU, and waits. It could still be AVG, but I have my doubts.
I checked VLF counts before starting this thread. There are about 60 VLFs with sizes in the range 8-32 MB. I came across an interesting post (http://www.sqlskills.com/blogs/kimberly/transaction-log-vlfs-too-many-or-too-few/) that indicated that you can have too many or few VLFs. Besides, if this was the issue, I think I would have seen latency on the disks. Disks are almost idle during the check when a spatial type is present.
With OLEDB as almost the only wait present when checking a table with a spatial index, I don't think MAXDOP is the next thing to try. The last table is being checked at about 10% per day. The CPU counter indicates that it's using about 1 logical core total most of the time. The performance monitor shows and average CPU use of 10%. (6 cores with hyper-threading enabled.) The total elapsed time is slightly larger than the CPU time on the DBCC process. I see a couple of logical cores running about 50-80% most of the time. A few others are working at 10-20%. Several are idle.
At this point, I don't have anything better to try after running the checks with NOINDEX. If I have time, I will try disabling HT, setting a MAXDOP, and disabling AVG for the entire run.
Randy in Marin
Thursday, June 26, 2014 3:57 PM -
At this point, I don't have anything better to try after running the checks with NOINDEX. If I have time, I will try disabling HT, setting a MAXDOP, and disabling AVG for the entire run.
A drastic step would be to drop only the spatial index to narrow down there the problem is. If that has the expected effect - CHECKTABLE completes in time, next step is to recreate it and try again. If the index was corrupted, CHECKTABLE may complete faster.
OF course, this is mere speculation on my part, and Paul may have better ideas.
My gut feeling that playing with MAXDOP and HT is a waste of time. And I don't really think that the antivirus has anything to do with it, as long as you have put the database files on the exclusion list. (Unless the antivirus finds it ways to the alternate stream which has the snapshot that DBCC uses.)
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seThursday, June 26, 2014 10:16 PM -
Thanks Erland. It's 25% done with the large table. I'm incline to let it continue (5-10% per day until perhaps 7/2) for one real pass to insure all is well with a recent migration. I'm behind on other work, but I will come back to this and try the check on the base table and index separately. I will also try to drop/create the index. I was told that this index will take about a day to create.
We are also going to check the use of the data with our client. It is a 5 foot elevation contour that might not be needed or might have too much resolution.
Me: Dr. SQL, it hurts when I use that data.
Dr. SQL: Well, then don't use that data.
Randy in Marin
Friday, June 27, 2014 4:12 PM -
-
My DBCC CHECKTABLE eventually failed with a memory error. I did not set a memory limit on the instance and was running a VM plus a few other things. The timing was logged, but I do not know the result of the check on the last table - good or bad. 70 KB/sec for 27 GB is not going to work for us.
Row Table TotalSpaceGB RowCount CheckTableSec Records/sec MB/sec ---- -------------------- ------------ ----------- ------------- ----------- ------- 101 dbo.FEMA25 2.836 48240943 28353 1701 0.102 102 dbo.CONTOUR_DENSIFIE 26.891 249898908 401743 622 0.069
I removed the spatial indexes and the check constraints without improvement. So, it appears that the presence of the geometry data type is the trigger for the slow performance. I checked the datalengh of the geometry column on the table with worst performing DBCC. It is only 30 bytes for all 250 million records. The row length is about 100 bytes. Perhaps there is some single threaded spatial related CLR call or such taking 1-2 ms per record?
Has anybody else run a DBCC on a table using the geometry data type with a high record count?
BTW, here are typical results on the same system for tables without a spatial type. About a factor of 5000 faster in the number of records or MB per second.
Row Table TotalSpaceGB RowCount CheckTableSec Records/sec MB/sec ---- -------------------- ------------ ----------- ------------- ----------- ------- 81 dbo.s948 1.077 7367710 1 7367710 1102.65 82 dbo.s946 1.414 9677533 2 4838766 724.168 83 dbo.f948 1.451 7367710 3 2455903 495.435 84 dbo.f946 1.906 9677533 4 2419383 488.027 85 dbo.s947 2.049 14019848 3 4673282 699.391 86 dbo.s949 2.510 17175852 3 5725284 856.799 87 dbo.f947 2.762 14019848 6 2336641 471.362 88 dbo.f949 3.383 17175852 7 2453693 494.952 89 dbo.s986 7.050 48240943 9 5360104 802.117 90 dbo.f986 9.503 48240943 18 2680052 540.592 91 dbo.s951 19.211 249898908 52 4805748 378.317 92 dbo.f951 39.785 249898908 119 2099990 342.349
Thanks
Randy in Marin
Tuesday, July 8, 2014 10:43 PM -
Another thing I noticed, using the Process Monitor from Sysinternals, is that there is an IO pattern where the IO is done in spurts separated by a delay. The spurt is about 50-60 reads done within .2 seconds. The delay is about 8 seconds between spurts.
Randy in Marin
Tuesday, July 8, 2014 11:37 PM -
-
Yup - I concur. Looks like a bug in spatial processing, possibly on 2012.
CEO, SQLskills.com (http://www.sqlskills.com/blogs/paul), SQL MVP, Microsoft Regional Director, Contributing Editor of TechNet and SQL Server Magazines, Author of 2005 DBCC CHECKDB/repair. Consulting/Training: http://www.sqlskills.com/
Thursday, July 10, 2014 9:45 PM -
Okay, I'll report it. No time today. Thanks for your advice.
Randy in Marin
Thursday, July 10, 2014 11:58 PM -
A status update for the curious. I changed the service account to avoid locking pages in memory. (DBCC would stop after a memory error.) It's been running 3 days and the estimated finish date is 7/14/2014. I looked at the sql process and noticed that it generates about 800000 page faults per second. I did not know that paging could be so high. No hard faults. Even so, it's not something you see every day.
PS I have not submitted the bug yet. I will.
Randy in Marin
Thursday, July 17, 2014 11:48 PM -
FYI: The production server DBCC finished after 12 days. No errors reported.
Randy in Marin
Monday, July 21, 2014 11:52 PM -
A slight necro of this thread - sorry.
Paul's discussion with Ryan about CHECKDB's expression evaluator (http://www.sqlskills.com/blogs/paul/dbcc-checkdb-performance-and-computed-column-indexes/), and the involvement of spatial data, compels me to attempt a contribution :).
While I see removing the spatial index did not improve the situation, the way in which a spatial index is used is interesting, in that there are two or more filters used to satisfy a spatial index. The primary filter of a spatial index uses a grid (a good picture is at http://blogs.msdn.com/b/isaac/archive/2007/12/01/spatial-indexing-part-3-faster-primary-filtering.aspx). However, because spatial data uses floating point arithmetic (in part due to spatial's use of COS, SIN, TAN), a secondary filter (which is a CLR function) must be applied to near misses or near hits (after the primary filter has been applied). The secondary filter's CLR function can be seen in most query plans (that harness a spatial index), plus there are 1 or 2 additional CLR functions which determine the highest (parent) level cell that will satisfy the spatial index (see the graphic of the levels, under the "Deepest-Cell Rule" section of http://msdn.microsoft.com/en-us/library/bb895265.aspx). Stll. removing the spatiall index (while retaining the spatial data) did not help, so I am not sure this paragraph was worth writing :).
The other matter that comes to mind is CHECKDB's DATA_PURITY "option", which can be (or at least was) disabled by trace flag 2566 - see http://support.microsoft.com/kb/945770/en-ushttp://support.microsoft.com/kb/945770/en-us ( I can only hope someone at MS one day gets it into their 'head' that publishing most traceflg is no less dangerous than giving customers cars that can exceed all known speed limits...:). For spatial data to be 'pure', the spatial type must have .STIsValid = 1 (see BOL). STIsValid must adhere to the Open Geospatial Consortium's definitions of valid spatial data. One example: The series of points that define the vertices of a polygon must both begin and end with the same point. 2nd example: A linestring must not cross over itself more than once (this is documented somewhere in BOL, if memory serves me right). My 'point' is that .STIsValid must be implemented in CLR, and it must be costly. Assuming DATA_PURITY triggers .STIsValid, not only would each spatial value be checked, each *point* within each spatial value would be checked (which is worse than row by agonizing row, with CLR/OLEDB waits thrown in 'just for good measure')....
Other than the references, the above is a guess. But, you might want to try the quasi-documented dbcc traceon(2566,-1) followed by a CHECKTABLE, and see if that helps.
- Edited by SoHelpMeCodd Wednesday, August 6, 2014 9:07 AM
Wednesday, August 6, 2014 8:56 AM