Answered by:
DBCC SHRINKFILE takes too long to run, LOB doesn't seem to be the problem

Question
-
I have a large data file (PRIMARY filegroup) with a total size of ~350GB, out of which ~118GB is used (only 37%).
I want to shrink this file as much as possible, since the database was turned into a read-only archive, and is therefore not expected to grow anymore.
When a big SHRINKFILE is executed, it takes too long to run, so we cancelled it (after about 8 hours), and I started shrinking in small chunks.
This actually worked well but I seem to have "hit a wall" when I reached ~314GB.
SQL Server doesn't seem to want to shrink even a tiny chunk of 10 MB.
It just "hangs" at 99.99% (based on sys.dm_exec_requests) and stays there.
I managed to make "progress" by shrinking the file in microscopic chunks of 1 MB. Every such chunk takes about 2 minutes to shrink. I got ~196000 more chunks like this to go, and if every one of them takes 2 minutes, it's gonna take me ~272 days to finish! That's madness!I found several sources here and there that suggest that this issue happens due to SQL trying to move LOB pages during the shrink:
http://www.sqlservercentral.com/Forums/Topic1372418-1550-1.aspx
http://www.sqlskills.com/blogs/paul/why-lob-data-makes-shrink-run-slooooowly-t-sql-tuesday-006/
However, unlike the people in those other posts, I don't have tables with LOB or Overflow-Data!
I run the following query to make sure:
SELECT p.object_id, OBJECT_SCHEMA_NAME(p.object_id) table_schema, OBJECT_NAME(p.object_id) table_name, p.index_id, p.partition_number, au.allocation_unit_id, au.type_desc, au.total_pages, au.used_pages, au.data_pages FROM sys.system_internals_allocation_units au JOIN sys.partitions p ON au.container_id = p.partition_id WHERE type_desc <> 'IN_ROW_DATA' AND total_pages > 0 ORDER BY table_schema, table_name, p.index_id, p.partition_number, au.type;
But the only result that comes up is the system table sys.sysobjvalues with 90 total pages of LOB_DATA.
Based on this post by Paul Randal: http://www.sqlskills.com/blogs/paul/why-lob-data-makes-shrink-run-slooooowly-t-sql-tuesday-006/
This puny little table shouldn't be bothering me so much, because Paul says:When a text page is moved by data file shrink, a table scan must be performed for each LOB value to find the ‘owning’ data record so the pointer can be updated. (If the LOB value is from an INCLUDEd column in a nonclustered index, then a similar scan must be performed of the nonclustered index. If the LOB value is lower down in the text tree for a >8000 byte LOB value, all text pages for that table or index must be scanned.)
Which means that, at most, only the owning table of a LOB page should be the one to be scanned.
Since sysobjvalues is the only one with LOB data, and it's a tiny table, even a full table scan on it shouldn't bother me.
I have a couple tables of my own with NVARCHAR(MAX) columns but none of them has overflowed data. But even if I do somehow, both of these tables are tiny as well! (less than 1MB each), so a full scan on both shouldn't be a problem either! I have several huge tables in the database as well, but none of them have LOB or MAX columns.So, this suggests one or more of the following:
1. Paul Randal was mistaken when he said that only the owning table/index will be scanned.
2. The issue is not LOB data, but something else entirely.So far I managed to shrink about 10MB and I keep going, each 1MB taking around 2 minutes.
But this situation is completely absurd and unacceptable.I'm stomped. Anyone got any ideas?
Thanks!
Eitan Blumin; SQL Server Consultant - Madeira SQL Server Services
- Edited by EitanBlumin Wednesday, October 22, 2014 12:11 PM
Wednesday, October 22, 2014 12:07 PM
Answers
-
For a heap with lots of indexes, the reason is quite apparent: all those NC indexes needs to be updated, and this is done through the query processor, see this blog post from Paul Randall:
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2007/03/29/how-does-your-schema-affect-your-shrink-run-time.aspxBut even if there are no indexes, the code is still taking that slower way.
Paul also has a comment about this in this thread:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/ccfd6304-a43b-4a9e-b441-f38e46dc2975/why-does-shrinkfile-run-slower-on-a-heap-vs-a-clustered-index?forum=sqldatabaseengine
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se- Marked as answer by EitanBlumin Monday, October 27, 2014 12:05 PM
Sunday, October 26, 2014 4:33 PM -
Eitan,
I don't have anything handy to test with right now (or the time), but since shrink is fully logged it may be possible to check the log and see what it is going slow on (what it's touching in the log). Just a thought, like I said I haven't tested this as it was just a fleeting thought.
- Marked as answer by EitanBlumin Monday, October 27, 2014 12:08 PM
Thursday, October 23, 2014 2:52 AMAnswerer -
Okay guys, some good news!
First of all, thanks again to Sean for his suggestion to look inside the transaction log!
Indeed that specific table that I noticed was filling the transaction log, is the largest table in my database. Its size stands at about ~65GB. Like I said earlier, it has no LOB or Overflow columns. But one interesting fact about it is - it had absolutely no indexes! It was a heap, but it was also partitioned. I didn't know a case like this could cause such problems with shrink? It's weird but it also kinda makes sense in a way.
Anyways, I ran a script to create a partitioned clustered index on that table. That took pretty much all day long due to the table size (almost 7 hours). But once that was finished, shrinking the data file was a breeze!
At least... Until the shrink started "hanging" again because now it started dealing with my second largest table (~55GB). Like the first one, this one is also a partitioned heap without any indexes. So I expected this to happen. I guess the solution this time will be the same: Create a clustered index.
Does this make sense to you? A partitioned heap table causing such heavy workload to SHRINK, but when it's moved to a clustered index, suddenly it flies?
Eitan Blumin; SQL Server Consultant - Madeira SQL Server Services
- Edited by EitanBlumin Sunday, October 26, 2014 3:51 PM
- Marked as answer by EitanBlumin Monday, October 27, 2014 12:05 PM
Sunday, October 26, 2014 3:50 PM -
Well, my understanding of what he says, is that they go to the Query Processor to update the non-clustered indexes one by one, why it should not matter in your case, since you did not have any non-clustered indexes. But I guess it could be read in either way, and without access to the source code, this may be as close as we get.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se- Marked as answer by EitanBlumin Monday, October 27, 2014 12:05 PM
Sunday, October 26, 2014 9:02 PM
All replies
-
-
Thanks Erland!
Yeah I already took that option into consideration as well.
I suppose that if no other solution comes up until after the weekend, then we'll go in that route.It is a very bizarre situation though.
No ideas what I could do to investigate it further?Eitan Blumin; SQL Server Consultant - Madeira SQL Server Services
Wednesday, October 22, 2014 10:30 PM -
Eitan,
I don't have anything handy to test with right now (or the time), but since shrink is fully logged it may be possible to check the log and see what it is going slow on (what it's touching in the log). Just a thought, like I said I haven't tested this as it was just a fleeting thought.
- Marked as answer by EitanBlumin Monday, October 27, 2014 12:08 PM
Thursday, October 23, 2014 2:52 AMAnswerer -
-
And, oh, did you run this query to make sure:
SELECT * FROM sys.system_internals_allocation_units au
WHERE NOT EXISTS (SELECT *
FROM sys.partitions p WHERE p.partition_id = au.container_id)
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seHi Erland,
Yes, I've tried it and nothing came up.
Eitan,
I don't have anything handy to test with right now (or the time), but since shrink is fully logged it may be possible to check the log and see what it is going slow on (what it's touching in the log). Just a thought, like I said I haven't tested this as it was just a fleeting thought.
Hi Sean,
Thank you for the suggestion. I looked into it and noticed that it was doing all of the work on one particular table. I'll try to rebuild that specific table or maybe move it into a different filegroup and see how that helps.
I'll post an update here as soon as something develops.
Eitan Blumin; SQL Server Consultant - Madeira SQL Server Services
Sunday, October 26, 2014 8:26 AM -
Okay guys, some good news!
First of all, thanks again to Sean for his suggestion to look inside the transaction log!
Indeed that specific table that I noticed was filling the transaction log, is the largest table in my database. Its size stands at about ~65GB. Like I said earlier, it has no LOB or Overflow columns. But one interesting fact about it is - it had absolutely no indexes! It was a heap, but it was also partitioned. I didn't know a case like this could cause such problems with shrink? It's weird but it also kinda makes sense in a way.
Anyways, I ran a script to create a partitioned clustered index on that table. That took pretty much all day long due to the table size (almost 7 hours). But once that was finished, shrinking the data file was a breeze!
At least... Until the shrink started "hanging" again because now it started dealing with my second largest table (~55GB). Like the first one, this one is also a partitioned heap without any indexes. So I expected this to happen. I guess the solution this time will be the same: Create a clustered index.
Does this make sense to you? A partitioned heap table causing such heavy workload to SHRINK, but when it's moved to a clustered index, suddenly it flies?
Eitan Blumin; SQL Server Consultant - Madeira SQL Server Services
- Edited by EitanBlumin Sunday, October 26, 2014 3:51 PM
- Marked as answer by EitanBlumin Monday, October 27, 2014 12:05 PM
Sunday, October 26, 2014 3:50 PM -
Sunday, October 26, 2014 4:32 PM
-
For a heap with lots of indexes, the reason is quite apparent: all those NC indexes needs to be updated, and this is done through the query processor, see this blog post from Paul Randall:
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2007/03/29/how-does-your-schema-affect-your-shrink-run-time.aspxBut even if there are no indexes, the code is still taking that slower way.
Paul also has a comment about this in this thread:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/ccfd6304-a43b-4a9e-b441-f38e46dc2975/why-does-shrinkfile-run-slower-on-a-heap-vs-a-clustered-index?forum=sqldatabaseengine
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se- Marked as answer by EitanBlumin Monday, October 27, 2014 12:05 PM
Sunday, October 26, 2014 4:33 PM -
But even if there are no indexes, the code is still taking that slower way.
Paul also has a comment about this in this thread:
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/ccfd6304-a43b-4a9e-b441-f38e46dc2975/why-does-shrinkfile-run-slower-on-a-heap-vs-a-clustered-index?forum=sqldatabaseengine
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
Eitan Blumin; SQL Server Consultant - Madeira SQL Server Services
- Edited by EitanBlumin Sunday, October 26, 2014 7:47 PM there were NO indexes at all. clustered or otherwise.
Sunday, October 26, 2014 7:43 PM -
Well, my understanding of what he says, is that they go to the Query Processor to update the non-clustered indexes one by one, why it should not matter in your case, since you did not have any non-clustered indexes. But I guess it could be read in either way, and without access to the source code, this may be as close as we get.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se- Marked as answer by EitanBlumin Monday, October 27, 2014 12:05 PM
Sunday, October 26, 2014 9:02 PM -
Yow! Well, good to know.
I might add a couple of more or less obvious comments, that this is also in some part a glitch (let's call it) in SQL Server's scalability, such inefficiencies can be glanced over as long as databases are small but become issues when databases are large. So that if something like this is left in it could be documented of course, but better yet the engine could issue a warning if it hits one, if it hits a big heap in a shrinkfile.
Thanks to all for an interesting discussion, and to the Pauls for posting the explanations.
Josh
Monday, October 27, 2014 12:12 AM -
Well, my understanding of what he says, is that they go to the Query Processor to update the non-clustered indexes one by one, why it should not matter in your case, since you did not have any non-clustered indexes. But I guess it could be read in either way, and without access to the source code, this may be as close as we get.
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seMy guess is that the performance hit could be caused due to the table being partitioned. Obviously, SQL needs some way to know which records are located in each partition, so some kind of RID mapping must exist somewhere. I bet that's what caused the performance problem... It's quite similar to the nonclustered-indexes scenario.
I wonder if this problem would have occured had the tables were not partitioned? Hmm... Interesting. I think it's worth a lab test.
Eitan Blumin; SQL Server Consultant - Madeira SQL Server Services
Monday, October 27, 2014 12:04 PM -
My guess is that the performance hit could be caused due to the table being partitioned. Obviously, SQL needs some way to know which records are located in each partition, so some kind of RID mapping must exist somewhere. I bet that's what caused the performance problem... It's quite similar to the nonclustered-indexes scenario.
Could be.
I wonder if this problem would have occured had the tables were not partitioned? Hmm... Interesting. I think it's worth a lab test.
Well, if you can't think no other better why to spend your time, who am I to stop you? :-)
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.seMonday, October 27, 2014 3:55 PM