I will be as detailed as possible.
We have a high OLTP Publisher (SQL 2005 Ent.) publishing to 5 subscribers (SQL 2005 Std.). The DB on one of the subscribers was 50GB more in size than other 4 DB's. After little bit of investigation found that one of the table alone which was 500MB in total size on all other servers was 50.5GB on one. Rebuild the index fixed the issue and the size came back to 500MB.
We are noticing that the size of this table increasing on a daily basis. Comparing the PK & NCIX indexes we found under INDEX PROPERTIES-FRAGMENTATION the ghost rows which is 0 in all servers is 40,000,000 on the effected server.
- Trace 661 is not turned to ON
- sp_configure compare is same on all servers
- MSFT has a fix for 2005 SP1 which does not apply in our case (http://support.microsoft.com/kb/932115)
- There are more than 2-3 million changes every hour
- The table has 5 INT & one DATETIME columns i.e. no BLOB or VARCHAR(max) columns
- The clustered index on this table is on a INT column i.e. all the 5 servers have the same schema (Table,PK,NCIX, FillFactor etc).
Updated Information: after referring to "http://www.sqlskills.com/blogs/paul/post/inside-the-storage-engine-ghost-cleanup-in-depth.aspx"
Running update stats on the table does help in decreasing the number of ghost rows, however its temporary i.e. the rowcount reduces by 100K from 40 million and then remains there. I tried running update stats again and it reduced by another 100K however remains at that number even after waiting for 30 minutes.
I will need to run update stats thousands of times to clear all the records which is not what i prefer. I also checked that auto update stats is enabled on the DB
MSDN is just a social support site, and Microsoft employees only post on rare occassions.
I do remember running into the ghost orphan records in SQL 2005, and even though we installed the hot fix and update, we kept having MORE ghost problems after that. I've often seen this from Microsoft bug reporting, they tell you the bug occurs on Thursday afternoon, but they don't say ONLY! And when they issue a fix for Thursday afternoon, it does fix it ONLY for Thursday afternoons.
The only fix I believe we found was to periodically rebuild the whole table into a different filegroup, and our table was big, too.
Sorry for the late update, Better than not updating i guess.
The ghost record not getting deleted issue is resolved. We opened a case with Microsoft and below are a brief description of what we did. Please do not be surprised by the actual FIX.
1. We checked the Index stats on one of the effected table i.e. lets call it Dbo.Employee using the below query where we found that the number of ghost rows was constantly increasing with each replication run. The record count on the table was 3.5 million where in ghost records were 175 million.
- select * from sys.dm_db_index_Physical_Stats(7,2050767687,NULL,NULL,'DETAILED')
- select * from sys.dm_db_index_Operational_Stats(7,2050767687,NULL,NULL)
2. We also observed that updating the stats on the table reduces the ghost records count by 200-500k temporarily and then does not do any help (NO LUCK)
3. We ran the below free space system SP to clean up space (NO LUCK)
- EXEC sp_clean_db_free_space @dbname = N'OurDBName'
4. We enabled flags 662, 3605 which writes the log of ghost record cleanup to the database log file & found that for this DB (id=7) the process was always purging the same page over & over again. Below is what we saw. The process was always purging or trying to purge the same page from the DB
2012-07-24 12:03:52.240 spid55 Purged page Dbid 7, File 1, Page 2076
2012-07-24 12:03:53.440 spid17s Purging page Dbid 7, File 1, Page 2076
2012-07-24 12:03:57.460 spid57 Purged page Dbid 7, File 1, Page 2076
2012-07-24 12:04:02.460 spid57 Purged page Dbid 7, File 1, Page 2076
2012-07-24 12:04:07.260 spid55 Purged page Dbid 7, File 1, Page 2076
2012-07-24 12:04:08.440 spid17s Purging page Dbid 7, File 1, Page 2076
2012-07-24 12:04:12.290 spid55 Purged page Dbid 7, File 1, Page 2076
2012-07-24 12:04:12.490 spid57 Purged page Dbid 7, File 1, Page 2076
2012-07-24 12:04:17.320 spid55 Purged page Dbid 7, File 1, Page 2076
2012-07-24 12:04:17.520 spid57 Purged page Dbid 7, File 1, Page 2076
5. We then checkd the information in the page using the below command and the header PFS page we foung that the specific page 2076 was not listed as having any any ghost recods. Below are the copmmands used & how the header page would look.
- DBCC PAGE(7,1,2076,3)
- DBCC PAGE(7,1,1,3) (For Header Page)
(1:1737) - = ALLOCATED 0_PCT_FULL
(1:1738) - (1:1740) = NOT ALLOCATED 0_PCT_FULL Has Ghost
(1:1741) - = ALLOCATED 0_PCT_FULL
(1:1742) - (1:1959) = NOT ALLOCATED 0_PCT_FULL Has Ghost
(1:1960) - (1:1967) = ALLOCATED 0_PCT_FULL
(1:1968) - (1:2069) = NOT ALLOCATED 0_PCT_FULL Has Ghost
(1:2070) - = ALLOCATED 0_PCT_FULL Mixed Ext
(1:2071) - = ALLOCATED 100_PCT_FULL Mixed Ext
(1:2072) - = ALLOCATED 0_PCT_FULL Mixed Ext
(1:2073) - = ALLOCATED 100_PCT_FULL Mixed Ext
(1:2074) - (1:2075) = ALLOCATED 0_PCT_FULL IAM Page Mixed Ext
(1:2076) - = ALLOCATED 0_PCT_FULL Mixed Ext
(1:2077) - = ALLOCATED 0_PCT_FULL IAM Page Mixed Ext
(1:2078) - (1:2097) = NOT ALLOCATED 0_PCT_FULL Has Ghost
(1:2099) - = NOT ALLOCATED 0_PCT_FULL
6. We then ran DBCC CHECKDB for look for error. (NO LUCK). CheckDB came up clean.
7. We took the call to restart the SQL as a last resort. This fixed the issue