none
MsMerge_genhistory has alot of rows with pubid = null

    Question

  • I have a merge replication and I am worried that the cleanup of metadata might not be enough. I have a retention period of 60 days and I can see thet the metadatacleanup-job do remove rows i msmege_genhistory that are older but only for rows that has the right guid in pubid. most of the rows, about 1,6 million, has the value NULL in pubid and I cannot figure out why. does anybody know why there is so many null-values?

    This is causing the stored procedure sp_msenumgenerations90 to be very slow and sometimes the whole system is locking.

    Tuesday, September 26, 2017 2:44 PM

Answers

  • What you are referring to is defunct subscriptions. Have you tried to drop these subscriptions? sp_dropmergepullsubscription should help you here. You will likely find that some metadata will not be purged. Be very careful about purging it manually.

    • Marked as answer by Dratskapoo Friday, October 6, 2017 2:49 PM
    Thursday, October 5, 2017 3:08 PM
    Moderator

All replies

  • If you are doing republishing pubid may be not null to reflect the upstream publisher.

    You may need to re-index/defrag or find covering indexes to fix the index enumeration issue. Also you might need to limit the number of concurrent syncs if blocking is your problem.

    Are you filtering? Issues like this could be the result of complex filtering.

    Tuesday, September 26, 2017 2:50 PM
    Moderator
  • Thank you Hilary for the reply!

    I do not republish, I only have one publication that has about 250 subscribers that are windows ce clients with sql server compact.

    I do have a job that rebuilds my indexes every night, and everything has been running fine for the past year (I dropped and recreated the replication a year ago because of some issues with some new tables). A few weeks ago when two clients where downloading new databases i started to see some locking and it was clear that it was sp_msenumgenerations90 that was the problem, CPU spiked and buffer I/O waits was over 2000 ms/s. I was able to see in the execution plan that there was a sort-operation occurring that was spilling over to tempdb so i created an index that was covering the tables, and then rebuilding it to include columns since there was a lot of columns in the sp that was in play. After that i could see in the new execution plan that the query used my new index and gets enough memory and is not spilling to tempdb. However the sort after my index takes 97% of execution and it sorts on all the columns in my index + expr1013. I added that index Friday and the synctime went down from 1h to 30min. Today it started to lock again and now the execution plan looks completely different with a sort in another place that is spilling to tempdb. I rebuilt my index and it changed the execution plan back to what it was before. even with my index CPU is running high on big syncs and wait CXPACKET is showing in activity monitor.

    Yes, i do use filtering, mostly on HOST to only replicate tables that is interesting for the client. I also have some join-filters. I have been working with this system for two years (it been live since 2008) and before that i did not even know about replication. I am originally a .Net developer and don't have a lot of db-experience so I have to ask what would be a complex filter?

    One thing that i want to add is that there is 1627862 rows in msMerge_genhistory and 1604342 of them har pubid = null and is not getting cleaned up, i can see that the ones with my pubid (guid) has coldate that is within my retention period but the ones with null spans to the date when the publication was recreated.


    • Edited by Dratskapoo Thursday, September 28, 2017 8:27 AM
    Tuesday, September 26, 2017 6:20 PM
  • A small update of what I have found. we use SQL server 2012 sp2 and from some atricles i have read that the metadata cleanup is modified in this version. now it does not only look at retention but also "active generations" that it get from looking at subscriptions in sysmergesubscriptions -table and checking last generation for every active subscriber and excludes them from the metadata cleanup. My problem is that all my subscribers has status 1 (active) in sysmergesubscribers and therefore most of my old generation does not get cleaned

    We use anonymous subscriptions and when a handheld device breaks down or if the database gets corrupted in the field the user can make a new subscription from the application. this means we do have a lot of rows in sysmergesubscriptions that gets "left behind" and for some reason they seem to not get set to expired. Do anybody now why they are not marked expired even though they are not synced whitin the retention period? 

    Thursday, October 5, 2017 6:54 AM
  • What you are referring to is defunct subscriptions. Have you tried to drop these subscriptions? sp_dropmergepullsubscription should help you here. You will likely find that some metadata will not be purged. Be very careful about purging it manually.

    • Marked as answer by Dratskapoo Friday, October 6, 2017 2:49 PM
    Thursday, October 5, 2017 3:08 PM
    Moderator
  • Thank you again Hilary,

    I will try to drop the subscriptions i know is not active and I think this will be the solution. But I have to add some more information that i have discovered in case someone else run in to this. 

    The query that evaluates active generations in the new clean up routine does also exclude subscriptions with the flag cleanedup_unsent_changes. all subscriptions that has not synced within the retention period in my sysmergesubscription has this flag set. So old (expired) subscriptions does not play a part for me. The problem is actually the defunct subscriptions that has only initiated a new subscription but not synced (and is still within the retention period). they have a null value in the column last_local_sentgen and the cleanup will fetch "generation BETWEEN isnull(oldsent,0)+1 AND newsent" (oldsent is last_local_sentgen and newsent is sentgen). With a value of null in last_local_sentgen and sentgen is 2000000 this will add all generations from 1 to 2000000 as excluded by the cleanup and this seems to be why it is not cleaning up old generations in my setup.

    I will add a new post when i have tried your solution Hilary and I am hopeful that it will get me back on track. Thank you again for your help.


    • Edited by Dratskapoo Friday, October 6, 2017 7:26 AM fix
    Friday, October 6, 2017 7:25 AM
  • Thank you for your help Hilary, I have carried out some test in my test environment and I can see that the cleanup now actually cleans the old rows that where piling up!

    I just want to add that the sp you supplied, sp_dropmergepullsubscription, was not an option for me since the subscribers do not exist anymore so i ended up running sp_dropmergesubscription instead on the distibutor and that removed the defunced subscriptions that was causing the cleanup to exclude every generation from the cleaning.

    Friday, October 6, 2017 2:56 PM
  • Hi Hilary, What can be the impact of purging manually. What we have to care while purging? If I am purging data more then 6 month which has pubid null what it can impact?
    Wednesday, May 15, 2019 12:15 PM
  • IIRC is a subscriber has not checked in for an extened time period the meta data will hang around. IIRC for filtered publications it is 2 times the retention period + 25 hours to handle time zones.

    Purging this manually may cause meta data cleanup errors which are show stopping and will require a re-initialization.

    Wednesday, May 15, 2019 1:13 PM
    Moderator