locked
Test handling SQL replication errors. RAISERROR seems to not work with MSSQL_REPL errors numbers RRS feed

  • Question

  • Folks,

    I'm not specifically after assistance with the error I'm dealing with although if someone has a clue to fix the actual issue that would be great. My question relates to errors raised by Replication faults and simulating these failures.

    Below is the error I get every few weeks for a particular publication\subscription.

    • Error messages:
    • The merge process failed because it detected a mismatch between the replication metadata of the two replicas, such that some changes could be lost leading to non-convergence. This could be due to the subscriber not having synchronized within the retention period, or because of one of the replicas being restored to a backup older than retention period, or because of the publisher performing more aggressive cleanup on articles of type download-only and articles with partition_options = 3. (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147199363)
      Get help: http://help/MSSQL_REPL-2147199363
    • The common generation watermark is invalid at this replica since it does not exist or metadata for changes not yet propagated may have been cleaned up. (Source: MSSQLServer, Error number: 21800)
      Get help: http://help/21800

    I know how to repair it but the suggestions about preventing it are limited. i suspect the "aggressive cleanup on articles of type download-only" but I'm just not sure how to limit the cleanup.

    Anyway I'm putting an SP together to re-initialize the subscriber when the above error is detected. In testing the process I used the RAISERROR method to test.

    My issue is simulating the first MSSQL_REPL-2147199363.

    For example 

    RAISERROR(21800 , 16, -1) WITH LOG

    works perfectly including the full text of the error message.

    Msg 21800, Level 16, State 1, Line 1
    The common generation watermark is invalid at this replica since it does not exist or metadata for changes not yet propagated may have been cleaned up.

    However this is what I get when I try to use the first error code

    RAISERROR(-2147199363 , 16, -1) WITH LOG
    Msg 18054, Level 16, State 1, Line 1
    Error -2147199363, severity 16, state 1 was raised, but no message with that error number was found in sys.messages. If error is larger than 50000, make sure the user-defined message is added using sp_addmessage.
    

    When you look at the original errors it's clear they come from different sources. The -2147199363 error is even prefixed with MSSQL_REPL.

    (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147199363)

    (Source: MSSQLServer, Error number: 21800)

    I was hoping to get the generic error message text and identify the parameters of the error message that allow me to identify exactly which publication\subscription is causing the issue and pass these values as parameters to the SP for re-initialisation.

    Has anyone got any experience with these large negative error codes and MSSQL_REPL specific error messages?

    regards

    Andrew

    Wednesday, March 28, 2018 4:51 AM

All replies

  • The problem is that you are looking at errors raised by .net code and assuming that they are the same error number as what is generated within SQL.

    They are completely different and there is no record of them within SQL Server sys.messages table.

    You can find 21800 here:

    select * from sys.messages where message_id=21800 and language_id=1033

    You can also poll msrepl_errors in the distribution database to find a record of their occurence. Perhaps you could tap into this table to try go accomplish your goal.

    Wednesday, March 28, 2018 2:07 PM
    Answerer
  • Thanks Hilary.

    I've managed to work out how to respond to this event. It's not general like I'd like but at least the system can respond.

    Wednesday, April 4, 2018 2:19 AM