locked
Recurring 1418 error while setting up DB Mirroring using certificates... RRS feed

  • Question

  • Dear valued member's,

     

    This issue has been troubling me for quite some time now. I am trying to setup database mirroring using certificates between two computers in a non-trusted domain. Both the computers are connected via Wireless connection(through a wireless router) and I am able to ping and telnet on ports both ways from both the computers using their IP addresses as well as FQDN's. The details are mentioned below:

     

    Machine 1( Desktop with Windows XP Home edition connected to the internet via ethernet with SQL Server 2008 Enterprise edition with named instance name MSCertiDesk). I have configured this as the mirror.

    Machine 2( Laptop with Windows XP Professional edition connected to the internet via wireless router with SQL Server 2008 Enterprise edition with named instance name MSCertiLap). This is my principal.

     

    I have systematically executed all the commands as enlisted in Microsoft's Book's Online recommendation for setting up mirroring using certificates. I have also checked for the following:

     

    1) I am able to ping and telnet on endpoint ports in both directions using IP addresses(static IP) as well as FQDN's on both machines.

    2) The ports being used on both machines are not blocked/not behind any firewalls and are listening all the time(verified using netstat -ano)

    3) I executed the query select state_desc from sys.database_mirroring_endpoints on both principal and mirror servers and the value is 'STARTED'.

    4) I have restored one full and one transaction log backup of principal database onto mirror server with norecovery.

    5) I have even edited the host files on both the machines and appropriately done an IP address and hostname mappings of both principal and mirror machines.

    6) I have configured both principal and mirror correctly for inbound and outbound connections by i) creating master key ii) creating certificate iii) creating endpoint  and setting authorization as certificate iv) backing up the certificate and copying onto other machine. v) creating login vi) creating user vii) mapping the user with the certificate viii) grant connect on endpoint to login and so on and so forth.

    7) Both the services(MS SQL Server and SQL Server Agent) on both machines are running under the Local System Account.

    8) I issue the ALTER DATABASE SAMPLEDB SET PARTNER='TCP://Laptop.Local:5022' first on the mirror.

    9) Everything works perfectly fine until here no problems whatsoever. However when I issue the alter command on principal which is

    ALTER DATABASE SAMPLEDB SET PARTNER='TCP://Desktop.Local:9999',

    the query executes for a while and then fails with the error:

     

    Error 1418: The server network address 'TCP://Desktop.Local:9999' cannot be reached or does not exists.Check the network address name and that the endpoints are operational.

     

    I have reached the end of my wits trying to figure out what the problem is. I have even checked the SQL Server error logs and Windows event logs to see if it captures any error. Apparently there are no logs in SSMS. However on the application event log of Mirror machine. I see the following :

     

    Database Mirroring session is unable to listen on 9999 as it is being used by another process.

     

    Could this possibly be the reason why the principal is unable to get the FQDN of the mirror ? Secondly when I created a user on the principal and try to map it to the certificate sent from mirror, I got a warning saying the start date of the certificate is ahead of the actual date. I then queried select start_date from sys.certificates and viewed the start date for the certificate I created which was same as the end_date. Could this be something to look at as well ?

    There could be n number of reasons why the Principal is unable to resolve the mirror's FQDN. I have really given up.

     

    Any inputs in this regard will be highly appreciated. Thanks in advance.

     

    Satvik S


    satviks
    Sunday, November 21, 2010 2:47 PM

All replies

  • Try "netstat -aon" from command prompt and see if someone else has grabbed this port?

    check http://blogs.msdn.com/b/suhde/archive/2009/07/13/step-by-step-guide-to-configure-database-mirroring-between-sql-server-instances-in-a-workgroup.aspx

     


    Balmukund Lakhani | Please mark solved if I've answered your question, vote for it as helpful to help other user's find a solution quicker
    --------------------------------------------------------------------------------
    This posting is provided "AS IS" with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/blakhani
    Team Blog: http://blogs.msdn.com/sqlserverfaq
    Monday, November 22, 2010 12:53 AM
  • Could you please also post us the endpoint definition. I have on some occasions seens such problems if the endpoints are set to require encryption.
    Monday, November 22, 2010 7:24 AM
  • @Prince: Thanks for your response. I am unable to understand quite well as to what you mean by definiton of the endpoint. Do you want to see the create endpoint statement ? Yes I have used ENCRYPTION=AES ALGORITHM while creating the endpoint as in Books Online. @Balmukund: Thanks much for your feedback. I tried to work around a little bit using tcp ports. However the problem persists. Is there a way to block all processes utilizing port # 9999 and assign it only to listen to database mirroring sessions ? I am unable to resolve the issue. The error 1418 is persistant. Thanks, Satvik
    satviks
    Monday, November 22, 2010 9:17 AM
  • Hi Satvik!

    Try recreating the endpoints without encryption. This is not the solution in the end, but just to be on the safe side... The easiest way to do that is to right click the endpoint and click "Script Endpoint As" -> "Drop And Create To" -> New Query Editor Window. Then change to ENCRYPTION=DISABLED. And make sure you do it on both sides. (You should be able to do that on the fly, no impact on anything else.)

    The definition I wanted is exactly the script that is created with the way I mentioned above.

    Lucifer

    Monday, November 22, 2010 10:56 AM
  • Thanks Lucifer,

     

    I will recreate the endpoints with encryption set to disabled. Let's see if it works that way. I will post the results for sure.

     

     

    Satvik !!

     


    satviks
    Tuesday, November 23, 2010 2:26 AM
  • Hi satvik20,

    Any progress?


    Regards,
    Tom Li
    Tuesday, November 23, 2010 10:44 AM
  • Hi Tom,

     

    I tried setting up mirroring with encryption disabled. I still get the 1418 Error. I am not sure what is amiss ?

     

    Regards,

     

    Satvik


    satviks
    Wednesday, November 24, 2010 3:39 AM
  • Try one more thing please: Again, it's not a real solution but a test for a hunch: Try to issue a "GRANT CONNECT ON ENDPOINT::<Endpoint_Name> TO Public" on both servers. This will solve nearly all permission based issues which will get us a clou in which direction to investigate further. (If it works there is something wrong with the users, if it doesn't there is a problem on network, firewall or something else.)

    Wednesday, November 24, 2010 7:11 AM
  • Thanks Lucifer. The workaround that you suggested did not work. Now I am planning to install Windows Server 2003 on the principal and set up a domain. I'll then set up mirroring as sufficient trust will not give rise to the 1418 Error.

     

    Thanks guys for your help. I really appreciate it.

     

    Satvik


    satviks
    Thursday, November 25, 2010 3:41 AM
  • Hello Guys,

    In continuation to the thread above, I have now Windows Server 2003 r2 Enterprise edition running on the mirror. I have setup a domain and have added my principal to the domain. Now both the computers are in a trusted domain. I have edited the TCP/IP settings on the principal and added the static ip of the mirror in its alternate DNS address text box so that both computers can access each other. I am able to access both computers via ping and able to connect on ports on both the machines. Firewalls on both machines have been turned off temporarily. Here's the interesting thing :

     

    Even after setting up domain, adding logins on both servers and making the services(sql server and sql agent) run under the domain user account, when I configure security for mirroring, it fails with the Error 1418 : Alter failded for database a. The network server address 'TCP://xxx.xxx.x.x:5022' cannot be reached. Check that the endpoints are operational.

     

    The only option left in this case was to check the SQL Server Logs. On the mirror it says the following:

    The database mirroring session is listening on all connections.

    So no problem.

    On the principal two significant messages that are a cause of concern are :

    1) The TCP port 5022 is already in use.

    2) The Database Mirroring session is unable to listen on port 5022 as it is in use by another process.

    I atatched SQL Profiler to both the instances and tried to monitor the trace for Broker and Audit Login events for Database Mirroring. It didnt have anything significant to show except that the alter database statement doesnt have an end time as it fails on the principal.

    I also did a netstat -n, netstat -abn etc commands to verify what process is using port 5022 but didnt get any result.

    I am really confused that if mirror can reach the principal's FQDN, then why doesnt it work the other way round. I thought, setting up a domain and using a single account to set up mirroring using Windows authentication would have an upper hand as compared to certificates as we would be dealing with trusted domains.

    I have gone through some of the earlier threads and it says that sometimes SQL Server logs are deceiving and donot reflect the correct picture. However rest of the setting is absolutely fine. Backups have been restored with norecovery on the mirror and services have been started on both machines under the domain account. The login exists on both machines in master db. The state of the endpoints is started on both machines and grant permission has been given to login to connect on endpoints.

     

    I will really appreciate if you guys can share your opinions as to how to troubleshoot this issue.

     

    Thanks,

     

    Satvik

     


    satviks
    Monday, November 29, 2010 3:46 AM
  • Additionally, I also tried to stop the process on the mirror and principal using

    Kill 58 --> which is the process id shown in SQL Server Logs. But no luck by doing so.

    Inputs will be highly appreciated.

    --Satvik


    satviks
    Monday, November 29, 2010 3:54 AM
  • Hi Satvik!

    I assume that both your endpoints are started and you can telnet to port 5022? Taking your error message I would a) recheck telnet on that ports b) if telnet goes through I would stop both endpoints and retry telnet to see if you are really connecting to the endpoints or maybe something else... And try both, IP and FQDN. Also again issue a GRANT CONNECT to the endpoints for the authorization user. And script out both endpoints to compare if the settings are the same.

    Let me know how things are developing.

    Lucifer

    Monday, November 29, 2010 6:39 AM
  • Most common error for this issue is that transaction log is not restored in norecovery mode after full backup restore of mirror.
    I am available online at blakhani (at) koolmail (dot) in for next 4 hours. Feel free to ping me.
    Balmukund Lakhani | Please mark solved if I've answered your question, vote for it as helpful to help other user's find a solution quicker
    --------------------------------------------------------------------------------
    This posting is provided "AS IS" with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/blakhani
    Team Blog: http://blogs.msdn.com/sqlserverfaq
    Monday, November 29, 2010 7:02 AM
  • Hi Balmukund!

    Just curious: Wouldn't the "wrong" restore result in a different error message? (Most likely in a "database is in wrong state" or "database has not been rolled forward enough")

    Lucifer

    Monday, November 29, 2010 8:04 AM
  • Hi Balmukund!
    Just curious: Wouldn't the "wrong" restore result in a different error message? (Most likely in a "database is in wrong state" or "database has not been rolled forward enough")
    Lucifer

    Unfortunately, we show this message even if the restore is not proper.


    Balmukund Lakhani | Please mark solved if I've answered your question, vote for it as helpful to help other user's find a solution quicker
    --------------------------------------------------------------------------------
    This posting is provided "AS IS" with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/blakhani
    Team Blog: http://blogs.msdn.com/sqlserverfaq
    Monday, November 29, 2010 8:12 AM
  • Hi Balmukund,

     

    Thanks for your feedback. This is the reason I mentioned in my previous post that I have read somewhere that SQL Logs are deceiving. I am 100 % sure that I have restored both a full backup and a transaction log backup of the principal db onto mirror with NORECOVERY. Doesn't look like this might be the issue unless restores have to be made with some hints like with no INIT etc as pointed out by Lucifer. I have used both the wizard as well as scripts to do the restore. Please let me know if there is an order to do so. The scripts I used are as follows:

     

    Backup on principal :

    BACKUP DATABASE Emp_D TO DISK=N'\\SS_DESK\SQLBackups\Emp.bak'

    BACKUP LOG Emp_D TO DISK=N'\\SS_DESK\SQLBackups\Emp.log'

     

    Restore on the Mirror :

     

    RESTORE DATABASE Emp_D FROM DISK='C:\SQLBackups\Emp.bak' WITH NORECOVERY

    RESTORE LOG Emp_D FROM DISK='C:\SQLBackups\Emp.log' WITH NORECOVERY.

     

    All the 4 commands have been immediately executed. There are no users connecting to the Principal as all sessions are closed and the transaction log backup is the most recent after the full database backup. Am I missing some hints or something ?

     

    Your feedback will be appreciated..

     

    Satvik


    satviks
    Monday, November 29, 2010 8:44 AM
  • Hi Satvik,
    Where are you located? India? Ping me at my messenger [ blakhani (at) koolmail (dot) in ]and I can provide some more pointers.

    Balmukund Lakhani | Please mark solved if I've answered your question, vote for it as helpful to help other user's find a solution quicker
    --------------------------------------------------------------------------------
    This posting is provided "AS IS" with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/blakhani
    Team Blog: http://blogs.msdn.com/sqlserverfaq
    Monday, November 29, 2010 9:02 AM
  • Hi Satvik!

    Have you checked the telnet connections as I proposed earlier? I am curious to see where the problem starts.

    Lucifer

    Monday, November 29, 2010 9:04 AM
  • Hi Satvik!

    Have you checked the telnet connections as I proposed earlier? I am curious to see where the problem starts.

    Lucifer


    Yeah. Make sure you are able to telnet to the port.
    Balmukund Lakhani | Please mark solved if I've answered your question, vote for it as helpful to help other user's find a solution quicker
    --------------------------------------------------------------------------------
    This posting is provided "AS IS" with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/blakhani
    Team Blog: http://blogs.msdn.com/sqlserverfaq
    Monday, November 29, 2010 9:07 AM
  • Hi Lucifer/Balmukund,

     

    Yes I have telnet on the ports and I am able to connect to the ports. I have double checked this.

    @Lucifer: I didnt quite get when you asked me to stop the endpoint and then script it. I am able to telnet on both the ports that I am using for mirroring on principal and mirror machines. Even I am curious to know as to what more needs to be done and where I am lacking :).

     

    @Balmukund: I am in India but most of the sites have been firewalled at my workplace. All that I can access right now is this website. Please provide me with your hotmail/gmail id. We can chat if you do prefer to be online some 7pm onwards IST. I can take screenshots on my system and maybe send you across as zip files.

     

    Satvik


    satviks
    Monday, November 29, 2010 9:15 AM
  • Hi Lucifer/Balmukund,

     

    Yes I have telnet on the ports and I am able to connect to the ports. I have double checked this.

    @Lucifer: I didnt quite get when you asked me to stop the endpoint and then script it. I am able to telnet on both the ports that I am using for mirroring on principal and mirror machines. Even I am curious to know as to what more needs to be done and where I am lacking :).

     

    @Balmukund: I am in India but most of the sites have been firewalled at my workplace. All that I can access right now is this website. Please provide me with your hotmail/gmail id. We can chat if you do prefer to be online some 7pm onwards IST. I can take screenshots on my system and maybe send you across as zip files.

     

    Satvik


    satviks
    Monday, November 29, 2010 9:16 AM
  • My hotmail id is - blakhani (at) koomail (dot) in
    If you can let me know your contact info, I would love to help you offline


    Balmukund Lakhani | Please mark solved if I've answered your question, vote for it as helpful to help other user's find a solution quicker
    --------------------------------------------------------------------------------
    This posting is provided "AS IS" with no warranties, and confers no rights.
    --------------------------------------------------------------------------------
    My Blog: http://blogs.msdn.com/blakhani
    Team Blog: http://blogs.msdn.com/sqlserverfaq
    Monday, November 29, 2010 10:06 AM
  • Thanks Balmukund,

     

    I will send the correspondence details on your hotmail id. I will also post further developments on my progress in regards to the setup for benefit of all of the forum readers. I hope we will be in a position to close this thread asap.

    Thanks all.

     

    --Satvik


    satviks
    Monday, November 29, 2010 10:22 AM
  • Hi Satvik!

    When you issue an "ALTER ENDPOINT <endpoint name> STATE=STOPPED" on both servers and try to reconnect with telnet, does it still work? With this you basically tell the endpoint to stop listening, which should close the ports and get you an error. (And of course a STATE=STARTED will revert that after testing)

    The second thing is to script the endpoint definition (Right click the endpoint in Management Studio, Script Endpoint -> Create To -> New Query window) which will give you all details about it. If you do this on both sides and compare the scripts they should look very close to identical.

    Lucifer

    Monday, November 29, 2010 1:01 PM
  • Hi Lucifer, An interesting revelation has occurred. 1)Since the sql server logs and windows even log said " The Database mirroring endpoint transport protocol cannot listen on port 5022 as another process is already using it." What I did for troubleshooting the above point is , I run -netstat -a -n -o on both principal and mirror. To my surprise an entry with PID 5022 is never returned I then did tasklist /svc /FI "PID eq 5022" to return the exact name of the process running on 5022. As expected no result returned as PID doesnt reflect in the netstat command. From this I guess we can completely rule out the possibility of any other process being consuming port 5022. 2) I edited the host files on both the machines to do a mapping of IP address to FQDN's. To my surprise I can ping on both machines using FQDN's and IP's. But when I do a telnet from principal to mirror, I get this Coudlnot connect to "hostname" on 5022. I am able to telnet on FQDN from mirror to principal though. Here we stand now : Telnet [ip address] [MIRRORSERVERNAME] 5022 doesnot work. but Telnet [ip address] [PRINCIPALSERVERNAME] 5022 works well. another point : Telnet [ip address] [MIRRORSERVERNAME] 1433 works but when i configure security and put mirrors FQDN as 'TCP://MIRRORSERVERNAME:1433', it doesnt work. Conclusion: Telnet command from principal to mirror doesnt work on any port except for 1433. How can I go about troubleshooting this. Inputs plz experts. Thanks, Satvik
    satviks
    Monday, November 29, 2010 3:21 PM
  • Hi again!

    Well, mirroring via 1433 won't work... So let's continue working on the main problem. As I understand the principal is unable to connect to the mirror on port 5022.

    For one: I don't really understand what you are doing with the PID... This has nothing to do with a network port. Try running following command:

    netstat -a -n -o | find ":5022"

    Also try recreating the mirroring endpoint on the mirror server with a different port.

    Fact remains... as long as your telnet to the mirroring endpoint does not work there is no point in trying anything in SQL Server...

    Monday, November 29, 2010 3:39 PM
  • Hi Lucifer,

     

    Thanks for your feedback. How can I ensure that I am able to connect to a certain port from principal to mirror. I tried telnet on all possible ports like 7022,9999 etc and connection doesnt succeed. The firewall is turned off. What could possibly be the reason ? all the protocols such as tcp,named pipes etc on mirror are enabled.

     

    Is there a workaround ?

     

    s


    satviks
    Monday, November 29, 2010 4:06 PM
  • Hi Satvik!

    You can only connect to a port if it is listening... So unless you first created something to wait for a connection Telnet will never succeed. There is no workaround to that, this is very basic network understanding...

    To solve the problem at hand: Do it one at a time... First just try to recreate the endpoint on another port. Easiest way is to righ-click the endpoint in management studio, Script Endpoint -> Drop and Create To -> New Query Editor Window. Then change the port in the endpoint to something else (for example 4001) and execute the script. Try Telneting to that new port afterwards. If this doesn't work try telneting from the mirrorhost to itself. ("Telnet localhost 4001" on the mirror) This is just meant to check if the endpoint is reacting at all.

    Let me know what the results of all steps are.

    Lucifer

    • Proposed as answer by whatsUpYO Sunday, June 5, 2011 7:41 AM
    Tuesday, November 30, 2010 6:24 AM
  • ok After reading all the things what happen here ... I have a same problem setting up mirroring with certificates can anybody help me with this ... I did every thing like ping ... netstat but still getting error message . Satvik how did you solve that problem ?? let me know I need some help ..

    Thank you

     

    Sunday, June 5, 2011 7:45 AM
  • I had the same problem, after already mirrored many databases and nearly went crazy.

    My solution:
    the actual problem is the endpoint. So I decided to setup mirroring with the gui (wizard), and it worked!!
    Then I removed mirroring again and tried again by script - no chance - same error.

    with the profiler I find out, that the wizard is doing a grant connect to the endpoint:
       grant connect on endpoint::mirroring to[domain\user]
    With this information, I tried to set up mirroring with the sa instead of an account with windows authentication.
    Then even without this grant, the mirroring was successful.

    My Summary: 
    - the error occurs only if you did something wrong in the first initalization of the mirroring
      (or you removed and changed an older mirroring implementation)
    - the problem is the endpoint
    - setup the mirroring with the wizard (if it does not work)
    - change the user you're working with  (in my case I used the sa)
    - on an other day, I thougt changing the port number helped. going to 5023.

    I hope this will be a help for many other, since I couldn't find a other solution.
    yours Ben

     

     

     

     

     

     

     

    Friday, December 23, 2011 12:23 PM
  • 6) I have configured both principal and mirror correctly for inbound and outbound connections by i) creating master key ii) creating certificate iii) creating endpoint  and setting authorization as certificate iv) backing up the certificate and copying onto other machine. v) creating login vi) creating user vii) mapping the user with the certificate viii) grant connect on endpoint to login and so on and so forth.

     

     

    Try the following adjustment;  after you backup of the certificate drop it from the instance you originally created it on and restore if from the backup.  The oddity is a lack of a privet key to decrypt it...

    Please let me know if this fixes the issue.

     

    Thursday, January 5, 2012 4:41 PM