Answered by:
Windows Authentication fails during rolling reboot of all domain controllers

Question
-
Last week our domain administrators decided to reboot all the domain controllers in the domain containing our production SharePoint SQL Server 2005 cluster (running on a Windows Server 2003 cluster). They only rebooted one at a time. This caused a flood of errors in SharePoint stating that it could not connect to SQL Server. It also filled the SQL error log with entries like the ones below.
I thought that this was obvious enough that the problem was the reboot of the domain controllers. But, now I am being asked why SQL Server caused the SharePoint outage. The argument is that "The domain controllers were only rebooted one at a time" and "Other applications like Exchange didn't drop connections".
I have verified that SharePoint is connecting to the SQL Server cluster using Kerberos. My questions are:
1. Is it to be expected that a reboot of a domain controller will cause these errors in SQL Server?
2. Can this behavior be changed?
Any help you can provide would be great.
Thanks
John
06/12/2009 15:22:50,Logon,Unknown,SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxx.xxx.xxx.xxx]
06/12/2009 15:22:50,Logon,Unknown,Error: 17806<c/> Severity: 20<c/> State: 2.
06/12/2009 15:22:50,Logon,Unknown,Login failed for user ''. The user is not associated with a trusted SQL Server connection. [CLIENT: xxx.xxx.xxx.xxx]
06/12/2009 15:22:50,Logon,Unknown,Error: 18452<c/> Severity: 14<c/> State: 1.
06/12/2009 15:22:50,Logon,Unknown,SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxx.xxx.xxx.xxx]
06/12/2009 15:22:50,Logon,Unknown,Error: 17806<c/> Severity: 20<c/> State: 2.
06/12/2009 15:22:49,Logon,Unknown,Login failed for user ''. The user is not associated with a trusted SQL Server connection. [CLIENT: xxx.xxx.xxx.xxx]
06/12/2009 15:22:49,Logon,Unknown,Error: 18452<c/> Severity: 14<c/> State: 1.
06/12/2009 15:22:49,Logon,Unknown,SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed. [CLIENT: xxx.xxx.xxx.xxx]Tuesday, June 16, 2009 8:12 PM
Answers
-
All you said is correct. The fact is that: SQL is asking SSPI layer to authenticate a user (once a request arrives) and SSPI layer failed to contact a DC. I'm not sure how SSPI layer falls back from one DC to another, it's probably not working as you expected. Even it can fall back gracefully, are you sure you always have at least one DC up and running while you reboot other DCs?
I would say you should expect those error when you reboot DC.
Thanks,
This posting is provided "AS IS" with no warranties, and confers no rights.- Marked as answer by John C GordonMicrosoft employee Thursday, June 18, 2009 9:35 PM
Wednesday, June 17, 2009 1:00 AM
All replies
-
Did you restart your sharepoint server and the machine after DC reboot?
Error 0x80090311 means: No authority could be contacted for authentication.
It means the SQL Server could not connect to the DC, maybe the address of the DC changed?
Thanks,
This posting is provided "AS IS" with no warranties, and confers no rights.Tuesday, June 16, 2009 9:12 PM -
Nothing was bounced and SharePoint connected and errors stopped after about a minute. I guess my question is does rebooting a domain controller even if there are other domain controllers in the domain interupt SQL Servers ability to authenticate windows accounts?
As I understand it SQL Server relies on the Windows SSPI APU to perform the authentication. It used the negotiate SSP, which tries Kerberos first and then falls to NTLM if Kerberos does not work. If everything is chugging along and negotiate is able to authenticate via Kerberos and a domain controller is rebooted. Is it expected behavior for connections to fail/drop?
Thanks
JohnWednesday, June 17, 2009 12:40 AM -
All you said is correct. The fact is that: SQL is asking SSPI layer to authenticate a user (once a request arrives) and SSPI layer failed to contact a DC. I'm not sure how SSPI layer falls back from one DC to another, it's probably not working as you expected. Even it can fall back gracefully, are you sure you always have at least one DC up and running while you reboot other DCs?
I would say you should expect those error when you reboot DC.
Thanks,
This posting is provided "AS IS" with no warranties, and confers no rights.- Marked as answer by John C GordonMicrosoft employee Thursday, June 18, 2009 9:35 PM
Wednesday, June 17, 2009 1:00 AM -
I have had the same issue today on a production site - we have 3 DCs, and are using Domain authentication from our IIS7 app pools to a SQL 2008 server.
Foolishly I thought that I could reboot one DC without it being an issue, but immediately all of the connections failed and the logs were filled with:
SSPI handshake failed with error code 0x8009030c while establishing a connection with integrated security; the connection has been closed. [CLIENT: 192.168.55.25]
and
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication. [CLIENT: 192.168.55.25]
In this instance, how can I ever reboot a DC without causing these problems? What is the point in having 3 if the authentication doesn't failover gracefully to a different DC?
Wednesday, May 12, 2010 5:12 PM -
Hello there,
we just had exactly the same problem with 2 DCs, 1 of them was rebooted, we have the same SSPI handshake failed error.
if somebody can have a look at it: appreciated
Wednesday, May 19, 2010 12:47 PM -
I am having this issue as well and was searching this forum for a solution. Did anyone ever determine a reason or possible fix for this issue?
Why is there a delay or failure in authentication when 1 DC is rebooted, but others are still online?
Thanks,
Tuesday, February 1, 2011 4:41 PM -
I an encountering hte same issue. still working on the workaround/Root cause.
I am thinking to point member servers to another DC which will not be rebooted using below command. but not verify yet.
nltest /SC_RESET:<DomainName>[\<DcName>] - Reset secure channel for <Domain> on <ServerName> to <DcName>
edenTuesday, March 15, 2011 7:27 AM