locked
ARR Web farm Back-end Health check failing with Schannel error RRS feed

  • Question

  • User1125611757 posted

    I am using ARRv3 installed on IIS 10 to load balance a  number of websites. Most of these sites are in an IIS 10 web farm with three nodes, there is also a PowerBI farm and another IIS 10 farm that currently only has one node but will be expanded later. Each host name is configured as a separate web far with a separate health check. The three node farm is configured using shared configuration.

    Each of the farms are in their own OU, receiving the same group policy as the other farm members, and all three farms have the fips compliant algorithm policy applied and all are updated to KB4462928

    End to end TLS is required, SSL offload is not an option.

    the two IIS farms will ultimately hold several web sites - one of them will eventually have 50 or more.

    I am having problems with the health checks coming out of ARR, it appears that ARR can only handle SNI being enabled on two web sites on two servers. When I enable SNI on more than that, any additional nodes become unhealthy, when verifying the health test, the error is:

    Result: Fail, Details: Cannot connect to the server. The HTTP status code is 0 and the error code is 80072f8f. 

    The health check also creates 4 entries in the system even log for SCHANNEL error 36871, "A fatal error occurred while creating a TLS client credential. The internal error state is 10013."

    Giving network service read permissions to C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys does not resolve the problem.

    If I turn SNI off on one of the sites/servers where the health check was failing, leaving 2 of the three nodes using SNI and one node without, the healthcheck becomes healthy. I can also make any node that reported unhealthy with SNI, healthy again just by toggling SNI on and off around on each node, but more than two at a time will never be detected as healthy by ARR. Any of the nodes that are reported as unhealthy by ARR at any time, also test healthy by going straight to that particular web server in a web server.

    Are there limitations on using SNI back end services in ARR? It seems odd that only two will work at a time. We are using ARR presently as a workaround to avoid working on a badly misconfigured corporate netscaler that has been worked on by multiple generations of IT staff, until that appliance gets replaced in a year. I could work around the problem until then I think with either a mix of using the same cert for all sites an disabling SNI, or using non-standard ports, but I really dont like either of these.

    Is it possible there are updates needed for ARR that I can download manually? My department is only delegated permissions to some of our OUs and some of our servers and I dont control updates other than manually downloading patches to install. Windows update could be configured to not check for ARR updates.

    Wednesday, November 14, 2018 5:26 PM

All replies

  • User-2064283741 posted

    Like many do I offload my SSL and do not have a environment to hand to test your scenario but I'll try and set something up.

    As a workaround (for now) can you not do the health check in http and have all other communication HTTPS to HTTPS?  OR do you have to reject all http comms on the backend?

    I would look at what those SCHANNEL errors are. I would crack out wireshark and see what is happening.

    Wednesday, November 14, 2018 7:25 PM
  • User-2064283741 posted

    DO you have a generic SSL cert for non SNI traffic?

    e.g. Do you only have SNI certs and all traffic has to go SNI and traffic to that backend IP over HTTPs with SNI will fail. Or do you have an site there with a non SNI SSL binding it can resolve.

    Wednesday, November 14, 2018 7:46 PM
  • User1125611757 posted

    That's a good idea for the healthcheck workaround - I can control access to it with windows firewall and I dont have any port 80 bindings on ARR so no client can connect. It isn't ideal, but it is still a step forward in the environment I am working and think we could do just the checks temporarily.

    I just found a portable packet sniffer - dont want to install anything on these servers, going to give that a go now and see what I can find

    Wednesday, November 14, 2018 8:04 PM
  • User1125611757 posted

    Unfortunately I am going to have to drop this for now. I intended on using the port 80 health check in interim just to get the farm ready for the developers with the intent of working through the SNI issue out of band, but the projects I am aren't going to allow me that at least for the moment. I am going to try and return to it when a few things get done and will report back. Your suggestion for the port 80 healthcheck was a perfect bandaid however and got out of the way of the developers, so I thank you for that!

    Friday, November 16, 2018 4:37 PM
  • User-72702933 posted

    Hi SomeoneElse42,

    Below are the possible causes for the issue.Server name indication(SNI)

    This is an extension to the TLS computer networking protocol by which client indicates which hostname it is attempting to connect to at the start of the handshaking process.

    HTTP response 0 is not standard HTTP response. But it indicates that client could not connect with server and hence time out happened.

    error code 80072f8f  is windows update error

    Server IIS SChannel event 36871 "A fatal error occurred while creating an TLS client credential." The internal error state is 10013:

    Using IIScrypto I discovered that SSL 3.0 was enabled on my machine. I disabled it system-wide and the error went away AND onedrive began to work.

    Best Regards,

    Brando

    Wednesday, November 21, 2018 2:56 AM
  • User170552381 posted

    Here is a thread about a very similar error:

    https://forums.iis.net/t/1239423.aspx?TLS+1+2+issues+on+ARR

    Tuesday, February 18, 2020 7:26 PM