Thursday, July 21, 2011 6:15 AMHello everyone,
The Application connects to a SQL server on the back-end. This SQL database is in a Clustered environment.
When the SQL database is patched, the database automatically fails over from Node A to Node B. Applications which connect to the cluster should not experience any outage.
But whenever the patching is done, the connection to the SQL Server is lost and we get the following exception in the logs:
2011-06-19 04:05:54,983 ERROR WebContainer : 50 dataaccess.ProductDAOImpl com.ibatis.common.jdbc.exception.NestedSQLException:
--- The error occurred while applying a parameter map.
--- Check the ProductFeature.getProductIDs-InlineParameterMap.
--- Check the statement (query failed).
--- Cause: java.lang.ClassCastException: java.net.SocketException incompatible with java.sql.SQLException
If the application is restarted, it starts working again.
Please help me out.
Friday, July 22, 2011 3:21 PMAnswerer
You are partly correct when you say "Applications which connect to the cluster should not experience any outage". What I mean by this is that when you do a rolling patch in SQL Server on a clustered environment you should patch the nodes that aren't actively running that resource group which would indeed not give you an outage. However directly before that comment you say "... the database automatically fails over ...". So this leads me to believe you patched the node the resources were currently running on which would in fact be disruptive.
When the resource group fails to another node, all connections are forcefully terminated, resource groups are migrated and resources inside it are onlined in correct order. When SQL Server starts back up it will go through redo/undo and then allow connections.
Again, failing to another node is disruptive. When patching you should patch the nodes that aren't actively running the instance and make sure to not allow the instnace to fail to those nodes while the patching is going on. When there is a downtime you should fail the resource group over and patch the final node or nodes (in the case of multi-instnace clusters).
The reason why the application works after the application is terminated and started back up is because it is making a connection to sql server after it has failed to the new node and is no longer attempting to hold on to an invalid connection from the old node. This is a design problem in the application not checking and handling connection errors.
Hope this helps,
Tuesday, July 26, 2011 9:16 AM
The right approach to apply patches on SQL Server in clustered environment is to patch the secondary node ( the node which is not owning the resources). Once the seconadary (passive) node gets patched, do the manual failover and make this secondary as active node. During this failover the connection will be lost & will reconnect as soon as secondary node becomes active. Now you can apply the patch on this node which has become passive node.
Regards, Vishal Srivastava