Driver version
main branch and I understand older versions to be affected in the same way
SQL Server version
...
Client Operating System
Linux
JAVA/JVM version
11.0.19
Table schema
...
Problem description
This can be considered a follow on from #2130
RMERR is erroneously being returned from xa_commit when the network between client and server is disabled and reconnected between xa_prepare and xa_commit
In summary:
xa_prepare
network down
// wait a bit, because otherwise the driver may hang in the xa_commit (something like it has not realised the connection is down before trying to read from the socket)
network up
xa_commit // this returns XAER_RMERR rather than XAER_RMFAIL
According to the XA specification, returning XAER_RMERR out of xa_commit should be handled by expecting the branch to have been rolled back, however in this case the driver is unable to know this. Moreover the the Xid is available for recovery and committing later which confirms the branch was not rolled back before.
Expected behavior
Driver should return XAER_RMFAIL
Actual behavior
Driver returns XAER_RMERR
Error message/stack trace
Complete error message and stack trace.
Any other details that can be helpful
Locally, I managed to resolve the issue by adding to the enum (https://github.com/microsoft/mssql-jdbc/blob/main/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerXAResource.java#L949) a constant to represent "Connection timed out" (which could speak to the need for the delay before trying the xa_commit, but like I said if reconnect things too quickly the driver seems to hang then in the xa_commit)
Here is a piece of code that can help to show the problem. It should be configured with system properties to connect to a URL, with a username and a password, and tries to protect a little against potentially destructive commits by checking if the property WARNcommitRecoveredXidsWithMatchingGtrid is set to try before commit branches that may have been created by other programs. At the point where Toggle the network access to the server is printed out, then the network should be disconnected and the pause around reconnecting be done before pressing enter to allow the program to continue. Also the RMFAIL case handling is useful because it allowed me to see when having the change I made (but still with some kind of pausing between allowing the test to continue) then the RMFAIL is returned correctly:
SQLServerXADataSource ds = new SQLServerXADataSource();
ds.setURL(System.getProperty("URL"));
ds.setConnectRetryCount(0);
XAConnection xaConnection = null;
boolean gotRMFAIL = false;
try {
xaConnection = ds.getXAConnection(System.getProperty("username"), System.getProperty("password"));
XAResource xar = xaConnection.getXAResource();
// Make sure there are no Xids from a previous run that created "SimpleTestCase" gtrid
Xid[] xids = xar.recover(XAResource.TMSTARTRSCAN);
for (Xid xid : xids) {
String gtrid = new String(xid.getGlobalTransactionId());
if (gtrid.equals("SimpleTestCase")) {
if (Boolean.parseBoolean(System.getProperty("WARNcommitRecoveredXidsWithMatchingGtrid"))) {
xar.commit(xid, false);
}
}
}
xar.recover(XAResource.TMENDRSCAN);
Xid xid = new Xid() {
@Override
public int getFormatId() {
return 1;
}
@Override
public byte[] getGlobalTransactionId() {
return "SimpleTestCase".getBytes();
}
@Override
public byte[] getBranchQualifier() {
return new byte[0];
}
};
xar.start(xid, XAResource.TMNOFLAGS);
xar.end(xid, XAResource.TMSUCCESS);
xar.prepare(xid);
System.out.println("Toggle the network access to the server");
System.in.read();
try {
xar.commit(xid, false);
} catch (XAException xae) {
if (xae.errorCode == XAException.XAER_RMERR) {
System.err.println("the error code should not be XAER_RMERR");
xae.printStackTrace();
failure = true;
} else if (xae.errorCode == XAException.XAER_RMFAIL ){
gotRMFAIL = true;
}
}
} finally {
try {
if (xaConnection != null) {
xaConnection.close();
}
} finally {
try {
// Tidy up
xaConnection = ds.getXAConnection("crashrec", "crashrec");
XAResource xar = xaConnection.getXAResource();
Xid[] xids = xar.recover(XAResource.TMSTARTRSCAN);
boolean foundAndCommitedXid = false;
for (Xid xid : xids) {
String gtrid = new String(xid.getGlobalTransactionId());
if (gtrid.equals("SimpleTestCase")) {
xar.commit(xid, false);
foundAndCommitedXid = true;
}
}
if (failure) {
if (foundAndCommitedXid) {
System.err.println("Given MsSQL reported that it an RMERR, it is unexpected that it was eventually able to find and commit the Xid so that should be considered a failure too");
failure = true;
}
} else if (gotRMFAIL){
if (!foundAndCommitedXid) {
System.err.println("Recieving RMFAIL we would expect to be able to recover the branch");
failure = true;
}
}
xar.recover(XAResource.TMENDRSCAN);
} finally {
if (xaConnection != null) {
xaConnection.close();
}
}
}
}
System.out.println("Did the test fail?: " + failure);
System.exit(failure ? -1 : 0);
JDBC trace logs
...
Driver version
mainbranch and I understand older versions to be affected in the same waySQL Server version
...
Client Operating System
Linux
JAVA/JVM version
11.0.19
Table schema
...
Problem description
This can be considered a follow on from #2130
RMERR is erroneously being returned from xa_commit when the network between client and server is disabled and reconnected between xa_prepare and xa_commit
In summary:
According to the XA specification, returning XAER_RMERR out of xa_commit should be handled by expecting the branch to have been rolled back, however in this case the driver is unable to know this. Moreover the the Xid is available for recovery and committing later which confirms the branch was not rolled back before.
Expected behavior
Driver should return XAER_RMFAIL
Actual behavior
Driver returns XAER_RMERR
Error message/stack trace
Complete error message and stack trace.
Any other details that can be helpful
Locally, I managed to resolve the issue by adding to the enum (https://github.com/microsoft/mssql-jdbc/blob/main/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerXAResource.java#L949) a constant to represent "Connection timed out" (which could speak to the need for the delay before trying the xa_commit, but like I said if reconnect things too quickly the driver seems to hang then in the xa_commit)
Here is a piece of code that can help to show the problem. It should be configured with system properties to connect to a
URL, with ausernameand apassword, and tries to protect a little against potentially destructive commits by checking if the propertyWARNcommitRecoveredXidsWithMatchingGtridis set to try before commit branches that may have been created by other programs. At the point whereToggle the network access to the serveris printed out, then the network should be disconnected and the pause around reconnecting be done before pressing enter to allow the program to continue. Also the RMFAIL case handling is useful because it allowed me to see when having the change I made (but still with some kind of pausing between allowing the test to continue) then the RMFAIL is returned correctly:JDBC trace logs
...