Skip to content

JDBC driver erroneously throwing XAER_RMERR out of xa_commit #2159

@tomjenkinson

Description

@tomjenkinson

Driver version

main branch and I understand older versions to be affected in the same way

SQL Server version

...

Client Operating System

Linux

JAVA/JVM version

11.0.19

Table schema

...

Problem description

This can be considered a follow on from #2130

RMERR is erroneously being returned from xa_commit when the network between client and server is disabled and reconnected between xa_prepare and xa_commit

In summary:

xa_prepare
network down
// wait a bit, because otherwise the driver may hang in the xa_commit (something like it has not realised the connection is down before trying to read from the socket)
network up
xa_commit // this returns XAER_RMERR rather than XAER_RMFAIL

According to the XA specification, returning XAER_RMERR out of xa_commit should be handled by expecting the branch to have been rolled back, however in this case the driver is unable to know this. Moreover the the Xid is available for recovery and committing later which confirms the branch was not rolled back before.

Expected behavior

Driver should return XAER_RMFAIL

Actual behavior

Driver returns XAER_RMERR

Error message/stack trace

Complete error message and stack trace.

Any other details that can be helpful

Locally, I managed to resolve the issue by adding to the enum (https://github.com/microsoft/mssql-jdbc/blob/main/src/main/java/com/microsoft/sqlserver/jdbc/SQLServerXAResource.java#L949) a constant to represent "Connection timed out" (which could speak to the need for the delay before trying the xa_commit, but like I said if reconnect things too quickly the driver seems to hang then in the xa_commit)

Here is a piece of code that can help to show the problem. It should be configured with system properties to connect to a URL, with a username and a password, and tries to protect a little against potentially destructive commits by checking if the property WARNcommitRecoveredXidsWithMatchingGtrid is set to try before commit branches that may have been created by other programs. At the point where Toggle the network access to the server is printed out, then the network should be disconnected and the pause around reconnecting be done before pressing enter to allow the program to continue. Also the RMFAIL case handling is useful because it allowed me to see when having the change I made (but still with some kind of pausing between allowing the test to continue) then the RMFAIL is returned correctly:

       SQLServerXADataSource ds = new SQLServerXADataSource();
        ds.setURL(System.getProperty("URL"));
        ds.setConnectRetryCount(0);

        XAConnection xaConnection = null;
        boolean gotRMFAIL = false;
        try {
            xaConnection = ds.getXAConnection(System.getProperty("username"), System.getProperty("password"));
            XAResource xar = xaConnection.getXAResource();

            // Make sure there are no Xids from a previous run that created "SimpleTestCase" gtrid
            Xid[] xids = xar.recover(XAResource.TMSTARTRSCAN);
            for (Xid xid : xids) {
                String gtrid = new String(xid.getGlobalTransactionId());
                if (gtrid.equals("SimpleTestCase")) {
                    if (Boolean.parseBoolean(System.getProperty("WARNcommitRecoveredXidsWithMatchingGtrid"))) {
                        xar.commit(xid, false);
                    }
                }
            }
            xar.recover(XAResource.TMENDRSCAN);

            Xid xid = new Xid() {
                @Override
                public int getFormatId() {
                    return 1;
                }

                @Override
                public byte[] getGlobalTransactionId() {
                    return "SimpleTestCase".getBytes();
                }

                @Override
                public byte[] getBranchQualifier() {
                    return new byte[0];
                }
            };
            xar.start(xid, XAResource.TMNOFLAGS);
            xar.end(xid, XAResource.TMSUCCESS);
            xar.prepare(xid);


            System.out.println("Toggle the network access to the server");
            System.in.read();

            try {
                xar.commit(xid, false);
            } catch (XAException xae) {
                if (xae.errorCode == XAException.XAER_RMERR) {
                    System.err.println("the error code should not be XAER_RMERR");
                    xae.printStackTrace();
                    failure = true;
                } else if (xae.errorCode == XAException.XAER_RMFAIL ){
                    gotRMFAIL = true;
                }
            }
        } finally {
            try {
                if (xaConnection != null) {
                    xaConnection.close();
                }
            } finally {
                try {
                    // Tidy up
                    xaConnection = ds.getXAConnection("crashrec", "crashrec");
                    XAResource xar = xaConnection.getXAResource();
                    Xid[] xids = xar.recover(XAResource.TMSTARTRSCAN);
                    boolean foundAndCommitedXid = false;
                    for (Xid xid : xids) {
                        String gtrid = new String(xid.getGlobalTransactionId());
                        if (gtrid.equals("SimpleTestCase")) {
                            xar.commit(xid, false);
                            foundAndCommitedXid = true;
                        }
                    }
                    if (failure) {
                        if (foundAndCommitedXid) {
                            System.err.println("Given MsSQL reported that it an RMERR, it is unexpected that it was eventually able to find and commit the Xid so that should be considered a failure too");
                            failure = true;
                        }
                    } else if (gotRMFAIL){
                        if (!foundAndCommitedXid) {
                            System.err.println("Recieving RMFAIL we would expect to be able to recover the branch");
                            failure = true;
                        }
                    }
                    xar.recover(XAResource.TMENDRSCAN);
                } finally {
                    if (xaConnection != null) {
                        xaConnection.close();
                    }
                }
            }
        }

        System.out.println("Did the test fail?: " + failure);
        System.exit(failure ? -1 : 0);

JDBC trace logs

...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions