diff options
| author | Alan Conway <aconway@apache.org> | 2014-08-28 21:47:44 +0000 |
|---|---|---|
| committer | Alan Conway <aconway@apache.org> | 2014-08-28 21:47:44 +0000 |
| commit | 9ae659723b21d9d1c547cf85bf9aba0019b081d5 (patch) | |
| tree | 85b5f3312fae5e481c921d92012536fce9fa5fff /qpid/cpp/include | |
| parent | 74fabf0e6bbbb07db6d9ac54af5738a60124b68d (diff) | |
| download | qpid-python-9ae659723b21d9d1c547cf85bf9aba0019b081d5.tar.gz | |
QPID-5975: HA extra/missing messages when running qpid-txtest2 in a loop with failover.
This is partly not-a-bug, there is a client error handling issue that has been
corrected.
qpid-txtest2 initializes a queue with messages at the start and drains the
queues at the end. These operations are *not transactional*. Therefore
duplicates are expected if there is a failover during initialization or
draining. When duplicates were observed, there was indeed a failover at one of
these times.
Making these operations transactional is not enough to pass, now we see the test
fail with "no messages to fetch". This is explained as follows:
If there is a failover during a transaction, TransactionAborted is raised. The
client assumes the transaction was rolled back and re-plays it. However, if the
failover occurs at a critical point *after* the client has sent commit
but *before* it has received a response, then the the client *does not know*
whether the transaction was committed or rolled-back on the new primary.
Re-playing in this case can duplicate the transaction. Each transaction moves
messages from one queue to another so as long as transactions are atomic the
total number of messages will not change. However, if transactions are
duplicated, a transactional session may try to move more messages than exist on
the queue, hence "no messages to fetch". For example if thread 1 moves N
messages from q1 to q2, and thread 2 tries to move N+M messages back, then
thread 2 will fail.
This problem has been corrected as follows: C++ and python clients now raise the
following exceptions:
- TransactionAborted: The transaction has definitely been rolled back due to a
connection failure before commit or a broker error (e.g. a store error) during commit.
It can safely be replayed.
- TransactionUnknown: The transaction outcome is unknown because the connection
failed at the critical time. There's no simple automatic way to know what
happened without examining the state of the broker queues.
Unfortunately With this fix qpid-txtest2 is no longer useful test for TX
failover because it regularly raises TransactionUnknown and there's not much we
can do with that.
A better test of TX atomicity with failover is to run a pair of
qpid-send/qpid-receive with fail-over and verify that the number of
enqueues/dequeues and message depth are a multiple of the transaction size. See
the JIRA for such a test. (Note these test also sometimes raise
TransactionUnknown but it doesn't matter since all we are checking is that
messages go on and off the queues in multiple of the TX size.) )
Note: the original bug also reported seeing missing messages from
qpid-txtest2. I don't have a good explanation for that but since the
qpid-send/receive test shows that transactions are atomic I am going to let that
go for now.
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1621211 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'qpid/cpp/include')
| -rw-r--r-- | qpid/cpp/include/qpid/messaging/Session.h | 3 | ||||
| -rw-r--r-- | qpid/cpp/include/qpid/messaging/exceptions.h | 16 |
2 files changed, 16 insertions, 3 deletions
diff --git a/qpid/cpp/include/qpid/messaging/Session.h b/qpid/cpp/include/qpid/messaging/Session.h index 94522e4c13..999af7c65b 100644 --- a/qpid/cpp/include/qpid/messaging/Session.h +++ b/qpid/cpp/include/qpid/messaging/Session.h @@ -65,7 +65,8 @@ class QPID_MESSAGING_CLASS_EXTERN Session : public qpid::messaging::Handle<Sessi /** * Commits the sessions transaction. * - * @exception TransactionAborted if the original session is lost + * @exception TransactionAborted if the transaction was rolled back due to an error. + * @exception TransactionUnknown if the connection was lost and the transaction outcome is unknown. * forcing an automatic rollback. */ QPID_MESSAGING_EXTERN void commit(); diff --git a/qpid/cpp/include/qpid/messaging/exceptions.h b/qpid/cpp/include/qpid/messaging/exceptions.h index d5527cdd63..391eb11db9 100644 --- a/qpid/cpp/include/qpid/messaging/exceptions.h +++ b/qpid/cpp/include/qpid/messaging/exceptions.h @@ -180,14 +180,16 @@ struct QPID_MESSAGING_CLASS_EXTERN SessionClosed : public SessionError QPID_MESSAGING_EXTERN SessionClosed(); }; +/** Base class for transactional errors */ struct QPID_MESSAGING_CLASS_EXTERN TransactionError : public SessionError { QPID_MESSAGING_EXTERN TransactionError(const std::string&); }; /** - * Thrown on Session::commit() if reconnection results in the - * transaction being automatically aborted. + * The transaction was automatically rolled back. This could be due to an error + * on the broker, such as a store failure, or a connection failure during the + * transaction */ struct QPID_MESSAGING_CLASS_EXTERN TransactionAborted : public TransactionError { @@ -195,6 +197,16 @@ struct QPID_MESSAGING_CLASS_EXTERN TransactionAborted : public TransactionError }; /** + * The outcome of the transaction on the broker, commit or roll-back, is not + * known. This occurs when the connection fails after we sent the commit but + * before we received a result. + */ +struct QPID_MESSAGING_CLASS_EXTERN TransactionUnknown : public TransactionError +{ + QPID_MESSAGING_EXTERN TransactionUnknown(const std::string&); +}; + +/** * Thrown to indicate that the application attempted to do something * for which it was not authorised by its peer. */ |
