Active-Passive Messaging Clusters

Overview The High Availability (HA) module provides active-passive, hot-standby messaging clusters to provide fault tolerant message delivery. In an active-passive cluster only one broker, known as the primary, is active and serving clients at a time. The other brokers are standing by as backups. Changes on the primary are replicated to all the backups so they are always up-to-date or "hot". Backup brokers reject client connection attempts, to enforce the requirement that clients only connect to the primary. If the primary fails, one of the backups is promoted to take over as the new primary. Clients fail-over to the new primary automatically. If there are multiple backups, the other backups also fail-over to become backups of the new primary. This approach relies on an external cluster resource manager to detect failures, choose the new primary and handle network partitions. Rgmanager is supported initially, but others may be supported in the future.

Avoiding message loss In order to avoid message loss, the primary broker delays acknowledgment of messages received from clients until the message has been replicated and acknowledged by all of the back-up brokers, or has been consumed from the primary queue. This ensures that all acknowledged messages are safe: they have either been consumed or backed up to all backup brokers. Messages that are consumed before they are replicated do not need to be replicated. This reduces the work load when replicating a queue with active consumers. Clients keep unacknowledged messages in a buffer You can control the maximum number of messages in the buffer by setting the client's capacity. For details of how to set the capacity in client code see "Using the Qpid Messaging API" in Programming in Apache Qpid. until they are acknowledged by the primary. If the primary fails, clients will fail-over to the new primary and re-send all their unacknowledged messages. Clients must use "at-least-once" reliability to enable re-send of unacknowledged messages. This is the default behavior, no options need be set to enable it. For details of client addressing options see "Using the Qpid Messaging API" in Programming in Apache Qpid. If the primary crashes, all the acknowledged messages will be available on the backup that takes over as the new primary. The unacknowledged messages will be re-sent by the clients. Thus no messages are lost. Note that this means it is possible for messages to be duplicated. In the event of a failure it is possible for a message to received by the backup that becomes the new primary and re-sent by the client. The application must take steps to identify and eliminate duplicates. When a new primary is promoted after a fail-over it is initially in "recovering" mode. In this mode, it delays acknowledgment of messages on behalf of all the backups that were connected to the previous primary. This protects those messages against a failure of the new primary until the backups have a chance to connect and catch up. Not all messages need to be replicated to the back-up brokers. If a message is consumed and acknowledged by a regular client before it has been replicated to a backup, then it doesn't need to be replicated. HA Broker States Joining Initial state of a new broker that has not yet connected to the primary. Catch-up A backup broker that is connected to the primary and catching up on queues and messages. Ready A backup broker that is fully caught-up and ready to take over as primary. Recovering The newly-promoted primary, waiting for backups to connect and catch up. Active The active primary broker with all backups connected and caught-up.

Limitations There are a some known limitations in the current implementation. These will be fixed in future versions. Transactional changes to queue state are not replicated atomically. If the primary crashes during a transaction, it is possible that the backup could contain only part of the changes introduced by a transaction. Configuration changes (creating or deleting queues, exchanges and bindings) are replicated asynchronously. Management tools used to make changes will consider the change complete when it is complete on the primary, it may not yet be replicated to all the backups. Federation links to the primary will fail over correctly. Federated links from the primary will be lost in fail over, they will not be re-connected to the new primary. It is possible to work around this by replacing the qpidd-primary start up script with a script that re-creates federation links when the primary is promoted.

Virtual IP Addresses Some resource managers (including rgmanager) support virtual IP addresses. A virtual IP address is an IP address that can be relocated to any of the nodes in a cluster. The resource manager associates this address with the primary node in the cluster, and relocates it to the new primary when there is a failure. This simplifies configuration as you can publish a single IP address rather than a list. A virtual IP address can be used by clients to connect to the primary. The following sections will explain how to configure virtual IP addresses for clients or brokers.

Configuring the Brokers The broker must load the ha module, it is loaded by default. The following broker options are available for the HA module. Broker Options for High Availability Messaging Cluster Options for High Availability Messaging Cluster ha-cluster yes|no Set to "yes" to have the broker join a cluster. ha-queue-replication yes|no Enable replication of specific queues without joining a cluster, see . ha-brokers-url URL The URL The full format of the URL is given by this grammar: url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* addr = tcp_addr / rmda_addr / ssl_addr / ... tcp_addr = ["tcp:"] host [":" port] rdma_addr = "rdma:" host [":" port] ssl_addr = "ssl:" host [":" port]' used by cluster brokers to connect to each other. The URL should contain a comma separated list of the broker addresses, rather than a virtual IP address. ha-public-url URL The URL is advertised to clients as the "known-hosts" for fail-over. It can be a list or a single virtual IP address. A virtual IP address is recommended. Using this option you can put client and broker traffic on separate networks, which is recommended. Note: When HA clustering is enabled the broker option known-hosts-url is ignored and over-ridden by the ha-public-url setting. ha-replicate VALUE Specifies whether queues and exchanges are replicated by default. VALUE is one of: none, configuration, all. For details see . ha-username USER ha-password PASS ha-mechanism MECHANISM Authentication settings used by HA brokers to connect to each other. If you are using authorization () then this user must have all permissions. ha-backup-timeoutSECONDS Values specified as SECONDS can be a fraction of a second, e.g. "0.1" for a tenth of a second. They can also have an explicit unit, e.g. 10s, 10ms, 10us, 10ns Maximum time that a recovering primary will wait for an expected backup to connect and become ready. link-maintenance-interval SECONDS Interval for the broker to check link health and re-connect links if need be. If you want brokers to fail over quickly you can set this to a fraction of a second, for example: 0.1. link-heartbeat-interval SECONDS Heartbeat interval for replication links. The link will be assumed broken if there is no heartbeat for twice the interval.

To configure a HA cluster you must set at least ha-cluster and ha-brokers-url.

The Cluster Resource Manager Broker fail-over is managed by a cluster resource manager. An integration with rgmanager is provided, but it is possible to integrate with other resource managers. The resource manager is responsible for starting the qpidd broker on each node in the cluster. The resource manager then promotes one of the brokers to be the primary. The other brokers connect to the primary as backups, using the URL provided in the ha-brokers-url configuration option. Once connected, the backup brokers synchronize their state with the primary. When a backup is synchronized, or "hot", it is ready to take over if the primary fails. Backup brokers continually receive updates from the primary in order to stay synchronized. If the primary fails, backup brokers go into fail-over mode. The resource manager must detect the failure and promote one of the backups to be the new primary. The other backups connect to the new primary and synchronize their state with it. The resource manager is also responsible for protecting the cluster from split-brain conditions resulting from a network partition. A network partition divide a cluster into two sub-groups which cannot see each other. Usually a quorum voting algorithm is used that disables nodes in the inquorate sub-group.

Configuring <command>rgmanager</command> as resource manager This section assumes that you are already familiar with setting up and configuring clustered services using cman and rgmanager. It will show you how to configure an active-passive, hot-standby qpidd HA cluster with rgmanager. You must provide a cluster.conf file to configure cman and rgmanager. Here is an example cluster.conf file for a cluster of 3 nodes named node1, node2 and node3. We will go through the configuration step-by-step.