Active-Passive Messaging Clusters

Overview The High Availability (HA) module provides active-passive, hot-standby messaging clusters to provide fault tolerant message delivery. In an active-passive cluster only one broker, known as the primary, is active and serving clients at a time. The other brokers are standing by as backups. Changes on the primary are replicated to all the backups so they are always up-to-date or "hot". Backup brokers reject client connection attempts, to enforce the requirement that clients only connect to the primary. If the primary fails, one of the backups is promoted to take over as the new primary. Clients fail-over to the new primary automatically. If there are multiple backups, the other backups also fail-over to become backups of the new primary. This approach relies on an external cluster resource manager to detect failures, choose the new primary and handle network partitions. rgmanager is supported initially, but others may be supported in the future.

Avoiding message loss In order to avoid message loss, the primary broker delays acknowledgement of messages received from clients until the message has been replicated and acknowledged by all of the back-up brokers, or has been consumed from the primary queue. This ensures that all acknowledged messages are safe: they have either been consumed or backed up to all backup brokers. Messages that are consumed before they are replicated do not need to be replicated. This reduces the work load when replicating a queue with active consumers. Clients keep unacknowledged messages in a buffer You can control the maximum number of messages in the buffer by setting the client's capacity. For details of how to set the capacity in client code see "Using the Qpid Messaging API" in Programming in Apache Qpid. until they are acknowledged by the primary. If the primary fails, clients will fail-over to the new primary and re-send all their unacknowledged messages. Clients must use "at-least-once" reliability to enable re-send of unacknowledged messages. This is the default behaviour, no options need be set to enable it. For details of client addressing options see "Using the Qpid Messaging API" in Programming in Apache Qpid. If the primary crashes, all the acknowledged messages will be available on the backup that takes over as the new primary. The unacknowledged messages will be re-sent by the clients. Thus no messages are lost. Note that this means it is possible for messages to be duplicated. In the event of a failure it is possible for a message to received by the backup that becomes the new primary and re-sent by the client. The application must take steps to identify and eliminate duplicates. When a new primary is promoted after a fail-over it is initially in "recovering" mode. In this mode, it delays acknowledgement of messages on behalf of all the backups that were connected to the previous primary. This protects those messages against a failure of the new primary until the backups have a chance to connect and catch up. Not all messages need to be replicated to the back-up brokers. If a message is consumed and acknowledged by a regular client before it has been replicated to a backup, then it doesn't need to be replicated. HA Broker States Stand-alone Broker is not part of a HA cluster. Joining Newly started broker, not yet connected to any existing primary. Catch-up A backup broker that is connected to the primary and downloading existing state (queues, messages etc.) Ready A backup broker that is fully caught-up and ready to take over as primary. Recovering Newly-promoted primary, waiting for backups to connect and catch up. Clients can connect but they are stalled until the primary is active. Active The active primary broker with all backups connected and caught-up.

Limitations There are a some known limitations in the current implementation. These will be fixed in future versions. Transactional changes to queue state are not replicated atomically. If the primary crashes during a transaction, it is possible that the backup could contain only part of the changes introduced by a transaction. Configuration changes (creating or deleting queues, exchanges and bindings) are replicated asynchronously. Management tools used to make changes will consider the change complete when it is complete on the primary, it may not yet be replicated to all the backups. Federation links to the primary will fail over correctly. Federated links from the primary will be lost in fail over, they will not be re-connected to the new primary. It is possible to work around this by replacing the qpidd-primary start up script with a script that re-creates federation links when the primary is promoted.

Virtual IP Addresses Some resource managers (including rgmanager) support virtual IP addresses. A virtual IP address is an IP address that can be relocated to any of the nodes in a cluster. The resource manager associates this address with the primary node in the cluster, and relocates it to the new primary when there is a failure. This simplifies configuration as you can publish a single IP address rather than a list. A virtual IP address can be used by clients to connect to the primary. The following sections will explain how to configure virtual IP addresses for clients or brokers.

Configuring the Brokers The broker must load the ha module, it is loaded by default. The following broker options are available for the HA module. Broker management is required for HA to operate, it is enabled by default. The option mgmt-enable must not be set to "no" Incorrect security settings are a common cause of problems when getting started, see . Broker Options for High Availability Messaging Cluster Options for High Availability Messaging Cluster ha-cluster yes|no Set to "yes" to have the broker join a cluster. ha-queue-replication yes|no Enable replication of specific queues without joining a cluster, see . ha-brokers-url URL The URL The full format of the URL is given by this grammar: url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* addr = tcp_addr / rmda_addr / ssl_addr / ... tcp_addr = ["tcp:"] host [":" port] rdma_addr = "rdma:" host [":" port] ssl_addr = "ssl:" host [":" port]' used by cluster brokers to connect to each other. The URL should contain a comma separated list of the broker addresses, rather than a virtual IP address. ha-public-url URL This option is only needed for backwards compatibility if you have been using the amq.failover exchange. This exchange is now obsolete, it is recommended to use a virtual IP address instead. If set, this URL is advertised by the amq.failover exchange and overrides the broker option known-hosts-url ha-replicate VALUE Specifies whether queues and exchanges are replicated by default. VALUE is one of: none, configuration, all. For details see . ha-username USER ha-password PASS ha-mechanism MECHANISM Authentication settings used by HA brokers to connect to each other, see ha-backup-timeoutSECONDS Values specified as SECONDS can be a fraction of a second, e.g. "0.1" for a tenth of a second. They can also have an explicit unit, e.g. 10s (seconds), 10ms (milliseconds), 10us (microseconds), 10ns (nanoseconds) Maximum time that a recovering primary will wait for an expected backup to connect and become ready. link-maintenance-interval SECONDS HA uses federation links to connect from backup to primary. Backup brokers check the link to the primary on this interval and re-connect if need be. Default 2 seconds. Set lower for faster failover, e.g. 0.1 seconds. Setting too low will result in excessive link-checking on the backups. link-heartbeat-interval SECONDS HA uses federation links to connect from backup to primary. If no heart-beat is received for twice this interval the primary will consider that backup dead (e.g. if backup is hung or partitioned.) This interval is also used to time-out for broker status checks, it may take up to this interval for rgmanager to detect a hung or partitioned broker. Clients sending messages may be held up during this time. Default 120 seconds: you will probably want to set this to a lower value e.g. 10. If set too low rgmanager may consider a slow broker to have failed and kill it.

To configure a HA cluster you must set at least ha-cluster and ha-brokers-url.

The Cluster Resource Manager Broker fail-over is managed by a cluster resource manager. An integration with rgmanager is provided, but it is possible to integrate with other resource managers. The resource manager is responsible for starting the qpidd broker on each node in the cluster. The resource manager then promotes one of the brokers to be the primary. The other brokers connect to the primary as backups, using the URL provided in the ha-brokers-url configuration option. Once connected, the backup brokers synchronize their state with the primary. When a backup is synchronized, or "hot", it is ready to take over if the primary fails. Backup brokers continually receive updates from the primary in order to stay synchronized. If the primary fails, backup brokers go into fail-over mode. The resource manager must detect the failure and promote one of the backups to be the new primary. The other backups connect to the new primary and synchronize their state with it. The resource manager is also responsible for protecting the cluster from split-brain conditions resulting from a network partition. A network partition divide a cluster into two sub-groups which cannot see each other. Usually a quorum voting algorithm is used that disables nodes in the inquorate sub-group.

Configuring with <command>rgmanager</command> as resource manager This section assumes that you are already familiar with setting up and configuring clustered services using cman and rgmanager. It will show you how to configure an active-passive, hot-standby qpidd HA cluster with rgmanager. Once all components are installed it is important to take the following step: chkconfig rgmanager on chkconfig cman on chkconfig qpidd off The qpidd service must be off in chkconfig because rgmanager will start and stop qpidd. If the normal system init process also attempts to start and stop qpidd it can cause rgmanager to lose track of qpidd processes. The symptom when this happens is that clustat shows a qpidd service to be stopped when in fact there is a qpidd process running. The qpidd log will show errors like this: critical Unexpected error: Daemon startup failed: Cannot lock /var/lib/qpidd/lock: Resource temporarily unavailable You must provide a cluster.conf file to configure cman and rgmanager. Here is an example cluster.conf file for a cluster of 3 nodes named node1, node2 and node3. We will go through the configuration step-by-step. status_poll_interval is the interval in seconds that the resource manager checks the status of managed services. This affects how quickly the manager will detect failed services. -->