Active-passive Messaging Clusters
Overview
The High Availability (HA) module provides
active-passive, hot-standby
messaging clusters to provide fault tolerant message delivery.
In an active-passive cluster only one broker, known as the
primary, is active and serving clients at a time. The other
brokers are standing by as backups. Changes on the primary
are replicated to all the backups so they are always up-to-date or "hot". Backup
brokers reject client connection attempts, to enforce the requirement that clients
only connect to the primary.
If the primary fails, one of the backups is promoted to take over as the new
primary. Clients fail-over to the new primary automatically. If there are multiple
backups, the other backups also fail-over to become backups of the new primary.
This approach relies on an external cluster resource manager
to detect failures, choose the new primary and handle network partitions. Rgmanager is supported
initially, but others may be supported in the future.
Avoiding message loss
In order to avoid message loss, the primary broker delays
acknowledgment of messages received from clients until the message has
been replicated to and acknowledged by all of the back-up brokers. This means that
all acknowledged messages are safely stored on all the backup
brokers.
Clients keep unacknowledged messages in a buffer
You can control the maximum number of messages in the buffer by setting the
client's capacity. For details of how to set the capacity
in client code see "Using the Qpid Messaging API" in
Programming in Apache Qpid.
until they are acknowledged by the primary. If the primary fails, clients will
fail-over to the new primary and re-send all their
unacknowledged messages.
Clients must use "at-least-once" reliability to enable re-send of unacknowledged
messages. This is the default behavior, no options need be set to enable it. For
details of client addressing options see "Using the Qpid Messaging API"
in Programming in Apache Qpid.
So if the primary crashes, all the acknowledged
messages will be available on the backup that takes over as the new
primary. The unacknowledged messages will be
re-sent by the clients. Thus no messages are lost.
Note that this means it is possible for messages to be
duplicated. In the event of a failure it is possible for a
message to received by the backup that becomes the new primary
and re-sent by the client. The application must take steps
to identify and eliminate duplicates.
When a new primary is promoted after a fail-over it is initially in
"recovering" mode. In this mode, it delays acknowledgment of messages
on behalf of all the backups that were connected to the previous
primary. This protects those messages against a failure of the new
primary until the backups have a chance to connect and catch up.
Not all messages need to be replicated to the back-up brokers. If a
message is consumed and acknowledged by a regular client before it has
been replicated to a backup, then it doesn't need to be replicated.
Status of a HA broker
Joining
Initial status of a new broker that has not yet connected to the primary.
Catch-up
A backup broker that is connected to the primary and catching up
on queues and messages.
Ready
A backup broker that is fully caught-up and ready to take over as
primary.
Recovering
The newly-promoted primary, waiting for backups to connect and catch up.
Active
The active primary broker with all backups connected and caught-up.
Limitations
There are a some known limitations in the current implementation. These
will be fixed in furture versions.
Transactional changes to queue state are not replicated atomically. If
the primary crashes during a transaction, it is possible that the
backup could contain only part of the changes introduced by a
transaction.
Configuration changes (creating or deleting queues, exchanges and
bindings) are replicated asynchronously. Management tools used to
make changes will consider the change complete when it is complete
on the primary, it may not yet be replicated to all the backups.
Federated links from the primary will be lost
in fail over, they will not be re-connected to the new
primary. Federation links to the primary will
fail over.
Virtual IP Addresses
Some resource managers (including rgmanager) support
virtual IP addresses. A virtual IP address is an IP
address that can be relocated to any of the nodes in a cluster. The
resource manager associates this address with the primary node in the
cluster, and relocates it to the new primary when there is a failure. This
simplifies configuration as you can publish a single IP address rather
than a list.
A virtual IP address can be used by clients and backup brokers to connect
to the primary. The following sections will explain how to configure
virtual IP addresses for clients or brokers.
Configuring the Brokers
The broker must load the ha module, it is loaded by
default. The following broker options are available for the HA module.
Broker Options for High Availability Messaging Cluster
Options for High Availability Messaging Cluster
ha-cluster yes|no
Set to "yes" to have the broker join a cluster.
ha-queue-replication yes|no
Enable replication of specific queues without joining a cluster, see .
ha-brokers-url URL
The URL
The full format of the URL is given by this grammar:
url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
addr = tcp_addr / rmda_addr / ssl_addr / ...
tcp_addr = ["tcp:"] host [":" port]
rdma_addr = "rdma:" host [":" port]
ssl_addr = "ssl:" host [":" port]'
used by cluster brokers to connect to each other. The URL should
contain a comma separated list of the broker addresses, rather than a
virtual IP address.
ha-public-url URL
The URL is advertised to
clients as the "known-hosts" for fail-over. It can be a list or
a single virtual IP address. A virtual IP address is recommended.
Using this option you can put client and broker traffic on
separate networks, which is recommended.
Note: When HA clustering is enabled the broker option
known-hosts-url is ignored and over-ridden by
the ha-public-url setting.
ha-replicate VALUE
Specifies whether queues and exchanges are replicated by default.
VALUE is one of: none,
configuration, all.
For details see .
ha-username USER
ha-password PASS
ha-mechanism MECH
Authentication settings used by HA brokers to connect to each other.
If you are using authorization
()
then this user must have all permissions.
ha-backup-timeout SECONDS
Maximum time that a recovering primary will wait for an expected
backup to connect and become ready.
link-maintenance-interval SECONDS
Interval for the broker to check link health and re-connect links if need
be. If you want brokers to fail over quickly you can set this to a
fraction of a second, for example: 0.1.
link-heartbeat-interval SECONDS
Heartbeat interval for replication links. The link will be assumed broken
if there is no heartbeat for twice the interval.
To configure a HA cluster you must set at least ha-cluster and
ha-brokers-url.
The Cluster Resource Manager
Broker fail-over is managed by a cluster resource
manager. An integration with rgmanager is
provided, but it is possible to integrate with other resource managers.
The resource manager is responsible for starting the qpidd broker
on each node in the cluster. The resource manager then promotes
one of the brokers to be the primary. The other brokers connect to the primary as
backups, using the URL provided in the ha-brokers-url configuration
option.
Once connected, the backup brokers synchronize their state with the
primary. When a backup is synchronized, or "hot", it is ready to take
over if the primary fails. Backup brokers continually receive updates
from the primary in order to stay synchronized.
If the primary fails, backup brokers go into fail-over mode. The resource
manager must detect the failure and promote one of the backups to be the
new primary. The other backups connect to the new primary and synchronize
their state with it.
The resource manager is also responsible for protecting the cluster from
split-brain conditions resulting from a network partition. A
network partition divide a cluster into two sub-groups which cannot see each other.
Usually a quorum voting algorithm is used that disables nodes
in the inquorate sub-group.
Configuring rgmanager as resource manager
This section assumes that you are already familiar with setting up and configuring
clustered services using cman and
rgmanager. It will show you how to configure an active-passive,
hot-standby qpidd HA cluster with rgmanager.
You must provide a cluster.conf file to configure
cman and rgmanager. Here is
an example cluster.conf file for a cluster of 3 nodes named
node1, node2 and node3. We will go through the configuration step-by-step.
]]>
There is a failoverdomain for each node containing just that
one node. This lets us stipulate that the qpidd service should always run on all
nodes.
The resources section defines the qpidd
script used to start the qpidd service. It also defines the
qpid-primary script which does not
actually start a new service, rather it promotes the existing
qpidd broker to primary status.
The resources section also defines a pair of virtual IP
addresses on different sub-nets. One will be used for broker-to-broker
communication, the other for client-to-broker.
To take advantage of the virtual IP addresses, qpidd.conf
should contain these lines:
ha-cluster=yes
ha-brokers-url=20.0.20.200
ha-public-url=20.0.10.200
This configuration specifies that backup brokers will use 20.0.20.200
to connect to the primary and will advertise 20.0.10.200 to clients.
Clients should connect to 20.0.10.200.
The service section defines 3 qpidd
services, one for each node. Each service is in a restricted fail-over
domain containing just that node, and has the restart
recovery policy. The effect of this is that rgmanager will run
qpidd on each node, restarting if it fails.
There is a single qpidd-primary-service using the
qpidd-primary script which is not restricted to a
domain and has the relocate recovery policy. This means
rgmanager will start qpidd-primary on one of the nodes
when the cluster starts and will relocate it to another node if the
original node fails. Running the qpidd-primary script
does not start a new broker process, it promotes the existing broker to
become the primary.
Broker Administration Tools
Normally, clients are not allowed to connect to a backup broker. However
management tools are allowed to connect to a backup brokers. If you use
these tools you must not add or remove messages from
replicated queues, nor create or delete replicated queues or exchanges as
this will disrupt the replication process and may cause message loss.
qpid-ha allows you to view and change HA configuration settings.
The tools qpid-config, qpid-route and
qpid-stat will connect to a backup if you pass the flag ha-admin on the
command line.
Controlling replication of queues and exchanges
By default, queues and exchanges are not replicated automatically. You can change
the default behavior by setting the ha-replicate configuration
option. It has one of the following values:
all: Replicate everything automatically: queues,
exchanges, bindings and messages.
configuration: Replicate the existence of queues,
exchange and bindings but don't replicate messages.
none: Don't replicate anything, this is the default.
You can over-ride the default for a particular queue or exchange by passing the
argument qpid.replicate when creating the queue or exchange. It
takes the same values as ha-replicate
Bindings are automatically replicated if the queue and exchange being bound both
have replication all or configuration, they
are not replicated otherwise.
You can create replicated queues and exchanges with the
qpid-config management tool like this:
qpid-config add queue myqueue --replicate all
To create replicated queues and exchanges via the client API, add a
node entry to the address like this:
"myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}"
There are some built-in exchanges created automatically by the broker, these
exchangs are never replicated. The built-in exchanges are the default (nameless)
exchange, the AMQP standard exchanges (amq.direct, amq.topic, amq.fanout and
amq.match) and the management exchanges (qpid.management, qmf.default.direct and
qmf.default.topic)
Note that if you bind a replicated queue to one of these exchanges, the
binding wil not be replicated, so the queue will not
have the binding after a fail-over.
Client Connection and Fail-over
Clients can only connect to the primary broker. Backup brokers
automatically reject any connection attempt by a client.
Clients are configured with the URL for the cluster (details below for
each type of client). There are two possibilities
The URL contains multiple addresses, one for each broker in the cluster.
The URL contains a single virtual IP address
that is assigned to the primary broker by the resource manager.
Only if the resource manager supports virtual IP
addresses
In the first case, clients will repeatedly re-try each address in the URL
until they successfully connect to the primary. In the second case the
resource manager will assign the virtual IP address to the primary broker,
so clients only need to re-try on a single address.
When the primary broker fails, clients re-try all known cluster addresses
until they connect to the new primary. The client re-sends any messages
that were previously sent but not acknowledged by the broker at the time
of the failure. Similarly messages that have been sent by the broker, but
not acknowledged by the client, are re-queued.
TCP can be slow to detect connection failures. A client can configure a
connection to use a heartbeat to detect connection
failure, and can specify a time interval for the heartbeat. If heartbeats
are in use, failures will be detected no later than twice the heartbeat
interval. The following sections explain how to enable heartbeat in each
client.
See "Cluster Failover" in Programming in Apache
Qpid for details on how to keep the client aware of cluster
membership.
Suppose your cluster has 3 nodes: node1,
node2 and node3 all using the
default AMQP port, and you are not using a virtual IP address. To connect
a client you need to specify the address(es) and set the
reconnect property to true. The
following sub-sections show how to connect each type of client.
C++ clients
With the C++ client, you specify multiple cluster addresses in a single URL
The full grammar for the URL is:
url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)*
addr = tcp_addr / rmda_addr / ssl_addr / ...
tcp_addr = ["tcp:"] host [":" port]
rdma_addr = "rdma:" host [":" port]
ssl_addr = "ssl:" host [":" port]'
You also need to specify the connection option
reconnect to be true. For example:
qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}");
Heartbeats are disabled by default. You can enable them by specifying a
heartbeat interval (in seconds) for the connection via the
heartbeat option. For example:
qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}");
Python clients
With the python client, you specify reconnect=True
and a list of host:port addresses as
reconnect_urls when calling
Connection.establish or
Connection.open
connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"])
Heartbeats are disabled by default. You can
enable them by specifying a heartbeat interval (in seconds) for the
connection via the 'heartbeat' option. For example:
connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10)
Java JMS Clients
In Java JMS clients, client fail-over is handled automatically if it is
enabled in the connection. You can configure a connection to use
fail-over using the failover property:
connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&failover='failover_exchange'
This property can take three values:
Fail-over Modes
failover_exchange
If the connection fails, fail over to any other broker in the cluster.
roundrobin
If the connection fails, fail over to one of the brokers specified in the brokerlist.
singlebroker
Fail-over is not supported; the connection is to a single broker only.
In a Connection URL, heartbeat is set using the idle_timeout property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds:
connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672',idle_timeout=3
Security.
You can secure your cluster using the authentication and authorization features
described in .
Backup brokers connect to the primary broker and subscribe for management
events and queue contents. You can specify the identity used to connect
to the primary with the following options:
Security options for High Availability Messaging Cluster
Security options for High Availability Messaging Cluster
ha-username USER
ha-password PASS
ha-mechanism MECH
Authentication settings used by HA brokers to connect to each other.
If you are using authorization
()
then this user must have all permissions.
This identity is also used to authorize actions taken on the backup broker to replicate
from the primary, for example to create queues or exchanges.
Integrating with other Cluster Resource Managers
To integrate with a different resource manager you must configure it to:
Start a qpidd process on each node of the cluster.
Restart qpidd if it crashes.
Promote exactly one of the brokers to primary.
Detect a failure and promote a new primary.
The qpid-ha command allows you to check if a broker is primary,
and to promote a backup to primary.
To test if a broker is the primary:
qpid-ha -b broker-address status --expect=primary
This command will return 0 if the broker at broker-address
is the primary, non-0 otherwise.
To promote a broker to primary:
qpid-ha -b broker-address promote
qpid-ha --help gives information on other commands and options available.
You can also use qpid-ha to manually examine and promote brokers. This
can be useful for testing failover scenarios without having to set up a full resource manager,
or to simulate a cluster on a single node. For deployment, a resource manager is required.
Replicating specific queues
In addition to the automatic replication performed in a cluster, you can
set up replication for specific queues between arbitrary brokers, even if
the brokers are not members of a cluster. The command:
qpid-ha replicate QUEUE REMOTE-BROKER
sets up replication of QUEUE on REMOTE-BROKER to QUEUE on the current broker.
Set the configuration option
ha-queue-replication=yes on both brokers to enable this
feature on non-cluster brokers. It is automatically enabled for brokers
that are part of a cluster.
Note that this feature does not provide automatic fail-over, for that you
need to run a cluster.