diff options
| author | Justin Ross <jross@apache.org> | 2016-04-19 23:11:13 +0000 |
|---|---|---|
| committer | Justin Ross <jross@apache.org> | 2016-04-19 23:11:13 +0000 |
| commit | da7718ef463775acc7d6fbecf2d64c1bbfc39fd8 (patch) | |
| tree | 6da761b56ed0433b68f755927a180d615f7fb5b3 /qpid/cpp/docs/book/src/cpp-broker/Active-Passive-Cluster.xml | |
| parent | eb1e7851a50c6a7901c73eb42d639516c0e3ba43 (diff) | |
| download | qpid-python-da7718ef463775acc7d6fbecf2d64c1bbfc39fd8.tar.gz | |
QPID-7207: Remove files and components that are obsolete or no longer in use; move doc and packaging pieces to the cpp subtree
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1740032 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'qpid/cpp/docs/book/src/cpp-broker/Active-Passive-Cluster.xml')
| -rw-r--r-- | qpid/cpp/docs/book/src/cpp-broker/Active-Passive-Cluster.xml | 1229 |
1 files changed, 1229 insertions, 0 deletions
diff --git a/qpid/cpp/docs/book/src/cpp-broker/Active-Passive-Cluster.xml b/qpid/cpp/docs/book/src/cpp-broker/Active-Passive-Cluster.xml new file mode 100644 index 0000000000..461b75d320 --- /dev/null +++ b/qpid/cpp/docs/book/src/cpp-broker/Active-Passive-Cluster.xml @@ -0,0 +1,1229 @@ +<?xml version="1.0" encoding="utf-8"?> +<!-- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +h"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + +--> + +<section id="chapter-ha"> + <title>Active-Passive Messaging Clusters</title> + + <section id="ha-overview"> + <title>Overview</title> + <para> + + The High Availability (HA) module provides + <firstterm>active-passive</firstterm>, <firstterm>hot-standby</firstterm> + messaging clusters to provide fault tolerant message delivery. + </para> + <para> + In an active-passive cluster only one broker, known as the + <firstterm>primary</firstterm>, is active and serving clients at a time. The other + brokers are standing by as <firstterm>backups</firstterm>. Changes on the primary + are replicated to all the backups so they are always up-to-date or "hot". Backup + brokers reject client connection attempts, to enforce the requirement that clients + only connect to the primary. + </para> + <para> + If the primary fails, one of the backups is promoted to take over as the new + primary. Clients fail-over to the new primary automatically. If there are multiple + backups, the other backups also fail-over to become backups of the new primary. + </para> + <para> + This approach relies on an external <firstterm>cluster resource manager</firstterm> + to detect failures, choose the new primary and handle network partitions. <ulink + url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is supported + initially, but others may be supported in the future. + </para> + <section id="ha-at-least-once"> + <title>Avoiding message loss</title> + <para> + In order to avoid message loss, the primary broker <emphasis>delays + acknowledgement</emphasis> of messages received from clients until the + message has been replicated and acknowledged by all of the back-up + brokers, or has been consumed from the primary queue. + </para> + <para> + This ensures that all acknowledged messages are safe: they have either + been consumed or backed up to all backup brokers. Messages that are + consumed <emphasis>before</emphasis> they are replicated do not need to + be replicated. This reduces the work load when replicating a queue with + active consumers. + </para> + <para> + Clients keep <emphasis>unacknowledged</emphasis> messages in a buffer + <footnote> + <para> + You can control the maximum number of messages in the buffer by setting the + client's <literal>capacity</literal>. For details of how to set the capacity + in client code see "Using the Qpid Messaging API" in + <citetitle>Programming in Apache Qpid</citetitle>. + </para> + </footnote> + until they are acknowledged by the primary. If the primary fails, clients will + fail-over to the new primary and <emphasis>re-send</emphasis> all their + unacknowledged messages. + <footnote> + <para> + Clients must use "at-least-once" reliability to enable re-send of unacknowledged + messages. This is the default behaviour, no options need be set to enable it. For + details of client addressing options see "Using the Qpid Messaging API" + in <citetitle>Programming in Apache Qpid</citetitle>. + </para> + </footnote> + </para> + <para> + If the primary crashes, all the <emphasis>acknowledged</emphasis> + messages will be available on the backup that takes over as the new + primary. The <emphasis>unacknowledged</emphasis> messages will be + re-sent by the clients. Thus no messages are lost. + </para> + <para> + Note that this means it is possible for messages to be + <emphasis>duplicated</emphasis>. In the event of a failure it is possible for a + message to received by the backup that becomes the new primary + <emphasis>and</emphasis> re-sent by the client. The application must take steps + to identify and eliminate duplicates. + </para> + <para> + When a new primary is promoted after a fail-over it is initially in + "recovering" mode. In this mode, it delays acknowledgement of messages + on behalf of all the backups that were connected to the previous + primary. This protects those messages against a failure of the new + primary until the backups have a chance to connect and catch up. + </para> + <para> + Not all messages need to be replicated to the back-up brokers. If a + message is consumed and acknowledged by a regular client before it has + been replicated to a backup, then it doesn't need to be replicated. + </para> + <variablelist id="ha-broker-states"> + <title>HA Broker States</title> + <varlistentry> + <term>Stand-alone</term> + <listitem> + <para> + Broker is not part of a HA cluster. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Joining</term> + <listitem> + <para> + Newly started broker, not yet connected to any existing primary. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Catch-up</term> + <listitem> + <para> + A backup broker that is connected to the primary and downloading + existing state (queues, messages etc.) + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Ready</term> + <listitem> + <para> + A backup broker that is fully caught-up and ready to take over as + primary. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Recovering</term> + <listitem> + <para> + Newly-promoted primary, waiting for backups to connect and catch up. + Clients can connect but they are stalled until the primary is active. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>Active</term> + <listitem> + <para> + The active primary broker with all backups connected and caught-up. + </para> + </listitem> + </varlistentry> + </variablelist> + </section> + <section id="limitations"> + <title>Limitations</title> + <para> + There are a some known limitations in the current implementation. These + will be fixed in future versions. + </para> + <itemizedlist> + <listitem> + <para> + Transactional changes to queue state are not replicated atomically. If + the primary crashes during a transaction, it is possible that the + backup could contain only part of the changes introduced by a + transaction. + </para> + </listitem> + <listitem> + <para> + Configuration changes (creating or deleting queues, exchanges and + bindings) are replicated asynchronously. Management tools used to + make changes will consider the change complete when it is complete + on the primary, it may not yet be replicated to all the backups. + </para> + </listitem> + <listitem> + <para> + Federation links <emphasis>to</emphasis> the primary will fail over + correctly. Federated links <emphasis>from</emphasis> the primary + will be lost in fail over, they will not be re-connected to the new + primary. It is possible to work around this by replacing the + <literal>qpidd-primary</literal> start up script with a script that + re-creates federation links when the primary is promoted. + </para> + </listitem> + </itemizedlist> + </section> + </section> + + <section id="ha-virtual-ip"> + <title>Virtual IP Addresses</title> + <para> + Some resource managers (including <command>rgmanager</command>) support + <firstterm>virtual IP addresses</firstterm>. A virtual IP address is an IP + address that can be relocated to any of the nodes in a cluster. The + resource manager associates this address with the primary node in the + cluster, and relocates it to the new primary when there is a failure. This + simplifies configuration as you can publish a single IP address rather + than a list. + </para> + <para> + A virtual IP address can be used by clients to connect to the primary. The + following sections will explain how to configure virtual IP addresses for + clients or brokers. + </para> + </section> + + <section id="ha-broker-config"> + <title>Configuring the Brokers</title> + <para> + The broker must load the <filename>ha</filename> module, it is loaded by + default. The following broker options are available for the HA module. + </para> + <note> + <para> + Broker management is required for HA to operate, it is enabled by + default. The option <literal>mgmt-enable</literal> must not be set to + "no" + </para> + </note> + <note> + <para> + Incorrect security settings are a common cause of problems when + getting started, see <xref linkend="ha-security"/>. + </para> + </note> + <table frame="all" id="ha-broker-options"> + <title>Broker Options for High Availability Messaging Cluster</title> + <tgroup align="left" cols="2" colsep="1" rowsep="1"> + <colspec colname="c1"/> + <colspec colname="c2"/> + <thead> + <row> + <entry align="center" nameend="c2" namest="c1"> + Options for High Availability Messaging Cluster + </entry> + </row> + </thead> + <tbody> + <row> + <entry> + <literal>ha-cluster <replaceable>yes|no</replaceable></literal> + </entry> + <entry> + Set to "yes" to have the broker join a cluster. + </entry> + </row> + <row> + <entry> + <literal>ha-queue-replication <replaceable>yes|no</replaceable></literal> + </entry> + <entry> + Enable replication of specific queues without joining a cluster, see <xref linkend="ha-queue-replication"/>. + </entry> + </row> + <row> + <entry> + <literal>ha-brokers-url <replaceable>URL</replaceable></literal> + </entry> + <entry> + <para> + The URL + <footnote id="ha-url-grammar"> + <para> + The full format of the URL is given by this grammar: + <programlisting> +url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* +addr = tcp_addr / rmda_addr / ssl_addr / ... +tcp_addr = ["tcp:"] host [":" port] +rdma_addr = "rdma:" host [":" port] +ssl_addr = "ssl:" host [":" port]' + </programlisting> + </para> + </footnote> + used by cluster brokers to connect to each other. The URL should + contain a comma separated list of the broker addresses, rather than a + virtual IP address. + </para> + </entry> + </row> + <row> + <entry><literal>ha-public-url <replaceable>URL</replaceable></literal> </entry> + <entry> + <para> + This option is only needed for backwards compatibility if you + have been using the <literal>amq.failover</literal> exchange. + This exchange is now obsolete, it is recommended to use a + virtual IP address instead. + </para> + <para> + If set, this URL is advertised by the + <literal>amq.failover</literal> exchange and overrides the + broker option <literal>known-hosts-url</literal> + </para> + </entry> + </row> + <row> + <entry><literal>ha-replicate </literal><replaceable>VALUE</replaceable></entry> + <entry> + <para> + Specifies whether queues and exchanges are replicated by default. + <replaceable>VALUE</replaceable> is one of: <literal>none</literal>, + <literal>configuration</literal>, <literal>all</literal>. + For details see <xref linkend="ha-replicate-values"/>. + </para> + </entry> + </row> + <row> + <entry> + <para><literal>ha-username <replaceable>USER</replaceable></literal></para> + <para><literal>ha-password <replaceable>PASS</replaceable></literal></para> + <para><literal>ha-mechanism <replaceable>MECHANISM</replaceable></literal></para> + </entry> + <entry> + Authentication settings used by HA brokers to connect to each other, + see <xref linkend="ha-security"/> + </entry> + </row> + <row> + <entry><literal>ha-backup-timeout<replaceable>SECONDS</replaceable></literal> + <footnote id="ha-seconds-spec"> + <para> + Values specified as <replaceable>SECONDS</replaceable> can be a + fraction of a second, e.g. "0.1" for a tenth of a second. + They can also have an explicit unit, + e.g. 10s (seconds), 10ms (milliseconds), 10us (microseconds), 10ns (nanoseconds) + </para> + </footnote> + </entry> + <entry> + <para> + Maximum time that a recovering primary will wait for an expected + backup to connect and become ready. + </para> + </entry> + </row> + <row> + <entry> + <literal>link-maintenance-interval <replaceable>SECONDS</replaceable></literal> + <footnoteref linkend="ha-seconds-spec"/> + </entry> + <entry> + <para> + HA uses federation links to connect from backup to primary. + Backup brokers check the link to the primary on this interval + and re-connect if need be. Default 2 seconds. Set lower for + faster failover, e.g. 0.1 seconds. Setting too low will result + in excessive link-checking on the backups. + </para> + </entry> + </row> + <row> + <entry> + <literal>link-heartbeat-interval <replaceable>SECONDS</replaceable></literal> + <footnoteref linkend="ha-seconds-spec"/> + </entry> + <entry> + <para> + HA uses federation links to connect from backup to primary. + If no heart-beat is received for twice this interval the primary will consider that + backup dead (e.g. if backup is hung or partitioned.) + This interval is also used to time-out for broker status checks, + it may take up to this interval for rgmanager to detect a hung or partitioned broker. + Clients sending messages may be held up during this time. + Default 120 seconds: you will probably want to set this to a lower value e.g. 10. + If set too low rgmanager may consider a slow broker to have failed and kill it. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + <para> + To configure a HA cluster you must set at least <literal>ha-cluster</literal> and + <literal>ha-brokers-url</literal>. + </para> + </section> + + <section id="ha-rm"> + <title>The Cluster Resource Manager</title> + <para> + Broker fail-over is managed by a <firstterm>cluster resource + manager</firstterm>. An integration with <ulink + url="https://fedorahosted.org/cluster/wiki/RGManager">rgmanager</ulink> is + provided, but it is possible to integrate with other resource managers. + </para> + <para> + The resource manager is responsible for starting the <command>qpidd</command> broker + on each node in the cluster. The resource manager then <firstterm>promotes</firstterm> + one of the brokers to be the primary. The other brokers connect to the primary as + backups, using the URL provided in the <literal>ha-brokers-url</literal> configuration + option. + </para> + <para> + Once connected, the backup brokers synchronize their state with the + primary. When a backup is synchronized, or "hot", it is ready to take + over if the primary fails. Backup brokers continually receive updates + from the primary in order to stay synchronized. + </para> + <para> + If the primary fails, backup brokers go into fail-over mode. The resource + manager must detect the failure and promote one of the backups to be the + new primary. The other backups connect to the new primary and synchronize + their state with it. + </para> + <para> + The resource manager is also responsible for protecting the cluster from + <firstterm>split-brain</firstterm> conditions resulting from a network partition. A + network partition divide a cluster into two sub-groups which cannot see each other. + Usually a <firstterm>quorum</firstterm> voting algorithm is used that disables nodes + in the inquorate sub-group. + </para> + </section> + + <section id="ha-rm-config"> + <title>Configuring with <command>rgmanager</command> as resource manager</title> + <para> + This section assumes that you are already familiar with setting up and configuring + clustered services using <command>cman</command> and + <command>rgmanager</command>. It will show you how to configure an active-passive, + hot-standby <command>qpidd</command> HA cluster with <command>rgmanager</command>. + </para> + <note> + <para> + Once all components are installed it is important to take the following step: + <programlisting> +chkconfig rgmanager on +chkconfig cman on +chkconfig qpidd <emphasis>off</emphasis> + </programlisting> + </para> + <para> + The qpidd service must be <emphasis>off</emphasis> in + <literal>chkconfig</literal> because <literal>rgmanager</literal> will + start and stop <literal>qpidd</literal>. If the normal system init + process also attempts to start and stop qpidd it can cause rgmanager to + lose track of qpidd processes. The symptom when this happens is that + <literal>clustat</literal> shows a <literal>qpidd</literal> service to + be stopped when in fact there is a <literal>qpidd</literal> process + running. The <literal>qpidd</literal> log will show errors like this: + <programlisting> +critical Unexpected error: Daemon startup failed: Cannot lock /var/lib/qpidd/lock: Resource temporarily unavailable + </programlisting> + </para> + </note> + <para> + You must provide a <literal>cluster.conf</literal> file to configure + <command>cman</command> and <command>rgmanager</command>. Here is + an example <literal>cluster.conf</literal> file for a cluster of 3 nodes named + node1, node2 and node3. We will go through the configuration step-by-step. + </para> + <programlisting> + <![CDATA[ +<?xml version="1.0"?> +<!-- +This is an example of a cluster.conf file to run qpidd HA under rgmanager. +This example assumes a 3 node cluster, with nodes named node1, node2 and node3. + +NOTE: fencing is not shown, you must configure fencing appropriately for your cluster. +--> + +<cluster name="qpid-test" config_version="18"> + <!-- The cluster has 3 nodes. Each has a unique nodeid and one vote + for quorum. --> + <clusternodes> + <clusternode name="node1.example.com" nodeid="1"/> + <clusternode name="node2.example.com" nodeid="2"/> + <clusternode name="node3.example.com" nodeid="3"/> + </clusternodes> + + <!-- Resouce Manager configuration. --> + + status_poll_interval is the interval in seconds that the resource manager checks the status + of managed services. This affects how quickly the manager will detect failed services. + --> + <rm status_poll_interval="1"> + <!-- + There is a failoverdomain for each node containing just that node. + This lets us stipulate that the qpidd service should always run on each node. + --> + <failoverdomains> + <failoverdomain name="node1-domain" restricted="1"> + <failoverdomainnode name="node1.example.com"/> + </failoverdomain> + <failoverdomain name="node2-domain" restricted="1"> + <failoverdomainnode name="node2.example.com"/> + </failoverdomain> + <failoverdomain name="node3-domain" restricted="1"> + <failoverdomainnode name="node3.example.com"/> + </failoverdomain> + </failoverdomains> + + <resources> + <!-- This script starts a qpidd broker acting as a backup. --> + <script file="/etc/init.d/qpidd" name="qpidd"/> + + <!-- This script promotes the qpidd broker on this node to primary. --> + <script file="/etc/init.d/qpidd-primary" name="qpidd-primary"/> + + <!-- + This is a virtual IP address for client traffic. + monitor_link="yes" means monitor the health of the NIC used for the VIP. + sleeptime="0" means don't delay when failing over the VIP to a new address. + --> + <ip address="20.0.20.200" monitor_link="yes" sleeptime="0"/> + </resources> + + <!-- There is a qpidd service on each node, it should be restarted if it fails. --> + <service name="node1-qpidd-service" domain="node1-domain" recovery="restart"> + <script ref="qpidd"/> + </service> + <service name="node2-qpidd-service" domain="node2-domain" recovery="restart"> + <script ref="qpidd"/> + </service> + <service name="node3-qpidd-service" domain="node3-domain" recovery="restart"> + <script ref="qpidd"/> + </service> + + <!-- There should always be a single qpidd-primary service, it can run on any node. --> + <service name="qpidd-primary-service" autostart="1" exclusive="0" recovery="relocate"> + <script ref="qpidd-primary"/> + <!-- The primary has the IP addresses for brokers and clients to connect. --> + <ip ref="20.0.20.200"/> + </service> + </rm> +</cluster> + ]]> + </programlisting> + + <para> + There is a <literal>failoverdomain</literal> for each node containing just that + one node. This lets us stipulate that the qpidd service should always run on all + nodes. + </para> + <para> + The <literal>resources</literal> section defines the <command>qpidd</command> + script used to start the <command>qpidd</command> service. It also defines the + <command>qpid-primary</command> script which does not + actually start a new service, rather it promotes the existing + <command>qpidd</command> broker to primary status. + </para> + <para> + The <literal>resources</literal> section also defines a virtual IP + address for clients: <literal>20.0.20.200</literal>. + </para> + <para> + <filename>qpidd.conf</filename> should contain these lines: + </para> + <programlisting> +ha-cluster=yes +ha-brokers-url=20.0.20.1,20.0.20.2,20.0.20.3 + </programlisting> + <para> + The brokers connect to each other directly via the addresses + listed in <command>ha-brokers-url</command>. Note the client and broker + addresses are on separate sub-nets, this is recommended but not required. + </para> + <para> + The <literal>service</literal> section defines 3 <literal>qpidd</literal> + services, one for each node. Each service is in a restricted fail-over + domain containing just that node, and has the <literal>restart</literal> + recovery policy. The effect of this is that rgmanager will run + <command>qpidd</command> on each node, restarting if it fails. + </para> + <para> + There is a single <literal>qpidd-primary-service</literal> using the + <command>qpidd-primary</command> script which is not restricted to a + domain and has the <literal>relocate</literal> recovery policy. This means + rgmanager will start <command>qpidd-primary</command> on one of the nodes + when the cluster starts and will relocate it to another node if the + original node fails. Running the <literal>qpidd-primary</literal> script + does not start a new broker process, it promotes the existing broker to + become the primary. + </para> + + <section id="ha-rm-shutdown-node"> + <title>Shutting down qpidd on a HA node</title> + <para> + As explained above both the per-node <literal>qpidd</literal> service + and the re-locatable <literal>qpidd-primary</literal> service are + implemented by the same <literal>qpidd</literal> daemon. + </para> + <para> + As a result, stopping the <literal>qpidd</literal> service will not stop + a <literal>qpidd</literal> daemon that is acting as primary, and + stopping the <literal>qpidd-primary</literal> service will not stop a + <literal>qpidd</literal> process that is acting as backup. + </para> + <para> + To shut down a node that is acting as primary you need to shut down the + <literal>qpidd</literal> service <emphasis>and</emphasis> relocate the + primary: + </para> + <para> + <programlisting> +clusvcadm -d somenode-qpidd-service +clusvcadm -r qpidd-primary-service + </programlisting> + </para> + <para> + This will shut down the <literal>qpidd</literal> daemon on that node and + prevent the primary service service from relocating back to the node + because the qpidd service is no longer running there. + </para> + </section> + </section> + + <section id="ha-broker-admin"> + <title>Broker Administration Tools</title> + <para> + Normally, clients are not allowed to connect to a backup broker. However + management tools are allowed to connect to a backup brokers. If you use + these tools you <emphasis>must not</emphasis> add or remove messages from + replicated queues, nor create or delete replicated queues or exchanges as + this will disrupt the replication process and may cause message loss. + </para> + <para> + <command>qpid-ha</command> allows you to view and change HA configuration settings. + </para> + <para> + The tools <command>qpid-config</command>, <command>qpid-route</command> and + <command>qpid-stat</command> will connect to a backup if you pass the flag <command>ha-admin</command> on the + command line. + </para> + </section> + + <section id="ha-replicate-values"> + <title>Controlling replication of queues and exchanges</title> + <para> + By default, queues and exchanges are not replicated automatically. You can change + the default behaviour by setting the <literal>ha-replicate</literal> configuration + option. It has one of the following values: + <itemizedlist> + <listitem> + <para> + <firstterm>all</firstterm>: Replicate everything automatically: queues, + exchanges, bindings and messages. + </para> + </listitem> + <listitem> + <para> + <firstterm>configuration</firstterm>: Replicate the existence of queues, + exchange and bindings but don't replicate messages. + </para> + </listitem> + <listitem> + <para> + <firstterm>none</firstterm>: Don't replicate anything, this is the default. + </para> + </listitem> + </itemizedlist> + </para> + <para> + You can over-ride the default for a particular queue or exchange by passing the + argument <literal>qpid.replicate</literal> when creating the queue or exchange. It + takes the same values as <literal>ha-replicate</literal> + </para> + <para> + Bindings are automatically replicated if the queue and exchange being bound both + have replication <literal>all</literal> or <literal>configuration</literal>, they + are not replicated otherwise. + </para> + <para> + You can create replicated queues and exchanges with the + <command>qpid-config</command> management tool like this: + </para> + <programlisting> +qpid-config add queue myqueue --replicate all + </programlisting> + <para> + To create replicated queues and exchanges via the client API, add a + <literal>node</literal> entry to the address like this: + </para> + <programlisting> +"myqueue;{create:always,node:{x-declare:{arguments:{'qpid.replicate':all}}}}" + </programlisting> + <para> + There are some built-in exchanges created automatically by the broker, these + exchanges are never replicated. The built-in exchanges are the default (nameless) + exchange, the AMQP standard exchanges (<literal>amq.direct, amq.topic, amq.fanout</literal> and + <literal>amq.match</literal>) and the management exchanges (<literal>qpid.management, qmf.default.direct</literal> and + <literal>qmf.default.topic</literal>) + </para> + <para> + Note that if you bind a replicated queue to one of these exchanges, the + binding will <emphasis>not</emphasis> be replicated, so the queue will not + have the binding after a fail-over. + </para> + </section> + + <section id="ha-failover"> + <title>Client Connection and Fail-over</title> + <para> + Clients can only connect to the primary broker. Backup brokers reject any + connection attempt by a client. Clients rejected by a backup broker will + automatically fail-over until they connect to the primary. + </para> + <para> + Clients are configured with the URL for the cluster (details below for + each type of client). There are two possibilities + <itemizedlist> + <listitem> + <para> + The URL contains multiple addresses, one for each broker in the cluster. + </para> + </listitem> + <listitem> + <para> + The URL contains a single <firstterm>virtual IP address</firstterm> + that is assigned to the primary broker by the resource manager. + This is the recommended configuration. + </para> + </listitem> + </itemizedlist> + In the first case, clients will repeatedly re-try each address in the URL + until they successfully connect to the primary. In the second case the + resource manager will assign the virtual IP address to the primary broker, + so clients only need to re-try on a single address. + </para> + <para> + When the primary broker fails, clients re-try all known cluster addresses + until they connect to the new primary. The client re-sends any messages + that were previously sent but not acknowledged by the broker at the time + of the failure. Similarly messages that have been sent by the broker, but + not acknowledged by the client, are re-queued. + </para> + <para> + TCP can be slow to detect connection failures. A client can configure a + connection to use a <firstterm>heartbeat</firstterm> to detect connection + failure, and can specify a time interval for the heartbeat. If heartbeats + are in use, failures will be detected no later than twice the heartbeat + interval. The following sections explain how to enable heartbeat in each + client. + </para> + <para> + Note: the following sections explain how to configure clients with + multiple dresses, but if you are using a virtual IP address you only need + to configure that one address for clients, you don't need to list all the + addresses. + </para> + <para> + Suppose your cluster has 3 nodes: <literal>node1</literal>, + <literal>node2</literal> and <literal>node3</literal> all using the + default AMQP port, and you are not using a virtual IP address. To connect + a client you need to specify the address(es) and set the + <literal>reconnect</literal> property to <literal>true</literal>. The + following sub-sections show how to connect each type of client. + </para> + <section id="ha-clients"> + <title>C++ clients</title> + <para> + With the C++ client, you specify multiple cluster addresses in a single URL + <footnote> + <para> + The full grammar for the URL is: + </para> + <programlisting> +url = ["amqp:"][ user ["/" password] "@" ] addr ("," addr)* +addr = tcp_addr / rmda_addr / ssl_addr / ... +tcp_addr = ["tcp:"] host [":" port] +rdma_addr = "rdma:" host [":" port] +ssl_addr = "ssl:" host [":" port]' + </programlisting> + </footnote> + You also need to specify the connection option + <literal>reconnect</literal> to be true. For example: + </para> + <programlisting> +qpid::messaging::Connection c("node1,node2,node3","{reconnect:true}"); + </programlisting> + <para> + Heartbeats are disabled by default. You can enable them by specifying a + heartbeat interval (in seconds) for the connection via the + <literal>heartbeat</literal> option. For example: + </para> + <programlisting> +qpid::messaging::Connection c("node1,node2,node3","{reconnect:true,heartbeat:10}"); + </programlisting> + </section> + <section id="ha-python-client"> + <title>Python clients</title> + <para> + With the python client, you specify <literal>reconnect=True</literal> + and a list of <replaceable>host:port</replaceable> addresses as + <literal>reconnect_urls</literal> when calling + <literal>Connection.establish</literal> or + <literal>Connection.open</literal> + </para> + <programlisting> +connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"]) + </programlisting> + <para> + Heartbeats are disabled by default. You can + enable them by specifying a heartbeat interval (in seconds) for the + connection via the 'heartbeat' option. For example: + </para> + <programlisting> +connection = qpid.messaging.Connection.establish("node1", reconnect=True, reconnect_urls=["node1", "node2", "node3"], heartbeat=10) + </programlisting> + </section> + <section id="ha-jms-client"> + <title>Java JMS Clients</title> + <para> + In Java JMS clients, client fail-over is handled automatically if it is + enabled in the connection. You can configure a connection to use + fail-over using the <command>failover</command> property: + </para> + + <screen> + connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&failover='failover_exchange' + </screen> + <para> + This property can take three values: + </para> + <variablelist> + <title>Fail-over Modes</title> + <varlistentry> + <term>failover_exchange</term> + <listitem> + <para> + If the connection fails, fail over to any other broker in the cluster. + </para> + + </listitem> + + </varlistentry> + <varlistentry> + <term>roundrobin</term> + <listitem> + <para> + If the connection fails, fail over to one of the brokers specified in the <command>brokerlist</command>. + </para> + + </listitem> + + </varlistentry> + <varlistentry> + <term>singlebroker</term> + <listitem> + <para> + Fail-over is not supported; the connection is to a single broker only. + </para> + + </listitem> + + </varlistentry> + + </variablelist> + <para> + In a Connection URL, heartbeat is set using the <command>heartbeat</command> property, which is an integer corresponding to the heartbeat period in seconds. For instance, the following line from a JNDI properties file sets the heartbeat time out to 3 seconds: + </para> + + <screen> + connectionfactory.qpidConnectionfactory = amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672'&heartbeat='3' + </screen> + </section> + </section> + + <section id="ha-security"> + <title>Security and Access Control.</title> + <para> + This section outlines the HA specific aspects of security configuration. + Please see <xref linkend="chap-Messaging_User_Guide-Security"/> for + more details on enabling authentication and setting up Access Control Lists. + </para> + <note> + <para> + Unless you disable authentication with <literal>auth=no</literal> in + your configuration, you <emphasis>must</emphasis> set the options below + and you <emphasis>must</emphasis> have an ACL file with at least the + entry described below. + </para> + <para> + Backups will be <emphasis>unable to connect to the primary</emphasis> if + the security configuration is incorrect. See also <xref + linkend="ha-troubleshoot-security"/> + </para> + </note> + <para> + When authentication is enabled you must set the credentials used by HA + brokers with following options: + </para> + <table frame="all" id="ha-security-options"> + <title>HA Security Options</title> + <tgroup align="left" cols="2" colsep="1" rowsep="1"> + <colspec colname="c1"/> + <colspec colname="c2"/> + <thead> + <row> + <entry align="center" nameend="c2" namest="c1"> + HA Security Options + </entry> + </row> + </thead> + <tbody> + <row> + <entry><para><literal>ha-username</literal> <replaceable>USER</replaceable></para></entry> + <entry><para>User name for HA brokers. Note this must <emphasis>not</emphasis> include the <literal>@QPID</literal> suffix.</para></entry> + </row> + <row> + <entry><para><literal>ha-password</literal> <replaceable>PASS</replaceable></para></entry> + <entry><para>Password for HA brokers.</para></entry> + </row> + <row> + <entry><para><literal>ha-mechanism</literal> <replaceable>MECHANISM</replaceable></para></entry> + <entry> + <para> + Mechanism for HA brokers. Any mechanism you enable for + broker-to-broker communication can also be used by a client, so + do not use ha-mechanism=ANONYMOUS in a secure environment. + </para> + </entry> + </row> + </tbody> + </tgroup> + </table> + <para> + This identity is used to authorize federation links from backup to + primary. It is also used to authorize actions on the backup to replicate + primary state, for example creating queues and exchanges. + </para> + <para> + When authorization is enabled you must have an Access Control List with the + following rule to allow HA replication to function. Suppose + <literal>ha-username</literal>=<replaceable>USER</replaceable> + </para> + <programlisting> +acl allow <replaceable>USER</replaceable>@QPID all all + </programlisting> + </section> + + <section id="ha-other-rm"> + <title>Integrating with other Cluster Resource Managers</title> + <para> + To integrate with a different resource manager you must configure it to: + <itemizedlist> + <listitem><para>Start a qpidd process on each node of the cluster.</para></listitem> + <listitem><para>Restart qpidd if it crashes.</para></listitem> + <listitem><para>Promote exactly one of the brokers to primary.</para></listitem> + <listitem><para>Detect a failure and promote a new primary.</para></listitem> + </itemizedlist> + </para> + <para> + The <command>qpid-ha</command> command allows you to check if a broker is + primary, and to promote a backup to primary. + </para> + <para> + To test if a broker is the primary: + </para> + <programlisting>qpid-ha -b <replaceable>broker-address</replaceable> status --expect=primary</programlisting> + <para> + This will return 0 if the broker at <replaceable>broker-address</replaceable> is the primary, + non-0 otherwise. + </para> + <para> + To promote a broker to primary: + <programlisting>qpid-ha --cluster-manager -b <replaceable>broker-address</replaceable> promote</programlisting> + </para> + <para> + Note that <literal>promote</literal> is considered a "cluster manager + only" command. Incorrect use of <literal>promote</literal> outside of the + cluster manager could create a cluster with multiple primaries. Such a + cluster will malfunction and lose data. "Cluster manager only" commands + are not accessible in <command>qpid-ha</command> without the + <literal>--cluster-manager</literal> option. + </para> + <para> + To list the full set of commands use: + </para> + <programlisting> +qpid-ha --cluster-manager --help + </programlisting> + </section> + + <section id ="ha-store"> + <title>Using a message store in a cluster</title> + <para> + If you use a persistent store for your messages then each broker in a + cluster will have its own store. If the entire cluster fails and is + restarted, the *first* broker that becomes primary will recover from its + store. All the other brokers will clear their stores and get an update + from the primary to ensure consistency. + </para> + </section> + + <section id="ha-troubleshoot"> + <title>Troubleshooting a cluster</title> + <para> + This section applies to clusters that are using rgmanager as the + cluster manager. + </para> + <section id="ha-troubleshoot-no-primary"> + <title>No primary broker</title> + <para> + When you initially start a HA cluster, all brokers are in + <literal>joining</literal> mode. The brokers do not automatically select + a primary, they rely on the cluster manager <literal>rgmanager</literal> + to do so. If <literal>rgmanager</literal> is not running or is not + configured correctly, brokers will remain in the + <literal>joining</literal> state. See <xref linkend="ha-rm-config"/> + </para> + </section> + <section id="ha-troubleshoot-security"> + <title>Authentication and ACL failures</title> + <para> + If a broker is unable to establish a connection to another broker in the + cluster due to authentication or ACL problems the logs may contain + errors like the following: + <programlisting> +info SASL: Authentication failed: SASL(-13): user not found: Password verification failed + </programlisting> + <programlisting> +warning Client closed connection with 320: User anonymous@QPID federation connection denied. Systems with authentication enabled must specify ACL create link rules. + </programlisting> + <programlisting> +warning Client closed connection with 320: ACL denied anonymous@QPID creating a federation link. + </programlisting> + </para> + <para> + Set the HA security configuration and ACL file as described in <xref + linkend="ha-security"/>. Once the cluster is running and the primary is + promoted , run: + <programlisting>qpid-ha status --all</programlisting> + to make sure that the brokers are running as one cluster. + </para> + </section> + <section id="ha-troubleshoot-slow-recovery"> + <title>Slow recovery times</title> + <para> + The following configuration settings affect recovery time. The + values shown are examples that give fast recovery on a lightly + loaded system. You should run tests to determine if the values are + appropriate for your system and load conditions. + </para> + <section id="ha-troubleshoot-cluster.conf"> + <title>cluster.conf:</title> + <programlisting> +<rm status_poll_interval=1> + </programlisting> + <para> + status_poll_interval is the interval in seconds that the + resource manager checks the status of managed services. This + affects how quickly the manager will detect failed services. + </para> + <programlisting> +<ip address="20.0.20.200" monitor_link="yes" sleeptime="0"/> + </programlisting> + <para> + This is a virtual IP address for client traffic. + monitor_link="yes" means monitor the health of the network interface + used for the VIP. sleeptime="0" means don't delay when + failing over the VIP to a new address. + </para> + </section> + <section id="ha-troubleshoot-qpidd.conf"> + <title>qpidd.conf</title> + <programlisting> +link-maintenance-interval=0.1 + </programlisting> + <para> + Interval for backup brokers to check the link to the primary + re-connect if need be. Default 2 seconds. Can be set lower for + faster fail-over. Setting too low will result in excessive + link-checking activity on the broker. + </para> + <programlisting> +link-heartbeat-interval=5 + </programlisting> + <para> + Heartbeat interval for federation links. The HA cluster uses + federation links between the primary and each backup. The + primary can take up to twice the heartbeat interval to detect a + failed backup. When a sender sends a message the primary waits + for all backups to acknowledge before acknowledging to the + sender. A disconnected backup may cause the primary to block + senders until it is detected via heartbeat. + </para> + <para> + This interval is also used as the timeout for broker status + checks by rgmanager. It may take up to this interval for + rgmanager to detect a hung broker. + </para> + <para> + The default of 120 seconds is very high, you will probably want + to set this to a lower value. If set too low, under network + congestion or heavy load, a slow-to-respond broker may be + re-started by rgmanager. + </para> + </section> + </section> + <section id="ha-troubleshoot-total-cluster-failure"> + <title>Total cluster failure</title> + <para> + Note: for definition of broker states <firstterm>joining</firstterm>, + <firstterm>catch-up</firstterm>, <firstterm>ready</firstterm>, + <firstterm>recovering</firstterm> and <firstterm>active</firstterm> see + <xref linkend="ha-broker-states"/> + </para> + <para> + The cluster can only guarantee availability as long as there is at + least one active primary broker or ready backup broker left alive. + If all the brokers fail simultaneously, the cluster will fail and + non-persistent data will be lost. + </para> + <para> + While there is an active primary broker, clients can get service. + If the active primary fails, one of the "ready" backup + brokers will take over, recover and become active. Note a backup + can only be promoted to primary if it is in the "ready" + state (with the exception of the first primary in a new cluster + where all brokers are in the "joining" state) + </para> + <para> + Given a stable cluster of N brokers with one active primary and + N-1 ready backups, the system can sustain up to N-1 failures in + rapid succession. The surviving broker will be promoted to active + and continue to give service. + </para> + <para> + However at this point the system <emphasis>cannot</emphasis> + sustain a failure of the surviving broker until at least one of + the other brokers recovers, catches up and becomes a ready backup. + If the surviving broker fails before that the cluster will fail in + one of two modes (depending on the exact timing of failures) + </para> + <section id="ha-troubleshoot-the-cluster-hangs"> + <title>1. The cluster hangs</title> + <para> + All brokers are in joining or catch-up mode. rgmanager tries to + promote a new primary but cannot find any candidates and so + gives up. clustat will show that the qpidd services are running + but the the qpidd-primary service has stopped, something like + this: + </para> + <programlisting> +Service Name Owner (Last) State +------- ---- ----- ------ ----- +service:mrg33-qpidd-service 20.0.10.33 started +service:mrg34-qpidd-service 20.0.10.34 started +service:mrg35-qpidd-service 20.0.10.35 started +service:qpidd-primary-service (20.0.10.33) stopped + </programlisting> + <para> + Eventually all brokers become stuck in "joining" mode, + as shown by: <literal>qpid-ha status --all</literal> + </para> + <para> + At this point you need to restart the cluster in one of the + following ways: + <orderedlist> + <listitem><para> + Restart the entire cluster: + In <literal>luci:<replaceable>your-cluster</replaceable>:Nodes</literal> + click reboot to restart the entire cluster + </para></listitem> + <listitem><para> + Stop and restart the cluster with + <literal>ccs --stopall; ccs --startall</literal> + </para></listitem> + <listitem><para> + Restart just the Qpid services:In <literal>luci:<replaceable>your-cluster</replaceable>:Service Groups</literal> + <orderedlist> + <listitem><para>Select all the qpidd (not qpidd-primary) services, click restart</para></listitem> + <listitem><para>Select the qpidd-primary service, click restart</para></listitem> + </orderedlist> + </para></listitem> + <listitem><para> + Stop the <literal>qpidd-primary</literal> and + <literal>qpidd</literal> services with <literal>clusvcadm</literal>, + then restart (qpidd-primary last) + </para></listitem> + </orderedlist> + </para> + </section> + <section id="ha-troubleshoot-the-cluster-reboots"> + <title>2. The cluster reboots</title> + <para> + A new primary is promoted and the cluster is functional but all + non-persistent data from before the failure is lost. + </para> + </section> + </section> + <section id="ha-troubleshoot-fencing-and-network-partitions"> + <title>Fencing and network partitions</title> + <para> + A network partition is a a network failure that divides the + cluster into two or more sub-clusters, where each broker can + communicate with brokers in its own sub-cluster but not with + brokers in other sub-clusters. This condition is also referred to + as a "split brain". + </para> + <para> + Nodes in one sub-cluster can't tell whether nodes in other + sub-clusters are dead or are still running but disconnected. We + cannot allow each sub-cluster to independently declare its own + qpidd primary and start serving clients, as the cluster will + become inconsistent. We must ensure only one sub-cluster continues + to provide service. + </para> + <para> + A <emphasis>quorum</emphasis> determines which sub-cluster + continues to operate, and <emphasis>power fencing</emphasis> + ensures that nodes in non-quorate sub-clusters cannot attempt to + provide service inconsistently. For more information see: + </para> + <para> + https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/High_Availability_Add-On_Overview/index.html, + chapter 2. Quorum and 4. Fencing. + </para> + </section> + </section> +</section> |
