| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
| |
References #500.
[#116521809]
|
| |
|
|
| |
While here, be more informative when a node rename fails.
|
| |
|
|
|
|
|
|
| |
This helps in CI where the connection tracking tables may not yet
be up-to-date at the time the testcase verifies them.
References #500.
[#116521809]
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The connection tracking tables are not replicated because the table
tracking connections on node A logically exists only the node A.
The backup made during the rename of a node failed because it wanted to
access a remote offline node to backup its connection tracking tables.
Obviously it didn't work. The solution is to not backup those tables.
This is correct because they are only relevant while the node is
running.
References #500.
[#116521809]
|
| |
|
|
|
|
|
|
|
|
|
| |
When the node is offline (`--offline` is set on the command line),
we must skip the emission of the `node_deleted` event because the
rabbit_event event handler is not setup (RabbitMQ is stopped).
Otherwise, the command fails with `badarg` in gen_event.
References #500.
[#116521809]
|
| |\ |
|
| | |\ |
|
| | | |\ |
|
| | | | | |
|
| | | | | |
|
| | | | |\
| | | |/
| | |/| |
|
| | | | |
| | | |
| | | |
| | | | |
* Also solves deadlocks when leader aborts autoheal in node down
|
| | |\ \ \
| | |/ / |
|
| | | |\ \
| | | | |
| | | | | |
Use new rabbitmqctl features for monitoring
|
| | | |/ /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This will stop wasting network bandwidth for monitoring.
E.g. a 200-node OpenStack installation produces aronud 10k queues and
10k channels. Doing single list_queues/list_channels in cluster in this
environment results in 27k TCP packets and around 12 megabytes of
network traffic. Given that this calls happen ~10 times a minute with 3
controllers, it results in pretty significant overhead.
To enable those features you shoud have rabbitmq containing following
patches:
- https://github.com/rabbitmq/rabbitmq-server/pull/883
- https://github.com/rabbitmq/rabbitmq-server/pull/911
- https://github.com/rabbitmq/rabbitmq-server/pull/915
|
| |\ \ \ \
| |/ / / |
|
| | |\ \ \
| | |/ / |
|
| | | |\ \
| | | |/
| | |/| |
[OCF HA] Change master score computation & split-brain detection logic
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Previous split brain logic worked as follows: each slave checked
that it is connected to master. If check fails, slave restarts. The
ultimate flaw in that logic is that there is little guarantee that
master is alive at the moment. Moreover, if master dies, it is very
probable that during the next monitor check slaves will detect its
death and restart, causing complete RabbitMQ cluster downtime.
With the new approach master node checks that slaves are connected to
it and orders them to restart if they are not. The check is performed
after master node health check, meaning that at least that node
survives. Also, orders expire in one minute and freshly started node
ignores orders to restart for three minutes to give cluster time to
stabilize.
Also corrected the problem, when node starts and is already clustered.
In that case OCF script forgot to start the RabbitMQ app, causing
subsequent restart. Now we ensure that RabbitMQ app is running.
The two introduced attributes rabbit-start-phase-1-time and
rabbit-ordered-to-restart are made private. In order to allow master
to set node's order to restart, both ocf_update_private_attr and
ocf_get_private_attr signatures are expanded to allow passing
node name.
Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute,
attrd_updater returns empty string instead of "(null)", when an
attribute is not defined on needed node, but is defined on some other
node. Correspondingly changed code to expect empty string, not a
"(null)".
This fix is a fix for Fuel bugs
https://bugs.launchpad.net/fuel/+bug/1559136
https://bugs.launchpad.net/mos/+bug/1561894
|
| | | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Right now we assign 1000 to the oldest nodes and 1 to others. That
creates a problem when Master restarts and no node is promoted until
that node starts back. In that case the returned node will have score
of 1, like all other slaves and Pacemaker will select to promote it
again. The node is clean empty and afterwards other slaves join to
it, wiping their data as well. As a result, we loose all the messages.
The new algorithm actually ranks nodes, not just selects the oldest
one. It also maintains the invariant that if node A started later
than node B, then node A score must be smaller than that of
node B. As a result, freshly started node has no chance of being
selected in preference to older node. If several nodes start
simultaneously, among them an older node might temporarily receive
lower score than a younger one, but that is neglectable.
Also remove any action on demote or demote notification - all of
these duplicate actions done in stop or stop notification. With these
removed, changing master on a running cluster does not affect RabbitMQ
cluster in any way - we just declare another node master and that is
it. It is important for the current change because master score might
change after initial cluster start up causing master migration from
one node to another.
This fix is a prerequsite for fix to Fuel bugs
https://bugs.launchpad.net/fuel/+bug/1559136
https://bugs.launchpad.net/mos/+bug/1561894
|
| | |\ \
| | |/ |
|
| | | |\ |
|
| | | | | |
|
| | | | |\
| | | | |
| | | | |
| | | | | |
https://github.com/Ayanda-D/rabbitmq-server into Ayanda-D-rabbitmq-server-914
|
| | | | | |
| | | | |
| | | | |
| | | | | |
dead pids from queue
|
| | | | | | |
|
| | | | | |
| | | | |
| | | | |
| | | | | |
from the gm process' neighbours
|
| | | |\ \ \
| | | |/ /
| | |/| | |
[OCF HA] Add ocf_get_private_attr function to RabbitMQ OCF script
|
| | | |/ /
| | | |
| | | |
| | | |
| | | |
| | | | |
The function is extracted from check_timeouts to be re-used later
in other parts of the script. Also, swtich check_timeouts to use
existing ocf_update_private_attr function.
|
| | |\ \ \
| | |/ / |
|
| | | |\ \
| | | | |
| | | | | |
Fix bashisms in rabbitmq OCF RA
|
| | | |/ /
| | | |
| | | |
| | | |
| | | |
| | | | |
Change "printf %b" to be passing the checkbashisms.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
|
| |\ \ \ \
| |/ / /
| | | |
| | | |
| | | | |
Conflicts:
src/rabbit_control_main.erl
|
| | |\ \ \
| | |/ / |
|
| | | |\ \
| | | | |
| | | | | |
Discard any unexpected messages, such as late replies from gen_server
|
| | | | | | |
|
| | | |/ / |
|
| | |\ \ \
| | |/ / |
|
| | | |\ \ |
|
| | | | |\ \
| | |/ / /
| | | | |
| | | | | |
https://github.com/binarin/rabbitmq-server into binarin-rabbitmq-server-health-check-node-monitor
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Tests + comment outlining the problem. The check itself is in separate
commit to `rabbitmq-common`.
|
| | |\ \ \ \
| | |/ / / |
|
| | | |\ \ \
| | | | | |
| | | | | | |
Update iptables calls with --wait
|
| | | |/ / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If iptables is currently being called outside of the ocf script, the
iptables call will fail because it cannot get a lock. This change
updates the iptables call to include the -w flag which will wait until
the lock can be established and not just exit with an error.
|
| | |\ \ \ \
| | | | | |
| | | | | | |
Update iptables calls with --wait
|
| | |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If iptables is currently being called outside of the ocf script, the
iptables call will fail because it cannot get a lock. This change
updates the iptables call to include the -w flag which will wait until
the lock can be established and not just exit with an error.
|
| | |\ \ \ \
| | |/ / / |
|
| | | | |/
| | |/| |
|
| | |\ \ \
| | |/ / |
|
| | | |\ \
| | | | |
| | | | | |
Add support for listing only local queues
|