summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Remove dead code left from a previous incarnation of the patchJean-Sébastien Pédron2016-08-263-40/+5
| | | | | References #500. [#116521809]
* cluster_rename_SUITE: Lower the testsuite timeoutJean-Sébastien Pédron2016-08-251-3/+9
| | | | While here, be more informative when a node rename fails.
* per_vhost_connection_limit_SUITE: Sleep 500ms after connection open/closeJean-Sébastien Pédron2016-08-251-176/+139
| | | | | | | | This helps in CI where the connection tracking tables may not yet be up-to-date at the time the testcase verifies them. References #500. [#116521809]
* rabbit_mnesia_rename: Backup local tables onlyJean-Sébastien Pédron2016-08-251-1/+7
| | | | | | | | | | | | | | The connection tracking tables are not replicated because the table tracking connections on node A logically exists only the node A. The backup made during the rename of a node failed because it wanted to access a remote offline node to backup its connection tracking tables. Obviously it didn't work. The solution is to not backup those tables. This is correct because they are only relevant while the node is running. References #500. [#116521809]
* rabbit_mnesia:forget_cluster_node/2: Skip event if node if offlineJean-Sébastien Pédron2016-08-251-4/+11
| | | | | | | | | | | When the node is offline (`--offline` is set on the command line), we must skip the emission of the `node_deleted` event because the rabbit_event event handler is not setup (RabbitMQ is stopped). Otherwise, the command fails with `badarg` in gen_event. References #500. [#116521809]
* Merge branch 'master' into rabbitmq-server-500-squashedMichael Klishin2016-08-244-73/+248
|\
| * Merge branch 'stable'Michael Klishin2016-08-243-51/+138
| |\
| | * Merge branch 'rabbitmq-server-928' into stableMichael Klishin2016-08-243-51/+138
| | |\
| | | * Naming, wordingMichael Klishin2016-08-241-7/+7
| | | |
| | | * Handle late autoheal_finished messageDiana Corbacho2016-08-242-0/+17
| | | |
| | | * Merge branch 'stable' into rabbitmq-server-928Michael Klishin2016-08-231-231/+209
| | | |\ | | | |/ | | |/|
| | | * Improve tolerance to partial partitions in autohealDiana Corbacho2016-08-233-51/+121
| | | | | | | | | | | | | | | | * Also solves deadlocks when leader aborts autoheal in node down
| * | | Merge branch 'stable'Michael Klishin2016-08-231-22/+110
| |\ \ \ | | |/ /
| | * | Merge pull request #916 from binarin/rabbitmq-server-new-shiny-ocf-health-checkMichael Klishin2016-08-231-22/+110
| | |\ \ | | | | | | | | | | Use new rabbitmqctl features for monitoring
| | | * | Monitor rabbitmq from OCF with less overheadAlexey Lebedeff2016-08-231-22/+110
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will stop wasting network bandwidth for monitoring. E.g. a 200-node OpenStack installation produces aronud 10k queues and 10k channels. Doing single list_queues/list_channels in cluster in this environment results in 27k TCP packets and around 12 megabytes of network traffic. Given that this calls happen ~10 times a minute with 3 controllers, it results in pretty significant overhead. To enable those features you shoud have rabbitmq containing following patches: - https://github.com/rabbitmq/rabbitmq-server/pull/883 - https://github.com/rabbitmq/rabbitmq-server/pull/911 - https://github.com/rabbitmq/rabbitmq-server/pull/915
* | | | Merge branch 'master' into rabbitmq-server-500-squashedMichael Klishin2016-08-236-242/+203
|\ \ \ \ | |/ / /
| * | | Merge branch 'stable'Michael Klishin2016-08-231-222/+112
| |\ \ \ | | |/ /
| | * | Merge pull request #929 from dmitrymex/start-sequenceMichael Klishin2016-08-221-222/+112
| | |\ \ | | | |/ | | |/| [OCF HA] Change master score computation & split-brain detection logic
| | | * [OCF HA] Enhance split-brain detection logicDmitry Mescheryakov2016-08-221-56/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous split brain logic worked as follows: each slave checked that it is connected to master. If check fails, slave restarts. The ultimate flaw in that logic is that there is little guarantee that master is alive at the moment. Moreover, if master dies, it is very probable that during the next monitor check slaves will detect its death and restart, causing complete RabbitMQ cluster downtime. With the new approach master node checks that slaves are connected to it and orders them to restart if they are not. The check is performed after master node health check, meaning that at least that node survives. Also, orders expire in one minute and freshly started node ignores orders to restart for three minutes to give cluster time to stabilize. Also corrected the problem, when node starts and is already clustered. In that case OCF script forgot to start the RabbitMQ app, causing subsequent restart. Now we ensure that RabbitMQ app is running. The two introduced attributes rabbit-start-phase-1-time and rabbit-ordered-to-restart are made private. In order to allow master to set node's order to restart, both ocf_update_private_attr and ocf_get_private_attr signatures are expanded to allow passing node name. Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute, attrd_updater returns empty string instead of "(null)", when an attribute is not defined on needed node, but is defined on some other node. Correspondingly changed code to expect empty string, not a "(null)". This fix is a fix for Fuel bugs https://bugs.launchpad.net/fuel/+bug/1559136 https://bugs.launchpad.net/mos/+bug/1561894
| | | * [OCF HA] Rank master score based on start timeDmitry Mescheryakov2016-08-221-166/+48
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we assign 1000 to the oldest nodes and 1 to others. That creates a problem when Master restarts and no node is promoted until that node starts back. In that case the returned node will have score of 1, like all other slaves and Pacemaker will select to promote it again. The node is clean empty and afterwards other slaves join to it, wiping their data as well. As a result, we loose all the messages. The new algorithm actually ranks nodes, not just selects the oldest one. It also maintains the invariant that if node A started later than node B, then node A score must be smaller than that of node B. As a result, freshly started node has no chance of being selected in preference to older node. If several nodes start simultaneously, among them an older node might temporarily receive lower score than a younger one, but that is neglectable. Also remove any action on demote or demote notification - all of these duplicate actions done in stop or stop notification. With these removed, changing master on a running cluster does not affect RabbitMQ cluster in any way - we just declare another node master and that is it. It is important for the current change because master score might change after initial cluster start up causing master migration from one node to another. This fix is a prerequsite for fix to Fuel bugs https://bugs.launchpad.net/fuel/+bug/1559136 https://bugs.launchpad.net/mos/+bug/1561894
| * | Merge branch 'stable'Diana Corbacho2016-08-196-18/+89
| |\ \ | | |/
| | * Merge branch 'Ayanda-D-rabbitmq-server-914' into stablerabbitmq_v3_6_6_milestone1Diana Corbacho2016-08-195-8/+74
| | |\
| | | * Test GM crash when group is deleted while processing a DOWN messageDiana Corbacho2016-08-191-1/+37
| | | |
| | | * Merge branch 'rabbitmq-server-914' of ↵Diana Corbacho2016-08-194-7/+37
| | | |\ | | | | | | | | | | | | | | | https://github.com/Ayanda-D/rabbitmq-server into Ayanda-D-rabbitmq-server-914
| | | | * Handle unexpected gm group alterations prior to removal ofAyanda Dube2016-08-153-6/+31
| | | | | | | | | | | | | | | | | | | | dead pids from queue
| | | | * Adds check_membership/2 clause for handling non-existant gm groupAyanda Dube2016-08-151-1/+3
| | | | |
| | | | * Safely handle (and log) anonymous info messages, most likelyAyanda Dube2016-08-151-1/+8
| | | | | | | | | | | | | | | | | | | | from the gm process' neighbours
| | * | | Merge pull request #926 from dmitrymex/get-private-attrMichael Klishin2016-08-181-10/+15
| | |\ \ \ | | | |/ / | | |/| | [OCF HA] Add ocf_get_private_attr function to RabbitMQ OCF script
| | | * | [OCF HA] Add ocf_get_private_attr function to RabbitMQ OCF scriptDmitry Mescheryakov2016-08-181-10/+15
| | |/ / | | | | | | | | | | | | | | | | | | | | The function is extracted from check_timeouts to be re-used later in other parts of the script. Also, swtich check_timeouts to use existing ocf_update_private_attr function.
| * | | Merge branch 'stable'Michael Klishin2016-08-181-4/+4
| |\ \ \ | | |/ /
| | * | Merge pull request #925 from bogdando/fix_bashismsMichael Klishin2016-08-181-4/+4
| | |\ \ | | | | | | | | | | Fix bashisms in rabbitmq OCF RA
| | | * | Fix bashisms in rabbitmq OCF RABogdan Dobrelya2016-08-181-4/+4
| | |/ / | | | | | | | | | | | | | | | | | | | | Change "printf %b" to be passing the checkbashisms. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* | | | Merge branch 'master' into rabbitmq-server-500-squashedMichael Klishin2016-08-1813-66/+342
|\ \ \ \ | |/ / / | | | | | | | | | | | | Conflicts: src/rabbit_control_main.erl
| * | | Merge branch 'stable'Michael Klishin2016-08-162-2/+14
| |\ \ \ | | |/ /
| | * | Merge pull request #923 from rabbitmq/rabbitmq-server-922Michael Klishin2016-08-162-2/+14
| | |\ \ | | | | | | | | | | Discard any unexpected messages, such as late replies from gen_server
| | | * | TypoMichael Klishin2016-08-161-1/+1
| | | | |
| | | * | Discard any unexpected messages, such as late replies from gen_serverDiana Corbacho2016-08-162-2/+14
| | |/ /
| * | | Merge branch 'stable'Michael Klishin2016-08-162-0/+30
| |\ \ \ | | |/ /
| | * | Merge branch 'binarin-rabbitmq-server-health-check-node-monitor' into stableMichael Klishin2016-08-162-0/+30
| | |\ \
| | | * \ Merge branch 'rabbitmq-server-health-check-node-monitor' of ↵Michael Klishin2016-08-162-0/+30
| | | |\ \ | | |/ / / | | | | | | | | | | https://github.com/binarin/rabbitmq-server into binarin-rabbitmq-server-health-check-node-monitor
| | | * | Check rabbit_node_monitor during health-checkAlexey Lebedeff2016-08-102-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | Tests + comment outlining the problem. The check itself is in separate commit to `rabbitmq-common`.
| * | | | Merge branch 'stable'Michael Klishin2016-08-160-0/+0
| |\ \ \ \ | | |/ / /
| | * | | Merge pull request #920 from mwhahaha/iptables-stableMichael Klishin2016-08-161-4/+4
| | |\ \ \ | | | | | | | | | | | | Update iptables calls with --wait
| | | * | | Update iptables calls with --waitAlex Schultz2016-08-151-4/+4
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If iptables is currently being called outside of the ocf script, the iptables call will fail because it cannot get a lock. This change updates the iptables call to include the -w flag which will wait until the lock can be established and not just exit with an error.
| * | | | Merge pull request #919 from mwhahaha/iptablesMichael Klishin2016-08-161-4/+4
| |\ \ \ \ | | | | | | | | | | | | Update iptables calls with --wait
| | * | | | Update iptables calls with --waitAlex Schultz2016-08-151-4/+4
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If iptables is currently being called outside of the ocf script, the iptables call will fail because it cannot get a lock. This change updates the iptables call to include the -w flag which will wait until the lock can be established and not just exit with an error.
| * | | | Merge branch 'stable'Diana Corbacho2016-08-151-3/+2
| |\ \ \ \ | | |/ / /
| | * | | Docs wordingMichael Klishin2016-08-151-3/+2
| | | |/ | | |/|
| * | | Merge branch 'stable'Diana Corbacho2016-08-158-59/+295
| |\ \ \ | | |/ /
| | * | Merge pull request #911 from binarin/rabbitmq-server-851D Corbacho2016-08-157-44/+257
| | |\ \ | | | | | | | | | | Add support for listing only local queues