delta/rabbitmq-server-git.git - github.com: rabbitmq/rabbitmq-server.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Nobody is using mochinum in this repo. Move it to common.	Peter Lemenkov	2016-08-26	1	-358/+0
\| \| \| \|	Signed-off-by: Peter Lemenkov <lemenkov@gmail.com>
*	Merge branch 'rabbitmq-server-928' into stable	Michael Klishin	2016-08-24	3	-51/+138
\|\
\| *	Naming, wording	Michael Klishin	2016-08-24	1	-7/+7
\| \|
\| *	Handle late autoheal_finished message	Diana Corbacho	2016-08-24	2	-0/+17
\| \|
\| *	Merge branch 'stable' into rabbitmq-server-928	Michael Klishin	2016-08-23	1	-231/+209
\| \|\ \| \|/ \|/\|
* \|	Merge pull request #916 from binarin/rabbitmq-server-new-shiny-ocf-health-check	Michael Klishin	2016-08-23	1	-22/+110
\|\ \ \| \| \| \| \| \|	Use new rabbitmqctl features for monitoring
\| * \|	Monitor rabbitmq from OCF with less overhead	Alexey Lebedeff	2016-08-23	1	-22/+110
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This will stop wasting network bandwidth for monitoring. E.g. a 200-node OpenStack installation produces aronud 10k queues and 10k channels. Doing single list_queues/list_channels in cluster in this environment results in 27k TCP packets and around 12 megabytes of network traffic. Given that this calls happen ~10 times a minute with 3 controllers, it results in pretty significant overhead. To enable those features you shoud have rabbitmq containing following patches: - https://github.com/rabbitmq/rabbitmq-server/pull/883 - https://github.com/rabbitmq/rabbitmq-server/pull/911 - https://github.com/rabbitmq/rabbitmq-server/pull/915
* \|	Merge pull request #929 from dmitrymex/start-sequence	Michael Klishin	2016-08-22	1	-222/+112
\|\ \ \| \| \| \| \| \|	[OCF HA] Change master score computation & split-brain detection logic
\| * \|	[OCF HA] Enhance split-brain detection logic	Dmitry Mescheryakov	2016-08-22	1	-56/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous split brain logic worked as follows: each slave checked that it is connected to master. If check fails, slave restarts. The ultimate flaw in that logic is that there is little guarantee that master is alive at the moment. Moreover, if master dies, it is very probable that during the next monitor check slaves will detect its death and restart, causing complete RabbitMQ cluster downtime. With the new approach master node checks that slaves are connected to it and orders them to restart if they are not. The check is performed after master node health check, meaning that at least that node survives. Also, orders expire in one minute and freshly started node ignores orders to restart for three minutes to give cluster time to stabilize. Also corrected the problem, when node starts and is already clustered. In that case OCF script forgot to start the RabbitMQ app, causing subsequent restart. Now we ensure that RabbitMQ app is running. The two introduced attributes rabbit-start-phase-1-time and rabbit-ordered-to-restart are made private. In order to allow master to set node's order to restart, both ocf_update_private_attr and ocf_get_private_attr signatures are expanded to allow passing node name. Finally, a bug is fixed in ocf_get_private_attr. Unlike crm_attribute, attrd_updater returns empty string instead of "(null)", when an attribute is not defined on needed node, but is defined on some other node. Correspondingly changed code to expect empty string, not a "(null)". This fix is a fix for Fuel bugs https://bugs.launchpad.net/fuel/+bug/1559136 https://bugs.launchpad.net/mos/+bug/1561894
\| * \|	[OCF HA] Rank master score based on start time	Dmitry Mescheryakov	2016-08-22	1	-166/+48
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now we assign 1000 to the oldest nodes and 1 to others. That creates a problem when Master restarts and no node is promoted until that node starts back. In that case the returned node will have score of 1, like all other slaves and Pacemaker will select to promote it again. The node is clean empty and afterwards other slaves join to it, wiping their data as well. As a result, we loose all the messages. The new algorithm actually ranks nodes, not just selects the oldest one. It also maintains the invariant that if node A started later than node B, then node A score must be smaller than that of node B. As a result, freshly started node has no chance of being selected in preference to older node. If several nodes start simultaneously, among them an older node might temporarily receive lower score than a younger one, but that is neglectable. Also remove any action on demote or demote notification - all of these duplicate actions done in stop or stop notification. With these removed, changing master on a running cluster does not affect RabbitMQ cluster in any way - we just declare another node master and that is it. It is important for the current change because master score might change after initial cluster start up causing master migration from one node to another. This fix is a prerequsite for fix to Fuel bugs https://bugs.launchpad.net/fuel/+bug/1559136 https://bugs.launchpad.net/mos/+bug/1561894
\| *	Improve tolerance to partial partitions in autoheal	Diana Corbacho	2016-08-23	3	-51/+121
\|/ \| \| \|	* Also solves deadlocks when leader aborts autoheal in node down
*	Merge branch 'Ayanda-D-rabbitmq-server-914' into stablerabbitmq_v3_6_6_milestone1	Diana Corbacho	2016-08-19	5	-8/+74
\|\
\| *	Test GM crash when group is deleted while processing a DOWN message	Diana Corbacho	2016-08-19	1	-1/+37
\| \|
\| *	Merge branch 'rabbitmq-server-914' of ↵	Diana Corbacho	2016-08-19	4	-7/+37
\| \|\ \| \| \| \| \| \| \| \| \|	https://github.com/Ayanda-D/rabbitmq-server into Ayanda-D-rabbitmq-server-914
\| \| *	Handle unexpected gm group alterations prior to removal of	Ayanda Dube	2016-08-15	3	-6/+31
\| \| \| \| \| \| \| \| \| \| \| \|	dead pids from queue
\| \| *	Adds check_membership/2 clause for handling non-existant gm group	Ayanda Dube	2016-08-15	1	-1/+3
\| \| \|
\| \| *	Safely handle (and log) anonymous info messages, most likely	Ayanda Dube	2016-08-15	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \|	from the gm process' neighbours
* \| \|	Merge pull request #926 from dmitrymex/get-private-attr	Michael Klishin	2016-08-18	1	-10/+15
\|\ \ \ \| \|/ / \|/\| \|	[OCF HA] Add ocf_get_private_attr function to RabbitMQ OCF script
\| * \|	[OCF HA] Add ocf_get_private_attr function to RabbitMQ OCF script	Dmitry Mescheryakov	2016-08-18	1	-10/+15
\|/ / \| \| \| \| \| \| \| \| \| \|	The function is extracted from check_timeouts to be re-used later in other parts of the script. Also, swtich check_timeouts to use existing ocf_update_private_attr function.
* \|	Merge pull request #925 from bogdando/fix_bashisms	Michael Klishin	2016-08-18	1	-4/+4
\|\ \ \| \| \| \| \| \|	Fix bashisms in rabbitmq OCF RA
\| * \|	Fix bashisms in rabbitmq OCF RA	Bogdan Dobrelya	2016-08-18	1	-4/+4
\|/ / \| \| \| \| \| \| \| \| \| \|	Change "printf %b" to be passing the checkbashisms. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* \|	Merge pull request #923 from rabbitmq/rabbitmq-server-922	Michael Klishin	2016-08-16	2	-2/+14
\|\ \ \| \| \| \| \| \|	Discard any unexpected messages, such as late replies from gen_server
\| * \|	Typo	Michael Klishin	2016-08-16	1	-1/+1
\| \| \|
\| * \|	Discard any unexpected messages, such as late replies from gen_server	Diana Corbacho	2016-08-16	2	-2/+14
\|/ /
* \|	Merge branch 'binarin-rabbitmq-server-health-check-node-monitor' into stable	Michael Klishin	2016-08-16	2	-0/+30
\|\ \
\| * \	Merge branch 'rabbitmq-server-health-check-node-monitor' of ↵	Michael Klishin	2016-08-16	2	-0/+30
\| \|\ \ \|/ / / \| \| \| \| \| \|	https://github.com/binarin/rabbitmq-server into binarin-rabbitmq-server-health-check-node-monitor
\| * \|	Check rabbit_node_monitor during health-check	Alexey Lebedeff	2016-08-10	2	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tests + comment outlining the problem. The check itself is in separate commit to `rabbitmq-common`.
* \| \|	Merge pull request #920 from mwhahaha/iptables-stable	Michael Klishin	2016-08-16	1	-4/+4
\|\ \ \ \| \| \| \| \| \| \| \|	Update iptables calls with --wait
\| * \| \|	Update iptables calls with --wait	Alex Schultz	2016-08-15	1	-4/+4
\|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If iptables is currently being called outside of the ocf script, the iptables call will fail because it cannot get a lock. This change updates the iptables call to include the -w flag which will wait until the lock can be established and not just exit with an error.
* \| \|	Docs wording	Michael Klishin	2016-08-15	1	-3/+2
\| \|/ \|/\|
* \|	Merge pull request #911 from binarin/rabbitmq-server-851	D Corbacho	2016-08-15	7	-44/+257
\|\ \ \| \| \| \| \| \|	Add support for listing only local queues
\| * \|	Add support for listing only local queues	Alexey Lebedeff	2016-08-12	7	-44/+257
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Partially implements https://github.com/rabbitmq/rabbitmq-server/issues/851 - Made old `--online`/`--offline` options mutually exclusive between themselves and the new `--local` option - Added documentation both for the old and the new option - Fixed some ugly indentation in generated usage (only `set_policy` wrapped line remains unfixed) - Added integration test suite for `rabbitmqctl list_queues`
* \| \|	Fix trivial typo noticed in error message.	jerryk	2016-08-12	1	-1/+1
\| \| \|
* \| \|	Merge pull request #892 from binarin/rabbitmq-server-890	Jean-Sébastien Pédron	2016-08-10	2	-3/+30
\|\ \ \ \| \|_\|/ \|/\| \|	Fix longname-mode on hosts without detectable FQDN
\| * \|	Fix longname-mode on hosts without detectable FQDN	Alexey Lebedeff	2016-07-22	2	-3/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Server startup and CLI tools fail in longnames-mode if erlang is not able to determine host FQDN (with at least one dot in it). E.g. this can happen when you want to assemble a cluster using only IP-addresses, and you completely don't care about FQDNs. And it was not possible to alleviate this situation using any options from http://erlang.org/doc/apps/erts/inet_cfg.html Fixes #890
* \| \|	Merge pull request #910 from rabbitmq/rabbitmq-server-850	Michael Klishin	2016-08-10	1	-1/+1
\|\ \ \ \| \|_\|/ \|/\| \|	Added resume after flow
\| * \|	add resume after flow	Gabriele Santomaggio	2016-08-08	1	-1/+1
\| \| \|
\| * \|	fix loop in flow control state	Gabriele Santomaggio	2016-08-05	1	-1/+1
\| \| \|
* \| \|	Commit .deb and .rpm change logs	Michael Klishin	2016-08-05	2	-0/+9
\|/ /
* \|	Merge branch 'rabbitmq-server-904' into stablerabbitmq_v3_6_5_milestone2 rabbitmq_v3_6_5_milestone1 rabbitmq_v3_6_5	Michael Klishin	2016-08-02	1	-1/+3
\|\ \
\| * \|	added rabbit_registry require	Gabriele Santomaggio	2016-08-02	1	-1/+3
\|/ /
* \|	Merge pull request #898 from binarin/rabbitmq-server-868-secs	Michael Klishin	2016-07-30	1	-1/+1
\|\ \ \| \| \| \| \| \|	Fix some type specs
\| * \|	Fix some type specs	Alexey Lebedeff	2016-07-29	1	-1/+1
\|/ / \| \| \| \| \| \|	Forgot to update specs in #868
* \|	Commit .deb and .rpm change logs	Michael Klishin	2016-07-29	2	-0/+9
\| \|
* \|	Merge pull request #896 from rabbitmq/rabbitmq-server-895rabbitmq_v3_6_4	D Corbacho	2016-07-29	2	-2/+2
\|\ \ \| \| \| \| \| \|	Bump default VM atom table limit to 5M
\| * \|	Bump default VM atom table size to 5M	Michael Klishin	2016-07-28	2	-2/+2
\| \|/ \| \| \| \| \| \| \| \| \| \|	See #895 for background and reasoning. Fixes #895.
* \|	Merge pull request #894 from lemenkov/toctou_in_cluster_status	Michael Klishin	2016-07-28	1	-3/+6
\|\ \ \| \|/ \|/\|	Don't die in case of faulty node
\| *	Don't die in case of faulty node	Peter Lemenkov	2016-07-28	1	-3/+6
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes TOCTOU issue introduced in the following commit: * rabbitmq/rabbitmq-server@93b9e37c3ea0cade4e30da0aa1f14fa97c82e669 If the node was just removed from the cluster, then there is a small window when it is still listed as a member of a Mnesia cluster locally. We retrieve list of nodes by calling locally ```erlang unsafe_rpc(Node, rabbit_mnesia, cluster_nodes, [running]). ``` However retrieving status from that particular failed node is no longer available and throws an exception. See `alarms_by_node(Name)` function, which is simply calls `unsafe_rpc(Name, rabbit, status, [])` for this node. This `unsafe_rpc/4` function is basically a wrapper over `rabbit_misc:rpc_call/4` which translates `{badrpc, nodedown}` into exception. This exception generated by `alarms_by_node(Name)` function call emerges on a very high level, so rabbitmqct thinks that the entire cluster is down, while generating a very bizarre message: Cluster status of node 'rabbit@overcloud-controller-0' ... Error: unable to connect to node 'rabbit@overcloud-controller-0': nodedown DIAGNOSTICS =========== attempted to contact: ['rabbit@overcloud-controller-0'] rabbit@overcloud-controller-0: * connected to epmd (port 4369) on overcloud-controller-0 * node rabbit@overcloud-controller-0 up, 'rabbit' application running current node details: - node name: 'rabbitmq-cli-31@overcloud-controller-0' - home dir: /var/lib/rabbitmq - cookie hash: PB31uPq3vzeQeZ+MHv+wgg== See - it reports that it failed to connect to node 'rabbit@overcloud-controller-0' (because it catches an exception from `alarms_by_node(Name)`), but attempt to connect to this node was successful ('rabbit' application running). In order to fix that we should not throw exception during consequent calls (`[alarms_by_node(Name) \|\| Name <- nodes_in_cluster(Node)]`), only during the first one (`unsafe_rpc(Node, rabbit_mnesia, status, [])`). Even more - we don't need to change `nodes_in_cluster(Node)`, because it is called locally. The only function which must use `rabbit_misc:rpc_call/4` is `alarms_by_node(Name)` because it is executed remotely. See this issue for further details and real world example: * https://bugzilla.redhat.com/1356169 Signed-off-by: Peter Lemenkov <lemenkov@gmail.com>
*	Update rabbitmq-components.mkrabbitmq_v3_6_4_rc1 rabbitmq_v3_6_4_milestone2 rabbitmq_v3_6_4_milestone1	Michael Klishin	2016-07-14	1	-0/+1
\|
*	Merge pull request #873 from rabbitmq/rabbitmq-server-612	Michael Klishin	2016-07-14	4	-12/+18
\|\ \| \| \| \|	Tune scheduling bind flags for Erlang VM