summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Make it possible for peer discovery backends to provide their own RSD rangeMichael Klishin2018-04-111-2/+7
| | | | References rabbitmq/rabbitmq-peer-discovery-k8s#23.
* Update erlang.mkJean-Sébastien Pédron2018-04-111-2/+35
|
* Update rabbitmq-components.mkJean-Sébastien Pédron2018-04-111-5/+5
|
* Update rabbitmq-components.mkJean-Sébastien Pédron2018-04-111-2/+2
|
* Merge pull request #1575 from rabbitmq/rabbitmq-federation-73Jean-Sébastien Pédron2018-04-091-0/+4
|\ | | | | rabbit_parameter_validation: support maps in proplist validator
| * rabbit_parameter_validation: support maps in proplist validatorMichael Klishin2018-04-091-0/+4
|/ | | | | Part of rabbitmq/rabbitmq-federation#73, references rabbitmq/rabbitmq-federation#70, rabbitmq/rabbitmq-federation#67.
* Travis CI: Update config from rabbitmq-commonJean-Sébastien Pédron2018-04-091-1/+1
|
* Travis CI: Update config from rabbitmq-commonJean-Sébastien Pédron2018-04-091-3/+6
|
* Merge branch 'reduce-mnesia-contention-when-nodes-restart-master'Michael Klishin2018-04-064-45/+65
|\
| * Re-apply f2ab0b40f034cda6bca4294735b493f20550b93cMichael Klishin2018-04-051-25/+1
| |
| * Merge branch 'reduce-mnesia-contention-when-nodes-restart-v3.7.x' into ↵Michael Klishin2018-04-055-46/+90
| |\ |/ / | | | | reduce-mnesia-contention-when-nodes-restart-master
| * Run binding deletions in a Mnesia transactionGerhard Lazu2018-03-271-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, if there is more than 1 node that runs rabbit_node_monitor:on_node_down/1, there will be `{aborted,no_transaction` errors. {{aborted,no_transaction}, [{mnesia,abort,1,[{file,"mnesia.erl"},{line,351}]}, {rabbit_exchange_type_topic,'-remove_bindings/3-lc$^0/1-0-',1, [{file,"src/rabbit_exchange_type_topic.erl"}, {line,78}]}, {rabbit_exchange_type_topic,remove_bindings,3, [{file,"src/rabbit_exchange_type_topic.erl"}, {line,78}]}, {rabbit_binding,x_callback,4,[{file,"src/rabbit_binding.erl"},{line,570}]}, {rabbit_binding,'-process_deletions/2-fun-0-',2, [{file,"src/rabbit_binding.erl"},{line,547}]}, {dict,map_bucket,2,[{file,"dict.erl"},{line,481}]}, {dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]}, {dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]}]} Partner-in-crime: @essen
| * Delete metrics for all deleted queues in a single operationGerhard Lazu2018-03-271-47/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than calling `rabbit_core_metrics:delete_queue/1` for every queue, collect all deleted queues and delete all their metrics in a single operation. We don't use a single Mnesia transaction to delete all objects related to a queue and this may be a problem, but we haven't run this version of the code long enough to know for sure. What should we be looking out for @michaelklishin? For initial context, see #1513 Partner-in-crime: @essen
| * Group queue deletions on_node_down into 10 operations per transactionGerhard Lazu2018-03-271-18/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When many queues are being deleted, we believe that it's faster to have fewer Mnesia transactions and therefore group 10 queue deletions into a single Mnesia transaction. This number (10) is arbitrary, we didn't try with a different number. Creating 1 Mnesia transaction for every queue deletion feels too many transaction, and having a single Mnesia transaction for all queue deletions is too few transactions. This felt like a sensible option. We cannot determine if this is a good change because rabbit_core_metrics:queue_deleted/1 takes the most time and obscures all observations. According to qcachegrind, rabbit_misc:execute_mnesia_transaction/1 takes 1.8s while rabbit_core_metrics:queue_deleted/1 takes 132s out of which ets:select/2 takes 131s. How can we optimise rabbit_core_metrics:queue_deleted/1 ? We are thinking that rather than calling ets:select/2 twice for every queue, we should call it twice for all queues that need to be deleted. We don't know whether this is possible. Alternatively, we might look into ets:first/1 & ets:next/2 to iterate over the entire table ONCE with all the queues that have been deleted. Thoughts @dcorbacho @michaelklishin ? For initial context, see #1513 Partner-in-crime: @essen
| * Add back INTERNAL_USER info to on_node_down functionGerhard Lazu2018-03-271-2/+4
| |
| * Split single Mnesia transaction that runs on_node_downGerhard Lazu2018-03-271-25/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than using a single Mnesia transaction to clean queues when a node goes down, split it into many transactions so that other Mnesia operations can make progress. This is especially important when nodes join the cluster, since Mnesia on the newly started node will not be able to synchronise if there is a long-running Mnesia transaction. This commit is not complete, we need feedback on the comments left in the code before we can settle on a final version that can be merged. For more context, see #1513 Partner-in-crime: @essen
| * Wait at most 5 secs for a node to reply to rabbit_node_monitor:partitions/0Gerhard Lazu2018-03-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When rabbit_node_monitor that runs on an alive node is busy cleaning after another node goes down, the process might not respond for a long time. Since the Management API calls rabbit_node_monitor:partitions/0 and blocks until this function returns, the Management UI will fail to load for a really long time - clients will timeout before this function returns. This change will make rabbit_node_monitor:partitions/0 timeout after 5 seconds so that the Management API returns in a timely manner and the Management UI blocks for an extra 5 seconds at most. The implications of this will result in node metrics being outdated, which is not ideal, but at least the users will have some feedback and will be able to perform action via the Management UI / API. For more context, see #1513 Partner-in-crime: @essen
| * Do not read exchange & queue before deleting, simply deleteGerhard Lazu2018-03-272-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Doing a wread operation requires a table write lock. Since a delete requires a table write lock as well, we have 2 write locks instead of 1. When Mnesia is under load, this can result in many lock collisions and slow down the entire set of operations. We think that it's better to leverage fast SSDs / NVMEs and defer the Mnesia disk log write optimisation to the filesystem. For more context, see #1513 Partner-in-crime @essen
| * Delete bindings from mnesia without full table scanGerhard Lazu2018-03-271-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mnesia:match_object/3 scans the entire table and can take many seconds on a loaded node. This is especially bad when there are many bindings which need to be deleted. If the object is in the table then delete it, otherwise carry on. Prior to this change, it was observed that there is a high % of lock collisions in rabbit_topic_trie_binding table: ``` lock id #tries #collisions collisions [%] time [us] duration [%] histogram [log2(us)] ----- --- ------- ------------ --------------- ---------- ------------- --------------------- db_tab rabbit_topic_trie_binding 258465 13627 5.2723 1389904 0.0370 | ...XXxxXx.......... | ``` mnesia:match_object/3 uses a table index if it exists, but in the case of rabbit_topic_trie_binding, there is no table index, so a full table scan used to be performed. For more context, see #1513 Partner-in-crime: @essen
| * Improve background_gc docs in example config filesMichael Klishin2018-03-212-4/+16
| | | | | | | | | | | | | | To make it clearer that memory breakdown analysis must be done formed. "Don't guess, collect data". (cherry picked from commit bb36e18a0ab25117ca8ccf3aecd58bdd3985006b)
| * Make is_booted function compatible with pre-3.7.4 remote nodesDaniil Fedotov2018-03-161-18/+9
| | | | | | | | (cherry picked from commit fc8abae47427eb9ae4347177cc23dd9d4e10ba35)
| * Check that default vhost is started on all nodes after restartDaniil Fedotov2018-03-161-1/+2
| | | | | | | | (cherry picked from commit 92df428dd6f46aaf6ed69ffa9f79da37ed8aefe5)
| * Compile from scratchMichael Klishin2018-03-161-5/+5
| | | | | | | | (cherry picked from commit 117ad73be2557e93bb9cc9bbe123337aaa98c192)
| * Include current node to the to-start listMichael Klishin2018-03-161-4/+4
| | | | | | | | (cherry picked from commit 1ef32474b0bd414e157411df17744f5fc7e3e5a6)
| * WordingMichael Klishin2018-03-161-1/+1
| | | | | | | | (cherry picked from commit a4cb92a27dd93903417e2de59abb9bb9dff77571)
| * ClarifyMichael Klishin2018-03-161-1/+5
| | | | | | | | (cherry picked from commit f14bf763ebdff5524d43e214ef9ced817939e088)
| * Test concurrent application start with no data.Daniil Fedotov2018-03-161-1/+20
| | | | | | | | (cherry picked from commit 8b6d0ef7450754d47335e6fd5267236b957a0870)
| * Do not try to start a vhost supervisors on not fully booted nodes.Daniil Fedotov2018-03-162-5/+24
| | | | | | | | | | | | | | | | | | | | | | Sometimes when several nodes are started at the same time, add_vhost can try to start a remote vhost supervisor on a node, which does not have a rabbit_vhost_sup_sup process yet, resulting in `{error,rabbit_vhost_sup_sup_not_running}` error. Filter only fully booted nodes to start remote vhost supervisors. (cherry picked from commit 52d0c1aed0d5b532765d6efd0ed92b393d821322)
| * Add special case in handle_other for normal TCP port exitLuke Bakken2018-03-162-24/+48
| | | | | | | | | | | | | | | | | | | | Handle noport at epmd monitor startup Handle EXIT from TCP port more gracefully Ensure that Parent pid is matched (cherry picked from commit e8d492b75e5c5f6c70f9ea8290a0c8a06362181a)
| * Wording, compile from scratchMichael Klishin2018-03-151-5/+7
| | | | | | | | (cherry picked from commit b30ae2f90c27c52c4e9db5f3845d37e441c6e371)
| * Make error message when refusing to delete non-empty message less radical.Daniil Fedotov2018-03-151-3/+3
| | | | | | | | (cherry picked from commit a82bdadc031bab0f607e5182a5b7325107386202)
| * Force-delete queues, which have no live master or slave processes.Daniil Fedotov2018-03-152-2/+87
| | | | | | | | | | | | | | | | | | | | | | | | Fixes #1501 [#155801556] If a queue is configured to not be promoted (via ha-promote-on-shutdown: when-synced) queue.delete can hang. Make it check for process existense first and force-delete if no master of slave processes are running. Do not force-delete if if_empty is set, since there is no way to check that the queue is empty. (cherry picked from commit 3e7bd564bda36c1bbb9e3b59b61509d0982a88ec)
| * Merge pull request #1539 from rabbitmq/rabbitmq-server-1538Arnaud Cogoluègnes2018-03-141-14/+24
| |\ | | | | | | Check process before getting amqp_params
* | \ Merge pull request #1567 from ↵Michael Klishin2018-03-284-65/+45
|\ \ \ | | | | | | | | | | | | | | | | rabbitmq/revert-1527-reduce-mnesia-contention-when-nodes-restart-master Revert "Reduce lock contention when nodes restart (master)"
| * | | Revert "Reduce lock contention when nodes restart (master)"Michael Klishin2018-03-284-65/+45
|/ / /
* | | Merge pull request #1527 from ↵Michael Klishin2018-03-284-45/+65
|\ \ \ | | | | | | | | | | | | | | | | rabbitmq/reduce-mnesia-contention-when-nodes-restart-master Reduce lock contention when nodes restart (master)
| * | | Run binding deletions in a Mnesia transactionGerhard Lazu2018-03-281-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise, if there is more than 1 node that runs rabbit_node_monitor:on_node_down/1, there will be `{aborted,no_transaction` errors. {{aborted,no_transaction}, [{mnesia,abort,1,[{file,"mnesia.erl"},{line,351}]}, {rabbit_exchange_type_topic,'-remove_bindings/3-lc$^0/1-0-',1, [{file,"src/rabbit_exchange_type_topic.erl"}, {line,78}]}, {rabbit_exchange_type_topic,remove_bindings,3, [{file,"src/rabbit_exchange_type_topic.erl"}, {line,78}]}, {rabbit_binding,x_callback,4,[{file,"src/rabbit_binding.erl"},{line,570}]}, {rabbit_binding,'-process_deletions/2-fun-0-',2, [{file,"src/rabbit_binding.erl"},{line,547}]}, {dict,map_bucket,2,[{file,"dict.erl"},{line,481}]}, {dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]}, {dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]}]} Partner-in-crime: @essen
| * | | Delete metrics for all deleted queues in a single operationGerhard Lazu2018-03-281-47/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than calling `rabbit_core_metrics:delete_queue/1` for every queue, collect all deleted queues and delete all their metrics in a single operation. We don't use a single Mnesia transaction to delete all objects related to a queue and this may be a problem, but we haven't run this version of the code long enough to know for sure. What should we be looking out for @michaelklishin? For initial context, see #1513 Partner-in-crime: @essen
| * | | Group queue deletions on_node_down into 10 operations per transactionGerhard Lazu2018-03-281-18/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When many queues are being deleted, we believe that it's faster to have fewer Mnesia transactions and therefore group 10 queue deletions into a single Mnesia transaction. This number (10) is arbitrary, we didn't try with a different number. Creating 1 Mnesia transaction for every queue deletion feels too many transaction, and having a single Mnesia transaction for all queue deletions is too few transactions. This felt like a sensible option. We cannot determine if this is a good change because rabbit_core_metrics:queue_deleted/1 takes the most time and obscures all observations. According to qcachegrind, rabbit_misc:execute_mnesia_transaction/1 takes 1.8s while rabbit_core_metrics:queue_deleted/1 takes 132s out of which ets:select/2 takes 131s. How can we optimise rabbit_core_metrics:queue_deleted/1 ? We are thinking that rather than calling ets:select/2 twice for every queue, we should call it twice for all queues that need to be deleted. We don't know whether this is possible. Alternatively, we might look into ets:first/1 & ets:next/2 to iterate over the entire table ONCE with all the queues that have been deleted. Thoughts @dcorbacho @michaelklishin ? For initial context, see #1513 Partner-in-crime: @essen
| * | | Add back INTERNAL_USER info to on_node_down functionGerhard Lazu2018-03-281-2/+4
| | | |
| * | | Split single Mnesia transaction that runs on_node_downGerhard Lazu2018-03-281-25/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than using a single Mnesia transaction to clean queues when a node goes down, split it into many transactions so that other Mnesia operations can make progress. This is especially important when nodes join the cluster, since Mnesia on the newly started node will not be able to synchronise if there is a long-running Mnesia transaction. This commit is not complete, we need feedback on the comments left in the code before we can settle on a final version that can be merged. For more context, see #1513 Partner-in-crime: @essen
| * | | Wait at most 5 secs for a node to reply to rabbit_node_monitor:partitions/0Gerhard Lazu2018-03-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When rabbit_node_monitor that runs on an alive node is busy cleaning after another node goes down, the process might not respond for a long time. Since the Management API calls rabbit_node_monitor:partitions/0 and blocks until this function returns, the Management UI will fail to load for a really long time - clients will timeout before this function returns. This change will make rabbit_node_monitor:partitions/0 timeout after 5 seconds so that the Management API returns in a timely manner and the Management UI blocks for an extra 5 seconds at most. The implications of this will result in node metrics being outdated, which is not ideal, but at least the users will have some feedback and will be able to perform action via the Management UI / API. For more context, see #1513 Partner-in-crime: @essen
| * | | Do not read exchange & queue before deleting, simply deleteGerhard Lazu2018-03-282-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Doing a wread operation requires a table write lock. Since a delete requires a table write lock as well, we have 2 write locks instead of 1. When Mnesia is under load, this can result in many lock collisions and slow down the entire set of operations. We think that it's better to leverage fast SSDs / NVMEs and defer the Mnesia disk log write optimisation to the filesystem. For more context, see #1513 Partner-in-crime @essen
| * | | Delete bindings from mnesia without full table scanGerhard Lazu2018-03-281-7/+2
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mnesia:match_object/3 scans the entire table and can take many seconds on a loaded node. This is especially bad when there are many bindings which need to be deleted. If the object is in the table then delete it, otherwise carry on. Prior to this change, it was observed that there is a high % of lock collisions in rabbit_topic_trie_binding table: ``` lock id #tries #collisions collisions [%] time [us] duration [%] histogram [log2(us)] ----- --- ------- ------------ --------------- ---------- ------------- --------------------- db_tab rabbit_topic_trie_binding 258465 13627 5.2723 1389904 0.0370 | ...XXxxXx.......... | ``` mnesia:match_object/3 uses a table index if it exists, but in the case of rabbit_topic_trie_binding, there is no table index, so a full table scan used to be performed. For more context, see #1513 Partner-in-crime: @essen
* | | Remove a test that need reworking to be more predictableMichael Klishin2018-03-271-25/+1
| | | | | | | | | | | | | | | It always passes locally and almost never in CI. We should consider testing the key code path used to seed the database in more isolation.
* | | Cuttlefish schema: wordingMichael Klishin2018-03-211-3/+3
| | | | | | | | | | | | [ci skip]
* | | Improve background_gc docs in example config filesMichael Klishin2018-03-212-4/+16
| | | | | | | | | | | | | | | To make it clearer that memory breakdown analysis must be done formed. "Don't guess, collect data".
* | | Merge pull request #1556 from rabbitmq/vhost-sup-raceMichael Klishin2018-03-163-5/+39
|\ \ \ | | | | | | | | Do not try to start a vhost supervisors on not fully booted nodes.
| * | | Make is_booted function compatible with pre-3.7.4 remote nodesDaniil Fedotov2018-03-161-18/+9
| | | |
| * | | Check that default vhost is started on all nodes after restartDaniil Fedotov2018-03-161-1/+2
| | | |