| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
References rabbitmq/rabbitmq-peer-discovery-k8s#23.
|
| | |
|
| | |
|
| | |
|
| |\
| |
| | |
rabbit_parameter_validation: support maps in proplist validator
|
| |/
|
|
|
| |
Part of rabbitmq/rabbitmq-federation#73, references rabbitmq/rabbitmq-federation#70,
rabbitmq/rabbitmq-federation#67.
|
| | |
|
| | |
|
| |\ |
|
| | | |
|
| | |\
|/ /
| |
| | |
reduce-mnesia-contention-when-nodes-restart-master
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise, if there is more than 1 node that runs
rabbit_node_monitor:on_node_down/1, there will be
`{aborted,no_transaction` errors.
{{aborted,no_transaction},
[{mnesia,abort,1,[{file,"mnesia.erl"},{line,351}]},
{rabbit_exchange_type_topic,'-remove_bindings/3-lc$^0/1-0-',1,
[{file,"src/rabbit_exchange_type_topic.erl"},
{line,78}]},
{rabbit_exchange_type_topic,remove_bindings,3,
[{file,"src/rabbit_exchange_type_topic.erl"},
{line,78}]},
{rabbit_binding,x_callback,4,[{file,"src/rabbit_binding.erl"},{line,570}]},
{rabbit_binding,'-process_deletions/2-fun-0-',2,
[{file,"src/rabbit_binding.erl"},{line,547}]},
{dict,map_bucket,2,[{file,"dict.erl"},{line,481}]},
{dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]},
{dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]}]}
Partner-in-crime: @essen
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rather than calling `rabbit_core_metrics:delete_queue/1` for every
queue, collect all deleted queues and delete all their metrics in a
single operation.
We don't use a single Mnesia transaction to delete all objects related
to a queue and this may be a problem, but we haven't run this version of
the code long enough to know for sure. What should we be looking out for
@michaelklishin?
For initial context, see #1513
Partner-in-crime: @essen
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When many queues are being deleted, we believe that it's faster to have
fewer Mnesia transactions and therefore group 10 queue deletions into a
single Mnesia transaction. This number (10) is arbitrary, we didn't try
with a different number. Creating 1 Mnesia transaction for every queue
deletion feels too many transaction, and having a single Mnesia
transaction for all queue deletions is too few transactions. This felt
like a sensible option.
We cannot determine if this is a good change because
rabbit_core_metrics:queue_deleted/1 takes the most time and obscures all
observations. According to qcachegrind,
rabbit_misc:execute_mnesia_transaction/1 takes 1.8s while
rabbit_core_metrics:queue_deleted/1 takes 132s out of which ets:select/2
takes 131s.
How can we optimise rabbit_core_metrics:queue_deleted/1 ? We are
thinking that rather than calling ets:select/2 twice for every queue, we
should call it twice for all queues that need to be deleted. We don't
know whether this is possible. Alternatively, we might look into
ets:first/1 & ets:next/2 to iterate over the entire table ONCE with all
the queues that have been deleted. Thoughts @dcorbacho @michaelklishin ?
For initial context, see #1513
Partner-in-crime: @essen
|
| | | |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Rather than using a single Mnesia transaction to clean queues when a
node goes down, split it into many transactions so that other Mnesia
operations can make progress. This is especially important when nodes
join the cluster, since Mnesia on the newly started node will not be
able to synchronise if there is a long-running Mnesia transaction.
This commit is not complete, we need feedback on the comments left in
the code before we can settle on a final version that can be merged.
For more context, see #1513
Partner-in-crime: @essen
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When rabbit_node_monitor that runs on an alive node is busy cleaning
after another node goes down, the process might not respond for a long
time.
Since the Management API calls rabbit_node_monitor:partitions/0 and
blocks until this function returns, the Management UI will fail to load
for a really long time - clients will timeout before this function
returns.
This change will make rabbit_node_monitor:partitions/0 timeout after 5
seconds so that the Management API returns in a timely manner and the
Management UI blocks for an extra 5 seconds at most. The implications of
this will result in node metrics being outdated, which is not ideal, but
at least the users will have some feedback and will be able to perform
action via the Management UI / API.
For more context, see #1513
Partner-in-crime: @essen
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Doing a wread operation requires a table write lock. Since a delete
requires a table write lock as well, we have 2 write locks instead of 1.
When Mnesia is under load, this can result in many lock collisions and
slow down the entire set of operations. We think that it's better to
leverage fast SSDs / NVMEs and defer the Mnesia disk log write
optimisation to the filesystem.
For more context, see #1513
Partner-in-crime @essen
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
mnesia:match_object/3 scans the entire table and can take many seconds
on a loaded node. This is especially bad when there are many bindings
which need to be deleted. If the object is in the table then delete it,
otherwise carry on.
Prior to this change, it was observed that there is a high % of
lock collisions in rabbit_topic_trie_binding table:
```
lock id #tries #collisions collisions [%] time [us] duration [%] histogram [log2(us)]
----- --- ------- ------------ --------------- ---------- ------------- ---------------------
db_tab rabbit_topic_trie_binding 258465 13627 5.2723 1389904 0.0370 | ...XXxxXx.......... |
```
mnesia:match_object/3 uses a table index if it exists, but in the case
of rabbit_topic_trie_binding, there is no table index, so a full table
scan used to be performed.
For more context, see #1513
Partner-in-crime: @essen
|
| | |
| |
| |
| |
| |
| |
| | |
To make it clearer that memory breakdown analysis must
be done formed. "Don't guess, collect data".
(cherry picked from commit bb36e18a0ab25117ca8ccf3aecd58bdd3985006b)
|
| | |
| |
| |
| | |
(cherry picked from commit fc8abae47427eb9ae4347177cc23dd9d4e10ba35)
|
| | |
| |
| |
| | |
(cherry picked from commit 92df428dd6f46aaf6ed69ffa9f79da37ed8aefe5)
|
| | |
| |
| |
| | |
(cherry picked from commit 117ad73be2557e93bb9cc9bbe123337aaa98c192)
|
| | |
| |
| |
| | |
(cherry picked from commit 1ef32474b0bd414e157411df17744f5fc7e3e5a6)
|
| | |
| |
| |
| | |
(cherry picked from commit a4cb92a27dd93903417e2de59abb9bb9dff77571)
|
| | |
| |
| |
| | |
(cherry picked from commit f14bf763ebdff5524d43e214ef9ced817939e088)
|
| | |
| |
| |
| | |
(cherry picked from commit 8b6d0ef7450754d47335e6fd5267236b957a0870)
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sometimes when several nodes are started at the same time, add_vhost
can try to start a remote vhost supervisor on a node, which does
not have a rabbit_vhost_sup_sup process yet, resulting in `{error,rabbit_vhost_sup_sup_not_running}`
error.
Filter only fully booted nodes to start remote vhost supervisors.
(cherry picked from commit 52d0c1aed0d5b532765d6efd0ed92b393d821322)
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Handle noport at epmd monitor startup
Handle EXIT from TCP port more gracefully
Ensure that Parent pid is matched
(cherry picked from commit e8d492b75e5c5f6c70f9ea8290a0c8a06362181a)
|
| | |
| |
| |
| | |
(cherry picked from commit b30ae2f90c27c52c4e9db5f3845d37e441c6e371)
|
| | |
| |
| |
| | |
(cherry picked from commit a82bdadc031bab0f607e5182a5b7325107386202)
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes #1501
[#155801556]
If a queue is configured to not be promoted (via ha-promote-on-shutdown: when-synced)
queue.delete can hang. Make it check for process existense first and
force-delete if no master of slave processes are running.
Do not force-delete if if_empty is set, since there is no
way to check that the queue is empty.
(cherry picked from commit 3e7bd564bda36c1bbb9e3b59b61509d0982a88ec)
|
| | |\
| | |
| | | |
Check process before getting amqp_params
|
| |\ \ \
| | | |
| | | |
| | | |
| | | | |
rabbitmq/revert-1527-reduce-mnesia-contention-when-nodes-restart-master
Revert "Reduce lock contention when nodes restart (master)"
|
| |/ / / |
|
| |\ \ \
| | | |
| | | |
| | | |
| | | | |
rabbitmq/reduce-mnesia-contention-when-nodes-restart-master
Reduce lock contention when nodes restart (master)
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Otherwise, if there is more than 1 node that runs
rabbit_node_monitor:on_node_down/1, there will be
`{aborted,no_transaction` errors.
{{aborted,no_transaction},
[{mnesia,abort,1,[{file,"mnesia.erl"},{line,351}]},
{rabbit_exchange_type_topic,'-remove_bindings/3-lc$^0/1-0-',1,
[{file,"src/rabbit_exchange_type_topic.erl"},
{line,78}]},
{rabbit_exchange_type_topic,remove_bindings,3,
[{file,"src/rabbit_exchange_type_topic.erl"},
{line,78}]},
{rabbit_binding,x_callback,4,[{file,"src/rabbit_binding.erl"},{line,570}]},
{rabbit_binding,'-process_deletions/2-fun-0-',2,
[{file,"src/rabbit_binding.erl"},{line,547}]},
{dict,map_bucket,2,[{file,"dict.erl"},{line,481}]},
{dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]},
{dict,map_bkt_list,2,[{file,"dict.erl"},{line,477}]}]}
Partner-in-crime: @essen
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Rather than calling `rabbit_core_metrics:delete_queue/1` for every
queue, collect all deleted queues and delete all their metrics in a
single operation.
We don't use a single Mnesia transaction to delete all objects related
to a queue and this may be a problem, but we haven't run this version of
the code long enough to know for sure. What should we be looking out for
@michaelklishin?
For initial context, see #1513
Partner-in-crime: @essen
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When many queues are being deleted, we believe that it's faster to have
fewer Mnesia transactions and therefore group 10 queue deletions into a
single Mnesia transaction. This number (10) is arbitrary, we didn't try
with a different number. Creating 1 Mnesia transaction for every queue
deletion feels too many transaction, and having a single Mnesia
transaction for all queue deletions is too few transactions. This felt
like a sensible option.
We cannot determine if this is a good change because
rabbit_core_metrics:queue_deleted/1 takes the most time and obscures all
observations. According to qcachegrind,
rabbit_misc:execute_mnesia_transaction/1 takes 1.8s while
rabbit_core_metrics:queue_deleted/1 takes 132s out of which ets:select/2
takes 131s.
How can we optimise rabbit_core_metrics:queue_deleted/1 ? We are
thinking that rather than calling ets:select/2 twice for every queue, we
should call it twice for all queues that need to be deleted. We don't
know whether this is possible. Alternatively, we might look into
ets:first/1 & ets:next/2 to iterate over the entire table ONCE with all
the queues that have been deleted. Thoughts @dcorbacho @michaelklishin ?
For initial context, see #1513
Partner-in-crime: @essen
|
| | | | | |
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Rather than using a single Mnesia transaction to clean queues when a
node goes down, split it into many transactions so that other Mnesia
operations can make progress. This is especially important when nodes
join the cluster, since Mnesia on the newly started node will not be
able to synchronise if there is a long-running Mnesia transaction.
This commit is not complete, we need feedback on the comments left in
the code before we can settle on a final version that can be merged.
For more context, see #1513
Partner-in-crime: @essen
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When rabbit_node_monitor that runs on an alive node is busy cleaning
after another node goes down, the process might not respond for a long
time.
Since the Management API calls rabbit_node_monitor:partitions/0 and
blocks until this function returns, the Management UI will fail to load
for a really long time - clients will timeout before this function
returns.
This change will make rabbit_node_monitor:partitions/0 timeout after 5
seconds so that the Management API returns in a timely manner and the
Management UI blocks for an extra 5 seconds at most. The implications of
this will result in node metrics being outdated, which is not ideal, but
at least the users will have some feedback and will be able to perform
action via the Management UI / API.
For more context, see #1513
Partner-in-crime: @essen
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Doing a wread operation requires a table write lock. Since a delete
requires a table write lock as well, we have 2 write locks instead of 1.
When Mnesia is under load, this can result in many lock collisions and
slow down the entire set of operations. We think that it's better to
leverage fast SSDs / NVMEs and defer the Mnesia disk log write
optimisation to the filesystem.
For more context, see #1513
Partner-in-crime @essen
|
| |/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
mnesia:match_object/3 scans the entire table and can take many seconds
on a loaded node. This is especially bad when there are many bindings
which need to be deleted. If the object is in the table then delete it,
otherwise carry on.
Prior to this change, it was observed that there is a high % of
lock collisions in rabbit_topic_trie_binding table:
```
lock id #tries #collisions collisions [%] time [us] duration [%] histogram [log2(us)]
----- --- ------- ------------ --------------- ---------- ------------- ---------------------
db_tab rabbit_topic_trie_binding 258465 13627 5.2723 1389904 0.0370 | ...XXxxXx.......... |
```
mnesia:match_object/3 uses a table index if it exists, but in the case
of rabbit_topic_trie_binding, there is no table index, so a full table
scan used to be performed.
For more context, see #1513
Partner-in-crime: @essen
|
| | | |
| | |
| | |
| | |
| | | |
It always passes locally and almost never in CI. We should consider
testing the key code path used to seed the database in more isolation.
|
| | | |
| | |
| | |
| | | |
[ci skip]
|
| | | |
| | |
| | |
| | |
| | | |
To make it clearer that memory breakdown analysis must
be done formed. "Don't guess, collect data".
|
| |\ \ \
| | | |
| | | | |
Do not try to start a vhost supervisors on not fully booted nodes.
|
| | | | | |
|
| | | | | |
|