| Commit message (Collapse) | Author | Age | Files | Lines |
| |\
| |
| | |
Wait until node detects new cluster configuration
|
| |/
|
|
|
|
|
| |
CI has failed with an mnesia error where the rabbit_queue table
doesn't exist. The actual logs don't show any error on the remaining
node so let's assume that is mnesia detecting the other node going
down. This really shouldn't happen, but I can't reproduce it either.
|
| | |
|
| |\
| |
| | |
unit_log_management_SUITE: Simplify code of `log_file_fails_to_initialise_during_startup`
|
| | |
| |
| |
| |
| |
| |
| | |
`log_file_fails_to_initialise_during_startup`
Also, add more log messages to help us debug this testcase when it
fails.
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of them we run up to 20 times (!!!) to make sure that they succeed.
- They are not helping anyone in the current state
- I don't have enough context to be able to fix them
- I need to stay focused on the current task, cannot afford to context switch
- Feel free to fix it if it's important, otherwise leave it deleted
cc @michaelklishin @dumbbell
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
(cherry picked from commit a835d3271680ad6db5663f504f08fd0db4ee21c2)
|
| |\
| |
| | |
rabbit_feature_flags: Multiple fixes and optimizations to get rid of race conditions
|
| | |
| |
| |
| |
| | |
We need to handle concurrent calls to this function to avoid any issues
with parallel read/modify/write operations.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Before we query all the details needed to generate a new registry, we
save the version of the currently loaded one, using the `vsn` Erlang
module attribute added by the compiler.
When the new registry is ready to be loaded, we verify again the version
of the loaded one: if it differs, it means a concurrent process reloaded
the registry. In this case, we restart the entire regen procedure,
including the query of fresh details. The goal is to avoid any loss of
information from the existing registry.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
By calling code:delete() ourselves, we created a time window where there
is no registry module loaded between that call and the following
code:load_binary().
This meant that a concurrent access to the registry would trigger a load
of the initial uninitialized registry module from disk. That module
would then trigger a reload itself, leading to possible deadlock.
In fact, in code:load_binary(), the code server already takes care of
replacing the module atomically. We don't need to do anything.
|
| | | |
|
| | |
| |
| |
| |
| | |
Now that the feature flags subsystem uses its own log category, we need
to configure the corresponding handler.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As of this commit, it show two issues with the current implementation of
the feature flags registry reloading:
* There is a small time window where there is no registry module loaded
while it is replaced.
* The newer registry might be initialized with older data.
Follow-up commits will address them.
|
| | |
| |
| |
| |
| |
| | |
Why did I write this in French in the first place?
While here, fix a typo in the comment nearby.
|
| |/
|
|
|
|
|
| |
... from unknown applications were discovered.
In particular, this was triggerring more queries to remote nodes: more
time spent to regen the same registry and potential for errors.
|
| |\
| |
| | |
Fixes race conditions on rabbit_fifo_int
|
| | |
| |
| |
| |
| |
| |
| |
| | |
A few testcases are time dependant. Instead of waiting a
predetermined amount of time for the ra events, this PR waits
for a specific number of events. This should remove most false
negatives detected on CI, even though we still have timeouts -
but much longer!
|
| |\ \
| |/
|/| |
Fix flaky test - connection tracking is asynchronous
|
| | | |
|
| |/ |
|
| |
|
|
|
|
|
|
|
| |
Such cases are best tested using other tools. These cases
are highly flaky and prevent pipeline progress. We believe
it's the tests that are timing-sensitive.
These may be replaced with e.g. additional Jepsen tests as needed
in the future.
|
| |\
| |
| | |
Remove Ra segment_max_entries override
|
| | |
| |
| |
| |
| | |
So that it uses Ra's internal default of 4096 instead which is safer for
larger message sizes.
|
| |\ \
| | |
| | |
| | |
| | | |
rabbitmq/handle-deadlocks-in-peer_discovery_classic_config_SUITE
peer_discovery_classic_config_SUITE: Handle dead-locks
|
| | | | |
|
| |/ /
| |
| |
| |
| |
| |
| |
| |
| | |
... when nodes are waiting for each other to finish Mnesia
initialization.
So if the success condition is not met, we reset and restart all nodes
except the first one to trigger peer discovery again. We check the
success condition after that.
|
| | |
| |
| |
| |
| | |
The raft.* conf parameters only take place at the next available
opportunity and won't affect in-flight data.
|
| |\ \
| | |
| | |
| | |
| | | |
rabbitmq/work-around-cli-circular-dep-in-feature_flags_SUITE
feature_flags_SUITE: Work around CLI/rabbitmq-server circular dependency
|
| | | | |
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We need to copy `rabbit` in `my_plugins` plugins directory, not because
`my_plugin` depends on it, but because the CLI erroneously depends on
the broker.
This can't be fixed easily because this is a circular dependency
(i.e. the broker depends on the CLI). So until a proper solution is
implemented, keep this second copy of the broker for the CLI to find it.
|
| |\ \ \
| |/ /
|/| |
| | |
| | | |
rabbitmq/lrb-fix-flaky-peer_discovery_classic_config-test
Ensure randomized_startup_delay_range custom value is used
|
| |/ /
| |
| |
| |
| |
| |
| |
| | |
This ensure the test completes within the 90sec time limit
https://pivotal.slack.com/archives/C055BSG8E/p1585840790221000
https://github.com/rabbitmq/rabbitmq-server/commit/609501c46d7e18a7ea103bfa0188e73c3c4fc951#commitcomment-38268475
|
| |\ \
| |/
|/| |
rabbit_fifo: set timer when publish rejected
|
| | |
| |
| |
| | |
and no current leader is known so that we can re-try after a timeout
|
| | |
| |
| |
| | |
in single active consmer test
|
| | | |
|
| | | |
|
| | |
| |
| |
| | |
10s is not enough for CI.
|
| |\ \
| |/
|/| |
clustering_management_SUITE: No need to stop node after start_app failure in `erlang_config`
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
... in `erlang_config`.
Since #2180, a failed `start_app` does not take the node down anymore.
Trying to restart the node just after was failing since (because the
node is still there), but this remained unnoticed so far because the
return value of `start_node()` is not checked.
However, since
rabbitmq/rabbitmq-ct-helpers@c033d9272afaf3575505533c81f1c0c7cfcb6206,
the Make recipe which starts the node automatically stops it if the
start failed somewhere. This is in order to not leave an unwanted node
around.
This means that after the failing
`rabbit_ct_broker_helpers:start_node()`, the node was effectively
stopped this time, leading to the rest of the testcase to fail.
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
| |
for up to N seconds
Depends on rabbitmq/rabbitmq-ct-helpers@98f1c4a8012c006965257f2875873bf9d08416bc
|
| |
|
|
| |
to make it clear that it is a mock-based unit test one
|
| |\
| |
| | |
Move rabbit_channel config value to config record
|
| | |
| |
| |
| |
| |
| | |
writer_gc_threshold is a static value and shoudl be in the static config
record not in the main channel record that should only hold mutable data
fields.
|
| | |
| |
| |
| | |
as opposed to eacces
|
| |\ \
| |/
|/| |
Increase wait timeouts in rabbit_fifo_int
|