summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #2310 from rabbitmq/t-list-queues-online-and-offlineMichael Klishin2020-04-101-0/+5
|\ | | | | Wait until node detects new cluster configuration
| * Wait until node detects new cluster configurationdcorbacho2020-04-101-0/+5
|/ | | | | | | CI has failed with an mnesia error where the rabbit_queue table doesn't exist. The actual logs don't show any error on the remaining node so let's assume that is mnesia detecting the other node going down. This really shouldn't happen, but I can't reproduce it either.
* Attempt to correct a testMichael Klishin2020-04-101-3/+0
|
* Merge pull request #2307 from rabbitmq/fix-log_management_SUITEJean-Sébastien Pédron2020-04-091-41/+32
|\ | | | | unit_log_management_SUITE: Simplify code of `log_file_fails_to_initialise_during_startup`
| * unit_log_management_SUITE: Simplify code of ↵Jean-Sébastien Pédron2020-04-091-41/+32
| | | | | | | | | | | | | | `log_file_fails_to_initialise_during_startup` Also, add more log messages to help us debug this testcase when it fails.
* | Remove all ct-partition test flakesGerhard Lazu2020-04-091-480/+0
|/ | | | | | | | | | | | | | Some of them we run up to 20 times (!!!) to make sure that they succeed. - They are not helping anyone in the current state - I don't have enough context to be able to fix them - I need to stay focused on the current task, cannot afford to context switch - Feel free to fix it if it's important, otherwise leave it deleted cc @michaelklishin @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk> (cherry picked from commit a835d3271680ad6db5663f504f08fd0db4ee21c2)
* Merge pull request #2304 from rabbitmq/skip-reginit-if-no-ff-from-unknown-appsJean-Sébastien Pédron2020-04-093-45/+272
|\ | | | | rabbit_feature_flags: Multiple fixes and optimizations to get rid of race conditions
| * rabbit_feature_flags: Add a FIXME to try_to_write_enabled_feature_flags_list()Jean-Sébastien Pédron2020-04-091-0/+2
| | | | | | | | | | We need to handle concurrent calls to this function to avoid any issues with parallel read/modify/write operations.
| * rabbit_feature_flags: Restart registry regen if it changed meanwhileJean-Sébastien Pédron2020-04-091-23/+70
| | | | | | | | | | | | | | | | | | | | | | | | Before we query all the details needed to generate a new registry, we save the version of the currently loaded one, using the `vsn` Erlang module attribute added by the compiler. When the new registry is ready to be loaded, we verify again the version of the loaded one: if it differs, it means a concurrent process reloaded the registry. In this case, we restart the entire regen procedure, including the query of fresh details. The goal is to avoid any loss of information from the existing registry.
| * rabbit_feature_flags: Fix concurrent registry module reloadJean-Sébastien Pédron2020-04-091-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | By calling code:delete() ourselves, we created a time window where there is no registry module loaded between that call and the following code:load_binary(). This meant that a concurrent access to the registry would trigger a load of the initial uninitialized registry module from disk. That module would then trigger a reload itself, leading to possible deadlock. In fact, in code:load_binary(), the code server already takes care of replacing the module atomically. We don't need to do anything.
| * rabbit_feature_flags: Log more details during registry reloadingJean-Sébastien Pédron2020-04-092-7/+50
| |
| * feature_flags_SUITE: Fix Lager configurationJean-Sébastien Pédron2020-04-091-1/+6
| | | | | | | | | | Now that the feature flags subsystem uses its own log category, we need to configure the corresponding handler.
| * feature_flags_SUITE: New testcase to test concurrent registry loadingJean-Sébastien Pédron2020-04-092-4/+117
| | | | | | | | | | | | | | | | | | | | As of this commit, it show two issues with the current implementation of the feature flags registry reloading: * There is a small time window where there is no registry module loaded while it is replaced. * The newer registry might be initialized with older data. Follow-up commits will address them.
| * rabbit_feature_flags: Translate a comment to EnglishJean-Sébastien Pédron2020-04-091-4/+4
| | | | | | | | | | | | Why did I write this in French in the first place? While here, fix a typo in the comment nearby.
| * rabbit_feature_flags: Don't re-initiliaze registry if no feature flagsJean-Sébastien Pédron2020-04-091-5/+10
|/ | | | | | | ... from unknown applications were discovered. In particular, this was triggerring more queries to remote nodes: more time spent to regen the same registry and potential for errors.
* Merge pull request #2305 from rabbitmq/test-rabbit-fifo-intMichael Klishin2020-04-081-73/+81
|\ | | | | Fixes race conditions on rabbit_fifo_int
| * Fixes race conditions on rabbit_fifo_intdcorbacho2020-04-071-73/+81
| | | | | | | | | | | | | | | | A few testcases are time dependant. Instead of waiting a predetermined amount of time for the ra events, this PR waits for a specific number of events. This should remove most false negatives detected on CI, even though we still have timeouts - but much longer!
* | Merge pull request #2306 from rabbitmq/test-unit-connection-trackingMichael Klishin2020-04-081-1/+16
|\ \ | |/ |/| Fix flaky test - connection tracking is asynchronous
| * Use rabbit_ct_helpers:await_condition/2dcorbacho2020-04-081-18/+4
| |
| * Fix flaky test - connection tracking is asynchronousdcorbacho2020-04-081-1/+30
|/
* Remove a few integration tests that rely on sigkillMichael Klishin2020-04-071-5/+0
| | | | | | | | | Such cases are best tested using other tools. These cases are highly flaky and prevent pipeline progress. We believe it's the tests that are timing-sensitive. These may be replaced with e.g. additional Jepsen tests as needed in the future.
* Merge pull request #2300 from rabbitmq/remove-segment-max-entries-defaultMichael Klishin2020-04-071-2/+0
|\ | | | | Remove Ra segment_max_entries override
| * Remove Ra segment_max_entries overridekjnilsson2020-04-031-2/+0
| | | | | | | | | | So that it uses Ra's internal default of 4096 instead which is safer for larger message sizes.
* | Merge pull request #2303 from ↵Michael Klishin2020-04-071-7/+20
|\ \ | | | | | | | | | | | | rabbitmq/handle-deadlocks-in-peer_discovery_classic_config_SUITE peer_discovery_classic_config_SUITE: Handle dead-locks
| * | Factor out common code, add multiple triesLuke Bakken2020-04-061-32/+13
| | |
| * | peer_discovery_classic_config_SUITE: Handle dead-locksJean-Sébastien Pédron2020-04-061-6/+38
|/ / | | | | | | | | | | | | | | | | ... when nodes are waiting for each other to finish Mnesia initialization. So if the success condition is not met, we reset and restart all nodes except the first one to trigger peer discovery again. We check the success condition after that.
* | Remove overly cautious comment from sample configkjnilsson2020-04-061-3/+0
| | | | | | | | | | The raft.* conf parameters only take place at the next available opportunity and won't affect in-flight data.
* | Merge pull request #2302 from ↵Jean-Sébastien Pédron2020-04-062-6/+41
|\ \ | | | | | | | | | | | | rabbitmq/work-around-cli-circular-dep-in-feature_flags_SUITE feature_flags_SUITE: Work around CLI/rabbitmq-server circular dependency
| * | rabbitmq-env: Fix indentationJean-Sébastien Pédron2020-04-061-2/+2
| | |
| * | feature_flags_SUITE: Work around CLI/rabbitmq-server circular dependencyJean-Sébastien Pédron2020-04-061-4/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to copy `rabbit` in `my_plugins` plugins directory, not because `my_plugin` depends on it, but because the CLI erroneously depends on the broker. This can't be fixed easily because this is a circular dependency (i.e. the broker depends on the CLI). So until a proper solution is implemented, keep this second copy of the broker for the CLI to find it.
* | | Merge pull request #2301 from ↵Michael Klishin2020-04-042-14/+26
|\ \ \ | |/ / |/| | | | | | | | rabbitmq/lrb-fix-flaky-peer_discovery_classic_config-test Ensure randomized_startup_delay_range custom value is used
| * | Ensure randomized_startup_delay_range custom value is usedLuke Bakken2020-04-032-14/+26
|/ / | | | | | | | | | | | | | | This ensure the test completes within the 90sec time limit https://pivotal.slack.com/archives/C055BSG8E/p1585840790221000 https://github.com/rabbitmq/rabbitmq-server/commit/609501c46d7e18a7ea103bfa0188e73c3c4fc951#commitcomment-38268475
* | Merge pull request #2295 from rabbitmq/rabbit-fifo-fixGerhard Lazu2020-04-021-9/+9
|\ \ | |/ |/| rabbit_fifo: set timer when publish rejected
| * rabbit_fifo: set timer when publish rejectedkjnilsson2020-04-011-9/+9
| | | | | | | | and no current leader is known so that we can re-try after a timeout
* | Bump test timeoutskjnilsson2020-04-021-9/+24
| | | | | | | | in single active consmer test
* | Reduce randomized startup delay range for this testMichael Klishin2020-04-011-3/+6
| |
* | Wait for up to 90s in this testMichael Klishin2020-04-011-2/+2
| |
* | Await cluster formation for 40sMichael Klishin2020-04-011-3/+3
| | | | | | | | 10s is not enough for CI.
* | Merge pull request #2294 from rabbitmq/fix-clustering_management/erlang_configJean-Sébastien Pédron2020-04-011-12/+3
|\ \ | |/ |/| clustering_management_SUITE: No need to stop node after start_app failure in `erlang_config`
| * clustering_management_SUITE: No need to stop node after start_app failureJean-Sébastien Pédron2020-04-011-12/+3
|/ | | | | | | | | | | | | | | | | | | ... in `erlang_config`. Since #2180, a failed `start_app` does not take the node down anymore. Trying to restart the node just after was failing since (because the node is still there), but this remained unnoticed so far because the return value of `start_node()` is not checked. However, since rabbitmq/rabbitmq-ct-helpers@c033d9272afaf3575505533c81f1c0c7cfcb6206, the Make recipe which starts the node automatically stops it if the start failed somewhere. This is in order to not leave an unwanted node around. This means that after the failing `rabbit_ct_broker_helpers:start_node()`, the node was effectively stopped this time, leading to the rest of the testcase to fail.
* More debug logging around peer discovery lockingMichael Klishin2020-04-011-2/+11
|
* Rename one more test suiteMichael Klishin2020-03-311-1/+1
|
* Rename one more test suiteMichael Klishin2020-03-311-1/+1
|
* Rename one more test suiteMichael Klishin2020-03-311-1/+1
|
* peer_discovery_classic_config_SUITE: re-evaluate cluster formation condition ↵Michael Klishin2020-03-311-6/+12
| | | | | | for up to N seconds Depends on rabbitmq/rabbitmq-ct-helpers@98f1c4a8012c006965257f2875873bf9d08416bc
* Rename a test suiteMichael Klishin2020-03-311-1/+1
| | | | to make it clear that it is a mock-based unit test one
* Merge pull request #2293 from rabbitmq/fix-rabbit-channel-recordMichael Klishin2020-03-311-13/+13
|\ | | | | Move rabbit_channel config value to config record
| * Move rabbit_channel config value to config recordkjnilsson2020-03-311-13/+13
| | | | | | | | | | | | writer_gc_threshold is a static value and shoudl be in the static config record not in the main channel record that should only hold mutable data fields.
* | unit_log_management_SUITE: handle erofs returned on macOSMichael Klishin2020-03-311-1/+7
| | | | | | | | as opposed to eacces
* | Merge pull request #2292 from rabbitmq/rabbit_fifo_int_tweaksMichael Klishin2020-03-311-4/+6
|\ \ | |/ |/| Increase wait timeouts in rabbit_fifo_int