summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* GitHub Actions: Regen workflowsJean-Sébastien Pédron2020-04-272-2/+2
|
* GitHub Actions: Generate workflows using `make github-actions`Jean-Sébastien Pédron2020-04-273-419/+374
|
* Git: Ignore rabbitmq-deps.mk fileJean-Sébastien Pédron2020-04-271-1/+1
|
* Don't cache app modulesGerhard Lazu2020-04-252-1102/+0
| | | | | | | | | | | | | @dumbbell copy paste: When we re-run a workflow, the `ebin` cache, created by the initial run, is restored in `checks`. This causes rabbitmq_prelaunch to be cleaned when we run `make xref` and thus the make target fails. This is probably a bug in Erlang.mk or one of our plugins,but in the meantime, I suggest we don't cache this directory. Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
* Use shields.io for better badgesGerhard Lazu2020-04-251-3/+2
| | | | | | | | | | Remove the Travis CI build badge - it links to a build which is less comprehensive than the new GitHub Actions builds. We are likely to remove Travis completely, but would like to do that after we discuss it. cc @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
* Shorten text on build badgesGerhard Lazu2020-04-251-2/+2
| | | | | | | | | | | | Triggering another build and busting the deps & ebin app modules cache, because it makes no sense for xref to pass in OTP v22.3 but fail in OTP v21.3 - it's the same src! https://github.com/rabbitmq/rabbitmq-server/actions/runs/86189793 cc @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
* rabbit_runtime_parameters:set_globa/3: add debug loggingMichael Klishin2020-04-241-0/+1
|
* Merge pull request #2277 from ↵Gerhard Lazu2020-04-232-117/+25
|\ | | | | | | | | rabbitmq/always-handle-config-files-with-cuttlefish Always handle config files with Cuttlefish
| * rabbit_config: Deprecate schema_dir/0Jean-Sébastien Pédron2020-04-231-9/+4
| | | | | | | | | | | | It is unused in RabbitMQ or tier-1 plugins, and the previously returned value made no sense since the switch to Cuttlefish as a library (as part of #2180).
| * rabbit_prelaunch_conf: Always handle config. files with CuttlefishJean-Sébastien Pédron2020-04-232-108/+21
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has several benefits: 1. It simplifies the code, all configuration being handled by the same code path (no more condition on Erlang-term-based vs. Cuttlefish). `rabbit_config` shrinks quite a lot in the process. 2. We can use additional configuration files AND an Erlang-term-based configuration file. In other words, it is possible to use the same existing Erlang-term-based file and introduce Cuttlefish files when needed. It allows a user to run RabbitMQ with: RABBITMQ_CONFIG_FILE=/path/to/rabbitmq.config \ RABBITMQ_CONFIG_FILES=/path/to/conf.d/*.conf \ ./sbin/rabbitmq-server A developer can do the same with `make run-broker`: make run-broker \ RABBITMQ_CONFIG_FILES=/path/to/conf.d/*.conf In the example above, the main configuration file generated by rabbitmq-run.mk is an Erlang-term-based one. This is implemented by calling Cuttlefish with a (possibly empty) list of additional files and the Erlang-term-based file as the advanced configuration file. References #2180.
* Merge pull request #2323 from rabbitmq/rabbitmq-server-2322Michael Klishin2020-04-221-12/+21
|\ | | | | Run both authn and authz steps when rabbit_auth_backend_cache module …
| * Run both authn and authz steps when rabbit_auth_backend_cache module is usedLuke Bakken2020-04-221-12/+21
| | | | | | | | Fixes #2322
* | Fix GitHub Actions workflow badgesGerhard Lazu2020-04-221-1/+2
| | | | | | | | Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
* | Merge pull request #2270 from rabbitmq/github-actionsGerhard Lazu2020-04-226-10/+7820
|\ \ | | | | | | Run checks & tests in GitHub Actions on every push
| * | Run checks & tests in GitHub Actions on every pushGerhard Lazu2020-04-226-10/+7820
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Separate workflows for min & max supported OTP version - Multiple groups for clustering_management (run in parallel & finish quicker) - Store CT logs as artefacts on failure - Tests badge in README - Store in S3 the version of the RabbitMQ components used while testing We have decided to not not run tests for RabbitMQ components that we depend on: rabbitmq-cli, rabbitmq-erlang-client & rabbitmq-common. rabbitmq-cli & rabbitmq-erlang-client depend on rabbitmq-server (recursive dependency) meaning that they will clone rabbitmq-server again, inside the deps dir. We will continue to run these in Concourse, until we merge all repositories into a single one. Let's be honest, it's a monolith! Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
* | Merge pull request #2320 from rabbitmq/rabbitmq-cli-408Michael Klishin2020-04-222-0/+174
|\ \ | |/ |/| Introduce rabbit_upgrade_preparation
| * Introduce rabbit_upgrade_preparationMichael Klishin2020-04-212-0/+174
| | | | | | | | Part of rabbitmq/rabbitmq-cli#408.
* | Update erlang.mkJean-Sébastien Pédron2020-04-211-16/+43
| |
* | Attempt to make unit_log_managment_SUITE less flakyPhilip Kuryloski2020-04-212-14/+27
| | | | | | | | | | | | | | | | | | | | Rather than wait a fixed 2000ms, poll the test condition for up to 5000ms. Also switch from a raw message send in rabbit.erl to a gen_event:call/4 to the lager backend. I had hoped this would behave synchronously, which it does not appear to, but at least we now get a value back from the call.
* | Merge pull request #2315 from rabbitmq/use-new-inet_tcp_proxy_distJean-Sébastien Pédron2020-04-214-22/+17
|\ \ | | | | | | Use new `inet_tcp_proxy_dist`
| * | rabbit_feature_flags: Improve timeout computation in ↵Jean-Sébastien Pédron2020-04-201-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `mark_as_enabled_remotely()` The rounded value of `(timer:now_diff(T1, T0) div 1000)` was often 0, leading to no decrease in timeout, i.e. the equivalent of an infinite loop. Now, the division happens after the substraction. Also, to avoid to much hammering, we sleep for one second between retries.
| * | *_SUITE: Use the new API for inet_tcp_proxy_distJean-Sébastien Pédron2020-04-203-19/+10
|/ / | | | | | | See rabbitmq/inet_tcp_proxy#3 and rabbitmq/rabbitmq-ct-helpers#39.
* | *_SUITE: Don't setup dist module when not neededJean-Sébastien Pédron2020-04-204-12/+4
|/
* Merge pull request #2316 from rabbitmq/lrb-fixup-schema-syncMichael Klishin2020-04-151-1/+1
|\ | | | | Fix schema sync for parameters
| * Fix schema sync for parametersLuke Bakken2020-04-151-1/+1
|/ | | | | | Fixes the issue described here: https://github.com/rabbitmq/rabbitmq-schema-definition-sync/issues/2#issuecomment-613075591
* Remove a couple of highly timing-sensitive testsMichael Klishin2020-04-131-54/+0
| | | | | | | | | | | When asserting that connections to a down virtual host cannot be open we compete against a desired behavior: quick recovery of the virtual host process tree. Unlike with the "X eventually happens or we fail with a timeout" kind of scenarios, here the test must catch a very specific time window. Instead of exposing virtual host internals to the test so that it's more observable from the outside we concluded that these tests are not critically important to have.
* dynamic_ha_SUITE: make some tests less timing-dependentMichael Klishin2020-04-131-14/+28
|
* Simplify a flaky test that overlaps with other tests and suitesMichael Klishin2020-04-131-45/+2
| | | | | and is quite coarse grained, meaning it is hard to keep track of node state for its duration.
* Merge pull request #2308 from rabbitmq/remove-flaky-testsMichael Klishin2020-04-116-235/+40
|\ | | | | Remove flaky tests
| * Remove tests from ct-simple_ha that fail frequentlyGerhard Lazu2020-04-091-51/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After looking at the last 20 builds in the server-release:v3.9.x pipeline, 2 builds succeeded on the first run. Even though we retry each build 3 times, out of the last 50 builds, 28 passed & 22 failed. We (+@dumbbell) removed the tests which fail more often than they pass. If anyone wants to add them back, please rewrite them as in the current form they simply don't work. Alternatively, they are hinting to a failure in the system which needs addressing. Both are out of the current scope. cc @michaelklishin Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
| * Remove publisher_confirms & confirm_nack from mirrored_queue testsGerhard Lazu2020-04-091-11/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conclusion of the following conversation: - Gerhard Lazu: what should we do about these recurring test failures? https://github.com/rabbitmq/rabbitmq-server/runs/548636403?check_suite_focus=true#step:6:185 == publisher_confirms_parallel_SUITE == publisher_confirms_parallel_SUITE > publisher_confirm_tests > mirrored_queue > publisher_confirms #1. {error,{noproc,{gen_server,call, [<0.393.0>, {wait_for_confirms,5000}, infinity]}}} publisher_confirms_parallel_SUITE > publisher_confirm_tests > mirrored_queue > confirm_nack #1. {error, {{badmatch,received_ack_instead_of_nack}, [{publisher_confirms_parallel_SUITE,confirm_nack,1, [{file,"test/publisher_confirms_parallel_SUITE.erl"}, {line,247}]}, {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1562}]}, {test_server,run_test_case_eval1,6, [{file,"test_server.erl"},{line,1080}]}, {test_server,run_test_case_eval,9, [{file,"test_server.erl"},{line,1012}]}]}} - Karl Nilsson: I always suspected that test to be racy - Gerhard Lazu: In which case, I would like to delete it. Does someone want to fix it instead? - Karl Nilsson: How do we know the queue exits after the channel processing the publish but before the queue mirrors sending all acks the and ch returning with ack - Michael Klishin: If there is no way to make it not racy or something like Jepsen would make more sense to test this kind of thing, we can simply delete the test - Karl Nilsson: Great point Michael, the Jensen suite should provide some coverage of this. - Karl Nilsson: I think we are safe to remove these tests, at least the ones using mirroring - Michael Klishin: Please go ahead and remove them Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
| * Remove rejects_survive test flakeGerhard Lazu2020-04-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://github.com/rabbitmq/rabbitmq-server/runs/522909848?check_suite_focus=true#step:4:512 simple_ha_SUITE > cluster_size_3 > overflow_reject_publish > rejects_survive_stop #1. {error,{error,received_both_acks_and_nacks}} - It's not helping anyone in the current state - I don't have enough context to be able to fix it - I need to stay focused on the current task, cannot afford to context switch - Feel free to fix it if it's important, otherwise leave it deleted cc @michaelklishin @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
| * Remove variable_queue_fold test flakeGerhard Lazu2020-04-091-38/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://github.com/rabbitmq/rabbitmq-server/runs/522627483?check_suite_focus=true#step:4:769 backing_queue_SUITE > backing_queue_tests > backing_queue_embed_limit_0 > variable_queue_default > variable_queue_fold #1. {error, {error, {case_clause,undefined}, [{file_handle_cache,'-partition_handles/1-fun-0-',2, [{file,"src/file_handle_cache.erl"},{line,770}]}, {file_handle_cache,get_or_reopen,1, [{file,"src/file_handle_cache.erl"},{line,709}]}, {file_handle_cache_stats,timer_tc,1, [{file,"src/file_handle_cache_stats.erl"},{line,63}]}, {file_handle_cache_stats,update,2, [{file,"src/file_handle_cache_stats.erl"},{line,49}]}, {file_handle_cache,with_handles,3, [{file,"src/file_handle_cache.erl"},{line,666}]}, {rabbit_msg_store,read_from_disk,2, [{file,"src/rabbit_msg_store.erl"},{line,1251}]}, {rabbit_msg_store,client_read3,3, [{file,"src/rabbit_msg_store.erl"},{line,674}]}, {rabbit_msg_store,safe_ets_update_counter,5, [{file,"src/rabbit_msg_store.erl"},{line,1306}]}]}} - It's not helping anyone in the current state - I don't have enough context to be able to fix it - I need to stay focused on the current task, cannot afford to context switch - Feel free to fix it if it's important, otherwise leave it deleted cc @michaelklishin @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
| * Remove calculate_client_local test flakeGerhard Lazu2020-04-091-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gets aborted after 30 minutes: https://github.com/rabbitmq/rabbitmq-server/runs/522332298?check_suite_focus=true#step:4:479 queue_master_location_SUITE > cluster_size_3 > calculate_client_local #1. {skip, {failed, {queue_master_location_SUITE,init_per_testcase, {timetrap_timeout,1800000}}}} - It's not helping anyone in the current state - I don't have enough context to be able to fix it - I need to stay focused on the current task, cannot afford to context switch - Feel free to fix it if it's important, otherwise leave it deleted cc @michaelklishin @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
| * Remove mixed_dead_alive_queues_reject test flakeGerhard Lazu2020-04-091-60/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://github.com/rabbitmq/rabbitmq-server/runs/521986549#step:4:468 confirms_rejects_SUITE > parallel_tests > mixed_dead_alive_queues_reject #1. {error, {expecting_nack_got_ack, [{confirms_rejects_SUITE,mixed_dead_alive_queues_reject,1, [{file,"test/confirms_rejects_SUITE.erl"},{line,158}]}, {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1562}]}, {test_server,run_test_case_eval1,6, [{file,"test_server.erl"},{line,1080}]}, {test_server,run_test_case_eval,9, [{file,"test_server.erl"},{line,1012}]}]}} - It's not helping anyone in the current state - I don't have enough context to be able to fix it - I need to stay focused on the current task, cannot afford to context switch - Feel free to fix it if it's important, otherwise leave it deleted cc @michaelklishin @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
| * Remove member_death test flakeGerhard Lazu2020-04-091-26/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://github.com/rabbitmq/rabbitmq-server/runs/521844674?check_suite_focus=true#step:4:468 gm_SUITE > member_death #1. {error,{thrown,timeout_waiting_for_death_3_2}} - It's not helping anyone in the current state - I don't have enough context to be able to fix it - I need to stay focused on the current task, cannot afford to context switch - Feel free to fix it if it's important, otherwise leave it deleted cc @michaelklishin @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
| * Remove dead_queue_rejects test flakeGerhard Lazu2020-04-091-37/+0
| | | | | | | | | | | | | | | | | | | | | | - It's not helping anyone in the current state - I don't have enough context to be able to fix it - I need to stay focused on the current task, cannot afford to context switch - Feel free to fix it if it's important, otherwise leave it deleted cc @michaelklishin @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
* | Merge pull request #2311 from rabbitmq/rabbitmq-cli-389-docsMichael Klishin2020-04-112-1/+27
|\ \ | | | | | | Add docs for check_if_node_is_mirror_sync_critical and check_if_node_…
| * | Add docs for check_if_node_is_mirror_sync_critical and ↵Luke Bakken2020-04-102-1/+27
|/ / | | | | | | | | | | check_if_node_is_quorum_critical Part of rabbitmq/rabbitmq-cli#389
* | Merge pull request #2310 from rabbitmq/t-list-queues-online-and-offlineMichael Klishin2020-04-101-0/+5
|\ \ | | | | | | Wait until node detects new cluster configuration
| * | Wait until node detects new cluster configurationdcorbacho2020-04-101-0/+5
|/ / | | | | | | | | | | | | CI has failed with an mnesia error where the rabbit_queue table doesn't exist. The actual logs don't show any error on the remaining node so let's assume that is mnesia detecting the other node going down. This really shouldn't happen, but I can't reproduce it either.
* | Attempt to correct a testMichael Klishin2020-04-101-3/+0
|/
* Merge pull request #2307 from rabbitmq/fix-log_management_SUITEJean-Sébastien Pédron2020-04-091-41/+32
|\ | | | | unit_log_management_SUITE: Simplify code of `log_file_fails_to_initialise_during_startup`
| * unit_log_management_SUITE: Simplify code of ↵Jean-Sébastien Pédron2020-04-091-41/+32
| | | | | | | | | | | | | | `log_file_fails_to_initialise_during_startup` Also, add more log messages to help us debug this testcase when it fails.
* | Remove all ct-partition test flakesGerhard Lazu2020-04-091-480/+0
|/ | | | | | | | | | | | | | Some of them we run up to 20 times (!!!) to make sure that they succeed. - They are not helping anyone in the current state - I don't have enough context to be able to fix them - I need to stay focused on the current task, cannot afford to context switch - Feel free to fix it if it's important, otherwise leave it deleted cc @michaelklishin @dumbbell Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk> (cherry picked from commit a835d3271680ad6db5663f504f08fd0db4ee21c2)
* Merge pull request #2304 from rabbitmq/skip-reginit-if-no-ff-from-unknown-appsJean-Sébastien Pédron2020-04-093-45/+272
|\ | | | | rabbit_feature_flags: Multiple fixes and optimizations to get rid of race conditions
| * rabbit_feature_flags: Add a FIXME to try_to_write_enabled_feature_flags_list()Jean-Sébastien Pédron2020-04-091-0/+2
| | | | | | | | | | We need to handle concurrent calls to this function to avoid any issues with parallel read/modify/write operations.
| * rabbit_feature_flags: Restart registry regen if it changed meanwhileJean-Sébastien Pédron2020-04-091-23/+70
| | | | | | | | | | | | | | | | | | | | | | | | Before we query all the details needed to generate a new registry, we save the version of the currently loaded one, using the `vsn` Erlang module attribute added by the compiler. When the new registry is ready to be loaded, we verify again the version of the loaded one: if it differs, it means a concurrent process reloaded the registry. In this case, we restart the entire regen procedure, including the query of fresh details. The goal is to avoid any loss of information from the existing registry.
| * rabbit_feature_flags: Fix concurrent registry module reloadJean-Sébastien Pédron2020-04-091-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | By calling code:delete() ourselves, we created a time window where there is no registry module loaded between that call and the following code:load_binary(). This meant that a concurrent access to the registry would trigger a load of the initial uninitialized registry module from disk. That module would then trigger a reload itself, leading to possible deadlock. In fact, in code:load_binary(), the code server already takes care of replacing the module atomically. We don't need to do anything.
| * rabbit_feature_flags: Log more details during registry reloadingJean-Sébastien Pédron2020-04-092-7/+50
| |