| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |\
| |
| | |
Add optional prefix for RabbitMQ node FQDNs
|
| | |
| |
| |
| |
| | |
It would allow to instantiate multiple rabbit clusters constructed
from prefix-based instances of rabbit nodes.
|
| |\ \
| |/
|/| |
Improve diagnostics in 'rabbitmq-server' script
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While errors are detected with '-e' shell option in this script,
diagnostics messages leave a lot to be desired.
E.g. when trying to write pid file to full partition, the only message
in log is:
sh: echo: I/O error
Which is definitely insufficient
|
| |\ \
| | |
| | | |
Fix rabbitMQ OCF monitor detection of running master
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When monitor detected the node as OCF_RUNNING_MASTER, this may be
lost while the monitor checks in progress.
* Rework the prev_rc by the rc_check to fix this.
* Also add info log if detected as running master.
* Break the monitor check loop early, if it shall be exiting to be
restarted by pacemaker.
* Do not recheck the master status and do not update the master score,
if the node was already detected by monitor as OCF_RUNNING_MASTER.
By that point, the running and healthy master shall not be checked
against other nodes uptime as it is pointless and only takes more
time and resources for the action monitor to finish.
* Fail early, if monitor detected the node as OCF_RUNNING_MASTER, but
the rabbit beam process is not running
* For OCF_CHECK_LEVEL>20, exclude the current node from the check
loop as we already checked it before
Related Fuel bug:
https://launchpad.net/bugs/1531838
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
|
| |\ \ \
| | | |
| | | | |
Fix 'rabbitmqctl rotate_logs' behaviour
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When 'rabbitmqctl rotate_logs' is called without any parameters, it
clears logs unconditionally. And given that this form is used in
logrotate config files, this could result in data loss.
This could be reproduced with following scenario:
1) 'max_size' is set globally in lograte config
2) One of two rabbitmq logs is greater than that limit
3) Daily logrotate run was already performed today, and now we
are calling it manually. In this case logrotate will copy only file
that is bigger than max_size, but 'rabbitmqctl rotate_logs' will
clear both of them - leading to data loss.
|
| |\ \ \ \
| |_|/ /
|/| | | |
Limit number of unique node names for rabbitmqctl
|
| |/ / /
| | |
| | |
| | |
| | |
| | | |
It prevents atom table overflow in a long running broker.
Fixes #549
|
| |\ \ \
| |/ /
|/| | |
Make number of Ranch acceptors configurable
|
| | |\ \
| |/ /
|/| | |
|
| |\ \ \
| | | |
| | | | |
OCF: Fuel bug 1529897
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* Fix get status() to catch beam state and output errors
* Fix action_stop() to force name-based mathcing then no
pidfile and the beam's unresponsive
* Fix proc_stop to use name based matching if no pidfile
found
* Fix proc_stop to retry sending the signal when using the name
based match as well
W/o this patch, the situation is possible when:
- beam's running and cannot process signals, but is reported "not running"
by the get_status(), while in fact it shall be reported as generic error
- which_applications() returned error, while its output is still
being parsed for the "what" match, while it shall not.
- action stop and proc_stop gives up then there is no pidfile and the beam's
running unresponsive.
The solution is to make get_status to return generic error and action
stop to use the rabbit process name matching for killing it.
Related Fuel bug:
https://bugs.launchpad.net/fuel/+bug/1529897
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
W/o this fix, the rabbit OCF cannot make
proc_stop to try to kill the pid-less beam process
by its name matching because the proc_kill()'s
1st parameter cannot be passed empty.
The fix is to use the "none" value then the pid-less
process must be matched by the service_name instead.
Also, fix the proc_kill to deal with Multi process
pid files as well (there are many pids, a space separated).
Related Fuel bugs:
https://launchpad.net/bugs/1529897
https://launchpad.net/bugs/1532723
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
|
| | | | |
| | | |
| | | |
| | | |
| | | | |
After this change `rabbitmqctl cluster_status` will print information
about alarms raised across a cluster.
|
| |\ \ \ \
| | | | |
| | | | | |
Set deleting exchange status
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This makes sure that values that were set right before
node failure or restart are not retained.
|
| | | | | | |
|
| | | | | | |
|
| | |\ \ \ \ |
|
| | | | | | | |
|
| | | | | | |
| | | | | |
| | | | | |
| | | | | | |
* Avoids race condition between declare and delete
|
| |\ \ \ \ \ \
| |_|/ / / /
|/| | / / /
| | |/ / /
| |/| | | |
Syntax and local vars usage fixes to OCF HA
|
| | | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Related Fuel bug:
https://launchpad.net/bugs/1529897
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
|
| | |_|/ /
|/| | | |
|
| |\ \ \ \
| |/ / /
| | / /
| |/ /
|/| | |
Remove unneeded sleep for a graceful stop by PID
|
| |/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The sleep in not needed according to the
https://www.rabbitmq.com/man/rabbitmqctl.1.man.html
"If a pid_file is specified, also waits for the process
specified there to terminate."
Related Fuel bug https://launchpad.net/bugs/1529897
Related PR
https://github.com/rabbitmq/rabbitmq-server/pull/523
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| |\ \
| | |
| | | |
Adds handling for bindings not of type 'queue'
|
| |/ /
| |
| |
| |
| |
| | |
exchange type bindings, for example.
References #521
|
| |/ |
|
| |\
| |
| | |
Ensure rabbit node uptime is reset in the CIB for OCF resource
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add ocf_run wrappers and info log messages for CIB attribute events
* Move "fast" CIB attribute updates before "heavy" operations like
start/stop/wait to ensure CIB consistent even if the timeouts
exceeded for the ops
* Delete master and start time attributes from CIB on action_start
to ensure the correct rabbit nodes uptime evaluation for new
master elections for corresponding pacemaker resources
* For post-demote notify and action_demote() delete the master
attribute from CIB as well.
* For post-start notify, update the start time in the CIB even when
the node is already clustered. Otherwise it would remain running
in cluster w/o the start time registered, which affects the new
master elections badly.
* fix wrong log message when joining by a node
Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1530150
https://bugs.launchpad.net/fuel/+bug/1530296
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
|
| |\
| |
| | |
Fix stop conditions for the rabbit OCF resource
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Fix the get_status() unexpectedly reports generic error
instead of "not running"
* Add proc_stop and proc_kill functions
(TODO these shall go as external common ocf heplers, eventually)
* Rework stop_server_process()
- make it to return SUCCESS/ERROR as expected
- grant the "rabbitmqctl stop" a graceful termintation window and only
then ensure the beam process termination and pidfile removal as well
- return the actual status with get_status()
* Rework kill_rmq_and_remove_pid()
- use proc_stop to try to kill by pgrp with -TERM, then -KILL, or
by the beam process name match, if there is no PID.
- make it to returns SUCCESS/ERROR
* Fix action_stop()
- fail early by the stop_server_process() results without additional
rabbitmqctl invocations in the get_status() call
- rework hard-coded sleep 10 to use the gracefull stop windows in the
stop_server_process() instead
- ensure the rabbit-start-time removal from CIB before to try to stop
the server process
- issue the "stop: action end" log record before the actual end
* Add comments and make logs to be more informational
Related Fuel bug https://bugs.launchpad.net/fuel/+bug/1529897
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Alex Schultz <aschultz@mirantis.com>
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |\
| |
| | |
Adds exception handling to rabbit_vm:bytes/1
|
| | |
| |
| |
| | |
References #328
|
| | |
| |
| |
| | |
Fixes #328
|
| | | |
|
| |\ \ |
|