diff options
| author | Francesco Mazzoli <francesco@rabbitmq.com> | 2012-05-15 18:04:32 +0100 |
|---|---|---|
| committer | Francesco Mazzoli <francesco@rabbitmq.com> | 2012-05-15 18:04:32 +0100 |
| commit | bdd62ebd05e37de6bf38eef43d03c4bbca63602f (patch) | |
| tree | d4ab45498714379e82bce2aafdc05b7e91d1a54f /src/rabbit.erl | |
| parent | 44db2ff0480de4ba4708f24fcdf2cc0c88003f26 (diff) | |
| download | rabbitmq-server-git-bdd62ebd05e37de6bf38eef43d03c4bbca63602f.tar.gz | |
store more info about the cluster on disc, check other nodes before clustering
Now the `cluster_nodes.config; doesn't store a "config" anymore, but all
the information we need about the cluster nodes. This file is updated
whenever a new node comes up. Moreover, this file should be *always*
present, and it's set up in `rabbit:prepare/0'.
Now that we have this file, the various functions regarding the status
of the cluster (`all_clustered_nodes/0', `running_clustered_nodes/0;, etc.)
can "work" even when mnesia is down (and when the rode is a ram node).
Given the assumption that the status file is up to date, with the status
of the cluster (before the node went down if the node is down) a lot of
operations become easier.
I'd like a review at this point, the best thing is probably to diff directly
with default since a lot of previous commits are not relevant anymore.
The most noticeable changes compared to default:
* `join_cluster' works differently. The nodes provided will be used to
"discover" the cluster, and are not used to determine whether the node
should be ram or not. The node will be a disc node by default, and
can be initialized as RAM with the `--ram' flag. The old `cluster'
command is preserved, adapted to work with the new functions while
preserving similar semantics (we decide on whether the node is disc or
not by looking at the list of nodes provided).
* `force_cluster' has been removed.
* The `join_cluster' operation will fail if:
- The node is currently the only disc node of its cluster
- We can't connect to any of the nodes provided
- The node is currently already clustered with the cluster of the nodes
provided
* On default restart RAM nodes try to connect to the nodes that were first
given by the user, and are not aware of the changes that might have
occurred in the cluster when they were online. Since the cluster status
is kept updated on disk now, the RAM node will be aware of changes in the
cluster when restarted.
* Before starting up mnesia, the node contacts the nodes it thinks it's
clustered with, and if the nodes are not clustered with the node anymore
the startup procedure fail. We fail only when we know for sure that
something is wrong - e.g. it won't fail it it doesn't find any online
node
Things to do:
* Implement `uncluster'/`leave_cluster' to kick out a node from a cluster
from another node - this is easy.
* Implement something like `change_node_type', since given how `join_cluster'
works it is not possible right now.
* Rewrite the tests regarding to clustering.
* Think hard about what can go wrong regarding the cluster status file and
the relevant functions. The stuff in `rabbit_upgrade' is particularly
worrying, and I need to make sure that things will work when upgrading
rabbitmq, by reading old file or upgrading them.
* Split `init_db/4' in various functions. We have much stronger assumptions
now, for example we should never need to reset or wipe the schema in
there. In general it's an ugly function, expecially the optional upgrade
part
* Probably something else...
Diffstat (limited to 'src/rabbit.erl')
| -rw-r--r-- | src/rabbit.erl | 10 |
1 files changed, 7 insertions, 3 deletions
diff --git a/src/rabbit.erl b/src/rabbit.erl index bff7af97d2..eff2fac282 100644 --- a/src/rabbit.erl +++ b/src/rabbit.erl @@ -285,6 +285,9 @@ split0([I | Is], [L | Ls]) -> split0(Is, Ls ++ [[I | L]]). prepare() -> ok = ensure_working_log_handlers(), + ok = rabbit_mnesia:ensure_mnesia_dir(), + ok = rabbit_mnesia:initialize_cluster_nodes_status(), + ok = rabbit_mnesia:check_cluster_consistency(), ok = rabbit_upgrade:maybe_upgrade_mnesia(). start() -> @@ -380,7 +383,7 @@ start(normal, []) -> end. stop(_State) -> - ok = rabbit_mnesia:record_running_nodes(), + ok = rabbit_mnesia:update_cluster_nodes_status(), terminated_ok = error_logger:delete_report_handler(rabbit_error_logger), ok = rabbit_alarm:stop(), ok = case rabbit_mnesia:is_clustered() of @@ -509,11 +512,12 @@ sort_boot_steps(UnsortedSteps) -> end. boot_step_error({error, {timeout_waiting_for_tables, _}}, _Stacktrace) -> + {AllNodes, _, _} = rabbit_mnesia:read_cluster_nodes_status(), {Err, Nodes} = - case rabbit_mnesia:read_previously_running_nodes() of + case AllNodes -- [node()] of [] -> {"Timeout contacting cluster nodes. Since RabbitMQ was" " shut down forcefully~nit cannot determine which nodes" - " are timing out.~n"}; + " are timing out.~n", []}; Ns -> {rabbit_misc:format( "Timeout contacting cluster nodes: ~p.~n", [Ns]), Ns} |
