| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PG::log, PG::ondisklog, PG::missing are moved from PG to a new PGLog
class and are made protected data members. It is a preliminary step
before writing unit tests to cover the methods that have side effects
on these data members and define a clean PGLog API. It improves
encapsulation and does not change any of the logic already in
place.
Possible issues :
* an additional reference (PG->PGLog->IndexedLog instead of
PG->IndexedLog for instance) is introduced : is it optimized ?
* rewriting log.log into pg_log.get_log().log affects the readability
but should be optimized and have no impact on performances
The guidelines followed for this patch are:
* const access to the data members are preserved, no attempt is made
to define accessors
* all non const methods are in PGLog, no access to non const methods of
PGLog::log, PGLog::logondisk and PGLog::missing are provided
* when methods are moved from PG to PGLog the change to their
implementation is restricted to the minimum.
* the PG::OndiskLog and PG::IndexedLog sub classes are moved
to PGLog sub classes unmodified and remain public
A const version of the pg_log_t::find_entry method was added.
A const accessor is provided for PGLog::get_log, PGLog::get_missing,
PGLog::get_ondisklog but no non-const accessor.
Arguments are added to most of the methods moved from PG to PGLog so
that they can get access to PG data members such as info or log_oid.
The PGLog method are sorted according to the data member they modify.
//////////////////// missing ////////////////////
* The pg_missing_t::{got,have,need,add,rm} methods are wrapped as
PGLog::missing_{got,have,need,add,rm}
//////////////////// log ////////////////////
* PGLog::get_tail, PGLog::get_head getters are created
* PGLog::set_tail, PGLog::set_head, PGLog::set_last_requested setters
are created
* PGLog::index, PGLog::unindex, PGLog::add wrappers,
PGLog::reset_recovery_pointers are created
* PGLog::clear_info_log replaces PG::clear_info_log
* PGLog::trim replaces PG::trim
//////////////////// log & missing ////////////////////
* PGLog::claim_log is created with code extracted from
PG::RecoveryState::Stray::react.
* PGLog::split_into is created with code extracted from
PG::split_into.
* PGLog::recover_got is created with code extracted from
ReplicatedPG::recover_got.
* PGLog::activate_not_complete is created with code extracted
from PG::active
* PGLog:proc_replica_log is created with code extracted from
PG::proc_replica_log
* PGLog:write_log is created with code extracted from
PG::write_log
* PGLog::merge_old_entry replaces PG::merge_old_entry
The remove_snap argument is used to collect hobject_t
* PGLog::rewind_divergent_log replaces PG::rewind_divergent_log
The remove_snap argument is used to collect hobject_t
A new PG::rewind_divergent_log method is added to call
remove_snap_mapped_object on each of the remove_snap
elements
* PGLog::merge_log replaces PG::merge_log
The remove_snap argument is used to collect hobject_t
A new PG::merge_log method is added to call
remove_snap_mapped_object on each of the remove_snap
elements
* PGLog:write_log is created with code extracted from PG::write_log. A
non-static version is created for convenience but is a simple
wrapper.
* PGLog:read_log replaces PG::read_log. A non-static version is
created for convenience but is a simple wrapper.
* PGLog:read_log_old replaces PG::read_log_old.
http://tracker.ceph.com/issues/5046 refs #5046
Signed-off-by: Loic Dachary <loic@dachary.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we are (wrongly) marked down, we need to go into the waiting-for-healthy
state and verify that our network interfaces are working before trying to
rejoin the cluster.
- make _is_healthy() check require positive proof of pings working
- do heartbeat checks and updates in this state
- reset the random peers every heartbeat_interval, in case we keep picking
bad ones
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
| |
is_unhealthy() will assume they are healthy for some period after we
send our first ping attempt. is_healthy() is now a strict check that we
know they are healthy.
Switch the failure report check to use is_unhealthy(); use is_healthy()
everywhere else, including the waiting-for-healthy pre-boot checks.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
| |
If a (say, random) peer goes down, filter it out.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
|
| |
We will soon be in this method for the waiting-for-healthy state. As
a consequence, we need to remove any down peers.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
| |
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
|
|
| |
- always include our neighbors to ensure we have a fully-connected
graph
- include some random neighbors to get at least some min number of peers.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
| |
For now we still only look at the internal heartbeats.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
| |
sub_want() returns true if this is a new sub; only renew then.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
| |
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
|
|
| |
This has a slight behavior change in that we ask the mon for the latest
osdmap if our internal heartbeat is failing. That isn't useful yet, but
will be shortly.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
| |
Fix bug introduced in 27381c0c6259ac89f5f9c592b4bfb585937a1cfc.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a few bugs introduced by 27381c0c6259ac89f5f9c592b4bfb585937a1cfc:
- check against both front and back cons; either one may have failed.
- close *both* front and back before reopening either. this is
overkill, but slightly simpler code.
- fix leak of con when marking down
- handle race against osdmap update and note_down_osd
Fixes: #5172
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\
| |
| | |
Reviewed-by: Samuel Just <sam.just@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Send ping requests to both the front and back hb addrs for peer osds. If
the front hb addr is not present, do not send it and interpret a reply
as coming from both. This handles the transition from old to new OSDs
seamlessly.
Note both the front and back rx times. Both need to be up to date in order
for the peer to be healthy.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| | |
We used to only need to avoid 2 ports; now we need 3. Make it a set so we
don't have this problem later.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| | |
We still aren't binding it to anything yet, or putting it in the OSDMap.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| | |
The hb_front messenger is not used yet.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| | |
Fixes crash when the OSD has not successfully booted and gets a
SIGINT or SIGTERM.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
|
|\ \
| |/
|/|
| | |
Reviewed-by: Sam Just <sam.just@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Do arithmetic so large intervals don't wrap
Fix log messages to reflect the change and improve output
Add message when skipping scrub due to load
fixes: #5049
Signed-off-by: David Zafman <david.zafman@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Set initial values for last_scrub_stamp, last_deep_scrub_stamp
fixes: #5050, #5051
Signed-off-by: David Zafman <david.zafman@inktank.com>
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, we simply queued ops in the OpWQ without checking. The PG
would then check in do_request whether the message should wait for a new
map. Unfortunately, this has the side effect that any op requeued for
any reason must also requeue the waiting_for_map queue.
Now, we will check before queueing the op whether it must wait on a map.
To avoid contention, there is now a map_lock which must be held along
with the PG lock in order to update the osdmap_ref. The map_lock also
protects the waiting_for_map list and queueing PG ops at the back of
the OpWQ. A few details:
1) It is no longer necessary to requeue waiting_for_map in on_change()
since the other ops are queued at the front.
2) Once waiting_for_map is non-empty, all ops are delayed to simplify
ordering.
3) waiting_for_map may now be non-empty during split, so we must split
waiting_for_map along with waiting_for_active. This must be done
under the map_lock.
The bug which uncovered this involved an out of order op as follows:
client.4208.0:2378 (e252) arrives, object is degraded
client.4208.0:2379 (e253) arrives, waits for map
client.4208.0:2378 (e252) is requeued after recovery
client.4208.0:2379 (e253) is requeued on map arrival
client.4208.0:2379 is processed
client.4208.0:2378 is processed
Fixes: #4955
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
|\ \
| | |
| | |
| | | |
Reviewed-by: Sam Just <sam.just@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Three Reservation priorities from RECOVERY, BACKFILL_HIGH, BACKFILL_LOW
fixes: #4273
Signed-off-by: David Zafman <david.zafman@inktank.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: David Zafman <david.zafman@inktank.com>
|
|\ \ \
| | | |
| | | |
| | | | |
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
DeletingState now allows _create_lock_pg() to attempt to cancel
pg deletion.
PG::init() must mark the PG as backfill iff we stopped a deletion.
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| |/ /
| | |
| | |
| | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
CID 1019627 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member "next_notif_id" is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |/
|/|
| |
| |
| |
| |
| | |
CID 1019628 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member "test_ops_hook" is not initialized in this constructor nor in any functions that it calls.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \
| |/
|/|
| | |
Fixes: #4927
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, we failed to clear snap_collections, which causes split to
spawn a bunch of snap collections. In load_pgs, we now clear any such
snap collections and then snap_collections field on the PG itself.
Related: #4927
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|/
|
|
|
| |
Fixes: #4639
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
|
|
|
| |
We need to go map-by-map to get the parents right in consume_map()
just as we must in load_pgs().
Fixes: 4884
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
|
|
|
|
|
|
|
|
|
|
| |
In load_pgs(), we previously called assigned children starting
at the loaded pg created between its stored epoch and the current
osdmap to have that pg as their parent. This is not correct, some
of the children may have been split in subsequent epochs from children
split in earlier epochs. Instead, do each map individually.
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
|
|
|
|
|
|
| |
expand_pg_num() and load_pgs() may result in a pg with children
in pending_splits which also have children in pending_splits (etc).
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two cases: 1) The parent pg has not yet initiated the split 2) The
parent pg has initiated the split.
Previously in case 1), _remove_pg left the entry for its children in the
in_progress_splits map blocking subsequent peering attempts.
In case 1), we need to unblock requests on the child pgs for the parent on
parent removal. We don't need to bother waking requests since any requests
received prior to the remove_pg request are necessarily obsolete.
In case 2), we don't need to do anything: the child will complete the split on
its own anyway.
Thus, we now track pending_splits vs in_progress_splits. Children in
pending_splits are in state 1), in_progress_splits in state 2). split_pgs
bumps pgs from pending_splits to in_progress_splits atomically with respect to
_remove_pg since the parent pg lock is held in both places.
Fixes: #4813
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
|