| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't actually need to write out the pg map epoch on every
activate_map as long as:
a) the osd does not trim past the oldest pg map persisted
b) the pg does update the persisted map epoch from time
to time.
To that end, we now keep a reference to the last map persisted.
The OSD already does not trim past the oldest live OSDMapRef.
Second, handle_activate_map will trim if the difference between
the current map and the last_persisted_map is large enough.
Fixes: #4731
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 2c5a9f0e178843e7ed514708bab137def840ab89)
Conflicts:
src/common/config_opts.h
src/osd/PG.cc
- last_persisted_osdmap_ref gets set in the non-static
PG::write_info
Conflicts:
src/osd/PG.cc
|
|\
| |
| |
| | |
Reviewed-by: Samuel Just <sam.just@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
PG::log, PG::ondisklog, PG::missing are moved from PG to a new PGLog
class and are made protected data members. It is a preliminary step
before writing unit tests to cover the methods that have side effects
on these data members and define a clean PGLog API. It improves
encapsulation and does not change any of the logic already in
place.
Possible issues :
* an additional reference (PG->PGLog->IndexedLog instead of
PG->IndexedLog for instance) is introduced : is it optimized ?
* rewriting log.log into pg_log.get_log().log affects the readability
but should be optimized and have no impact on performances
The guidelines followed for this patch are:
* const access to the data members are preserved, no attempt is made
to define accessors
* all non const methods are in PGLog, no access to non const methods of
PGLog::log, PGLog::logondisk and PGLog::missing are provided
* when methods are moved from PG to PGLog the change to their
implementation is restricted to the minimum.
* the PG::OndiskLog and PG::IndexedLog sub classes are moved
to PGLog sub classes unmodified and remain public
A const version of the pg_log_t::find_entry method was added.
A const accessor is provided for PGLog::get_log, PGLog::get_missing,
PGLog::get_ondisklog but no non-const accessor.
Arguments are added to most of the methods moved from PG to PGLog so
that they can get access to PG data members such as info or log_oid.
The PGLog method are sorted according to the data member they modify.
//////////////////// missing ////////////////////
* The pg_missing_t::{got,have,need,add,rm} methods are wrapped as
PGLog::missing_{got,have,need,add,rm}
//////////////////// log ////////////////////
* PGLog::get_tail, PGLog::get_head getters are created
* PGLog::set_tail, PGLog::set_head, PGLog::set_last_requested setters
are created
* PGLog::index, PGLog::unindex, PGLog::add wrappers,
PGLog::reset_recovery_pointers are created
* PGLog::clear_info_log replaces PG::clear_info_log
* PGLog::trim replaces PG::trim
//////////////////// log & missing ////////////////////
* PGLog::claim_log is created with code extracted from
PG::RecoveryState::Stray::react.
* PGLog::split_into is created with code extracted from
PG::split_into.
* PGLog::recover_got is created with code extracted from
ReplicatedPG::recover_got.
* PGLog::activate_not_complete is created with code extracted
from PG::active
* PGLog:proc_replica_log is created with code extracted from
PG::proc_replica_log
* PGLog:write_log is created with code extracted from
PG::write_log
* PGLog::merge_old_entry replaces PG::merge_old_entry
The remove_snap argument is used to collect hobject_t
* PGLog::rewind_divergent_log replaces PG::rewind_divergent_log
The remove_snap argument is used to collect hobject_t
A new PG::rewind_divergent_log method is added to call
remove_snap_mapped_object on each of the remove_snap
elements
* PGLog::merge_log replaces PG::merge_log
The remove_snap argument is used to collect hobject_t
A new PG::merge_log method is added to call
remove_snap_mapped_object on each of the remove_snap
elements
* PGLog:write_log is created with code extracted from PG::write_log. A
non-static version is created for convenience but is a simple
wrapper.
* PGLog:read_log replaces PG::read_log. A non-static version is
created for convenience but is a simple wrapper.
* PGLog:read_log_old replaces PG::read_log_old.
http://tracker.ceph.com/issues/5046 refs #5046
Signed-off-by: Loic Dachary <loic@dachary.org>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sysfs entries for snapshots went away a while ago, and this
script used them to verify sizes matched what was expected.
Instead, look at the mapped size of the snapshot in the places
that used to look for the image's snapshot sysfs files.
Also, switch over to using "udevadm settle" rather than a delay to
wait for udev to do its thing. Insert them at more appropriate
places--right after "rmd map" commands and before and after the
"rbd unmap" calls.
Stop doing the manual refresh calls as well. The osd will trigger
refreshes whenever the image size or shapshot context changes.
Finally, the cleanup routine is called initially, when there really
isn't expected to be anything to clean up. Change the rbd commands
to run there conditionally, only if the target of the command
already exists.
Signed-off-by: Alex Elder <elder@inktank.com>
|
|
|
|
|
|
|
|
| |
cur_ios, etc may not be zero due to an in progress
flush.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
|
|
|
|
|
|
| |
These are just for the cinder configuration, nothing else changed.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
| |
Added the -r option, which starts the radosgw and apache2 to access it
to the usage message.
Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
|
|
|
|
|
|
|
| |
There's no guarantee the rbd module is loaded when this script is
run, so add a line that loads it if necessary.
Signed-off-by: Alex Elder <elder@inktank.com>
|
|\
| |
| | |
Reviewed-by: Samuel Just <sam.just@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we are (wrongly) marked down, we need to go into the waiting-for-healthy
state and verify that our network interfaces are working before trying to
rejoin the cluster.
- make _is_healthy() check require positive proof of pings working
- do heartbeat checks and updates in this state
- reset the random peers every heartbeat_interval, in case we keep picking
bad ones
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
is_unhealthy() will assume they are healthy for some period after we
send our first ping attempt. is_healthy() is now a strict check that we
know they are healthy.
Switch the failure report check to use is_unhealthy(); use is_healthy()
everywhere else, including the waiting-for-healthy pre-boot checks.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| | |
If a (say, random) peer goes down, filter it out.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| | |
We will soon be in this method for the waiting-for-healthy state. As
a consequence, we need to remove any down peers.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
- always include our neighbors to ensure we have a fully-connected
graph
- include some random neighbors to get at least some min number of peers.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| | |
For now we still only look at the internal heartbeats.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| | |
sub_want() returns true if this is a new sub; only renew then.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This has a slight behavior change in that we ask the mon for the latest
osdmap if our internal heartbeat is failing. That isn't useful yet, but
will be shortly.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If we use operator[] on a new int field its value is undefined; avoid
reading it or using |= et al until we initialize it.
Fixes: #4967
Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If we are in the SCAN state, stay there until the recovery finishes. Do
not jump to another state from file_eval().
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 0071b8e75bd3f5a09cc46e2225a018f6d1ef0680)
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For a list-snaps operation on the snapdir, do not assume that the obc for the
head means the object exists. This fixes a race between a head deletion and
a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress
test when thrashing OSDs.
Fixes: #5183
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | | |
Fixes: #4782
Reviewed-by: Sage Weil
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We may want to influence the caching behavior for other
reasons.
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The object would have had to have been removed already. With
fd caching, this extra remove might check the wrong replay_guard
since the fd caching mechanism assumes that between any operation
on an hobject_t oid and a remove operation, all operations on that
hobject_t must refer to the same inode.
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Clear clears a key/value from the cache.
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If we are in the SCAN state, stay there until the recovery finishes. Do
not jump to another state from file_eval().
Signed-off-by: Sage Weil <sage@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \ \ \
| |_|_|/
|/| | |
| | | |
| | | |
| | | |
| | | | |
Reviewed-by: Sage Weil <sage@inktank.com>
Conflicts:
src/mds/MDCache.cc
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Also add a new config option "mds_open_remote_link_mode". The anchor
approach is used by default. If mode is non-zero, use the open-by-ino
function. In case open-by-ino function fails, if mode is 1, retry
using the anchor approach, otherwise trigger assertion.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When a recovering MDS enters reconnect stage, client sends reconnect
messages to it. The message lists open files, their path, and issued
caps. If an inode is not in the cache, the recovering MDS uses the
path client provides to determine if it's the inode's authority. If
not, the recovering MDS exports the inode's caps to other MDS. The
issue here is that the path client provides isn't always accuracy.
The fix is use recently added "open inode by ino" function to open
any missing cap inodes when the recovering MDS enters rejoin stage.
Send cache rejoin messages to other MDS after all caps' authorities
are determined.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|