summaryrefslogtreecommitdiff
path: root/src/osd/OSD.cc
Commit message (Collapse)AuthorAgeFilesLines
* move log, ondisklog, missing from PG to PGLogLoic Dachary2013-05-301-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PG::log, PG::ondisklog, PG::missing are moved from PG to a new PGLog class and are made protected data members. It is a preliminary step before writing unit tests to cover the methods that have side effects on these data members and define a clean PGLog API. It improves encapsulation and does not change any of the logic already in place. Possible issues : * an additional reference (PG->PGLog->IndexedLog instead of PG->IndexedLog for instance) is introduced : is it optimized ? * rewriting log.log into pg_log.get_log().log affects the readability but should be optimized and have no impact on performances The guidelines followed for this patch are: * const access to the data members are preserved, no attempt is made to define accessors * all non const methods are in PGLog, no access to non const methods of PGLog::log, PGLog::logondisk and PGLog::missing are provided * when methods are moved from PG to PGLog the change to their implementation is restricted to the minimum. * the PG::OndiskLog and PG::IndexedLog sub classes are moved to PGLog sub classes unmodified and remain public A const version of the pg_log_t::find_entry method was added. A const accessor is provided for PGLog::get_log, PGLog::get_missing, PGLog::get_ondisklog but no non-const accessor. Arguments are added to most of the methods moved from PG to PGLog so that they can get access to PG data members such as info or log_oid. The PGLog method are sorted according to the data member they modify. //////////////////// missing //////////////////// * The pg_missing_t::{got,have,need,add,rm} methods are wrapped as PGLog::missing_{got,have,need,add,rm} //////////////////// log //////////////////// * PGLog::get_tail, PGLog::get_head getters are created * PGLog::set_tail, PGLog::set_head, PGLog::set_last_requested setters are created * PGLog::index, PGLog::unindex, PGLog::add wrappers, PGLog::reset_recovery_pointers are created * PGLog::clear_info_log replaces PG::clear_info_log * PGLog::trim replaces PG::trim //////////////////// log & missing //////////////////// * PGLog::claim_log is created with code extracted from PG::RecoveryState::Stray::react. * PGLog::split_into is created with code extracted from PG::split_into. * PGLog::recover_got is created with code extracted from ReplicatedPG::recover_got. * PGLog::activate_not_complete is created with code extracted from PG::active * PGLog:proc_replica_log is created with code extracted from PG::proc_replica_log * PGLog:write_log is created with code extracted from PG::write_log * PGLog::merge_old_entry replaces PG::merge_old_entry The remove_snap argument is used to collect hobject_t * PGLog::rewind_divergent_log replaces PG::rewind_divergent_log The remove_snap argument is used to collect hobject_t A new PG::rewind_divergent_log method is added to call remove_snap_mapped_object on each of the remove_snap elements * PGLog::merge_log replaces PG::merge_log The remove_snap argument is used to collect hobject_t A new PG::merge_log method is added to call remove_snap_mapped_object on each of the remove_snap elements * PGLog:write_log is created with code extracted from PG::write_log. A non-static version is created for convenience but is a simple wrapper. * PGLog:read_log replaces PG::read_log. A non-static version is created for convenience but is a simple wrapper. * PGLog:read_log_old replaces PG::read_log_old. http://tracker.ceph.com/issues/5046 refs #5046 Signed-off-by: Loic Dachary <loic@dachary.org>
* osd: wait for healthy pings from peers in waiting-for-healthy stateSage Weil2013-05-291-23/+72
| | | | | | | | | | | | | If we are (wrongly) marked down, we need to go into the waiting-for-healthy state and verify that our network interfaces are working before trying to rejoin the cluster. - make _is_healthy() check require positive proof of pings working - do heartbeat checks and updates in this state - reset the random peers every heartbeat_interval, in case we keep picking bad ones Signed-off-by: Sage Weil <sage@inktank.com>
* osd: distinguish between definitely healthy and definitely not unhealthySage Weil2013-05-291-1/+1
| | | | | | | | | | | is_unhealthy() will assume they are healthy for some period after we send our first ping attempt. is_healthy() is now a strict check that we know they are healthy. Switch the failure report check to use is_unhealthy(); use is_healthy() everywhere else, including the waiting-for-healthy pre-boot checks. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: remove down hb peersSage Weil2013-05-291-4/+10
| | | | | | If a (say, random) peer goes down, filter it out. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: only add pg peers if activeSage Weil2013-05-291-17/+19
| | | | | | | We will soon be in this method for the waiting-for-healthy state. As a consequence, we need to remove any down peers. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: factor out _remove_heartbeat_peerSage Weil2013-05-291-12/+18
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd: augment osd heartbeat peers with neighbors and randoms, to up some minSage Weil2013-05-291-16/+59
| | | | | | | | - always include our neighbors to ensure we have a fully-connected graph - include some random neighbors to get at least some min number of peers. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: move health checks into a single helperSage Weil2013-05-291-3/+13
| | | | | | For now we still only look at the internal heartbeats. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: avoid duplicate mon requests for a new osdmapSage Weil2013-05-291-2/+2
| | | | | | sub_want() returns true if this is a new sub; only renew then. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: tell peers that ping us if they are deadSage Weil2013-05-291-0/+7
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd: simplify is_healthy() check during bootSage Weil2013-05-291-7/+2
| | | | | | | | This has a slight behavior change in that we ask the mon for the latest osdmap if our internal heartbeat is failing. That isn't useful yet, but will be shortly. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: fix note_down_osdSage Weil2013-05-281-1/+1
| | | | | | Fix bug introduced in 27381c0c6259ac89f5f9c592b4bfb585937a1cfc. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: fix hb con failure handlerSage Weil2013-05-281-11/+19
| | | | | | | | | | | | | Fix a few bugs introduced by 27381c0c6259ac89f5f9c592b4bfb585937a1cfc: - check against both front and back cons; either one may have failed. - close *both* front and back before reopening either. this is overkill, but slightly simpler code. - fix leak of con when marking down - handle race against osdmap update and note_down_osd Fixes: #5172 Signed-off-by: Sage Weil <sage@inktank.com>
* Merge pull request #312 from ceph/wip-osd-hbSage Weil2013-05-231-87/+173
|\ | | | | Reviewed-by: Samuel Just <sam.just@inktank.com>
| * osd: ping both front and back interfacesSage Weil2013-05-221-64/+123
| | | | | | | | | | | | | | | | | | | | | | | | Send ping requests to both the front and back hb addrs for peer osds. If the front hb addr is not present, do not send it and interpret a reply as coming from both. This handles the transition from old to new OSDs seamlessly. Note both the front and back rx times. Both need to be up to date in order for the peer to be healthy. Signed-off-by: Sage Weil <sage@inktank.com>
| * msgr: take an arbitrary set of ports to avoid binding toSage Weil2013-05-221-5/+7
| | | | | | | | | | | | | | We used to only need to avoid 2 ports; now we need 3. Make it a set so we don't have this problem later. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: bind front heartbeat messenger to public_addrSage Weil2013-05-221-3/+20
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: send hb front addr to monitor at bootSage Weil2013-05-221-8/+10
| | | | | | | | | | | | We still aren't binding it to anything yet, or putting it in the OSDMap. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: create front and back hb messenger instancesSage Weil2013-05-221-12/+18
| | | | | | | | | | | | The hb_front messenger is not used yet. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd/OSDMap: hb_addr -> hb_back_addrSage Weil2013-05-221-4/+4
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | Merge branch 'next'Sage Weil2013-05-231-15/+14
|\ \ | |/ |/|
| * osd: skip mark-me-down message if osd is not upSage Weil2013-05-221-15/+14
| | | | | | | | | | | | | | Fixes crash when the OSD has not successfully booted and gets a SIGINT or SIGTERM. Signed-off-by: Sage Weil <sage@inktank.com>
* | OSD: kill old split code, it's been dead for a whileSamuel Just2013-05-201-158/+0
| | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
* | osd/OSD.cc: remove unused variableDanny Al-Gaaf2013-05-161-1/+0
| | | | | | | | Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | Merge branch 'wip-5049'David Zafman2013-05-141-13/+18
|\ \ | |/ |/| | | Reviewed-by: Sam Just <sam.just@inktank.com>
| * OSD: scrub interval checkingDavid Zafman2013-05-141-13/+13
| | | | | | | | | | | | | | | | | | | | Do arithmetic so large intervals don't wrap Fix log messages to reflect the change and improve output Add message when skipping scrub due to load fixes: #5049 Signed-off-by: David Zafman <david.zafman@inktank.com>
| * OSD: Don't scrub newly created PGs until min intervalDavid Zafman2013-05-141-0/+5
| | | | | | | | | | | | | | | | Set initial values for last_scrub_stamp, last_deep_scrub_stamp fixes: #5050, #5051 Signed-off-by: David Zafman <david.zafman@inktank.com>
* | Merge remote-tracking branch 'gh/next'Sage Weil2013-05-131-7/+4
|\ \ | |/ |/|
| * PG,OSD: delay ops for map prior to queueing in the OpWQSamuel Just2013-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we simply queued ops in the OpWQ without checking. The PG would then check in do_request whether the message should wait for a new map. Unfortunately, this has the side effect that any op requeued for any reason must also requeue the waiting_for_map queue. Now, we will check before queueing the op whether it must wait on a map. To avoid contention, there is now a map_lock which must be held along with the PG lock in order to update the osdmap_ref. The map_lock also protects the waiting_for_map list and queueing PG ops at the back of the OpWQ. A few details: 1) It is no longer necessary to requeue waiting_for_map in on_change() since the other ops are queued at the front. 2) Once waiting_for_map is non-empty, all ops are delayed to simplify ordering. 3) waiting_for_map may now be non-empty during split, so we must split waiting_for_map along with waiting_for_active. This must be done under the map_lock. The bug which uncovered this involved an out of order op as follows: client.4208.0:2378 (e252) arrives, object is degraded client.4208.0:2379 (e253) arrives, waits for map client.4208.0:2378 (e252) is requeued after recovery client.4208.0:2379 (e253) is requeued on map arrival client.4208.0:2379 is processed client.4208.0:2378 is processed Fixes: #4955 Signed-off-by: Samuel Just <sam.just@inktank.com>
| * OSD,PG: lock_with_map_lock_held() is the same as lock()Samuel Just2013-05-081-6/+3
| | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* | Merge branch 'wip-4273'David Zafman2013-05-101-10/+1
|\ \ | | | | | | | | | Reviewed-by: Sam Just <sam.just@inktank.com>
| * | osd: prioritize recovery for degraded pgsDavid Zafman2013-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Three Reservation priorities from RECOVERY, BACKFILL_HIGH, BACKFILL_LOW fixes: #4273 Signed-off-by: David Zafman <david.zafman@inktank.com>
| * | Clean up defer_recovery() functionsDavid Zafman2013-05-061-9/+0
| | | | | | | | | | | | Signed-off-by: David Zafman <david.zafman@inktank.com>
* | | Merge branch 'wip_pg_res'Samuel Just2013-05-091-70/+123
|\ \ \ | | | | | | | | | | | | Reviewed-by: Sage Weil <sage@inktank.com>
| * | | OSD: rename clear_temp to recursive_remove_collection()Samuel Just2013-05-091-5/+5
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | osd: remove_dir use collection_list_partialSamuel Just2013-05-091-19/+30
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | OSD: add pg deletion cancelationSamuel Just2013-05-091-8/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DeletingState now allows _create_lock_pg() to attempt to cancel pg deletion. PG::init() must mark the PG as backfill iff we stopped a deletion. Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | OSD: don't rename pg collections, handle PGs in RemoveWQSamuel Just2013-05-091-33/+38
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | OSD: removal collections will be removed inline and not queuedSamuel Just2013-05-071-16/+3
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | OSD::clear_temp should clear snap mapper entries as wellSamuel Just2013-05-071-0/+10
| |/ / | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* | | osd: initialize OSDService::next_notif_idSage Weil2013-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | CID 1019627 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR) 2. uninit_member: Non-static class member "next_notif_id" is not initialized in this constructor nor in any functions that it calls. Signed-off-by: Sage Weil <sage@inktank.com>
* | | osd: init test_ops_hookSage Weil2013-05-091-0/+1
| |/ |/| | | | | | | | | | | CID 1019628 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR) 2. uninit_member: Non-static class member "test_ops_hook" is not initialized in this constructor nor in any functions that it calls. Signed-off-by: Sage Weil <sage@inktank.com>
* | Merge branch 'wip_split_upgrade' into nextSamuel Just2013-05-081-17/+23
|\ \ | |/ |/| | | Fixes: #4927
| * OSD: handle stray snap collections from upgrade bugSamuel Just2013-05-081-2/+23
| | | | | | | | | | | | | | | | | | | | Previously, we failed to clear snap_collections, which causes split to spawn a bunch of snap collections. In load_pgs, we now clear any such snap collections and then snap_collections field on the PG itself. Related: #4927 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * OSD: snap collections can be ignored on splitSamuel Just2013-05-081-15/+0
| | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | osd: make class load errors louderSage Weil2013-05-061-1/+1
|/ | | | | Fixes: #4639 Signed-off-by: Sage Weil <sage@inktank.com>
* OSD: also walk maps individually for start_split in consume_map()Samuel Just2013-05-021-33/+38
| | | | | | | | | We need to go map-by-map to get the parents right in consume_map() just as we must in load_pgs(). Fixes: 4884 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* OSD: load_pgs() should fill in start_split honestlySamuel Just2013-05-011-5/+25
| | | | | | | | | | In load_pgs(), we previously called assigned children starting at the loaded pg created between its stored epoch and the current osdmap to have that pg as their parent. This is not correct, some of the children may have been split in subsequent epochs from children split in earlier epochs. Instead, do each map individually. Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: cancel_pending_splits needs to cancel all descendantsSamuel Just2013-05-011-0/+6
| | | | | | | expand_pg_num() and load_pgs() may result in a pg with children in pending_splits which also have children in pending_splits (etc). Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: clean up in progress split state on pg removalSamuel Just2013-05-011-17/+83
| | | | | | | | | | | | | | | | | | | | | | | | There are two cases: 1) The parent pg has not yet initiated the split 2) The parent pg has initiated the split. Previously in case 1), _remove_pg left the entry for its children in the in_progress_splits map blocking subsequent peering attempts. In case 1), we need to unblock requests on the child pgs for the parent on parent removal. We don't need to bother waking requests since any requests received prior to the remove_pg request are necessarily obsolete. In case 2), we don't need to do anything: the child will complete the split on its own anyway. Thus, we now track pending_splits vs in_progress_splits. Children in pending_splits are in state 1), in_progress_splits in state 2). split_pgs bumps pgs from pending_splits to in_progress_splits atomically with respect to _remove_pg since the parent pg lock is held in both places. Fixes: #4813 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>