summaryrefslogtreecommitdiff
path: root/src/osd/OSD.h
Commit message (Collapse)AuthorAgeFilesLines
* osd: wait for healthy pings from peers in waiting-for-healthy stateSage Weil2013-05-291-0/+8
| | | | | | | | | | | | | If we are (wrongly) marked down, we need to go into the waiting-for-healthy state and verify that our network interfaces are working before trying to rejoin the cluster. - make _is_healthy() check require positive proof of pings working - do heartbeat checks and updates in this state - reset the random peers every heartbeat_interval, in case we keep picking bad ones Signed-off-by: Sage Weil <sage@inktank.com>
* osd: distinguish between definitely healthy and definitely not unhealthySage Weil2013-05-291-7/+11
| | | | | | | | | | | is_unhealthy() will assume they are healthy for some period after we send our first ping attempt. is_healthy() is now a strict check that we know they are healthy. Switch the failure report check to use is_unhealthy(); use is_healthy() everywhere else, including the waiting-for-healthy pre-boot checks. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: factor out _remove_heartbeat_peerSage Weil2013-05-291-0/+1
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd: move health checks into a single helperSage Weil2013-05-291-0/+1
| | | | | | For now we still only look at the internal heartbeats. Signed-off-by: Sage Weil <sage@inktank.com>
* Merge remote-tracking branch 'gh/last'Sage Weil2013-05-281-4/+8
|\
| * Merge branch 'wip_scrub_tphandle' into nextSamuel Just2013-05-231-4/+8
| |\ | | | | | | | | | | | | Fixes: #5159 Reviewed-by: Sage Weil <sage@inktank.com>
| | * OSD,PG: pass tphandle down to _scan_listSamuel Just2013-05-231-4/+8
| | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* | | osd: ping both front and back interfacesSage Weil2013-05-221-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send ping requests to both the front and back hb addrs for peer osds. If the front hb addr is not present, do not send it and interpret a reply as coming from both. This handles the transition from old to new OSDs seamlessly. Note both the front and back rx times. Both need to be up to date in order for the peer to be healthy. Signed-off-by: Sage Weil <sage@inktank.com>
* | | osd: create front and back hb messenger instancesSage Weil2013-05-221-2/+5
| | | | | | | | | | | | | | | | | | The hb_front messenger is not used yet. Signed-off-by: Sage Weil <sage@inktank.com>
* | | OSD: kill old split code, it's been dead for a whileSamuel Just2013-05-201-2/+0
|/ / | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
* | OSD: We need to wait on CLEARING_DIR, not DELETED_DIRSamuel Just2013-05-131-4/+4
| | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | osd/OSD.h: fix try_stop_deletionDanny Al-Gaaf2013-05-111-1/+1
| | | | | | | | | | | | | | | | Fix try_stop_deletion(): The comment above the while loop says "If we are in DELETING_DIR or DELETED_DIR", but the while loop checks for DELETING_DIR twice. Change one check to DELETED_DIR otherwise on state get missed. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | Merge branch 'wip-4273'David Zafman2013-05-101-1/+5
|\ \ | | | | | | | | | Reviewed-by: Sam Just <sam.just@inktank.com>
| * | osd: prioritize recovery for degraded pgsDavid Zafman2013-05-091-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | Three Reservation priorities from RECOVERY, BACKFILL_HIGH, BACKFILL_LOW fixes: #4273 Signed-off-by: David Zafman <david.zafman@inktank.com>
| * | Clean up defer_recovery() functionsDavid Zafman2013-05-061-1/+0
| |/ | | | | | | Signed-off-by: David Zafman <david.zafman@inktank.com>
* | osd/OSD.h: add missing unlock of osd_lockDanny Al-Gaaf2013-05-111-0/+1
| | | | | | | | | | | | CID 1019560 Missing unlock (CWE-667) Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | OSD: rename clear_temp to recursive_remove_collection()Samuel Just2013-05-091-1/+1
| | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* | PG: no need to wait on DeletingStateRef for flushSamuel Just2013-05-091-12/+0
| | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* | OSD: add pg deletion cancelationSamuel Just2013-05-091-1/+77
| | | | | | | | | | | | | | | | | | DeletingState now allows _create_lock_pg() to attempt to cancel pg deletion. PG::init() must mark the PG as backfill iff we stopped a deletion. Signed-off-by: Samuel Just <sam.just@inktank.com>
* | OSD: don't rename pg collections, handle PGs in RemoveWQSamuel Just2013-05-091-15/+15
|/ | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: also walk maps individually for start_split in consume_map()Samuel Just2013-05-021-0/+1
| | | | | | | | | We need to go map-by-map to get the parents right in consume_map() just as we must in load_pgs(). Fixes: 4884 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* OSD: cancel_pending_splits needs to cancel all descendantsSamuel Just2013-05-011-0/+1
| | | | | | | expand_pg_num() and load_pgs() may result in a pg with children in pending_splits which also have children in pending_splits (etc). Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: clean up in progress split state on pg removalSamuel Just2013-05-011-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | There are two cases: 1) The parent pg has not yet initiated the split 2) The parent pg has initiated the split. Previously in case 1), _remove_pg left the entry for its children in the in_progress_splits map blocking subsequent peering attempts. In case 1), we need to unblock requests on the child pgs for the parent on parent removal. We don't need to bother waking requests since any requests received prior to the remove_pg request are necessarily obsolete. In case 2), we don't need to do anything: the child will complete the split on its own anyway. Thus, we now track pending_splits vs in_progress_splits. Children in pending_splits are in state 1), in_progress_splits in state 2). split_pgs bumps pgs from pending_splits to in_progress_splits atomically with respect to _remove_pg since the parent pg lock is held in both places. Fixes: #4813 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* Merge branch 'wip-4201' into nextDavid Zafman2013-04-191-2/+5
|\ | | | | | | Reviewed-by: Samuel Just <sam.just@inktank.com>
| * osd: Make clear_temp() public for use by removeDavid Zafman2013-04-191-2/+1
| | | | | | | | Signed-off-by: David Zafman <david.zafman@inktank.com>
| * osd: Add OSD::make_infos_oid() as common function to create oidDavid Zafman2013-04-191-0/+4
| | | | | | | | Signed-off-by: David Zafman <david.zafman@inktank.com>
* | osd/: optionally track every pg refSamuel Just2013-04-191-22/+57
|/ | | | | | | | | | | | | | | | | | | | | | | This involves three pieces: For intrusive_ptr type references, we use TrackedIntPtr instead. This uses get_with_id and put_with_id to associate an id and backtrace with each particular ref instance. For refs taken via direct calls to get() and put(), get and put now require a tag string. The PG tracks individual ref counts for each tag as well as the total. Finally, PGs register/unregister themselves on construction/destruction with OSDService. As a result, on shutdown, we can check for live pgs and determine where the references are held. This behavior is compiled out by default, but can be included with the --enable-pgrefdebugging flag. Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSDService: add too_full_for_backfillSamuel Just2013-03-211-0/+3
| | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: notify mon prior to shutdownSamuel Just2013-03-211-0/+18
| | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: check for is_stopping after locking osd_lock or heartbeat_lockSamuel Just2013-03-211-0/+8
| | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: lookup_lock_raw_pg is deadSamuel Just2013-03-211-1/+0
| | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: rename timer to tick_timerSamuel Just2013-03-211-1/+1
| | | | | | | Only used for scheduling ticks - we should keep it that way. Signed-off-by: Samuel Just <sam.just@inktank.com>
* osd: data loss: low space handlingDavid Zafman2013-03-141-0/+9
| | | | | | | | | | | Add check whether to allow writing ops based on failsafe full percentage Check for failsafe nearfull warning or full error message every heartbeat Use clock to limit messages to every 30 secs (osd_op_complaint_time) Feature: #4197 Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
* osd/: Integrate SnapMapper with OSDSamuel Just2013-03-131-0/+7
| | | | | | | | | | | - SnapTrimmer now uses SnapMapper to get the next object to trim - Entries for a snap are implicitely removed from SnapMapper when the last object is trimmed, so no need for the adjust_local_snaps logic. - Scrub now compares the object_info snaps set on the object attr with the version stored in the SnapMapper. Signed-off-by: Samuel Just <sam.just@inktank.com>
* OSD: lock not needed in ~DeletingState()Samuel Just2013-03-131-1/+0
| | | | | | | | | | | No further refs to the object can remain at this point. Furthermore, the callbacks might lock mutexes of their own. Backport: bobtail Fixes: #4378 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* osd/OSD.h: prefer prefix ++operator for iteratorsDanny Al-Gaaf2013-03-131-1/+1
| | | | Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* Watch/Notify: rework watch/notifySamuel Just2013-02-201-13/+7
| | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
* osd: requeue pg waiters at the front of the finished queueSage Weil2013-02-191-2/+7
| | | | | | | | | | | | | | | | | | We could have a sequence like: - op1 - notify - op2 in the finished queue. Op1 gets put on waiting_for_pg, the notify creates the pg and requeues op1 (and the end), op2 is handled, and finally op1 is handled. That breaks ordering; see #2947. Instead, when we wake up a pg, queue the waiting messages at the front of the dispatch queue. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
* PG: place biginfo on the infos object next to the info and epoch keysSamuel Just2013-02-131-1/+1
| | | | | | This is simpler and possibly more efficient. Signed-off-by: Samuel Just <sam.just@inktank.com>
* osd/PG: store pg_info_t in leveldb (omap), purged_snaps separatelyDavid Zafman2013-02-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | Separate the purged_snaps portion of pg_info_t (the one that gets big). Feature #3891: osd: move purged_snaps out of info Add a separate dirty_big_info flag so that we only update the pginfo "biginfo" file if that state changes. This lets us avoid the cost in the general case, like a regular PG write. Add LEVELDBINFO feature Put info, biginfo in leveldb Move epoch to omap Feature #3892: osd: move pg info into leveldb Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Just <sam.just@inktank.com>
* Merge branch 'wip-mds-encode-rebased'Greg Farnum2013-02-111-1/+0
|\ | | | | | | Reviewed-by: Sage Weil <sage@inktank.com>
| * osd: remove DecayCounter headerGreg Farnum2013-02-051-1/+0
| | | | | | | | | | | | Neither the OSD nor the PG makes any use of this. Signed-off-by: Greg Farnum <greg@inktank.com>
* | src/osd/OSD.h: use empty() instead of size()Danny Al-Gaaf2013-02-111-2/+2
|/ | | | | | | | | | | | | | | Fix warning for usage of *.size(). Use empty() since it should be prefered as it has, following the standard, a constant time complexity regardless of the containter type. The same is not guaranteed for size(). warning from cppchecker was: [osd/OSD.h:265]: (performance) Possible inefficient checking for 'last_scrub_pg' emptiness. [osd/OSD.h:274]: (performance) Possible inefficient checking for 'last_scrub_pg' emptiness. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* osd: create tool to extract pg info and pg log from filestoreDavid Zafman2013-01-301-2/+2
| | | | | | | | | | New application ceph-filestore-dump created that mounts filstore and can dump info or log in JSON when an OSD is not running. Feature: #3890 Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
* osd: kill unused addr-based send_map()Sage Weil2013-01-251-1/+0
| | | | | | Not used, old API, bad. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: share incoming maps via Connection*, not addrsSage Weil2013-01-251-2/+1
| | | | | | | | Kill a set of parallel methods that are using the old addr/inst-based msgr APIs, and instead use Connection handles. This is much safer and gets us closer to killing the old msgr API. Signed-off-by: Sage Weil <sage@inktank.com>
* OSD: use TPHandle in peering_wqSamuel Just2013-01-241-5/+12
| | | | | | | | | | | Implement _process overload with TPHandle argument and use that to ping the hb map between pgs and between map epochs when advancing a pg. The thread will still timeout if genuinely stuck at any point. Fixes: 3905 Backport: bobtail Signed-off-by: Samuel Just <sam.just@inktank.com>
* osd: do not join cluster if not healthySage Weil2013-01-221-0/+2
| | | | | | | If our internal heartbeats are failing, do not send a boot message and try to join the cluster. Signed-off-by: Sage Weil <sage@inktank.com>
* Merge remote-tracking branch 'gh/wip-3833-b'Sage Weil2013-01-221-11/+23
|\ | | | | | | | | | | | | | | Conflicts: src/osd/OSD.cc src/osd/OSD.h Reviewed-by: Samuel Just <sam.just@inktank.com>
| * osd: dump op priority queue state via admin socketSage Weil2013-01-221-0/+5
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>