summaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* osd: wait for healthy pings from peers in waiting-for-healthy stateSage Weil2013-05-292-23/+80
| | | | | | | | | | | | | If we are (wrongly) marked down, we need to go into the waiting-for-healthy state and verify that our network interfaces are working before trying to rejoin the cluster. - make _is_healthy() check require positive proof of pings working - do heartbeat checks and updates in this state - reset the random peers every heartbeat_interval, in case we keep picking bad ones Signed-off-by: Sage Weil <sage@inktank.com>
* osd: distinguish between definitely healthy and definitely not unhealthySage Weil2013-05-292-8/+12
| | | | | | | | | | | is_unhealthy() will assume they are healthy for some period after we send our first ping attempt. is_healthy() is now a strict check that we know they are healthy. Switch the failure report check to use is_unhealthy(); use is_healthy() everywhere else, including the waiting-for-healthy pre-boot checks. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: remove down hb peersSage Weil2013-05-291-4/+10
| | | | | | If a (say, random) peer goes down, filter it out. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: only add pg peers if activeSage Weil2013-05-291-17/+19
| | | | | | | We will soon be in this method for the waiting-for-healthy state. As a consequence, we need to remove any down peers. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: factor out _remove_heartbeat_peerSage Weil2013-05-292-12/+19
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd: augment osd heartbeat peers with neighbors and randoms, to up some minSage Weil2013-05-293-16/+84
| | | | | | | | - always include our neighbors to ensure we have a fully-connected graph - include some random neighbors to get at least some min number of peers. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: move health checks into a single helperSage Weil2013-05-292-3/+14
| | | | | | For now we still only look at the internal heartbeats. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: avoid duplicate mon requests for a new osdmapSage Weil2013-05-291-2/+2
| | | | | | sub_want() returns true if this is a new sub; only renew then. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: tell peers that ping us if they are deadSage Weil2013-05-291-0/+7
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd: simplify is_healthy() check during bootSage Weil2013-05-291-7/+2
| | | | | | | | This has a slight behavior change in that we ask the mon for the latest osdmap if our internal heartbeat is failing. That isn't useful yet, but will be shortly. Signed-off-by: Sage Weil <sage@inktank.com>
* mon: disable tdump by defaultSage Weil2013-05-281-1/+1
| | | | | | Grr. Signed-off-by: Sage Weil <sage@inktank.com>
* Merge remote-tracking branch 'gh/last'Sage Weil2013-05-2810-69/+104
|\
| * HashIndex: sync top directory during start_split,merge,col_splitSamuel Just2013-05-281-3/+12
| | | | | | | | | | | | | | | | | | | | | | Otherwise, the links might be ordered after the in progress operation tag write. We need the in progress operation tag to correctly recover from an interrupted merge, split, or col_split. Fixes: #5180 Backport: cuttlefish, bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * Merge branch 'wip_scrub_tphandle' into nextSamuel Just2013-05-234-33/+66
| |\ | | | | | | | | | | | | Fixes: #5159 Reviewed-by: Sage Weil <sage@inktank.com>
| | * PG: ping tphandle during omap loop as wellSamuel Just2013-05-232-0/+8
| | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| | * PG: reset timeout in _scan_list for each object, read chunkSamuel Just2013-05-231-0/+2
| | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| | * OSD,PG: pass tphandle down to _scan_listSamuel Just2013-05-233-33/+56
| | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | rgw: iterate usage entries from correct entryYehuda Sadeh2013-05-231-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: #5152 When iterating through usage entries, and when user id was provided, we started at the user's first entry and not from the entry indexed by the request start time. This commit fixes the issue. Backport: bobtail Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
| * | mon: drop unnecessary conditionalsSage Weil2013-05-231-5/+3
| | | | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * | Merge pull request #311 from ceph/wip-5102Sage Weil2013-05-234-30/+12
| |\ \ | | | | | | | | Reviewed-by: Sage Weil <sage@inktank.com>
| | * | mon: Paxos: get rid of the 'prepare_bootstrap()' mechanismJoao Eduardo Luis2013-05-224-26/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need it after all. If we are in the middle of some proposal, then we guarantee that said proposal is likely to be retried. If we haven't yet proposed, then it's forever more likely that a client will eventually retry the message that triggered this proposal. Basically, this mechanism attempted at fixing a non-problem, and was in fact triggering some unforeseen issues that would have required increasing the code complexity for no good reason. Fixes: #5102 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
| | * | mon: Paxos: finish queued proposals instead of clearing the listJoao Eduardo Luis2013-05-221-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By finishing these Contexts, we make sure the Contexts they enclose (to be called once the proposal goes through) will behave as their were initially planned: for instance, a C_Command() may retry the command if a -EAGAIN is passed to 'finish_contexts', while a C_Trimmed() will simply set 'going_to_trim' to false. This aims at fixing at least a bug in which Paxos will stop trimming if an election is triggered while a trim is queued but not yet finished. Such happens because it is the C_Trimmed() context that is responsible for resetting 'going_to_trim' back to false. By clearing all the contexts on the proposal list instead of finishing them, we stay forever unable to trim Paxos again as 'going_to_trim' will stay True till the end of time as we know it. Fixes: #4895 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
| | * | mon: Paxos: finish_proposal() when we're finished recoveringJoao Eduardo Luis2013-05-221-0/+2
| | | | | | | | | | | | | | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* | | | Merge branch 'wip-5172'Sage Weil2013-05-281-12/+20
|\ \ \ \ | | | | | | | | | | | | | | | Reviewed-by: Samuel Just <sam.just@inktank.com>
| * | | | osd: fix note_down_osdSage Weil2013-05-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix bug introduced in 27381c0c6259ac89f5f9c592b4bfb585937a1cfc. Signed-off-by: Sage Weil <sage@inktank.com>
| * | | | osd: fix hb con failure handlerSage Weil2013-05-281-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a few bugs introduced by 27381c0c6259ac89f5f9c592b4bfb585937a1cfc: - check against both front and back cons; either one may have failed. - close *both* front and back before reopening either. this is overkill, but slightly simpler code. - fix leak of con when marking down - handle race against osdmap update and note_down_osd Fixes: #5172 Signed-off-by: Sage Weil <sage@inktank.com>
* | | | | Merge pull request #319 from dalgaaf/wip-da-pylint-3Sage Weil2013-05-281-5/+5
|\ \ \ \ \ | | | | | | | | | | | | Fix some smaller Python issues
| * | | | | ceph-disk: remove unnecessary semicolonsDanny Al-Gaaf2013-05-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
| * | | | | ceph-disk: cast output of _check_output()Danny Al-Gaaf2013-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cast output of _check_output() to str() to be able to use str.split(). Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
| * | | | | ceph-disk: fix undefined variableDanny Al-Gaaf2013-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
| * | | | | ceph-disk: add missing spaces around operatorDanny Al-Gaaf2013-05-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | | | | | Merge pull request #326 from dalgaaf/wip-da-CID-727978Sage Weil2013-05-281-0/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | kv_flat_btree_async.cc: fix AioCompletion resource leak
| * | | | | | kv_flat_btree_async.cc: fix AioCompletion resource leakDanny Al-Gaaf2013-05-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call AioCompletion::release() if the completion is no longer needed. CID 727978 (#1-2 of 2): Resource leak (RESOURCE_LEAK) leaked_storage: Variable "obj_aioc" going out of scope leaks the storage it points to. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | | | | | | Merge pull request #325 from dalgaaf/wip-da-CID-727980Sage Weil2013-05-281-0/+2
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | kv_flat_btree_async.cc: fix AioCompletion resource leak
| * | | | | | | kv_flat_btree_async.cc: fix AioCompletion resource leakDanny Al-Gaaf2013-05-281-0/+2
| |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call AioCompletion::release() if the completion is no longer needed. CID 727980 (#1-4 of 4): Resource leak (RESOURCE_LEAK) leaked_storage: Variable "aioc" going out of scope leaks the storage it points to. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | | | | | | Merge pull request #324 from dalgaaf/wip-da-CID-727979Sage Weil2013-05-281-0/+2
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | | kv_flat_btree_async.cc: fix AioCompletion resource leak
| * | | | | | kv_flat_btree_async.cc: fix AioCompletion resource leakDanny Al-Gaaf2013-05-281-0/+2
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call AioCompletion::release() if the completion is no longer needed. CID 727979 (#1-2 of 2): Resource leak (RESOURCE_LEAK) leaked_storage: Variable "a" going out of scope leaks the storage it points to. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | | | | | osd/OSDMap: fix Incremental dumpSage Weil2013-05-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The front hb addr entry may not be present. Signed-off-by: Sage Weil <sage@inktank.com>
* | | | | | Merge pull request #322 from guilhem/patch-1Sage Weil2013-05-281-0/+5
|\ \ \ \ \ \ | |/ / / / / |/| | | | | Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | | Remove mon socket in post-stopGuilhem Lettron2013-05-281-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ceph-mon segfault, socket file isn't removed. By adding a remove in post-stop, upstart clean run directory properly. Signed-off-by: Guilhem Lettron <guilhem@lettron.fr>
* | | | | | osdmaptool: fix cli testsSage Weil2013-05-272-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the default pool flags have changed. Signed-off-by: Sage Weil <sage@inktank.com>
* | | | | | Merge pull request #321 from dalgaaf/wip-da-CID-727981Sage Weil2013-05-271-0/+1
|\ \ \ \ \ \ | | | | | | | | | | | | | | kv_flat_btree_async.cc: fix AioCompletion resource leak
| * | | | | | kv_flat_btree_async.cc: fix AioCompletion resource leakDanny Al-Gaaf2013-05-241-0/+1
| | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call AioCompletion::release() if the completion is no longer needed to free the resources. CID 727981 (#3 of 3): Resource leak (RESOURCE_LEAK) leaked_storage: Variable "top_aioc" going out of scope leaks the storage it points to. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | | | | | Merge pull request #320 from dalgaaf/wip-da-CID-727983Sage Weil2013-05-271-0/+1
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | kv_flat_btree_async.cc: fix resource leak
| * | | | | kv_flat_btree_async.cc: fix resource leakDanny Al-Gaaf2013-05-241-0/+1
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call AioCompletion::release() if the completion is no longer needed to free the resources. CID 727983 : Resource leak (RESOURCE_LEAK) leaked_storage: Variable "aioc" going out of scope leaks the storage it points to. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* | | | | pg_pool_t: enable FLAG_HASHPSPOOL by defaultSamuel Just2013-05-243-0/+8
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | Fixes: #5160 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | | | Merge pull request #312 from ceph/wip-osd-hbSage Weil2013-05-2315-168/+336
|\ \ \ \ | | | | | | | | | | Reviewed-by: Samuel Just <sam.just@inktank.com>
| * | | | osd: ping both front and back interfacesSage Weil2013-05-223-67/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send ping requests to both the front and back hb addrs for peer osds. If the front hb addr is not present, do not send it and interpret a reply as coming from both. This handles the transition from old to new OSDs seamlessly. Note both the front and back rx times. Both need to be up to date in order for the peer to be healthy. Signed-off-by: Sage Weil <sage@inktank.com>
| * | | | msgr: add Messenger reference to ConnectionSage Weil2013-05-224-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows us to get the messenger associated with a connection. Signed-off-by: Sage Weil <sage@inktank.com>
| * | | | msgr: take an arbitrary set of ports to avoid binding toSage Weil2013-05-226-20/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to only need to avoid 2 ports; now we need 3. Make it a set so we don't have this problem later. Signed-off-by: Sage Weil <sage@inktank.com>