summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* doc: update build prerequisiteswip-doc-prereqGary Lowell2013-05-311-2/+29
| | | | Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
* PG: don't write out pg map epoch every handle_activate_mapSamuel Just2013-05-313-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't actually need to write out the pg map epoch on every activate_map as long as: a) the osd does not trim past the oldest pg map persisted b) the pg does update the persisted map epoch from time to time. To that end, we now keep a reference to the last map persisted. The OSD already does not trim past the oldest live OSDMapRef. Second, handle_activate_map will trim if the difference between the current map and the last_persisted_map is large enough. Fixes: #4731 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit 2c5a9f0e178843e7ed514708bab137def840ab89) Conflicts: src/common/config_opts.h src/osd/PG.cc - last_persisted_osdmap_ref gets set in the non-static PG::write_info Conflicts: src/osd/PG.cc
* Merge branch 'wip-5046'Samuel Just2013-05-3012-1225/+1472
|\ | | | | | | Reviewed-by: Samuel Just <sam.just@inktank.com>
| * move log, ondisklog, missing from PG to PGLogLoic Dachary2013-05-3012-1225/+1472
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PG::log, PG::ondisklog, PG::missing are moved from PG to a new PGLog class and are made protected data members. It is a preliminary step before writing unit tests to cover the methods that have side effects on these data members and define a clean PGLog API. It improves encapsulation and does not change any of the logic already in place. Possible issues : * an additional reference (PG->PGLog->IndexedLog instead of PG->IndexedLog for instance) is introduced : is it optimized ? * rewriting log.log into pg_log.get_log().log affects the readability but should be optimized and have no impact on performances The guidelines followed for this patch are: * const access to the data members are preserved, no attempt is made to define accessors * all non const methods are in PGLog, no access to non const methods of PGLog::log, PGLog::logondisk and PGLog::missing are provided * when methods are moved from PG to PGLog the change to their implementation is restricted to the minimum. * the PG::OndiskLog and PG::IndexedLog sub classes are moved to PGLog sub classes unmodified and remain public A const version of the pg_log_t::find_entry method was added. A const accessor is provided for PGLog::get_log, PGLog::get_missing, PGLog::get_ondisklog but no non-const accessor. Arguments are added to most of the methods moved from PG to PGLog so that they can get access to PG data members such as info or log_oid. The PGLog method are sorted according to the data member they modify. //////////////////// missing //////////////////// * The pg_missing_t::{got,have,need,add,rm} methods are wrapped as PGLog::missing_{got,have,need,add,rm} //////////////////// log //////////////////// * PGLog::get_tail, PGLog::get_head getters are created * PGLog::set_tail, PGLog::set_head, PGLog::set_last_requested setters are created * PGLog::index, PGLog::unindex, PGLog::add wrappers, PGLog::reset_recovery_pointers are created * PGLog::clear_info_log replaces PG::clear_info_log * PGLog::trim replaces PG::trim //////////////////// log & missing //////////////////// * PGLog::claim_log is created with code extracted from PG::RecoveryState::Stray::react. * PGLog::split_into is created with code extracted from PG::split_into. * PGLog::recover_got is created with code extracted from ReplicatedPG::recover_got. * PGLog::activate_not_complete is created with code extracted from PG::active * PGLog:proc_replica_log is created with code extracted from PG::proc_replica_log * PGLog:write_log is created with code extracted from PG::write_log * PGLog::merge_old_entry replaces PG::merge_old_entry The remove_snap argument is used to collect hobject_t * PGLog::rewind_divergent_log replaces PG::rewind_divergent_log The remove_snap argument is used to collect hobject_t A new PG::rewind_divergent_log method is added to call remove_snap_mapped_object on each of the remove_snap elements * PGLog::merge_log replaces PG::merge_log The remove_snap argument is used to collect hobject_t A new PG::merge_log method is added to call remove_snap_mapped_object on each of the remove_snap elements * PGLog:write_log is created with code extracted from PG::write_log. A non-static version is created for convenience but is a simple wrapper. * PGLog:read_log replaces PG::read_log. A non-static version is created for convenience but is a simple wrapper. * PGLog:read_log_old replaces PG::read_log_old. http://tracker.ceph.com/issues/5046 refs #5046 Signed-off-by: Loic Dachary <loic@dachary.org>
* | doc: Updated to reflect glossary usage.John Wilkins2013-05-301-12/+12
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | doc: Updated title and syntax to reflect glossary usage.John Wilkins2013-05-301-23/+27
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | doc: Updated to reflect glossary usage.John Wilkins2013-05-301-15/+16
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | doc: Updated title to reflect glossary usage.John Wilkins2013-05-301-3/+3
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | doc: Updated conf with ServerAlias for S3 subdomains.John Wilkins2013-05-301-0/+2
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | doc: Updated object storage quick start for S3-style subdomains.John Wilkins2013-05-301-5/+52
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | doc: Updated text with new glossary terms.John Wilkins2013-05-301-22/+24
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | doc: Removed FAQ from the index.John Wilkins2013-05-301-1/+0
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | doc: Removed FAQ doc. It's now in the wiki.John Wilkins2013-05-301-11/+0
| | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | rbd/kernel.sh: quit looking for snapshot sysfs entriesAlex Elder2013-05-301-19/+16
|/ | | | | | | | | | | | | | | | | | | | | | | The sysfs entries for snapshots went away a while ago, and this script used them to verify sizes matched what was expected. Instead, look at the mapped size of the snapshot in the places that used to look for the image's snapshot sysfs files. Also, switch over to using "udevadm settle" rather than a delay to wait for udev to do its thing. Insert them at more appropriate places--right after "rmd map" commands and before and after the "rbd unmap" calls. Stop doing the manual refresh calls as well. The osd will trigger refreshes whenever the image size or shapshot context changes. Finally, the cleanup routine is called initially, when there really isn't expected to be anything to clean up. Change the rbd commands to run there conditionally, only if the target of the command already exists. Signed-off-by: Alex Elder <elder@inktank.com>
* os/WBThrottle: remove asserts in clear()Samuel Just2013-05-301-2/+0
| | | | | | | | cur_ios, etc may not be zero due to an in progress flush. Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* doc: note openstack changes for GrizzlyJosh Durgin2013-05-301-3/+10
| | | | | | These are just for the cinder configuration, nothing else changed. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
* Added -r option to usageChristophe Courtaut2013-05-301-0/+1
| | | | | | | Added the -r option, which starts the radosgw and apache2 to access it to the usage message. Signed-off-by: Christophe Courtaut <christophe.courtaut@gmail.com>
* rbd/concurrent.sh: probe rbd module at startAlex Elder2013-05-301-0/+2
| | | | | | | There's no guarantee the rbd module is loaded when this script is run, so add a line that loads it if necessary. Signed-off-by: Alex Elder <elder@inktank.com>
* Merge pull request #331 from ceph/wip-osd-interfacecheckSage Weil2013-05-294-73/+230
|\ | | | | Reviewed-by: Samuel Just <sam.just@inktank.com>
| * osd: wait for healthy pings from peers in waiting-for-healthy stateSage Weil2013-05-292-23/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | If we are (wrongly) marked down, we need to go into the waiting-for-healthy state and verify that our network interfaces are working before trying to rejoin the cluster. - make _is_healthy() check require positive proof of pings working - do heartbeat checks and updates in this state - reset the random peers every heartbeat_interval, in case we keep picking bad ones Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: distinguish between definitely healthy and definitely not unhealthySage Weil2013-05-292-8/+12
| | | | | | | | | | | | | | | | | | | | | | is_unhealthy() will assume they are healthy for some period after we send our first ping attempt. is_healthy() is now a strict check that we know they are healthy. Switch the failure report check to use is_unhealthy(); use is_healthy() everywhere else, including the waiting-for-healthy pre-boot checks. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: remove down hb peersSage Weil2013-05-291-4/+10
| | | | | | | | | | | | If a (say, random) peer goes down, filter it out. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: only add pg peers if activeSage Weil2013-05-291-17/+19
| | | | | | | | | | | | | | We will soon be in this method for the waiting-for-healthy state. As a consequence, we need to remove any down peers. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: factor out _remove_heartbeat_peerSage Weil2013-05-292-12/+19
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: augment osd heartbeat peers with neighbors and randoms, to up some minSage Weil2013-05-293-16/+84
| | | | | | | | | | | | | | | | - always include our neighbors to ensure we have a fully-connected graph - include some random neighbors to get at least some min number of peers. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: move health checks into a single helperSage Weil2013-05-292-3/+14
| | | | | | | | | | | | For now we still only look at the internal heartbeats. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: avoid duplicate mon requests for a new osdmapSage Weil2013-05-291-2/+2
| | | | | | | | | | | | sub_want() returns true if this is a new sub; only renew then. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: tell peers that ping us if they are deadSage Weil2013-05-291-0/+7
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: simplify is_healthy() check during bootSage Weil2013-05-291-7/+2
| | | | | | | | | | | | | | | | This has a slight behavior change in that we ask the mon for the latest osdmap if our internal heartbeat is failing. That isn't useful yet, but will be shortly. Signed-off-by: Sage Weil <sage@inktank.com>
* | Merge branch 'next'Sage Weil2013-05-292-1/+9
|\ \
| * | osd: initialize new_state field when we use itSage Weil2013-05-291-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we use operator[] on a new int field its value is undefined; avoid reading it or using |= et al until we initialize it. Fixes: #4967 Backport: cuttlefish, bobtail Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
| * | mds: stay in SCAN state in file_evalSage Weil2013-05-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | If we are in the SCAN state, stay there until the recovery finishes. Do not jump to another state from file_eval(). Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 0071b8e75bd3f5a09cc46e2225a018f6d1ef0680)
| * | osd: do not assume head obc object exists when getting snapdirSage Weil2013-05-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For a list-snaps operation on the snapdir, do not assume that the obc for the head means the object exists. This fixes a race between a head deletion and a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress test when thrashing OSDs. Fixes: #5183 Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
* | | Merge branch 'wip_osd_throttle'Samuel Just2013-05-2912-280/+720
|\ \ \ | | | | | | | | | | | | | | | | Fixes: #4782 Reviewed-by: Sage Weil
| * | | WBThrottle: add some comments and some assertsSamuel Just2013-05-292-0/+6
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | WBThrottle: rename replica nocacheSamuel Just2013-05-292-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We may want to influence the caching behavior for other reasons. Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | doc/dev/osd_internals: add wbthrottle.rstSamuel Just2013-05-281-0/+28
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | WBThrottle: add perfcountersSamuel Just2013-05-282-0/+40
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | ReplicatedPG::submit_push_complete don't remove the head objectSamuel Just2013-05-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The object would have had to have been removed already. With fd caching, this extra remove might check the wrong replay_guard since the fd caching mechanism assumes that between any operation on an hobject_t oid and a remove operation, all operations on that hobject_t must refer to the same inode. Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | FileStore: integrate WBThrottleSamuel Just2013-05-213-158/+9
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | os/: Add WBThrottleSamuel Just2013-05-215-1/+384
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | FileStore: add fd cacheSamuel Just2013-05-215-127/+228
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | common/shared_cache.hpp: fix set_size()Samuel Just2013-05-211-1/+1
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | doc/dev/osd_internals: add some info about throttlesSamuel Just2013-05-211-0/+21
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | common/shared_cache.hpp: add clear()Samuel Just2013-05-211-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | Clear clears a key/value from the cache. Signed-off-by: Samuel Just <sam.just@inktank.com>
* | | | mds: stay in SCAN state in file_evalSage Weil2013-05-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are in the SCAN state, stay there until the recovery finishes. Do not jump to another state from file_eval(). Signed-off-by: Sage Weil <sage@inktank.com>
* | | | Makefile: include new message header filesSage Weil2013-05-291-0/+2
| | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | | | Merge remote-tracking branch 'yan/wip-mds'Sage Weil2013-05-2932-703/+1534
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | Reviewed-by: Sage Weil <sage@inktank.com> Conflicts: src/mds/MDCache.cc
| * | | mds: use "open-by-ino" function to open remote linkYan, Zheng2013-05-283-20/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also add a new config option "mds_open_remote_link_mode". The anchor approach is used by default. If mode is non-zero, use the open-by-ino function. In case open-by-ino function fails, if mode is 1, retry using the anchor approach, otherwise trigger assertion. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
| * | | mds: open missing cap inodesYan, Zheng2013-05-286-87/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a recovering MDS enters reconnect stage, client sends reconnect messages to it. The message lists open files, their path, and issued caps. If an inode is not in the cache, the recovering MDS uses the path client provides to determine if it's the inode's authority. If not, the recovering MDS exports the inode's caps to other MDS. The issue here is that the path client provides isn't always accuracy. The fix is use recently added "open inode by ino" function to open any missing cap inodes when the recovering MDS enters rejoin stage. Send cache rejoin messages to other MDS after all caps' authorities are determined. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>