summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* librbd: add explicit management of cacheshistoric/rbd-multi-cacheSage Weil2012-05-052-0/+40
| | | | | | | | | | | | | | | | | | Allow librbd users to create in-memory cache pools, and open images using those caches. This lets you control the total amount of memory consumed for some number of open images. It also lets you individually control the writeback behavior for individual images (e.g., have some write-thru, some write-back). This doesn't let you specify max_dirty limits on a per-image basis, tho, unless you give that image its own cache. Note that doing rbd_open on an image when 'rbd cache' is true is equivalent to creating a separate cache for that image using the 'rbd cache *' tunables. This API is meant to be used in leiu of the 'rbd cache*' options. Signed-off-by: Sage Weil <sage@newdream.net>
* librbd: move cache into separate CacheCtxSage Weil2012-05-051-56/+84
| | | | | | | Put the ObjectCacher and its lock into a separate CacheCtx object. This will eventually let us share it between images. Signed-off-by: Sage Weil <sage@newdream.net>
* objectcacher: specify the WritebackHandler for each ObjectSetSage Weil2012-05-055-34/+33
| | | | | | | | This will allow us to share a single cache between different users. Also use a pointer rather than a reference. Signed-off-by: Sage Weil <sage@newdream.net>
* objectcacher: make cache sizes explicitSage Weil2012-05-055-21/+40
| | | | | | | | | | | | | | | | | Make ObjectCacher users specify the cache size for each ObjectCacher instances. This avoids the confusing config namespace for the object cache (client_oc_*), and also will make it possible to eventually have cache sizes that vary between (say) RBD images. - drop unused client_oc_max_sync_write - add rbd_cache_max_size, max_dirty, target_dirty config values (these are the defaults for each image) We probably want to add librbd calls to specify the cache size on a per-image basis? Alternatively, we should make it possible to share a cache pool between multiple images in some explicit way. Signed-off-by: Sage Weil <sage@newdream.net>
* objectcacher: delete unused onfinish from flush_setSage Weil2012-05-051-0/+4
| | | | | | | Once upon a time the caller would do this, but none of those have survived, and this makes more sense. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
* objectcacher: explicit write-thru modeSage Weil2012-05-051-16/+29
| | | | | | | | If the max_dirty config is 0, switch to write-thru mode, which will explicitly flush and wait on the range we just dirtied. Closes: #2335 Signed-off-by: Sage Weil <sage@newdream.net>
* common: add C_CondSage Weil2012-05-051-5/+34
| | | | | | Similar to C_SafeCond, but assume finisher already holds the relevant lock. Signed-off-by: Sage Weil <sage@newdream.net>
* objectcacher: user helper to get starting point in buffer mapSage Weil2012-05-052-58/+27
| | | | | | | A common pattern is to search for the first buffer intersecting or following an object offset. Use a helper for that. Signed-off-by: Sage Weil <sage@newdream.net>
* objectcacher: flush range, setSage Weil2012-05-052-9/+79
| | | | | | | Add ability to flush a range of an object, or a vector of ObjectExtents. Flush any buffers that intersect the specified range, or the entire object if len==0. Signed-off-by: Sage Weil <sage@newdream.net>
* objectcacher: wait directly from writex()Sage Weil2012-05-044-14/+17
| | | | | | | This gives us access to the original ObjectExtent (useful later), and simplifies the callers. Signed-off-by: Sage Weil <sage@newdream.net>
* objectcacher: don't wait for write waiters; wait after dirtyingSage Weil2012-05-044-20/+26
| | | | | | | | | | | | | | | | | | | | We do three things here: - Wait for the dirty limit to drop _after_ writing into the cache. This means that an active thread can always provide its dirty data to the cache for potential writing without waiting (a small win). It's also helpful later... (see below, and next commit) - Don't wait for other waiters. If another thread dirtying 1MB and is waiting for it, don't wait for them too. This prevents two threads writing 1MB at a time with a limit of 1MB from serializing: both can dirty their 1MB and initiate a flush, and they once 1/2 of that has flushed one of them will be allowed to proceed. - Update the flusher to add the dirty_waiting bytes to the amount to write so that the OPs will indeed be parallel. Signed-off-by: Sage Weil <sage@newdream.net>
* crush: update_item() should pass an error back to the callerSage Weil2012-05-041-3/+3
| | | | | | | | If you give it a nonsensical loc, it will fail check_item_loc() (false) and then error out on insert_item(). Reported-by: Sam Just <sam.just@inktank.com> Signed-off-by: Sage Weil <sage@newdream.net>
* crush: improve docs/comments for check_item_loc and insert_item semanticsSage Weil2012-05-041-1/+27
| | | | | | | We don't adjust the internal hierarchy structure (currently). This is a bit confusing, so describe the semantics in some detail. Signed-off-by: Sage Weil <sage@newdream.net>
* crush: comment and clean up checks for check_item_loc and insert_itemSage Weil2012-05-041-12/+13
| | | | | | | | - drop useless cur for check_item_loc - comment the checks we're doing so the code is understandable - use name_exists instead of broken get_item_id != 0 check Signed-off-by: Sage Weil <sage@newdream.net>
* Merge branch 'wip-crush-update'Sage Weil2012-05-0312-21/+429
|\ | | | | | | Reviewed-by: Greg Farnum <greg@inktank.com>
| * crushtool: another simple test for updateSage Weil2012-05-031-0/+3
| | | | | | | | | | | | If the weight doesn't change it should be a no-op. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
| * crush: document return valuesSage Weil2012-05-031-6/+3
| | | | | | | | Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
| * crush: compare fixed-point weights in update_itemSage Weil2012-05-032-10/+18
| | | | | | | | | | | | | | This is less ugly than converting the quantized value back to a float and comparing that. Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
| * crush: clean up check_item_loc() commentsSage Weil2012-05-032-3/+5
| | | | | | | | | | | | Thanks Greg! Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
| * mon: drop 'osd crush add ...'Sage Weil2012-05-021-47/+0
| | | | | | | | | | | | 'osd crush set ...' is better, us that instead. Signed-off-by: Sage Weil <sage@newdream.net>
| * vstart.sh: use 'osd crush set ...'Sage Weil2012-05-021-1/+1
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * mon: 'osd crush set ...' do an add or updateSage Weil2012-05-021-0/+46
| | | | | | | | | | | | | | This operation will add/update/move the item to the specified location. It is idempotent and much more useful than 'osd crush add ...'. Signed-off-by: Sage Weil <sage@newdream.net>
| * crushtool: extent cli test to include --remove-item and --update-itemSage Weil2012-05-027-1/+249
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * crushtool: add --update-item commandSage Weil2012-05-021-5/+26
| | | | | | | | | | | | | | Similar to --add-item, except it will move, rename, or reweight the item if it is already present in the map. Signed-off-by: Sage Weil <sage@newdream.net>
| * crush: do some docsSage Weil2012-05-022-8/+41
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * crush: implement update_item()Sage Weil2012-05-022-1/+43
| | | | | | | | | | | | | | | | | | This is similar to insert_item(), except it will succeed if the item is already there, and will move an item to the specified location if it is not. It returns 0 for no change, 1 if a chance was made. It also makes sure the weight and name match. Signed-off-by: Sage Weil <sage@newdream.net>
| * crush: add check_item_locSage Weil2012-05-022-0/+54
| | | | | | | | | | | | | | | | | | | | The check_item_loc() method will take an item and position and tell you if it matches the items current location. The matching is identical to that used for insert_item, in that a specific location constraint match means success, even if a less specific one does not match (e.g., rack=wrongrack, host=correcthost will return true). Signed-off-by: Sage Weil <sage@newdream.net>
| * crush: fix weights when removing itemsSage Weil2012-05-021-0/+1
| | | | | | | | | | | | | | Reweight an item to 0 before removing it, so that the parent weights are adjusted accordingly. Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'wip-osd-uuid'Sage Weil2012-05-0314-68/+179
|\ \ | | | | | | | | | Reviewed-by: Greg Farnum <greg@inktank.com>
| * | mon: simplify 'osd create <uuid>' commandSage Weil2012-05-031-28/+27
| | | | | | | | | | | | | | | | | | Make the flow clearer for the three cases (exists, about to exist, new). Signed-off-by: Sage Weil <sage@newdream.net>
| * | osd: drop unused CEPH_OSDMAP*VERSION* #definesSage Weil2012-05-031-8/+0
| | | | | | | | | | | | | | | | | | It's easier to manage/rev/grok these inline. Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph-object-corpus: a few instances of the newly encoded typesSage Weil2012-05-021-0/+0
| | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * | ceph-dencoder: ignore trailing goop after OSDMap and OSDMap::IncrementalSage Weil2012-05-022-4/+9
| |/ | | | | | | | | | | | | | | All users pass around bufferlists and avoid encoding these structures inline, but the dencoder tests are picky. Disable that for these types so that we can add new fields without noise. Signed-off-by: Sage Weil <sage@newdream.net>
| * vstart.sh: explicitly specify uuids during startupSage Weil2012-05-011-3/+4
| | | | | | | | | | | | This exercises all the new per-osd uuid code. Signed-off-by: Sage Weil <sage@newdream.net>
| * osd: --get-{osd,journal}-uuid synonyms for --get-{osd,journal}-fsidSage Weil2012-05-011-2/+2
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * osd: allow uuid to be fed to mkfs with 'osd uuid' settingSage Weil2012-05-012-0/+5
| | | | | | | | | | | | E.g., ceph-osd --mkfs --osd-uuid <uuid> -i 123 ... Signed-off-by: Sage Weil <sage@newdream.net>
| * filestore: allow fsid to be fed in for mkfsSage Weil2012-05-013-1/+8
| | | | | | | | | | | | | | Mkfs currently always generates a new uuid. Allow the caller to feed one in. Signed-off-by: Sage Weil <sage@newdream.net>
| * mon: 'osd create <uuid>'Sage Weil2012-05-011-21/+18
| | | | | | | | | | | | | | | | | | | | | | Make the osd create command idempotent by providing a uuid. If you call it multiple times with the same (or some other existing) uuid you'll get back the osd id that is already using it. Drop support for 'osd create <id>', which was mostly useless and non-idempotent anyway. Signed-off-by: Sage Weil <sage@newdream.net>
| * mon: fill in osd uuid in map on bootSage Weil2012-05-011-0/+7
| | | | | | | | | | | | | | | | We may want to make this more strict, so that if it is defined it has to match the map, and only fill it in when the map's uuid is still zeroed (for legacy clusters)... Signed-off-by: Sage Weil <sage@newdream.net>
| * osdmap: store a uuid for each osdSage Weil2012-05-012-2/+68
| | | | | | | | | | | | | | | | Rev the extended section of the map to store it. Dump it when the osd exists. Zero it out if an osd is destroyed. Provide some accessors to identify an osd given a uuid (linear search). Signed-off-by: Sage Weil <sage@newdream.net>
| * osd: make output less uglySage Weil2012-05-011-5/+5
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * osd: create a 'ready' file on mkfs completionSage Weil2012-05-011-6/+15
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| * osd: use fsync+rename when writing meta files (during mkfs)Sage Weil2012-05-011-6/+29
| | | | | | | | | | | | | | It's overkill to do the dir fsync on each file, but not worth making efficient. Signed-off-by: Sage Weil <sage@newdream.net>
* | Makefile: fix $shell_scripts substutionSage Weil2012-05-031-1/+1
| | | | | | | | | | | | No spaces here, apparently! Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
* | thread: remove get_num_threads() staticSage Weil2012-05-032-33/+0
| | | | | | | | | | | | | | This looks in /proc to count threads. Kludgey and no longer needed. Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* | global_init: do not count threads before daemonize()Sage Weil2012-05-031-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | We were verifying that there was only 1 thread (the presumably main()) when we call daemonize. However, with the new logging code, we stop a thread right before the check, and /proc apparently updates asynchronously such that our attempt to count running threads gives us a bad answer. Just remove this kludgey check; we'll have to catch this class of bugs the hard way. Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* | OpRequest: only show a small set of the oldest messages, instead of all.Joao Eduardo Luis2012-05-032-5/+38
| | | | | | | | | | Signed-off-by: Joao Eduardo Luis <jecluis@gmail.com> Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
* | rgw: update cache interface for put_obj_metaYehuda Sadeh2012-05-031-3/+4
| | | | | | | | | | | | | | | | | | This fixes issue #2381. The method interface was different than the one needed in order to override the one in RGWRados. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | doc: fix some underscoresSage Weil2012-05-031-2/+2
| | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
* | Merge branch 'wip-doc-rebase-2'Sage Weil2012-05-0369-987/+29566
|\ \