diff options
author | Samuel Just <sam.just@inktank.com> | 2013-05-09 22:24:25 -0700 |
---|---|---|
committer | Samuel Just <sam.just@inktank.com> | 2013-05-09 22:24:31 -0700 |
commit | b353da6f682d223ba14812da0fe814eca72ad6f5 (patch) | |
tree | 07760f90fb83049be7123a6528544aa4d98e7f31 /doc | |
parent | c55c6abb6055ff7983dd7b1b7c7903e8b08e23b4 (diff) | |
parent | 01a07c1ee1ea2ef134f5fddf19518eb3c0349b53 (diff) | |
download | ceph-b353da6f682d223ba14812da0fe814eca72ad6f5.tar.gz |
Merge branch 'wip_pg_res'
Reviewed-by: Sage Weil <sage@inktank.com>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/dev/osd_internals/pg_removal.rst | 80 |
1 files changed, 50 insertions, 30 deletions
diff --git a/doc/dev/osd_internals/pg_removal.rst b/doc/dev/osd_internals/pg_removal.rst index 1e0fb139152..4ac0d331b23 100644 --- a/doc/dev/osd_internals/pg_removal.rst +++ b/doc/dev/osd_internals/pg_removal.rst @@ -9,33 +9,53 @@ There are two ways for a pg to be removed from an OSD: 1. MOSDPGRemove from the primary 2. OSD::advance_map finds that the pool has been removed -In either case, our general strategy for removing the pg is to atomically remove -the metadata objects (pg->log_oid, pg->biginfo_oid) and rename the pg collections -(temp, HEAD, and snap collections) into removal collections -(see OSD::get_next_removal_coll). Those collections are then asynchronously -removed. We do not do this inline because scanning the collections to remove -the objects is an expensive operation. Atomically moving the directories out -of the way allows us to proceed as if the pg is fully removed except that we -cannot rewrite any of the objects contained in the removal directories until -they have been fully removed. PGs partition the object space, so the only case -we need to worry about is the same pg being recreated before we have finished -removing the objects from the old one. - -OSDService::deleting_pgs tracks all pgs in the process of being deleted. Each -DeletingState object in deleting_pgs lives while at least one reference to it -remains. Each item in RemoveWQ carries a reference to the DeletingState for -the relevant pg such that deleting_pgs.lookup(pgid) will return a null ref -only if there are no collections currently being deleted for that pg. -DeletingState allows you to register a callback to be called when the deletion -is finally complete. See PG::start_flush. We use this mechanism to prevent -the pg from being "flushed" until any pending deletes are complete. Metadata -operations are safe since we did remove the old metadata objects and we -inherit the osr from the previous copy of the pg. - -Similarly, OSD::osr_registry ensures that the OpSequencers for those pgs can -be reused for a new pg if created before the old one is fully removed, ensuring -that operations on the new pg are sequenced properly with respect to operations -on the old one. - -OSD::load_pgs() rebuilds deleting_pgs and osr_registry when scanning the -collections as it finds old removal collections not yet removed. +In either case, our general strategy for removing the pg is to +atomically set the metadata objects (pg->log_oid, pg->biginfo_oid) to +backfill and asynronously remove the pg collections. We do not do +this inline because scanning the collections to remove the objects is +an expensive operation. + +OSDService::deleting_pgs tracks all pgs in the process of being +deleted. Each DeletingState object in deleting_pgs lives while at +least one reference to it remains. Each item in RemoveWQ carries a +reference to the DeletingState for the relevant pg such that +deleting_pgs.lookup(pgid) will return a null ref only if there are no +collections currently being deleted for that pg. DeletingState allows +you to register a callback to be called when the deletion is finally +complete. See PG::start_flush. We use this mechanism to prevent the +pg from being "flushed" until any pending deletes are complete. +Metadata operations are safe since we did remove the old metadata +objects and we inherit the osr from the previous copy of the pg. + +The DeletingState for a pg also carries information about the status +of the current deletion and allows the deletion to be cancelled. +The possible states are: + + 1. QUEUED: the PG is in the RemoveWQ + 2. CLEARING_DIR: the PG's contents are being removed syncronously + 3. DELETING_DIR: the PG's directories and metadata being queued for removal + 4. DELETED_DIR: the final removal transaction has been queued + 5. CANCELED: the deletion has been canceled + +In 1 and 2, the deletion can be canceled. Each state transition +method (and check_canceled) returns false if deletion has been +canceled and true if the state transition was successful. Similarly, +try_stop_deletion() returns true if it succeeds in canceling the +deletion. Additionally, try_stop_deletion() in the event that it +fails to stop the deletion will not return until the final removal +transaction is queued. This ensures that any operations queued after +that point will be ordered after the pg deletion. + +_create_lock_pg must handle two cases: + + 1. Either there is no DeletingStateRef for the pg, or it failed to cancel + 2. We succeeded in canceling the deletion. + +In case 1., we proceed as if there were no deletion occuring, except that +we avoid writing to the PG until the deletion finishes. In case 2., we +proceed as in case 1., except that we first mark the PG as backfilling. + +Similarly, OSD::osr_registry ensures that the OpSequencers for those +pgs can be reused for a new pg if created before the old one is fully +removed, ensuring that operations on the new pg are sequenced properly +with respect to operations on the old one. |