diff options
author | Samuel Just <sam.just@inktank.com> | 2013-05-09 12:13:46 -0700 |
---|---|---|
committer | Samuel Just <sam.just@inktank.com> | 2013-05-09 17:28:15 -0700 |
commit | 0ef9b1e04991e1749a37f57fc964c8f0e6d7edc3 (patch) | |
tree | 8b6be179c6b828deb9d541dd5d23a71df09880d3 | |
parent | 90f50c487acea3b0a395fdcc6ea6640c2a8e7d9d (diff) | |
download | ceph-0ef9b1e04991e1749a37f57fc964c8f0e6d7edc3.tar.gz |
osd_internals/pg_removal.rst: update for pg resurrection
Signed-off-by: Samuel Just <sam.just@inktank.com>
-rw-r--r-- | doc/dev/osd_internals/pg_removal.rst | 80 |
1 files changed, 50 insertions, 30 deletions
diff --git a/doc/dev/osd_internals/pg_removal.rst b/doc/dev/osd_internals/pg_removal.rst index 1e0fb139152..4ac0d331b23 100644 --- a/doc/dev/osd_internals/pg_removal.rst +++ b/doc/dev/osd_internals/pg_removal.rst @@ -9,33 +9,53 @@ There are two ways for a pg to be removed from an OSD: 1. MOSDPGRemove from the primary 2. OSD::advance_map finds that the pool has been removed -In either case, our general strategy for removing the pg is to atomically remove -the metadata objects (pg->log_oid, pg->biginfo_oid) and rename the pg collections -(temp, HEAD, and snap collections) into removal collections -(see OSD::get_next_removal_coll). Those collections are then asynchronously -removed. We do not do this inline because scanning the collections to remove -the objects is an expensive operation. Atomically moving the directories out -of the way allows us to proceed as if the pg is fully removed except that we -cannot rewrite any of the objects contained in the removal directories until -they have been fully removed. PGs partition the object space, so the only case -we need to worry about is the same pg being recreated before we have finished -removing the objects from the old one. - -OSDService::deleting_pgs tracks all pgs in the process of being deleted. Each -DeletingState object in deleting_pgs lives while at least one reference to it -remains. Each item in RemoveWQ carries a reference to the DeletingState for -the relevant pg such that deleting_pgs.lookup(pgid) will return a null ref -only if there are no collections currently being deleted for that pg. -DeletingState allows you to register a callback to be called when the deletion -is finally complete. See PG::start_flush. We use this mechanism to prevent -the pg from being "flushed" until any pending deletes are complete. Metadata -operations are safe since we did remove the old metadata objects and we -inherit the osr from the previous copy of the pg. - -Similarly, OSD::osr_registry ensures that the OpSequencers for those pgs can -be reused for a new pg if created before the old one is fully removed, ensuring -that operations on the new pg are sequenced properly with respect to operations -on the old one. - -OSD::load_pgs() rebuilds deleting_pgs and osr_registry when scanning the -collections as it finds old removal collections not yet removed. +In either case, our general strategy for removing the pg is to +atomically set the metadata objects (pg->log_oid, pg->biginfo_oid) to +backfill and asynronously remove the pg collections. We do not do +this inline because scanning the collections to remove the objects is +an expensive operation. + +OSDService::deleting_pgs tracks all pgs in the process of being +deleted. Each DeletingState object in deleting_pgs lives while at +least one reference to it remains. Each item in RemoveWQ carries a +reference to the DeletingState for the relevant pg such that +deleting_pgs.lookup(pgid) will return a null ref only if there are no +collections currently being deleted for that pg. DeletingState allows +you to register a callback to be called when the deletion is finally +complete. See PG::start_flush. We use this mechanism to prevent the +pg from being "flushed" until any pending deletes are complete. +Metadata operations are safe since we did remove the old metadata +objects and we inherit the osr from the previous copy of the pg. + +The DeletingState for a pg also carries information about the status +of the current deletion and allows the deletion to be cancelled. +The possible states are: + + 1. QUEUED: the PG is in the RemoveWQ + 2. CLEARING_DIR: the PG's contents are being removed syncronously + 3. DELETING_DIR: the PG's directories and metadata being queued for removal + 4. DELETED_DIR: the final removal transaction has been queued + 5. CANCELED: the deletion has been canceled + +In 1 and 2, the deletion can be canceled. Each state transition +method (and check_canceled) returns false if deletion has been +canceled and true if the state transition was successful. Similarly, +try_stop_deletion() returns true if it succeeds in canceling the +deletion. Additionally, try_stop_deletion() in the event that it +fails to stop the deletion will not return until the final removal +transaction is queued. This ensures that any operations queued after +that point will be ordered after the pg deletion. + +_create_lock_pg must handle two cases: + + 1. Either there is no DeletingStateRef for the pg, or it failed to cancel + 2. We succeeded in canceling the deletion. + +In case 1., we proceed as if there were no deletion occuring, except that +we avoid writing to the PG until the deletion finishes. In case 2., we +proceed as in case 1., except that we first mark the PG as backfilling. + +Similarly, OSD::osr_registry ensures that the OpSequencers for those +pgs can be reused for a new pg if created before the old one is fully +removed, ensuring that operations on the new pg are sequenced properly +with respect to operations on the old one. |