summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* osd/ReplcatedPG: maybe_handle_cache stylewip-cacheSage Weil2013-10-221-16/+24
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: skip promote for DELETESage Weil2013-10-222-0/+17
| | | | | | | If an op starts with DELETE there is no need to promote the old content from the base tier. Signed-off-by: Sage Weil <sage@inktank.com>
* wip cache_flushSage Weil2013-10-221-0/+15
|
* osd/ReplicatedPG: implement cache_evictSage Weil2013-10-223-6/+135
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* librados: add an aio_operate that takes a write and flagsSage Weil2013-10-224-16/+33
| | | | | | Until now you could only pass flags to read operations. Signed-off-by: Sage Weil <sage@inktank.com>
* osd/osd_types: introduce helper for osd op flags -> string conversionSage Weil2013-10-224-12/+44
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* librados, osd: add IGNORE_OVERLAY flagSage Weil2013-10-225-1/+12
| | | | | | | Add a flag that will make the OSD bypass the cache overlay logic. This is needed in order to handle operations like CACHE_EVICT and CACHE_FLUSH. Signed-off-by: Sage Weil <sage@inktank.com>
* librados: add cache_flush(), cache_evict() methodsSage Weil2013-10-226-0/+51
| | | | | | Not yet implemented by the OSD. Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: set object_info and snapset xattrs on promoteSage Weil2013-10-221-5/+15
| | | | | | | For the normal write path, prepare_transaction() handles this for us. In this case, we need to do it explicitly. Signed-off-by: Sage Weil <sage@inktank.com>
* os/FileStore: fix getattr return value when using omapSage Weil2013-10-221-1/+1
| | | | | | | The return value should be the length of the value, even when it was stored in omap. Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: handle is_whiteout in do_osd_ops()Sage Weil2013-10-221-9/+10
| | | | | | | | | Most of the time we handle whiteouts by returning ENOENT before we even get this far. However, for a mixed read/write transaction (e.g., a guard) or certain ops (like create exclusive) we need to deal with the exists == true and whiteout flag set case explicitly. Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: clear whiteout when writing into cache tierSage Weil2013-10-222-0/+22
| | | | | | | If we have a whiteout object and then write over it, clear the whiteout flag. Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: set whiteout in cache pool on deleteSage Weil2013-10-222-8/+27
| | | | | | | If we delete an object in the cache pool, set the whiteout flag instead of removing the on-disk object. Signed-off-by: Sage Weil <sage@inktank.com>
* os/ObjectStore: fix RMATTRS encodingSage Weil2013-10-221-1/+1
| | | | | | Apparently nobody uses this! Signed-off-by: Sage Weil <sage@inktank.com>
* ceph_test_rados_api_tier: verify delete creates whiteoutsSage Weil2013-10-221-0/+63
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: ENOENT when deleting a whiteoutSage Weil2013-10-221-1/+1
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: create whiteout on promote ENOENTSage Weil2013-10-222-9/+34
| | | | | | | If we try to fetch an object from the base tier and it is not present, we can create a whiteout object. Signed-off-by: Sage Weil <sage@inktank.com>
* ceph_test_rados_api_tier: add simple promote-on-read testSage Weil2013-10-221-0/+70
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* ceph_test_rados_api_tier: rename testsSage Weil2013-10-221-3/+3
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: use simple_repop_{create,submit} for finish_promoteSage Weil2013-10-221-13/+6
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* osd/ReplicatedPG: move r<0 handling into finish_promote()Sage Weil2013-10-222-19/+22
| | | | | | Let logic in header, and will let us handle ENOENT with a whiteout. Signed-off-by: Sage Weil <sage@inktank.com>
* Merge remote-tracking branch 'gh/wip-promote-copies' into wip-tierSage Weil2013-10-2210-190/+529
|\
| * OSD: object_copy_data_t should take advantage of bufferlist-based getattrsGreg Farnum2013-10-173-6/+3
| | | | | | | | | | | | Now we don't need to do the silly bufferlist-bufferptr non-magic. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ObjectStore: add a bufferlist-based getattrs() functionGreg Farnum2013-10-171-0/+10
| | | | | | | | Signed-off-by: Greg Farnum <greg@inktank.com>
| * workunits: break down cache pool tests to be more precise; expand someGreg Farnum2013-10-171-22/+41
| | | | | | | | Signed-off-by: Greg Farnum <greg@inktank.com>
| * workunits: check errors propagate on cache pools in caching_redirects.shGreg Farnum2013-10-171-0/+3
| | | | | | | | Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: promote: handle failed promotesGreg Farnum2013-10-171-2/+18
| | | | | | | | | | | | | | If we get an error back, reply to the client directly and remove the op which triggered promotion from our blocked op queue. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: promote: add the OpRequest to the CallbackGreg Farnum2013-10-172-3/+5
| | | | | | | | | | | | This way we can do stuff to it, and we're about to. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: promote: move the promote functionality into finish_promoteGreg Farnum2013-10-172-36/+43
| | | | | | | | | | | | | | | | This way we can have a couple of functions that handle each type of case, and let the PromoteCallback choose between them. That's much better than doing it all inline in the callback. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: copy: don't return from finish_copyfromGreg Farnum2013-10-172-5/+6
| | | | | | | | | | | | The return value is meaningless; nothing in this function can fail. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: promote: first draft pass at doing object promotionGreg Farnum2013-10-172-6/+97
| | | | | | | | | | | | | | | | This is not yet at all complete -- among other things, it will retry forever on any object which doesn't exist in the underlying pool. But it demonstrates the approach reasonably clearly. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: copy: switch out the CopyCallback interfaceGreg Farnum2013-10-172-45/+55
| | | | | | | | | | | | | | The tuple was already unwieldy with 4 members; I didn't want to add more. Instead, create a new CopyResults struct which contains all the object info and completion data, and pass the retval and a CopyResults* in the CopyCallbackResults tuple.
| * Objecter: expose the copy-get()'ed object's categoryGreg Farnum2013-10-173-5/+12
| | | | | | | | | | | | In the OSD, store the category in the CopyOp using this. Signed-off-by: Greg Farnum <greg@inktank.com>
| * osd: add category to object_copy_data_tGreg Farnum2013-10-173-0/+4
| | | | | | | | | | | | | | | | | | We don't bump the encoding version -- and stick it in the middle -- since it's still brand-new. For simplicity, we encode it unconditionally rather than trying to embed it alongside the attrs or with its own "complete" flag in the cursor. Signed-off-by: Greg Farnum <greg@inktank.com>
| * OSD: add back CEPH_OSD_OP_COPY_GET, and use it in the ObjecterGreg Farnum2013-10-176-3/+41
| | | | | | | | | | | | | | | | | | | | This one is encoded with version information. We are not doing anything to control which op gets sent by the client, but after discussion with Sam we think this op isn't accessible enough to clients (right now it's only triggered by a client sending copy-from, which can only happen via ceph-test-rados) to require compatibility versioning. Signed-off-by: Greg Farnum <greg@inktank.com>
| * OSD: rename CEPH_OSD_OP_COPY_GET -> CEPH_OSD_OP_COPY_GET_CLASSICGreg Farnum2013-10-176-8/+10
| | | | | | | | | | | | | | In order to introduce versioning of copy-get, we need to make it a different op that has the versioning infrastructure from the start. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: copy: move the COPY_GET implementation into its own functionGreg Farnum2013-10-172-75/+85
| | | | | | | | | | | | | | It was getting long, isn't terribly dependent on access to do_osd_ops() state, and will be easier to make generic as its own function. Signed-off-by: Greg Farnum <greg@inktank.com>
| * osd: Add a new object_copy_data_t, and use it in the OSD/ObjecterGreg Farnum2013-10-175-31/+119
| | | | | | | | | | | | | | | | | | | | Right now this is very primitive, but we're about to extend it to deal with request versioning appropriately, and adding in some extra fields. Sadly we are doing a little extra copying in the Objecter as a result, but too bad -- being able to do updates will be worth it. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: cache: don't handle cache if the obc is blockedGreg Farnum2013-10-171-0/+4
| | | | | | | | | | | | | | Right now the only way that can happen is if we're in the middle of a promote! Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: copy: add a C_KickBlockedObjectGreg Farnum2013-10-171-0/+11
| | | | | | | | | | | | | | As the name says, you give it an obc and it kicks the block list when finish()ed. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: add a Context *ondone to RepGathersGreg Farnum2013-10-172-6/+19
| | | | | | | | | | | | | | Make a few changes to make sure we trigger it when appropriate. We'll use this shortly for object promotion, and perhaps for other things in future. Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: copy: rename CopyOp::version -> user_versionGreg Farnum2013-10-082-6/+6
| | | | | | | | | | | | | | | | This version is a user version, and since we're in the OSD we should call it such. (In particular, we may want to keep track of the internal version too when doing cache promotes.) Signed-off-by: Greg Farnum <greg@inktank.com>
| * ReplicatedPG: copy: do not let start_copy() return error codesGreg Farnum2013-10-082-7/+13
| | | | | | | | | | | | | | | | There's no failure it can actually run into, and handling error codes in some of its callers is going to be a pain. While we're here, document the parameters. Signed-off-by: Greg Farnum <greg@inktank.com>
* | librados, osd: list and get HitSets via libradosSage Weil2013-10-2211-9/+322
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | osd/ReplicatedPG: add basic HitSet trackingSage Weil2013-10-225-10/+231
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | osd: add hit_set_* parameters to pg_pool_tSage Weil2013-10-222-4/+51
| | | | | | | | | | | | | | Add pool properties to control what type of HitSet we want to use, along with some (mostly generic) parameters. Signed-off-by: Sage Weil <sage@inktank.com>
* | librados: add wait_for_latest_map()Sage Weil2013-10-227-0/+71
| | | | | | | | | | | | | | There are times when we need to make sure the client has the latest osdmap, for example after sending a mon command modifying pool properties. Signed-off-by: Sage Weil <sage@inktank.com>
* | librados: expose methods for calculating object hash positionSage Weil2013-10-224-0/+28
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | osdc/Objecter: expose methods for getting object hash position and pgSage Weil2013-10-222-0/+21
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | osd: capture hashing of objects to hash positions/pgs in pg_pool_tSage Weil2013-10-223-12/+26
| | | | | | | | | | | | | | The hashing is dependent on pool properties; capture (more of) it in a method instead of having it in OSDMap. Signed-off-by: Sage Weil <sage@inktank.com>