summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* mds: do not allow GLAZYIO in mix->sync statewip-mds-lazyio-cuttlefish-minimalSage Weil2013-07-061-2/+2
| | | | | | | | | | | | | GLAZYIO is not allowed in SYNC, so we cannot allow it in the preceding gather state. I verified the other GLAZYIO rules look ok. We should make a validater to confirm that no gather state includes caps that its target state does not... or at least assert as much in eval_gather(). Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit b88938e5a646fbf175a7135e872bcb2d1afafbb8)
* client: set issue_seq (not seq) in cap releaseSage Weil2013-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | We regularly have been observing a stall where the MDS is blocked waiting for a cap revocation (Ls, in our case) and never gets a reply. We finally tracked down the sequence: - mds issues cap seq 1 to client - mds does revocation (seq 2) - client replies - much time goes by - client trims inode from cache, sends release with seq == 2 - mds ignores release because its issue_seq is 1 - mds later tries to revoke other caps - client discards message because it doesn't have the inode in cache The problem is simply that we are using seq instead of issue_seq in the cap release message. Note that the other release call site in encode_inode_release() is correct. That one is much more commonly triggered by short tests, as compared to this case where the inode needs to get pushed out of the client cache. Signed-off-by: Sage Weil <sage@inktank.com>
* ceph-fuse: create finisher threads after fork()Sage Weil2013-06-081-28/+32
| | | | | | | | | | | | | | The ObjectCacher and MonClient classes both instantiate Finisher threads. We need to make sure they are created *after* the fork(2) or else the process will fail to join() them on shutdown, and the threads will not exist while fuse is doing useful work. Put CephFuse on the heap and move all this initalization into the child block, and make sure errors are passed back to the parent. Fix-proposed-by: Alexandre Marangone <alexandre.maragone@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 4fa5f99a40792341d247e51488c37301da3c4e4f)
* osd: do not include logbl in scrub mapSage Weil2013-06-073-14/+6
| | | | | | | | | | | | | | | This is a potentially use object/file, usually prefixed by a zeroed region on disk, that is not used by scrub at all. It dates back to f51348dc8bdd5071b7baaf3f0e4d2e0496618f08 (2008) and the original version of scrub. This *might* fix #4179. It is not a leak per se, but I observed 1GB scrub messages going over the write. Maybe the allocations are causing fragmentation, or the sub_op queues are growing. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com> (cherry picked from commit 0b036ecddbfd82e651666326d6f16b3c000ade18)
* rgw: handle deep uri resourcesYehuda Sadeh2013-06-071-0/+25
| | | | | | | | | | | | In case of deep uri resources (ones created beyond a single level of hierarchy, e.g. auth/v1.0) we want to create a new empty handlers for the path if no handlers exists. E.g., for auth/v1.0 we need to have a handler for 'auth', otherwise the default S3 handler will be used, which we don't want. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit ad3934e335399f7844e45fcfd17f7802800d2cb3)
* rgw: fix get_resource_mgr() to correctly identify resourceYehuda Sadeh2013-06-071-2/+2
| | | | | | | | | | | Fixes: #5262 The original test was not comparing the correct string, ended up with the effect of just checking the substring of the uri to match the resource. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit 8d55b87f95d59dbfcfd0799c4601ca37ebb025f5)
* rgw: add 'cors' to the list of sub-resourcesYehuda Sadeh2013-06-071-0/+1
| | | | | | | | | | | Fixes: #5261 Backport: cuttlefish Add 'cors' to the list of sub-resources, otherwise auth signing is wrong. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit 9a0a9c205b8c24ca9c1e05b0cf9875768e867a9e)
* mon: fix preforker exit behavior behaviorSage Weil2013-06-062-2/+7
| | | | | | | | | | | | | | | | In 3c5706163b72245768958155d767abf561e6d96d we made exit() not actually exit so that the leak checking would behave for a non-forking case. That is only needed for the normal exit case; every other case expects exit() to actually terminate and not continue execution. Instead, make a signal_exit() method that signals the parent (if any) and then lets you return. exit() goes back to it's usual behavior, fixing the many other calls in main(). Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com> (cherry picked from commit 92d085f7fd6224ffe5b7651c1f83b093f964b5cd)
* rados.py: correct some C typesJosh Durgin2013-06-061-7/+7
| | | | | | | | | | trunc was getting size_t instead of uint64_t, leading to bad results in 32-bit environments. Explicitly cast to the desired type everywhere, so it's clear the correct type is being used. Fixes: #5233 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit 6dd7d469000144b499af84bda9b735710bb5cec3)
* v0.61.3v0.61.3Gary Lowell2013-06-052-1/+7
|
* os/LevelDBStore: only remove logger if non-nullSage Weil2013-06-051-1/+2
| | | | | | Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit ce67c58db7d3e259ef5a8222ef2ebb1febbf7362) Fixes: #5255
* test_librbd: use correct type for varargs snap testJosh Durgin2013-06-041-7/+9
| | | | | | | | uint64_t is passed in, but int was extracted. This fails on 32-bit builds. Fixes: #5220 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> (cherry picked from commit 17029b270dee386e12e5f42c2494a5feffd49b08)
* os/LevelDBStore: fix merge loopSage Weil2013-06-041-6/+7
| | | | | | | | We were double-incrementing p, both in the for statement and in the body. While we are here, drop the unnecessary else's. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit eb6d5fcf994d2a25304827d7384eee58f40939af)
* msgr: add get_messenger() to ConnectionSage Weil2013-06-021-0/+4
| | | | | | This was part of commit 27381c0c6259ac89f5f9c592b4bfb585937a1cfc. Signed-off-by: Sage Weil <sage@inktank.com>
* mon: start lease timer from peon_init()Sage Weil2013-06-022-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the scenario: - leader wins, peons lose - leader sees it is too far behind on paxos and bootstraps - leader tries to sync with someone, waits for a quorum of the others - peons sit around forever waiting The problem is that they never time out because paxos never issues a lease, which is the normal timeout that lets them detect a leader failure. Avoid this by starting the lease timeout as soon as we lose the election. The timeout callback just does a bootstrap and does not rely on any other state. I see one possible danger here: there may be some "normal" cases where the leader takes a long time to issue its first lease that we currently tolerate, but won't with this new check in place. I hope that raising the lease interval/timeout or reducing the allowed paxos drift will make that a non-issue. If it is problematic, we will need a separate explicit "i am alive" from the leader while it is getting ready to issue the lease to prevent a live-lock. Backport: cuttlefish, bobtail Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit f1ccb2d808453ad7ef619c2faa41a8f6e0077bd9)
* mon: discard messages from disconnected clientsSage Weil2013-06-022-1/+13
| | | | | | | | | | If the client is not connected, discard the message. They will reconnect and resend anyway, so there is no point in processing it twice (now and later). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit fb3cd0c2a8f27a1c8d601a478fd896cc0b609011)
* msgr: add Messenger reference to ConnectionSage Weil2013-06-024-4/+7
| | | | | | | This allows us to get the messenger associated with a connection. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 92a558bf0e5fee6d5250e1085427bff22fe4bbe4)
* mon/Paxos: adjust trimming defaults up; rename optionsSage Weil2013-06-022-5/+7
| | | | | | | | | | | | | | - trim more at a time (by an order of magnitude) - rename fields to paxos_trim_{min,max}; only trim when there are min items that are trimmable, and trim at most max items at a time. - adjust the paxos_service_trim_{min,max} values up by a factor of 2. Since we are compacting every time we trim, adjusting these up mean less frequent compactions and less overall work for the monitor. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit 6b8e74f0646a7e0d31db24eb29f3663fafed4ecc)
* common/Preforker: fix warningsSage Weil2013-06-021-2/+3
| | | | | Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit a284c9ece85f11d020d492120be66a9f4c997416)
* fix test users of LevelDBStoreSage Weil2013-06-026-6/+6
| | | | | | | Need to pass in cct. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 446e0770c77de5d72858dcf7a95c5b19f642cf98)
* mon: destroy MonitorDBStore before g_ceph_contextSage Weil2013-06-021-8/+9
| | | | | | | | | | | | | | | | | | | | | Put it on the heap so that we can destroy it before the g_ceph_context cct that it references. This fixes a crash like *** Caught signal (Segmentation fault) ** in thread 4034a80 ceph version 0.63-204-gcf9aa7a (cf9aa7a0037e56eada8b3c1bb59d59d0bfe7bba5) 1: ceph-mon() [0x59932a] 2: (()+0xfcb0) [0x4e41cb0] 3: (Mutex::Lock(bool)+0x1b) [0x6235bb] 4: (PerfCountersCollection::remove(PerfCounters*)+0x27) [0x6a0877] 5: (LevelDBStore::~LevelDBStore()+0x1b) [0x582b2b] 6: (LevelDBStore::~LevelDBStore()+0x9) [0x582da9] 7: (main()+0x1386) [0x48db16] 8: (__libc_start_main()+0xed) [0x658076d] 9: ceph-mon() [0x4909ad] Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit df2d06db6f3f7e858bdadcc8cd2b0ade432df413)
* mon: fix leak of health_monitor and config_key_serviceSage Weil2013-06-0210-68/+27
| | | | | | | | Switch to using regular pointers here. The lifecycle of these services is very simple such that refcounting is overkill. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit c888d1d3f1b77e62d1a8796992e918d12a009b9d)
* mon: return instead of exit(3) via preforkerSage Weil2013-06-022-3/+3
| | | | | | | | This lets us run all the locally-scoped dtors so that leak checking will work. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 3c5706163b72245768958155d767abf561e6d96d)
* os/LevelDBStore: add perfcountersSage Weil2013-06-024-35/+96
| | | | | Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 7802292e0a49be607d7ba139b44d5ea1f98e07e6)
* mon: make compaction bounds overlapSage Weil2013-06-023-6/+9
| | | | | | | | | | When we trim items N to M, compact over range (N-1) to M so that the items in the queue will share bounds and get merged. There is no harm in compacting over a larger range here when the lower bound is a key that doesn't exist anyway. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit a47ca583980523ee0108774b466718b303bd3f46)
* os/LevelDBStore: merge adjacent ranges in compactionqueueSage Weil2013-06-022-9/+34
| | | | | | | | If we get behind and multiple adjacent ranges end up in the queue, merge them so that we fire off compaction on larger ranges. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit f628dd0e4a5ace079568773edfab29d9f764d4f0)
* mon: compact trimmed range, not entire prefixSage Weil2013-06-022-11/+12
| | | | | | | | | | | This will reduce the work that leveldb is asked to do by only triggering compaction of the keys that were just trimmed. We ma want to further reduce the work by compacting less frequently, but this is at least a step in that direction. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 6da4b20ca53fc8161485c8a99a6b333e23ace30e)
* mon/MonitorDBStore: allow compaction of rangesSage Weil2013-06-021-15/+30
| | | | | | | | | | | | | | Allow a transaction to describe the compaction of a range of keys. Do this in a backward compatible say, such that older code will interpret the compaction of a prefix + range as compaction of the entire prefix. This allows us to avoid introducing any new feature bits. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit ab09f1e5c1305a64482ebbb5a6156a0bb12a63a4) Conflicts: src/mon/MonitorDBStore.h
* os/LevelDBStore: allow compaction of key rangesSage Weil2013-06-022-18/+27
| | | | | Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit e20c9a3f79ccfeb816ed634ca25de29fc5975ea8)
* os/LevelDBStore: do compact_prefix() work asynchronouslySage Weil2013-06-023-2/+60
| | | | | | | | | | | We generally do not want to block while compacting a range of leveldb. Push the blocking+waiting off to a separate thread. (leveldb will do what it can to avoid blocking internally; no reason for us to wait explicitly.) This addresses part of #5176. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 4af917d4478ec07734a69447420280880d775fa2)
* qa: rsync test: exclude /usr/localSage Weil2013-06-011-4/+4
| | | | | | | | Some plana have non-world-readable crap in /usr/local/samba. Avoid /usr/local entirely for that and any similar landmines. Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit 82211f2197241c4f3d3135fd5d7f0aa776eaeeb6)
* mon: fix uninitialized fields in MMonHealthSage Weil2013-05-311-6/+3
| | | | | | Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> (cherry picked from commit d7e2ab1451e284cd4273cca47eec75e1d323f113)
* PGLog: only add entry to caller_ops in add() if reqid_is_indexed()Samuel Just2013-05-311-1/+2
| | | | | | Fixes: #5216 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* PG: don't write out pg map epoch every handle_activate_mapSamuel Just2013-05-313-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | We don't actually need to write out the pg map epoch on every activate_map as long as: a) the osd does not trim past the oldest pg map persisted b) the pg does update the persisted map epoch from time to time. To that end, we now keep a reference to the last map persisted. The OSD already does not trim past the oldest live OSDMapRef. Second, handle_activate_map will trim if the difference between the current map and the last_persisted_map is large enough. Fixes: #4731 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit 2c5a9f0e178843e7ed514708bab137def840ab89) Conflicts: src/common/config_opts.h src/osd/PG.cc - last_persisted_osdmap_ref gets set in the non-static PG::write_info
* upstart: handle upper case in cluster name and idAlexandre Marangone2013-05-314-4/+4
| | | | | Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com> (cherry picked from commit 851619ab6645967e5d7659d9b0eea63d5c402b15)
* OSDMonitor: skip new pools in update_pools_status() and get_pools_health()Samuel Just2013-05-311-0/+4
| | | | | | | | | | | | | New pools won't be full. mon->pgmon()->pg_map.pg_pool_sum[poolid] will implicitly create an entry for poolid causing register_new_pgs() to assume that the newly created pgs in the new pool are in fact a result of a split preventing MOSDPGCreate messages from being sent out. Fixes: #4813 Backport: cuttlefish Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit 0289c445be0269157fa46bbf187c92639a13db46)
* rgw: only append prefetched data if reading from headYehuda Sadeh2013-05-311-17/+17
| | | | | | | | | | | | Fixes: #5209 Backport: bobtail, cuttlefish If the head object wrongfully contains data, but according to the manifest we don't read from the head, we shouldn't copy the prefetched data. Also fix the length calculation for that data. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit c5fc52ae0fc851444226abd54a202af227d7cf17)
* rgw: don't copy object idtag when copying objectYehuda Sadeh2013-05-311-0/+1
| | | | | | | | | | | | Fixes: #5204 When copying object we ended up also copying the original object idtag which overrode the newly generated one. When refcount put is called with the wrong idtag the count does't go down. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com> (cherry picked from commit b1312f94edc016e604f1d05ccfe2c788677f51d1)
* debian: sync up postinst and prerm with latestSage Weil2013-05-306-5/+114
| | | | | | | | | | - do not use invoke-rc.d for upstart - do not stop daemons on upgrade - misc other cleanups This corresponds to the state of master as of cf9aa7a. Signed-off-by: Sage Weil <sage@inktank.com>
* mon: Monitor: backup monmap using all ceph features instead of quorum'sJoao Eduardo Luis2013-05-301-1/+1
| | | | | | | | | | | | | | | When a monitor is freshly created and for some reason its initial sync is aborted, it will end up with an incorrect backup monmap. This monmap is incorrect in the sense that it will not contain the monitor's names as it will expect on the next run. This results from us being using the quorum features to encode the monmap when backing it up, instead of CEPH_FEATURES_ALL. Fixes: #5203 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> (cherry picked from commit 626de387e617db457d6d431c16327c275b0e8a34)
* osd: do not assume head obc object exists when getting snapdirSage Weil2013-05-301-0/+5
| | | | | | | | | | | | | For a list-snaps operation on the snapdir, do not assume that the obc for the head means the object exists. This fixes a race between a head deletion and a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress test when thrashing OSDs. Fixes: #5183 Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com> (cherry picked from commit 29e4e7e316fe3f3028e6930bb5987cfe3a5e59ab)
* osd: initialize new_state field when we use itSage Weil2013-05-291-1/+4
| | | | | | | | | | | If we use operator[] on a new int field its value is undefined; avoid reading it or using |= et al until we initialize it. Fixes: #4967 Backport: cuttlefish, bobtail Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com> (cherry picked from commit 50ac8917f175d1b107c18ecb025af1a7b103d634)
* HashIndex: sync top directory during start_split,merge,col_splitSamuel Just2013-05-281-3/+12
| | | | | | | | | | | | Otherwise, the links might be ordered after the in progress operation tag write. We need the in progress operation tag to correctly recover from an interrupted merge, split, or col_split. Fixes: #5180 Backport: cuttlefish, bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> (cherry picked from commit 5bca9c38ef5187c7a97916970a7fa73b342755ac)
* mon: Paxos: get rid of the 'prepare_bootstrap()' mechanismJoao Eduardo Luis2013-05-274-26/+4
| | | | | | | | | | | | | | | | We don't need it after all. If we are in the middle of some proposal, then we guarantee that said proposal is likely to be retried. If we haven't yet proposed, then it's forever more likely that a client will eventually retry the message that triggered this proposal. Basically, this mechanism attempted at fixing a non-problem, and was in fact triggering some unforeseen issues that would have required increasing the code complexity for no good reason. Fixes: #5102 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> (cherry picked from commit e15d29094503f279d444eda246fc45c09f5535c9)
* mon: Paxos: finish queued proposals instead of clearing the listJoao Eduardo Luis2013-05-271-4/+6
| | | | | | | | | | | | | | | | | | | | | By finishing these Contexts, we make sure the Contexts they enclose (to be called once the proposal goes through) will behave as their were initially planned: for instance, a C_Command() may retry the command if a -EAGAIN is passed to 'finish_contexts', while a C_Trimmed() will simply set 'going_to_trim' to false. This aims at fixing at least a bug in which Paxos will stop trimming if an election is triggered while a trim is queued but not yet finished. Such happens because it is the C_Trimmed() context that is responsible for resetting 'going_to_trim' back to false. By clearing all the contexts on the proposal list instead of finishing them, we stay forever unable to trim Paxos again as 'going_to_trim' will stay True till the end of time as we know it. Fixes: #4895 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> (cherry picked from commit 586e8c2075f721456fbd40f738dab8ccfa657aa8)
* mon: Paxos: finish_proposal() when we're finished recoveringJoao Eduardo Luis2013-05-271-0/+2
| | | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> (cherry picked from commit 2ff23fe784245f3b86bc98e0434b21a5318e0a7b)
* Merge branch 'wip_scrub_tphandle' into cuttlefishSamuel Just2013-05-234-33/+66
|\ | | | | | | | | Fixes: #5159 Reviewed-by: Sage Weil <sage@inktank.com>
| * PG: ping tphandle during omap loop as wellSamuel Just2013-05-232-0/+8
| | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * PG: reset timeout in _scan_list for each object, read chunkSamuel Just2013-05-231-0/+2
| | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * OSD,PG: pass tphandle down to _scan_listSamuel Just2013-05-233-33/+56
| | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>