summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* ceph_mon_kvstore_fix: new reworkwip-4521-toolJoao Eduardo Luis2013-04-181-101/+188
| | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* ceph_mon_kvstore_fix: rework fixJoao Eduardo Luis2013-04-181-77/+206
| | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* ceph_mon_kvstore_fix: add in-mem class to perform dry runsJoao Eduardo Luis2013-04-181-7/+29
| | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* mon: MonitorDBStore: allow function override when inheriting the classJoao Eduardo Luis2013-04-181-4/+4
| | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* mon: MonitorDBStore: use stringify()Joao Eduardo Luis2013-04-181-15/+6
| | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* ceph_mon_kvstore_fix: add a dry runJoao Eduardo Luis2013-04-181-2/+65
| | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* ceph_mon_kvstore_fix: force '--i-am-sure' parameterJoao Eduardo Luis2013-04-181-1/+32
| | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* mon: ceph_mon_kvstore_fix: recreate full versions when possibleJoao Eduardo Luis2013-04-181-13/+123
| | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* Merge pull request #208 from dalgaaf/wip-4521-buildJoão Eduardo Luís2013-04-182-0/+2
|\ | | | | | | | | | | add ceph_mon_kvstore_fix to RPM/Debian packaging Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
| * ceph.install: add ceph_mon_kvstore_fixDanny Al-Gaaf2013-04-181-0/+1
| | | | | | | | Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
| * ceph.spec.in: add ceph_mon_kvstore_fix to ceph packageDanny Al-Gaaf2013-04-181-0/+1
|/ | | | Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
* mon: add tool to fix store conversion bug manuallyJoao Eduardo Luis2013-04-183-0/+305
| | | | | | | | | | This tool is to be used to fix the cause of #4521, and it will take an old-format store and convert all the osdmap's full versions to the new-format k/v store. Fixes: #4521 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
* osd/PG.cc: initialize PG::flushed in constructorDanny Al-Gaaf2013-04-171-0/+1
| | | | | | | | Initialize PG::flushed in constructor with false as described in doc/dev/osd_internals/pg.rst . Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de> (cherry picked from commit fb840c8ff75b0c66dfeed48e8558542fe3da4c24)
* Merge pull request #215 from ceph/wip-leveldb-configSage Weil2013-04-175-7/+104
|\ | | | | | | | | os: bring leveldbstore options up to date Reviewed-by: Sage Weil <sage@inktank.com>
| * config: provide settings for the LevelDB stores we useGreg Farnum2013-04-163-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we can set up the LevelDB options internally, provide config options on the OSD and the Monitor. We leave the OSD values at the defaults for now as they're performance-sensitive, but we set new values on the Monitor so that it can scale to large PGMaps. (Previously there were issues with large PGMaps taking forever to write; these changes to the use of compression and the default block and write buffers counteract them.) Since we pass these variables through, users who are interested in doing so now can test and tune them more appropriately. Reported-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Greg Farnum <greg@inktank.com>
| * os: bring leveldbstore options up to dateGreg Farnum2013-04-122-7/+78
| | | | | | | | | | | | | | | | | | | | | | LevelDB has a lot of options which we don't implement right now. Add an options struct to the LevelDBStore which users can access as they wish in order to set values different from the defaults. This will let us set various size values, as well as turning on caching or bloom filter read optimizations. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Greg Farnum <greg@inktank.com>
* | Fix policy handling for RESTful admin api.caleb miles2013-04-171-1/+0
| | | | | | | | Signed-off-by caleb miles <caleb.miles@inktank.com>
* | qa: pull qemu-iotests from ceph.com mirrorSage Weil2013-04-161-1/+1
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | Merge pull request #214 from ceph/wip-objectcacher-handler-orderedSage Weil2013-04-168-53/+121
|\ \ | | | | | | | | | | | | keep write responses to clones in order Reviewed-by: Sage Weil <sage@inktank.com>
| * | LibrbdWriteback: complete writes strictly in orderJosh Durgin2013-04-102-2/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RADOS returns writes to the same object in the same order. The ObjectCacher relies on this assumption to make sure previous writes are complete and maintain consistency. Reads, however, may be reordered with respect to each other. When writing to an rbd clone, reads to the parent must be performed when the object does not exist in the child yet. These reads may be reordered, resulting in the original writes being reordered. This breaks the assmuptions of the ObjectCacher, causing an assert to fail. To fix this, keep a per-object queue of outstanding writes to an object in the LibrbdWriteback handler, and finish them in the order in which they were sent. Fixes: #4531 Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
| * | LibrbdWriteback: removed unused and undefined methodJosh Durgin2013-04-101-1/+0
| | | | | | | | | | | | Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
| * | LibrbdWriteback: use a tid_t for tidsJosh Durgin2013-04-101-1/+1
| | | | | | | | | | | | | | | | | | An int could be much smaller, leading to overflow and bad behavior. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
| * | WritebackHandler: make read return nothingJosh Durgin2013-04-106-30/+28
| | | | | | | | | | | | | | | | | | | | | | | | The tid returned by reads is ignored, and would make tracking writes internally more difficult by using the same id-space as them. Make read void and update all implementations. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
| * | ObjectCacher: deduplicate final part of flush_set()Josh Durgin2013-04-102-19/+19
| | | | | | | | | | | | | | | | | | | | | Both versions of flush_set() did the same thing. Move it into a helper called from both. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
* | | librbd: flush on diff_iterateSage Weil2013-04-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | The diff_iterate() tests fail when caching is enabled because recent writes aren't visible to listsnaps. Flush from diff_iterate to ensure that they are. Someday, maybe, we might make diff_iterate() inspect the cache contents to make this more efficient, but for now that is not necessary. Signed-off-by: Sage Weil <sage@inktank.com>
* | | Merge branch 'next' of https://github.com/ceph/ceph into nextJohn Wilkins2013-04-161-11/+11
|\ \ \
| * | | os/FileJournal: fix journal completion plug removalSage Weil2013-04-161-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We plug completions when transitioning from a full to non-full journal to ensure that we do not complete items before we have a stable journal starting point that is past the committed_thru marker. However, the order of the header update and completion queueing means that we never remove the plug if the journalq is empty--the seq test is always false. The result is very slow osd requests that only commit when we do a full sync. This bug was masked until recently by another issue, fixed in 170d4a3d794260476ecde1e5e2ee719b7cb3ffd1. The simple fix is to reorder the completion queuing before we update the new header. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
* | | | doc: Cherry-picked from master to next. Uses ceph-mds package during upgrade.John Wilkins2013-04-161-1/+1
| | | | | | | | | | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | | | doc: Cherry-picked from master to next. Rewrite of CloudStack document.John Wilkins2013-04-161-47/+113
| | | | | | | | | | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | | | doc: Cherry-picked from master to next. Updates config to use virtio.John Wilkins2013-04-161-2/+5
| | | | | | | | | | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | | | doc: Cherry-picked from master to next. Reorders ceph osd create.John Wilkins2013-04-161-9/+14
| | | | | | | | | | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | | | doc: Cherry picked from master to next. Adds comments on naming OSDs.John Wilkins2013-04-161-1/+24
|/ / / | | | | | | | | | Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* | | client: Fix inode remove from snaprealm raceSam Lang2013-04-161-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow on fix to b5ce4d0. Always remove the inode from the snaprealm's list of inodes_with_caps before the snaprealm ref is decremented (and the snaprealm potentially gets freed). Fixes #4694. Signed-off-by: Sam Lang <sam.lang@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* | | librbd: use initialized data for DiffIterateDiscard testSage Weil2013-04-151-0/+1
| | | | | | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
* | | librbd: print seed for all DiffIterate testsSage Weil2013-04-151-0/+8
| | | | | | | | | | | | | | | | | | This will aid debugging on failures, and give better coverage. Signed-off-by: Sage Weil <sage@inktank.com>
* | | Merge pull request #217 from alram/masterSage Weil2013-04-151-1/+1
|\ \ \ | | | | | | | | | | | | | | | | Fix: use absolute path with udev Reviewed-by: Sage Weil <sage@inktank.com>
| * | | Fix: use absolute path with udevAlexandre Marangone2013-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoids the following: udevd[61613]: failed to execute '/lib/udev/bash' 'bash -c 'while [ ! -e /dev/mapper/.... Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com>
* | | | qa: add workunit for running qemu-iotestsJosh Durgin2013-04-121-0/+22
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | This uses the old stand-alone qemu-iotests repo so it works with the version of qemu in Ubuntu 12.04. The tests depend tightly on qemu version, so to use later tests we'd need to install corresponding versions of qemu. Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
* | | mds: output error number when failing to load an MDSTableGreg Farnum2013-04-121-1/+1
| | | | | | | | | | | | Signed-off-by: Greg Farnum <greg@inktank.com>
* | | init-radosgw.sysv: New radosgw init file for rpm based systemsGary Lowell2013-04-113-4/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added init-radosgw.sys file for rpm based systems, added it to the tarball list in the makefile, and updated the specfile to install it. Also added the a dependency in ceph since it uses utility routes from that package (On debian systems these are packaged in ceph-common). Incorporated review comments from Alex. (Bug #4571) Signed-off-by: Gary Lowell <gary.lowell@inktank.com> Reviewed-by: Alexandre Marangone <alexandre.marangone@inktank.com>
* | | Merge pull request #213 from ceph/wip-sessionmap-4644Sam Lang2013-04-111-2/+2
|\ \ \ | | | | | | | | | | | | | | | | mds: fix session_info_t decoding Reviewed-by: Sam Lang <sam.lang@inktank.com>
| * | | mds: fix session_info_t decodingYan, Zheng2013-04-101-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | commit 0bcf2ac081 changes session_info_t's format, but there is a typo in the code that decodes old format. We also need to handle struct_v == 1, which had the same encoding but without the size guards (which is all handled by DECODE_START_LEGACY_COMPAT). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Greg Farnum <greg@inktank.com>
* | | Merge pull request #212 from ceph/wip-4451Gregory Farnum2013-04-116-29/+58
|\ \ \
| * | | mds: Delay export on missing inodes for reconnectSam Lang2013-04-115-17/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reconnect caps sent by the client on reconnect may not have inodes found in the inode cache until after clientreplay (when the client creates a new file, for example). Currently, we send an export for that cap to the client if we don't see an inode in the cache and path_is_mine() returns false (for example, if the client didn't send a path because the file was already unlinked). Instead, we want to delay handling of the reconnect cap until clientreplay completes. This patch modifies handle_client_reconnect() so that we don't assume the cap isn't ours if we don't have an inode for it, but instead delay recovery for later. An export cap message is only sent if the inode exists and the cap isn't ours (non-auth) during reconnect. If any remaining recovered caps exist in the recovered list once the mds goes active, we send export messages at that point. Also, after removing the path_is_mine check, MDCache::parallel_fetch_traverse_dir() needs to skip non-auth dirfrags. Fixes #4451. Signed-off-by: Sam Lang <sam.lang@inktank.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
| * | | client: Unify session close handlingSam Lang2013-04-111-12/+12
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If mds failure causes client reconnect while the client is unmounting, the client will send a session close request to the mds even if there are outstanding inodes in the cache waiting to receive flush_acks. This causes the mds to send back a session close message and the client closes the connection, so that when the mds tries to send flush acks back to the client, they get dropped, resulting in the client hanging on unmount. The pattern for this bug is: 1. mds restart 2. client sends session open request 3. client unmount sets unmounting flag and waits for flush_acks 4. mds sends session open reply 5. client sends session close request (because its unmounting) 6. mds sends session close, client closes connection 7. mds tries to send flush_acks, but drops them because the connection is gone This patch unifies the session close handling so that the client only sends a session close in unmount once all flush acks have been received. If the mds restarts during session close, the reconnect logic will kick the session close waiter so that session close requests are re-sent for session close replies not yet received. Signed-off-by: Sam Lang <sam.lang@inktank.com>
* | | OSD: make pg upgrade logging quietSamuel Just2013-04-101-2/+7
| | | | | | | | | | | | | | | | | | Fixes: #4701 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
* | | Merge branch 'wip_4654' into nextSamuel Just2013-04-104-10/+21
|\ \ \ | | | | | | | | | | | | | | | | Fixes: #wip_4654 Reviewed-by: Greg Farnum <greg@inktank.com>
| * | | FileJournal: start_seq is seq+1 if journalq.empty()Samuel Just2013-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is also the same as journaled_seq + 1 for writeahead journaling, but not for parallel journaling. Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | FileJournal: fix off by one error in committed_thruSamuel Just2013-04-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | journalq.front().first is the sequence number of the entry at journalq.front().second. Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | Journal: commits may not include all journaled seqsSamuel Just2013-04-101-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At one point, a commit had to drain the FileStore op queue. This is no longer the case. Consequently, the journal may have to wait more than one commit for the filestore to create a stable commit point at a particular sequence. Handling this requires two changes: 1) We cannot transition to FULL_WAIT until we receive a commit_start on a seq >= journaled_seq. 2) We cannot remove the journal completion plug until get a committed_thru on a seq >= header.start_seq at least as new as the oldest committed item in the journal. If on replay, the journal does not include fs_op_seq, we ignore it, which is fine since we won't have reported those entries committed! Signed-off-by: Samuel Just <sam.just@inktank.com>