| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|
|
|
| |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|
|
|
| |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|
|
|
| |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|
|
|
| |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|
|
|
| |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|
|
|
| |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|
|
|
| |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|\
| |
| |
| |
| |
| | |
add ceph_mon_kvstore_fix to RPM/Debian packaging
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
|
|/
|
|
| |
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
|
|
|
|
|
|
|
|
|
|
| |
This tool is to be used to fix the cause of #4521, and it
will take an old-format store and convert all the osdmap's
full versions to the new-format k/v store.
Fixes: #4521
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|
|
|
|
|
|
|
| |
Initialize PG::flushed in constructor with false as
described in doc/dev/osd_internals/pg.rst .
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit fb840c8ff75b0c66dfeed48e8558542fe3da4c24)
|
|\
| |
| |
| |
| | |
os: bring leveldbstore options up to date
Reviewed-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that we can set up the LevelDB options internally, provide
config options on the OSD and the Monitor. We leave the OSD values
at the defaults for now as they're performance-sensitive, but we
set new values on the Monitor so that it can scale to large PGMaps.
(Previously there were issues with large PGMaps taking forever to write;
these changes to the use of compression and the default block and
write buffers counteract them.)
Since we pass these variables through, users who are interested in
doing so now can test and tune them more appropriately.
Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <greg@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
LevelDB has a lot of options which we don't implement right now. Add
an options struct to the LevelDBStore which users can access as they
wish in order to set values different from the defaults.
This will let us set various size values, as well as turning on
caching or bloom filter read optimizations.
Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <greg@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by caleb miles <caleb.miles@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \
| | |
| | |
| | |
| | | |
keep write responses to clones in order
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
RADOS returns writes to the same object in the same order. The
ObjectCacher relies on this assumption to make sure previous writes
are complete and maintain consistency. Reads, however, may be
reordered with respect to each other. When writing to an rbd clone,
reads to the parent must be performed when the object does not exist
in the child yet. These reads may be reordered, resulting in the
original writes being reordered. This breaks the assmuptions of the
ObjectCacher, causing an assert to fail.
To fix this, keep a per-object queue of outstanding writes to an
object in the LibrbdWriteback handler, and finish them in the order in
which they were sent.
Fixes: #4531
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
An int could be much smaller, leading to overflow and bad behavior.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The tid returned by reads is ignored, and would make tracking writes
internally more difficult by using the same id-space as them. Make read
void and update all implementations.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Both versions of flush_set() did the same thing. Move it into a
helper called from both.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The diff_iterate() tests fail when caching is enabled because recent writes
aren't visible to listsnaps. Flush from diff_iterate to ensure that they
are. Someday, maybe, we might make diff_iterate() inspect the cache
contents to make this more efficient, but for now that is not necessary.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We plug completions when transitioning from a full to non-full journal
to ensure that we do not complete items before we have a stable journal
starting point that is past the committed_thru marker. However, the order
of the header update and completion queueing means that we never remove
the plug if the journalq is empty--the seq test is always false. The
result is very slow osd requests that only commit when we do a full sync.
This bug was masked until recently by another issue, fixed in
170d4a3d794260476ecde1e5e2ee719b7cb3ffd1.
The simple fix is to reorder the completion queuing before we update the
new header.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
|/ / /
| | |
| | |
| | | |
Signed-off-by: John Wilkins <john.wilkins@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This is a follow on fix to b5ce4d0. Always remove the inode from the
snaprealm's list of inodes_with_caps before the snaprealm ref is
decremented (and the snaprealm potentially gets freed).
Fixes #4694.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
This will aid debugging on failures, and give better coverage.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | | |
Fix: use absolute path with udev
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Avoids the following: udevd[61613]: failed to execute '/lib/udev/bash'
'bash -c 'while [ ! -e /dev/mapper/....
Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com>
|
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This uses the old stand-alone qemu-iotests repo so it works with the
version of qemu in Ubuntu 12.04. The tests depend tightly on qemu
version, so to use later tests we'd need to install corresponding
versions of qemu.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Greg Farnum <greg@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Added init-radosgw.sys file for rpm based systems, added it to
the tarball list in the makefile, and updated the specfile to
install it. Also added the a dependency in ceph since it uses
utility routes from that package (On debian systems these are
packaged in ceph-common). Incorporated review comments from
Alex. (Bug #4571)
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Reviewed-by: Alexandre Marangone <alexandre.marangone@inktank.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | | |
mds: fix session_info_t decoding
Reviewed-by: Sam Lang <sam.lang@inktank.com>
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
commit 0bcf2ac081 changes session_info_t's format, but there is
a typo in the code that decodes old format. We also need to
handle struct_v == 1, which had the same encoding but without
the size guards (which is all handled by DECODE_START_LEGACY_COMPAT).
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
|
|\ \ \ |
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The reconnect caps sent by the client on reconnect may not have
inodes found in the inode cache until after clientreplay (when
the client creates a new file, for example). Currently, we send an
export for that cap to the client if we don't see an inode in the cache
and path_is_mine() returns false (for example, if the client didn't
send a path because the file was already unlinked).
Instead, we want to delay handling of the reconnect cap until
clientreplay completes.
This patch modifies handle_client_reconnect() so that we don't assume
the cap isn't ours if we don't have an inode for it, but instead delay
recovery for later. An export cap message is only sent if the inode exists
and the cap isn't ours (non-auth) during reconnect. If any remaining
recovered caps exist in the recovered list once the mds goes active, we
send export messages at that point.
Also, after removing the path_is_mine check,
MDCache::parallel_fetch_traverse_dir() needs to skip non-auth dirfrags.
Fixes #4451.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If mds failure causes client reconnect while the
client is unmounting, the client will send a session
close request to the mds even if there are outstanding
inodes in the cache waiting to receive flush_acks. This
causes the mds to send back a session close message and
the client closes the connection, so that when the mds tries
to send flush acks back to the client, they get dropped, resulting
in the client hanging on unmount. The pattern for this bug is:
1. mds restart
2. client sends session open request
3. client unmount sets unmounting flag and waits for flush_acks
4. mds sends session open reply
5. client sends session close request (because its unmounting)
6. mds sends session close, client closes connection
7. mds tries to send flush_acks, but drops them because the connection
is gone
This patch unifies the session close handling so that the client
only sends a session close in unmount once all flush acks have been
received. If the mds restarts during session close, the reconnect
logic will kick the session close waiter so that session close requests
are re-sent for session close replies not yet received.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Fixes: #4701
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | | |
Fixes: #wip_4654
Reviewed-by: Greg Farnum <greg@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is also the same as journaled_seq + 1 for writeahead
journaling, but not for parallel journaling.
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
journalq.front().first is the sequence number of the entry
at journalq.front().second.
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
At one point, a commit had to drain the FileStore op
queue. This is no longer the case. Consequently, the
journal may have to wait more than one commit for the
filestore to create a stable commit point at a particular
sequence. Handling this requires two changes:
1) We cannot transition to FULL_WAIT until we receive
a commit_start on a seq >= journaled_seq.
2) We cannot remove the journal completion plug until get
a committed_thru on a seq >= header.start_seq at least as
new as the oldest committed item in the journal. If on
replay, the journal does not include fs_op_seq, we ignore
it, which is fine since we won't have reported those
entries committed!
Signed-off-by: Samuel Just <sam.just@inktank.com>
|