| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes: #6088
Backport: bobtail, cuttlefish, dumpling
When posting an object it is possible to provide a key
name that refers to the original filename, however we
need to verify that in the end we don't end up with an
empty object name.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
|
|\
| |
| |
| |
| | |
ceph_rest_api.py: create own default for log_file
Reviewed-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
common/config thinks the default log_file for non-daemons should be "".
Override that so that the default is
/var/log/ceph/{cluster}-{name}.{pid}.log
since ceph-rest-api is more of a daemon than a client.
Fixes: #6099
Backport: dumpling
Signed-off-by: Dan Mick <dan.mick@inktank.com>
|
|\ \
| | |
| | |
| | |
| | | |
Fix readdir_r invocation
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
PATH_MAX isn't quite big enough.
Backport: dumpling, cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
|
|/ /
| |
| |
| |
| |
| |
| | |
The buffer needs to be big or else we're walk all over the stack.
Backport: dumpling, cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is possible that we begin the paxos recovery with an uncommitted
value for, say, commit 100. During last/collect we discover 100 has been
committed already. But also, another node provides an uncommitted value
for 101 with the same pn. Currently, we refuse to learn it, because the
pn is not strictly > than our current uncommitted pn... even though it is
the next last_committed+1 value that we need.
There are two possible fixes here:
- make this a >= as we can accept newer values from the same pn.
- discard our uncommitted value metadata when we commit the value.
Let's do both!
Fixes: #6090
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes: #6056
When removing a bucket metadata entry we first unlink the bucket
and then we remove the bucket entrypoint object. Originally
when unlinking the bucket we first overwrote the bucket entrypoint
entry marking it as 'unlinked'. However, this is not really needed
as we're just about to remove it. The original version triggered
a bug, as we needed to propagate the new header version first (which
we didn't do, so the subsequent bucket removal failed).
Reviewed-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Alfredo Deza <alfredo.deza@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|\ \
| | |
| | |
| | |
| | | |
PGMonitor: pg dump_stuck should respect --format (plain works fine)
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Dan Mick <dan.mick@inktank.com>
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some distro's have a lack of ltp-kernel packages and all we need is
fstress. This just modified the shell script to download/compile
fstress from source and copy it to the right location if it doesn't
currently exist where it is expected. It is a very small/quick
compile and currently only SLES and debian do not have it already.
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Sandon Van Ness <sandon@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Moving the watch/notify init before the zone init,
as we might need to send a notification.
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the parent xattrs of active inodes that the mds attempts to open
during rejoin lack pool info (struct_v < 5), this field will be filled
in with -1, causing the mds to retry fetching a backtrace with a pool
number that matches the expected value, which fails and causes the
err==-ENOENT branch to be taken and retry pool 1, which succeeds, but
with pool -1, and so keeps on bouncing between the two retry cases
forever.
This patch arranges for the mds to go along with pool -1 instead of
insisting that it be refetched, enabling it to complete recovery
instead of eating cpu, network bandwidth and metadata osd's resources
like there's no tomorrow, in what AFAICT is an infinite and very busy
loop.
This is not a new problem: I've had it even before upgrading from
Cuttlefish to Dumpling, I'd just never managed to track it down, and
force-unmounting the filesystem and then restarting the mds was an
easier (if inconvenient) work-around, particularly because it always
hit when the filesystem was under active, heavy-ish use (or there
wouldn't be much reason for caps recovery ;-)
There are two issues not addressed in this patch, however. One is
that nothing seems to proactively update the parent xattr when it is
found to be outdated, so it remains out of date forever. Not even
renaming top-level directories causes the xattrs to be recursively
rewritten. AFAICT that's a bug.
The other is that inodes that don't have a parent xattr (created by
even older versions of ceph) are reported as non-existing in the mds
rejoin message, because the absence of the parent xattr is signaled as
a missing inode (?failed to reconnect caps for missing inodes?). I
suppose this may cause more serious recovery problems.
I suppose a global pass over the filesystem tree updating parent
xattrs that are out-of-date would be desirable, if we find any parent
xattrs still lacking current information; it might make sense to
activate it as a background thread from the backtrace decoding
function, when it finds a parent xattr that's too out-of-date, or as a
separate client (ceph-fsck?).
Backport: dumpling, cuttlefish
Signed-off-by: Alexandre Oliva <oliva@gnu.org>
Reviewed-by: Zheng, Yan <zheng.z.yan@intel.com>
|
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The registering flag no longer exists, and registered was using the
wrong property due to a copy-paste error.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Plain Ops that haven't finished yet need to be resent if the osdmap
transitions from full or paused to unpaused. If these Ops are
triggered by LingerOps, they will be cancelled instead (since
should_resend = false), but the LingerOps that triggered them will not
be resent.
Fix this by checking the registered flag for all linger ops, and
resending any of them that aren't paused anymore.
Fixes: #6070
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes: #6046
We were initializing the watch-notify (through the cache
init) before reading the zone info which was much too
early, as we didn't have the control pool name yet. Now
simplifying init/cleanup a bit, cache doesn't call watch/notify
init and cleanup directly, but rather states its need
through a virtual callback.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|\
| |
| |
| |
| | |
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We do not try to merge rx buffers currently. Make that explicit and
documented in the code that it is not supported. (Otherwise the
last_read_tid values will get lost and read results won't get applied
to the cache properly.)
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Consider a sequence like:
1- start read on 100~200
100~200 state rx
2- truncate to 200
100~100 state rx
3- start read on 200~200
100~100 state rx
200~200 state rx
4- get 100~200 read result
Currently this makes us crash on
osdc/ObjectCacher.cc: 738: FAILED assert(bh->length() <= start+(loff_t)length-opos)
when processing the second 200~200 bufferhead (it is too big). The
larger issue, though, is that we should not be looking at this data at
all; it has been truncated away.
Fix this by marking each rx buffer with the read request that is sent to
fill it, and only fill it from that read request. Then the first reply
will fill the first 100~100 extend but not touch the other extent; the
second read will do that.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| | |
client/fuse_ll.cc: In function 'void invalidate_cb(void*, vinodeno_t, int64_t, int64_t)':
warning: client/fuse_ll.cc:540: unused variable 'fino'
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In file included from json_spirit/json_spirit_writer.cpp:7:0:
json_spirit/json_spirit_writer_template.h: In function 'String_type json_spirit::non_printable_to_string(unsigned int)':
json_spirit/json_spirit_writer_template.h:37:50: warning: typedef 'Char_type' locally defined but not used [-Wunused-local-typedefs]
typedef typename String_type::value_type Char_type;
(Also, ha ha, this file uses \r\n.)
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \
| | |
| | |
| | |
| | |
| | | |
mon/PGMap: OSD byte counts 4x too large (conversion to bytes overzealous)
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
| | |
| | |
| | |
| | |
| | | |
Fixes: #6049
Signed-off-by: Dan Mick <dan.mick@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If we store any new state, we need to refresh the services, even if we
are still in the midst of Paxos recovery. This is because the
subscription path will share any committed state even when paxos is
still recovering. This prevents a race like:
- we have maps 10..20
- we drop out of quorum
- we are elected leader, paxos recovery starts
- we get one LAST with committed states that trim maps 10..15
- we get a subscribe for map 10..20
- we crash because 10 is no longer on disk because the PaxosService
is out of sync with the on-disk state.
Fixes: #6045
Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This avoid duplicated code by using the helper created exactly for this
purpose.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| | |
This happens after we connect, which means we get ENOSYS always.
Instead, parse_env inside the normal setup method, which had the added
benefit of being able to debug these tests.
Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise the log_oid will be non-empty and the next
boot will cause us to try to upgrade again.
Fixes: #6057
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This is a debug check which may be causing excessive
cpu usage.
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| |
| |
| |
| |
| |
| | |
Fixes: #6052
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
|
|\ \ |
|
| |\ \
| | | |
| | | | |
Fix documentation issues
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Marked the following keys as deprecated since v0.65:
- filestore flusher
- filestore flusher max fds
- filestore sync flush
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
|
| |/ /
| | |
| | |
| | |
| | |
| | | |
Fix names of cephx signature keys.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
|
| |\ \
| | | |
| | | |
| | | |
| | | | |
config_opts: add two ceph-rest-api-only variables for convenience
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
These aren't used by the C++ code at all, but in order for
rados_conf_get to find them, they need to be listed. They're
consumed by ceph_rest_api.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \ \ \
| | | | |
| | | | | |
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is unlikely to be noticed by anybody, but it is a big change. Document
in the PendingReleaseNotes and bump up the librados minor version number
to 68.
Signed-off-by: Greg Farnum <greg@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The error message helpfully references the -m and -c CLI options for
specifying monitors, but this code can be invoked from non-core librados
client applications so that's unfortunately not kosher. Remove the
reference.
Fixes #5979.
Signed-off-by: Greg Farnum <greg@inktank.com>
|
|\ \ \ \ \
| |/ / / /
|/| | | | |
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
rearrange erasure code documents
Reviewed-by: Samuel Just <sam.just@inktank.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Signed-off-by: Loic Dachary <loic@dachary.org>
|
| |/ / / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Explains how objects are stored and used in erasure coded pools. It is
the result of discussions that occured on the ceph-devel mailing list
around june 2013. The rationale behind each change can be found in the
archive of the mailing list. For instance, the coding of the chunk
number with the object or the decision to decode using any K chunks
instead of trying to fetch the data chunks when possible because it
would allow simple concatenation when systematic codes are used.
http://tracker.ceph.com/issues/4929 refs #4929
Signed-off-by: Loic Dachary <loic@dachary.org>
|