| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch adds "open-by-ino" helper. It utilizes backtrace to find
inode's path and open the inode. The algorithm looks like:
1. Check MDS peers. If any MDS has the inode in its cache, goto step 6.
2. Fetch backtrace. If backtrace was previously fetched and get the
same backtrace again, return -EIO.
3. Traverse the path in backtrace. If the inode is found, goto step 6;
if non-auth dirfrag is encountered, goto next step. If fail to find
the inode in its parent dir, goto step 1.
4. Request MDS peers to traverse the path in backtrace. If the inode
is found, goto step 6. If MDS peer encounters non-auth dirfrag, it
stops traversing. If any MDS peer fails to find the inode in its
parent dir, goto step 1.
5. Use the same algorithm to open the inode's parent. Goto step 3 if
succeeds; goto step 1 if fails.
6. return the inode's auth MDS ID.
The algorithm has two main assumptions:
1. If an inode is in its auth MDS's cache, its on-disk backtrace
can be out of date.
2. If an inode is not in any MDS's cache, its on-disk backtrace
must be up to date.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We may want to fetch backtrace while corresponding inode isn't
instantiated. MDCache::fetch_backtrace() will be used by later
patch.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
unlink moves inodes to stray dir, it's a special form of rename.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
To queue a backtrace update, current code allocates a BacktraceInfo
structure and adds it to log segment's update_backtraces list. The
main issue of this approach is that BacktraceInfo is independent
from inode. It's very inconvenient to find pending backtrace updates
for given inodes. When exporting inodes from one MDS to another
MDS, we need find and cancel all pending backtrace updates on the
source MDS.
This patch brings back old backtrace handling code and adapts it
for the current backtrace format. The basic idea behind of the old
code is: when an inode's backtrace becomes dirty, add the inode to
log segment's dirty_parent_inodes list.
Compare to the current backtrace handling, another difference is
that backtrace update is journalled in EMetaBlob::full_bit
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Current way to journal backtrace update is set EMetaBlob::update_bt
to true. The problem is that an EMetaBlob can include several inodes.
If an EMetaBlob's update_bt is true, journal replay code has to queue
backtrace updates for all inodes in the EMetaBlob.
This patch adds two new flags to class EMetaBlob::fullbit, make it be
able to journal backtrace update.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
prepare for adding new state parameter such as 'dirty_parent'
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When there are more than one active MDS, restarting MDS triggers
assertion "reconnected_snaprealms.empty()" quite often. If there
is no snapshot in the FS, the items left in reconnected_snaprealms
should be other MDS' mdsdir. I think it's harmless.
If there are snapshots in the FS, the assertion probably can catch
real bugs. But at present, snapshot feature is broken, fixing it is
non-trivial. So replace the assertion with a warning.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
No need to output the function's debug message to console.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If a MDiscover message is for discovering base inode, want_base_dir
should be false, path should be empty.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For replica, filelock in LOCK_LOCK state doesn't allow Fc cap. So
filelock in LOCK_SYNC_LOCK/LOCK_EXCL_LOCK state shouldn't allow Fc
cap either.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When inode is freezing or frozen, we defer processing MClientCaps
messages and cap release embedded in requests. The same deferral
logical should also cover MClientCapRelease messages.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
After sending cache rejoin message, replica need notify auth MDS when
cap_wanted changes. But it can send MInodeFileCaps message only after
receiving auth MDS' rejoin ack. Locker::request_inode_file_caps() has
correct wait logical, but it skips sending MInodeFileCaps message if
the auth MDS is still in rejoin state.
The fix is defer sending MInodeFileCaps message until the auth MDS
is active. It makes the function's wait logical less tricky.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
So the auth MDS can choose locks' states base on our cap_wanted.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
CInode:mds_caps_wanted is used to keep track of caps wanted by non-auth
MDS. The auth MDS checks it when choosing locks' states.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
when failure of peer is detected, MDCache::handle_mds_failure()
checks if there are requests waiting for slave replies from the
failed peer, and adds them to the "wait for active peer" list.
The "retry request" logical only covers slave requests sent before
MDCache::handle_mds_failure() is called. If a slave request was
sent while peer isn't up, we wait for its reply forever.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
we should not wake up the unfreeze waiter while the inode is still
linked to a non-auth dirfrag.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
I previously added code to handle a corner case of cache rejoin:
entire subtree, together with the inode subtree root belongs to,
were trimmed between sending cache rejoin and receiving rejoin ack.
In this case, we should send cache expire message to the subtree's
auth MDS. But the code is complete broken, remove it temporarily.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Current code uses import state to detect obsolete import discover/prep
message. it does not work for the case: cancel a subtree import, import
the same subtree again, the discover/prep message for the first import
get dispatched.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For unlink/rename request, the target dentry's linkage may change
before all locks are acquired. So we need check if the existing stray
dentry is valid.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
MDS may crash after journalling a slave commit, but before sending
commit ack to the master. Later when the MDS restarts, it will not
send commit ack to the master. So the master waits for the commit
ack forever. The fix is remove failed MDS from requests' uncommitted
slave list. When failed MDS recovers, its resolve message will tell
the master which slave requests are not committed. The master will
re-add the recovering MDS to requests' uncommitted slave list if
necessary.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We may add new waiter while the master is committing. so we should
take the waiters and wake up them when the master is committed.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We only journal the finish of exporting subtree, so we shouldn't
consider export bounds as subtree root.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If the underwater dentry is a remove link, we shouldn't mark the
inode clean
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
this avoids creating bare dirfrags during journal replay.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
|
|\ \ \ \
| |_|_|/
|/| | | |
Use new fuse package instead of fuse-utils
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The fuse-utils package was deprecated a while ago.
Switch the primary dependency for fuse tools to use
the preferred 'fuse' package.
Signed-off-by: James Page <james.page@ubuntu.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Grr.
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \ \ |
|
| | | | |
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Otherwise, the links might be ordered after the in progress
operation tag write. We need the in progress operation tag to
correctly recover from an interrupted merge, split, or col_split.
Fixes: #5180
Backport: cuttlefish, bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | | |
Fixes: #5159
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Samuel Just <sam.just@inktank.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fixes: #5152
When iterating through usage entries, and when user id was
provided, we started at the user's first entry and not from
the entry indexed by the request start time.
This commit fixes the issue.
Backport: bobtail
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Sage Weil <sage@inktank.com>
|
| |\ \ \
| | | | |
| | | | | |
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We don't need it after all. If we are in the middle of some proposal,
then we guarantee that said proposal is likely to be retried. If we
haven't yet proposed, then it's forever more likely that a client will
eventually retry the message that triggered this proposal.
Basically, this mechanism attempted at fixing a non-problem, and was in
fact triggering some unforeseen issues that would have required increasing
the code complexity for no good reason.
Fixes: #5102
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
By finishing these Contexts, we make sure the Contexts they enclose (to be
called once the proposal goes through) will behave as their were initially
planned: for instance, a C_Command() may retry the command if a -EAGAIN
is passed to 'finish_contexts', while a C_Trimmed() will simply set
'going_to_trim' to false.
This aims at fixing at least a bug in which Paxos will stop trimming if an
election is triggered while a trim is queued but not yet finished. Such
happens because it is the C_Trimmed() context that is responsible for
resetting 'going_to_trim' back to false. By clearing all the contexts on
the proposal list instead of finishing them, we stay forever unable to
trim Paxos again as 'going_to_trim' will stay True till the end of time as
we know it.
Fixes: #4895
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | | |
Reviewed-by: Samuel Just <sam.just@inktank.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fix bug introduced in 27381c0c6259ac89f5f9c592b4bfb585937a1cfc.
Signed-off-by: Sage Weil <sage@inktank.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fix a few bugs introduced by 27381c0c6259ac89f5f9c592b4bfb585937a1cfc:
- check against both front and back cons; either one may have failed.
- close *both* front and back before reopening either. this is
overkill, but slightly simpler code.
- fix leak of con when marking down
- handle race against osdmap update and note_down_osd
Fixes: #5172
Signed-off-by: Sage Weil <sage@inktank.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | | |
Fix some smaller Python issues
|