delta/ceph.git - github.com: ceph/ceph.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	asdfwip-before	Sage Weil	2013-07-21	1	-0/+1
\|
*	mds: stay in SCAN state in file_eval	Sage Weil	2013-05-29	1	-0/+4
\| \| \| \| \| \| \|	If we are in the SCAN state, stay there until the recovery finishes. Do not jump to another state from file_eval(). Signed-off-by: Sage Weil <sage@inktank.com>
*	Makefile: include new message header files	Sage Weil	2013-05-29	1	-0/+2
\| \| \| \|	Signed-off-by: Sage Weil <sage@inktank.com>
*	Merge remote-tracking branch 'yan/wip-mds'	Sage Weil	2013-05-29	32	-703/+1534
\|\ \| \| \| \| \| \| \| \| \| \| \| \|	Reviewed-by: Sage Weil <sage@inktank.com> Conflicts: src/mds/MDCache.cc
\| *	mds: use "open-by-ino" function to open remote link	Yan, Zheng	2013-05-28	3	-20/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also add a new config option "mds_open_remote_link_mode". The anchor approach is used by default. If mode is non-zero, use the open-by-ino function. In case open-by-ino function fails, if mode is 1, retry using the anchor approach, otherwise trigger assertion. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: open missing cap inodes	Yan, Zheng	2013-05-28	6	-87/+185
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a recovering MDS enters reconnect stage, client sends reconnect messages to it. The message lists open files, their path, and issued caps. If an inode is not in the cache, the recovering MDS uses the path client provides to determine if it's the inode's authority. If not, the recovering MDS exports the inode's caps to other MDS. The issue here is that the path client provides isn't always accuracy. The fix is use recently added "open inode by ino" function to open any missing cap inodes when the recovering MDS enters rejoin stage. Send cache rejoin messages to other MDS after all caps' authorities are determined. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: bump the protocol version	Yan, Zheng	2013-05-28	1	-1/+1
\| \| \| \| \| \| \| \|	Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: open inode by ino	Yan, Zheng	2013-05-28	9	-7/+612
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds "open-by-ino" helper. It utilizes backtrace to find inode's path and open the inode. The algorithm looks like: 1. Check MDS peers. If any MDS has the inode in its cache, goto step 6. 2. Fetch backtrace. If backtrace was previously fetched and get the same backtrace again, return -EIO. 3. Traverse the path in backtrace. If the inode is found, goto step 6; if non-auth dirfrag is encountered, goto next step. If fail to find the inode in its parent dir, goto step 1. 4. Request MDS peers to traverse the path in backtrace. If the inode is found, goto step 6. If MDS peer encounters non-auth dirfrag, it stops traversing. If any MDS peer fails to find the inode in its parent dir, goto step 1. 5. Use the same algorithm to open the inode's parent. Goto step 3 if succeeds; goto step 1 if fails. 6. return the inode's auth MDS ID. The algorithm has two main assumptions: 1. If an inode is in its auth MDS's cache, its on-disk backtrace can be out of date. 2. If an inode is not in any MDS's cache, its on-disk backtrace must be up to date. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: move fetch_backtrace() to class MDCache	Yan, Zheng	2013-05-28	4	-68/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We may want to fetch backtrace while corresponding inode isn't instantiated. MDCache::fetch_backtrace() will be used by later patch. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: remove old backtrace handling	Yan, Zheng	2013-05-28	5	-215/+14
\| \| \| \| \| \| \| \|	Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: update backtraces when unlinking inodes	Yan, Zheng	2013-05-28	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \|	unlink moves inodes to stray dir, it's a special form of rename. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: bring back old style backtrace handling	Yan, Zheng	2013-05-28	9	-11/+180
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To queue a backtrace update, current code allocates a BacktraceInfo structure and adds it to log segment's update_backtraces list. The main issue of this approach is that BacktraceInfo is independent from inode. It's very inconvenient to find pending backtrace updates for given inodes. When exporting inodes from one MDS to another MDS, we need find and cancel all pending backtrace updates on the source MDS. This patch brings back old backtrace handling code and adapts it for the current backtrace format. The basic idea behind of the old code is: when an inode's backtrace becomes dirty, add the inode to log segment's dirty_parent_inodes list. Compare to the current backtrace handling, another difference is that backtrace update is journalled in EMetaBlob::full_bit Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: rename last_renamed_version to backtrace_version	Yan, Zheng	2013-05-28	5	-17/+21
\| \| \| \| \| \| \| \|	Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: journal backtrace update in EMetaBlob::fullbit	Yan, Zheng	2013-05-28	2	-22/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current way to journal backtrace update is set EMetaBlob::update_bt to true. The problem is that an EMetaBlob can include several inodes. If an EMetaBlob's update_bt is true, journal replay code has to queue backtrace updates for all inodes in the EMetaBlob. This patch adds two new flags to class EMetaBlob::fullbit, make it be able to journal backtrace update. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: reorder EMetaBlob::add_primary_dentry's parameters	Yan, Zheng	2013-05-28	6	-33/+33
\| \| \| \| \| \| \| \| \| \| \| \|	prepare for adding new state parameter such as 'dirty_parent' Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: warn on unconnected snap realms	Yan, Zheng	2013-05-28	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When there are more than one active MDS, restarting MDS triggers assertion "reconnected_snaprealms.empty()" quite often. If there is no snapshot in the FS, the items left in reconnected_snaprealms should be other MDS' mdsdir. I think it's harmless. If there are snapshots in the FS, the assertion probably can catch real bugs. But at present, snapshot feature is broken, fixing it is non-trivial. So replace the assertion with a warning. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: slient MDCache::trim_non_auth()	Yan, Zheng	2013-05-28	1	-20/+6
\| \| \| \| \| \| \| \| \| \| \| \|	No need to output the function's debug message to console. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: fix check for base inode discovery	Yan, Zheng	2013-05-28	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a MDiscover message is for discovering base inode, want_base_dir should be false, path should be empty. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: Fix replica's allowed caps for filelock in SYNC_LOCK state	Yan, Zheng	2013-05-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For replica, filelock in LOCK_LOCK state doesn't allow Fc cap. So filelock in LOCK_SYNC_LOCK/LOCK_EXCL_LOCK state shouldn't allow Fc cap either. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: defer releasing cap if necessary	Yan, Zheng	2013-05-28	2	-32/+58
\| \| \| \| \| \| \| \| \| \| \| \|	When inode is freezing or frozen, we defer processing MClientCaps messages and cap release embedded in requests. The same deferral logical should also cover MClientCapRelease messages.
\| *	mds: fix Locker::request_inode_file_caps()	Yan, Zheng	2013-05-28	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After sending cache rejoin message, replica need notify auth MDS when cap_wanted changes. But it can send MInodeFileCaps message only after receiving auth MDS' rejoin ack. Locker::request_inode_file_caps() has correct wait logical, but it skips sending MInodeFileCaps message if the auth MDS is still in rejoin state. The fix is defer sending MInodeFileCaps message until the auth MDS is active. It makes the function's wait logical less tricky. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: notify auth MDS when cap_wanted changes	Yan, Zheng	2013-05-28	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	So the auth MDS can choose locks' states base on our cap_wanted. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: export CInode:mds_caps_wanted	Yan, Zheng	2013-05-28	2	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	CInode:mds_caps_wanted is used to keep track of caps wanted by non-auth MDS. The auth MDS checks it when choosing locks' states. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: export CInode::STATE_NEEDSRECOVER	Yan, Zheng	2013-05-28	4	-14/+21
\| \| \| \| \| \| \| \|	Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: send slave request after target MDS is active	Yan, Zheng	2013-05-28	3	-13/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when failure of peer is detected, MDCache::handle_mds_failure() checks if there are requests waiting for slave replies from the failed peer, and adds them to the "wait for active peer" list. The "retry request" logical only covers slave requests sent before MDCache::handle_mds_failure() is called. If a slave request was sent while peer isn't up, we wait for its reply forever. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: unfreeze inode after rename rollback finishes	Yan, Zheng	2013-05-28	2	-27/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	we should not wake up the unfreeze waiter while the inode is still linked to a non-auth dirfrag. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: remove buggy cache rejoin code	Yan, Zheng	2013-05-28	1	-20/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I previously added code to handle a corner case of cache rejoin: entire subtree, together with the inode subtree root belongs to, were trimmed between sending cache rejoin and receiving rejoin ack. In this case, we should send cache expire message to the subtree's auth MDS. But the code is complete broken, remove it temporarily. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: fix typo in Server::do_rename_rollback	Yan, Zheng	2013-05-28	1	-2/+2
\| \| \| \| \| \| \| \|	Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: fix import cancel race	Yan, Zheng	2013-05-28	2	-32/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current code uses import state to detect obsolete import discover/prep message. it does not work for the case: cancel a subtree import, import the same subtree again, the discover/prep message for the first import get dispatched. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: fix straydn race	Yan, Zheng	2013-05-28	4	-19/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For unlink/rename request, the target dentry's linkage may change before all locks are acquired. So we need check if the existing stray dentry is valid. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: fix slave commit tracking	Yan, Zheng	2013-05-28	3	-23/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MDS may crash after journalling a slave commit, but before sending commit ack to the master. Later when the MDS restarts, it will not send commit ack to the master. So the master waits for the commit ack forever. The fix is remove failed MDS from requests' uncommitted slave list. When failed MDS recovers, its resolve message will tell the master which slave requests are not committed. The master will re-add the recovering MDS to requests' uncommitted slave list if necessary. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: fix uncommitted master wait	Yan, Zheng	2013-05-28	2	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We may add new waiter while the master is committing. so we should take the waiters and wake up them when the master is committed. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: adjust subtree auth if import aborts in PREPPED state	Yan, Zheng	2013-05-28	1	-1/+7
\| \| \| \| \| \| \| \|	Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: don't stop at export bounds when journaling dir context	Yan, Zheng	2013-05-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We only journal the finish of exporting subtree, so we shouldn't consider export bounds as subtree root. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: fix underwater dentry cleanup	Yan, Zheng	2013-05-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the underwater dentry is a remove link, we shouldn't mark the inode clean Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| *	mds: journal new subtrees created by rename	Yan, Zheng	2013-05-28	2	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \|	this avoids creating bare dirfrags during journal replay. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* \|	Merge pull request #329 from javacruft/wip-fuse-deps	Sage Weil	2013-05-29	1	-2/+2
\|\ \ \| \| \| \| \| \|	Use new fuse package instead of fuse-utils
\| * \|	Use new fuse package instead of fuse-utils	James Page	2013-05-29	1	-2/+2
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fuse-utils package was deprecated a while ago. Switch the primary dependency for fuse tools to use the preferred 'fuse' package. Signed-off-by: James Page <james.page@ubuntu.com>
* \|	mon: disable tdump by default	Sage Weil	2013-05-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Grr. Signed-off-by: Sage Weil <sage@inktank.com>
* \|	Merge remote-tracking branch 'gh/last'	Sage Weil	2013-05-28	12	-70/+111
\|\ \
\| * \|	v0.63v0.63	Gary Lowell	2013-05-28	2	-1/+7
\| \| \|
\| * \|	HashIndex: sync top directory during start_split,merge,col_split	Samuel Just	2013-05-28	1	-3/+12
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, the links might be ordered after the in progress operation tag write. We need the in progress operation tag to correctly recover from an interrupted merge, split, or col_split. Fixes: #5180 Backport: cuttlefish, bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
\| *	Merge branch 'wip_scrub_tphandle' into next	Samuel Just	2013-05-23	4	-33/+66
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Fixes: #5159 Reviewed-by: Sage Weil <sage@inktank.com>
\| \| *	PG: ping tphandle during omap loop as well	Samuel Just	2013-05-23	2	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Samuel Just <sam.just@inktank.com>
\| \| *	PG: reset timeout in _scan_list for each object, read chunk	Samuel Just	2013-05-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Samuel Just <sam.just@inktank.com>
\| \| *	OSD,PG: pass tphandle down to _scan_list	Samuel Just	2013-05-23	3	-33/+56
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Samuel Just <sam.just@inktank.com>
\| * \|	rgw: iterate usage entries from correct entry	Yehuda Sadeh	2013-05-23	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes: #5152 When iterating through usage entries, and when user id was provided, we started at the user's first entry and not from the entry indexed by the request start time. This commit fixes the issue. Backport: bobtail Signed-off-by: Yehuda Sadeh <yehuda@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
\| * \|	mon: drop unnecessary conditionals	Sage Weil	2013-05-23	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Sage Weil <sage@inktank.com>
\| * \|	Merge pull request #311 from ceph/wip-5102	Sage Weil	2013-05-23	4	-30/+12
\| \|\ \ \| \| \| \| \| \| \| \|	Reviewed-by: Sage Weil <sage@inktank.com>
\| \| * \|	mon: Paxos: get rid of the 'prepare_bootstrap()' mechanism	Joao Eduardo Luis	2013-05-22	4	-26/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't need it after all. If we are in the middle of some proposal, then we guarantee that said proposal is likely to be retried. If we haven't yet proposed, then it's forever more likely that a client will eventually retry the message that triggered this proposal. Basically, this mechanism attempted at fixing a non-problem, and was in fact triggering some unforeseen issues that would have required increasing the code complexity for no good reason. Fixes: #5102 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>