summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* wip livenessinfo, miscwip-osd-readholeSage Weil2012-11-303-42/+46
|
* wipSage Weil2012-11-306-8/+88
|
* osd: include up_from in MOSDPingSage Weil2012-11-302-5/+14
| | | | | | This identifies *which* instances of osd.NNN we are. Signed-off-by: Sage Weil <sage@inktank.com>
* more pg_interval_t changes... end_stamp, primary_up_fromSage Weil2012-11-302-9/+18
|
* osd: HeartbeatSession -> HeartbeatClientSessionSage Weil2012-11-302-4/+4
| | | | | | | This session is associated with the client-side (outing ping request) Connections. Signed-off-by: Sage Weil <sage@inktank.com>
* osd: add start_stamp to pg_interval_tSage Weil2012-11-302-2/+8
| | | | Signed-off-by: Sage Weil <sage@inktank.com>
* hb msgr trainwreckSage Weil2012-11-303-15/+38
|
* Merge remote-tracking branch 'gh/wip-osd-msgr'Sage Weil2012-11-307-147/+298
|\
| * OSDService: make messengers privateSamuel Just2012-11-303-24/+35
| | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * osd/: make OSDService messenger helpers return ConnectionRefSamuel Just2012-11-305-54/+54
| | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * Revert "osd: fix leak of heartbeat con on reset"Sage Weil2012-11-291-2/+0
| | | | | | | | This reverts commit b31a99abda75b9170a5805b02944a0c0c78245b7.
| * osd: move next_osdmap under separate lockSage Weil2012-11-292-13/+29
| | | | | | | | | | | | | | | | | | | | | | It doesn't actually interfere with publish_lock, and the current osdmap ref. Document what is going on. Always preceed publish_map() with one or more pre_publish_map() calls. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: fix leak of heartbeat con on resetSage Weil2012-11-291-0/+2
| | | | | | | | | | | | If we replace our old con, drop the reference. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: use safe con helpers for scrubSage Weil2012-11-291-38/+40
| | | | | | | | | | | | | | Note that if we don't get a con our behavior largely does not matter, since we know we are about to get a Reset event anyway. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: use safe con helpers from do_{infos,queries,notifies}Sage Weil2012-11-291-12/+15
| | | | | | | | | | | | Ensure we don't reopen connections to downloads. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: make _share_map_outgoing() use a ConnectionSage Weil2012-11-293-23/+38
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: fix Connection leaksSage Weil2012-11-291-0/+3
| | | | | | | | | | | | Messenger::get_connection() returns a reference. Put it. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: add Connection-base send_map(), send_incremental_map()Sage Weil2012-11-292-0/+36
| | | | | | | | Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: use OSDService send_message helper from PG contextSage Weil2012-11-293-52/+52
| | | | | | | | | | | | | | | | Use the OSDService helper to send messages to peers. This ensures that if we are on an older OSDMap the messages don't actually get sent to down OSDs that handle_osd_map has done mark_down() on. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: simplify active_committedSage Weil2012-11-292-9/+7
| | | | | | | | | | | | | | | | | | | | Way back in 4b3bb5ab37a05fa001d59f24da7d9c30d650321b we changed this to pass an entity_inst_t down to fix a race. The refactor of the PG map handling made this unnecessary; remove it. The PG's OSDMap is not coherent with respect to the PG when we take the lock, which is all that is needed here. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: use safe OSDService msgr helpers for heartbeatsSage Weil2012-11-292-4/+13
| | | | | | | | | | | | Get connections via the OSDService helper. Signed-off-by: Sage Weil <sage@inktank.com>
| * osd: helpers to blacklist messages to down osdsSage Weil2012-11-292-3/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between handle_osd_map -> note_down_osd() and PG threads: - handle_osd_map -> note_down_osd marks down an osd for epoch N - a pg thread with epoch <N sends a message to the (old) peer, reopening the msgr connection - nobody cleans up Introduce a pre_publish_map() OSDService method and helpers for sending messages to peers. Pass in the epoch we are working from, and drop the message on the floor if the target OSD has been since marked down. See #3548. Signed-off-by: Sage Weil <sage@inktank.com>
* | Merge remote-tracking branch 'gh/wip-mds-ls2'Sage Weil2012-11-305-18/+27
|\ \ | | | | | | | | | Reviewed-by: Greg Farnum <greg@inktank.com>
| * | mds: assert segements not emtpy in get_current_segment()Sage Weil2012-11-292-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only one caller can tolerate no segments; make a new peek_current_segment() for them. Motivated by paranoia tracking down a crash during client unmount, but it wasn't this. Signed-off-by: Sage Weil <sage@inktank.com>
| * | mds: be explicit about MDRequest killed stateSage Weil2012-11-292-7/+8
| | | | | | | | | | | | | | | | | | | | | Set the killed flag and use that instead of inferring things from the session xlist. Signed-off-by: Sage Weil <sage@inktank.com>
| * | mds: drop redundant mdr->committing = trueSage Weil2012-11-291-3/+0
| | | | | | | | | | | | | | | | | | journal_and_reply() does this. Signed-off-by: Sage Weil <sage@inktank.com>
| * | mds: fix request_kill()Sage Weil2012-11-292-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | Only request_cleanup() if the request isn't already committing. If it is, wait for it to commit before we clean up. It might fix all of #3531, #3210, #1947, and #1548. Maybe. Signed-off-by: Sage Weil <sage@inktank.com>
* | | doc: update kernel recsSage Weil2012-11-301-1/+4
|/ / | | | | | | | | | | Mention which stable kernels we recommend. Signed-off-by: Sage Weil <sage@inktank.com>
* | client: only dump cache on umount if we time outSage Weil2012-11-291-5/+8
| | | | | | | | | | | | | | | | | | | | | | We don't want to dump the cache every time an item is trimmed and the mount_cond gets signaled; this can make umount crazy-slow when logging is turned up. Instead, only dump if we wait 5 seconds without making any progress on shrinking the cache. Signed-off-by: Sage Weil <sage@inktank.com>
* | rgw: treat lack of swift token as anonymous user accessYehuda Sadeh2012-11-291-0/+7
| | | | | | | | | | | | | | Fixes: 3534 If a swift token hasn't been provided, set user as anonymous. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
* | Merge branch 'next'Sage Weil2012-11-2926-134/+359
|\ \ | | | | | | | | | | | | Conflicts: src/rgw/rgw_admin.cc
| * \ Merge remote-tracking branch 'gh/wip_next_bugs' into nextSage Weil2012-11-297-27/+64
| |\ \
| | * | PG: scrubber.end should be exactly a boundarySamuel Just2012-11-292-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let scrubber.end be (foo, HEAD, 10) where the oid is foo , HEAD is the snap, and 10 is the hash and scrubber.begin similarly be (bar, 5, 1). After choosing to scan [(bar, 5, 1), (foo, HEAD, 10)), we block writes on that interval. 1) A write might then come in for foo (which isn't blocked) which creates a new snap (foo, 400, 10) which happens to fall in the interval. This will result in a crash in _scrub() when it attempts to compare clones since it will get (foo, 400, 10) but not the head object (foo, HEAD, 10). 2) Alternately, the write from 1) has already happened. When we scan the log, we find 34'10 and 34'11 are the clone operation creating (foo, 400, 10) and the modify on (foo, HEAD, 10) respectively. Both primary and replica will wait for last_update_applied to be 34'10 before scanning, but last_update_applied will in fact skip to 34'11 since 34'10 and 34'11 happened in the same transaction. This can result in IO hanging on the scrubber interval. Instead, we ensure that scrubber.end is exactly a hash boundary (min hobject_t a with the specified hash). No such object can exist since we don't create objects with empty oids, so no writes can occur on that object. Signed-off-by: Samuel Just <sam.just@inktank.com>
| | * | ReplicatedPG: remove from snap_collections even without objects to trimSamuel Just2012-11-291-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | Also, make sure to write_info after updating snap_collections. Signed-off-by: Samuel Just <sam.just@inktank.com>
| | * | OSD: get_or_create_pg return null if pool is goneSamuel Just2012-11-291-0/+2
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| | * | OSD: history.last_epoch_started should start at 0Samuel Just2012-11-291-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | history.last_epoch_started marks a lower bound on the last epoch at which the pg went active. As with info.last_epoch_started, it should be 0 prior to the first activation. Signed-off-by: Samuel Just <sam.just@inktank.com>
| | * | PG: maintain osd local last_epoch_started for find_best_infoSamuel Just2012-11-294-9/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to proceed with peering, we need an osd with a log including the last commit sent to a client. This translates to the oldest last_update from the infos of the most recent acting set to go active. history.last_epoch_started gives us a lower bound on the last time the entire acting set persisted authoratative logs/infos. However, it doesn't indicate anything about the info/log on the osd which sent it. Thus, we will maintain an osd local info.last_epoch_started to determine which osds were actually active (and thus have the required log entries). The max info.last_epoch_started in the prior set gives us an upper bound on the last interval during which writes occurred. The min last_update among the infos with that last_epoch_started must therefore be an upper bound on the oldest operation which clients consider committed. Any osd with an info.last_updated past that version must be sufficient. The observed bug was there was an empty pg info with a last_epoch_started at the most recent interval which pushed min_last_update_acceptable to eversion_t(). There were two down osds, but peering proceeded since the backfill peer did survive. However, its info was later disregarded due to incomplete. An empty osd was then chosen as the best_info since it's last_update was equal to min_last_update_acceptable. This caused the contents of the pg to be lost. Signed-off-by: Samuel Just <sam.just@inktank.com>
| | * | hobject_t: make max privateSamuel Just2012-11-292-3/+11
| | | | | | | | | | | | | | | | Signed-off-by: Samuel Just <sam.just@inktank.com>
| * | | Merge remote-tracking branch 'gh/wip-mon-osd-create-fix' into nextSage Weil2012-11-292-1/+10
| |\ \ \
| | * | | mon: Monitor: don't allow '+' or '-' prefixed values on parse_pos_long()Joao Eduardo Luis2012-11-291-0/+5
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
| | * | | mon: OSDMonitor: return -EINVAL on not-a-uuid during 'osd create'Joao Eduardo Luis2012-11-291-1/+5
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
| * | | | radosgw-admin: close storage before exitYehuda Sadeh2012-11-291-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: #3560 This will remove watches off notification objects. Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
| * | | | client: Fix for #3490 and config option to testSam Lang2012-11-292-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the mds revokes our cache cap, and we follow the _read_sync() path, on a zero-byte file the osd returns ENOENT. We need to replace ENOENT with a return of 0 in this case. Signed-off-by: Sam Lang <sam.lang@inktank.com>
| * | | | test/libcephfs: Test reading an empty fileSam Lang2012-11-291-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This tests a bug (#3490) in the Client::_read_sync codepath, and should be run with conf->client_read_sync_always set to true. Signed-off-by: Sam Lang <sam.lang@inktank.com>
| * | | | Merge branch 'wip-mon-store-errorcheck' into nextGreg Farnum2012-11-299-97/+125
| |\ \ \ \ | | | | | | | | | | | | | | | | | | Reviewed-by: Joao Luis <joao.luis@inktank.com>
| | * | | | mon: add WARN_UNUSED_RESULT to the MonitorStore functions that return error ↵Greg Farnum2012-11-281-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | codes Signed-off-by: Greg Farnum <greg@inktank.com>
| | * | | | mon: remove the silly write_bl_ss write_bl_ss_impl distinctionGreg Farnum2012-11-282-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was introduced at the same time as all these unchecked return codes, but I can't tell why. Signed-off-by: Greg Farnum <greg@inktank.com>
| | * | | | mon: convert store users with unchecked return codes to just assert on issuesGreg Farnum2012-11-283-58/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will make them much more noticeable and reduce the odds of something writing data which assumes the previous op succeeded. Signed-off-by: Greg Farnum <greg@inktank.com>
| | * | | | mon: update Paxos::read()'s successful read checkGreg Farnum2012-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was returning success if it got back an error code; don't do that! Signed-off-by: Greg Farnum <greg@inktank.com>
| | * | | | mon: add new get_bl_[sn|ss]_safe functionsGreg Farnum2012-11-288-19/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These functions are like the non-safe versions, but assert that there were no disk errors and have void return types. Change a bunch of callers who weren't checking the return code to use these variants instead. (Unfortunately we can't make them default safe because several of the callers depend on getting back the length, and are perfectly happy with ENOENT producing a 0 return value.) Signed-off-by: Greg Farnum <greg@inktank.com>