summaryrefslogtreecommitdiff
path: root/qpid/cpp
diff options
context:
space:
mode:
authorAlan Conway <aconway@apache.org>2013-08-30 14:11:53 +0000
committerAlan Conway <aconway@apache.org>2013-08-30 14:11:53 +0000
commit27d31ba355acfef3ec66c23e48864e88a358014b (patch)
tree460cdf169eeacc80c71cb604169afb742703e08c /qpid/cpp
parent72f706cac68f5e7146ca5facb0e72e9d35b4332b (diff)
downloadqpid-python-27d31ba355acfef3ec66c23e48864e88a358014b.tar.gz
NO-JIRA: Remove obsolete (never implemented) cluster design docs.
git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1518975 13f79535-47bb-0310-9956-ffa450edef68
Diffstat (limited to 'qpid/cpp')
-rw-r--r--qpid/cpp/design_docs/new-cluster-design.txt298
-rw-r--r--qpid/cpp/design_docs/new-cluster-plan.txt340
2 files changed, 0 insertions, 638 deletions
diff --git a/qpid/cpp/design_docs/new-cluster-design.txt b/qpid/cpp/design_docs/new-cluster-design.txt
deleted file mode 100644
index d29ecce445..0000000000
--- a/qpid/cpp/design_docs/new-cluster-design.txt
+++ /dev/null
@@ -1,298 +0,0 @@
--*-org-*-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-* A new design for Qpid clustering.
-
-** Issues with old cluster design
-
-See [[./old-cluster-issues.txt]]
-
-** A new cluster design.
-
-1. Clearly defined interface between broker code and cluster plug-in.
-
-2. Replicate queue events rather than client data.
- - Only requires consistent enqueue order.
- - Events only need be serialized per-queue, allows concurrency between queues
- - Allows for replicated and non-replicated queues.
-
-3. Use a lock protocol to agree order of dequeues: only the broker
- holding the lock can acqiure & dequeue. No longer relies on
- identical state and lock-step behavior to cause identical dequeues
- on each broker.
-
-4. Use multiple CPG groups to process different queues in
- parallel. Use a fixed set of groups and hash queue names to choose
- the group for each queue.
-
-*** Requirements
-
-The cluster must provide these delivery guarantees:
-
-- client sends transfer: message must be replicated and not lost even if the local broker crashes.
-- client acquires a message: message must not be delivered on another broker while acquired.
-- client accepts message: message is forgotten, will never be delivered or re-queued by any broker.
-- client releases message: message must be re-queued on cluster and not lost.
-- client rejects message: message must be dead-lettered or discarded and forgotten.
-- client disconnects/broker crashes: acquired but not accepted messages must be re-queued on cluster.
-
-Each guarantee takes effect when the client receives a *completion*
-for the associated command (transfer, acquire, reject, accept)
-
-*** Broker receiving messages
-
-On recieving a message transfer, in the connection thread we:
-- multicast a message-received event.
-- enqueue and complete the transfer when it is self-delivered.
-
-Other brokers enqueue the message when they recieve the message-received event.
-
-Enqueues are queued up with other queue operations to be executed in the
-thread context associated with the queue.
-
-*** Broker sending messages: moving queue ownership
-
-Each queue is *owned* by at most one cluster broker at a time. Only
-that broker may acquire or dequeue messages. The owner multicasts
-notification of messages it acquires/dequeues to the cluster.
-Periodically the owner hands over ownership to another interested
-broker, providing time-shared access to the queue among all interested
-brokers.
-
-We assume the same IO-driven dequeuing algorithm as the standalone
-broker with one modification: queues can be "locked". A locked queue
-is not available for dequeuing messages and will be skipped by the
-output algorithm.
-
-At any given time only those queues owned by the local broker will be
-unlocked.
-
-As messages are acquired/dequeued from unlocked queues by the IO threads
-the broker multicasts acquire/dequeue events to the cluster.
-
-When an unlocked queue has no more consumers with credit, or when a
-time limit expires, the broker relinquishes ownership by multicasting
-a release-queue event, allowing another interested broker to take
-ownership.
-
-*** Asynchronous completion of accept
-
-In acknowledged mode a message is not forgotten until it is accepted,
-to allow for requeue on rejection or crash. The accept should not be
-completed till the message has been forgotten.
-
-On receiving an accept the broker:
-- dequeues the message from the local queue
-- multicasts an "accept" event
-- completes the accept asynchronously when the dequeue event is self delivered.
-
-NOTE: The message store does not currently implement asynchronous
-completions of accept, this is a bug.
-
-*** Multiple CPG groups.
-
-The old cluster was bottlenecked by processing everything in a single
-CPG deliver thread.
-
-The new cluster uses a set of CPG groups, one per core. Queue names
-are hashed to give group indexes, so statistically queues are likely
-to be spread over the set of groups.
-
-Operations on a given queue always use the same group, so we have
-order within each queue, but operations on different queues can use
-different groups giving greater throughput sending to CPG and multiple
-handler threads to process CPG messages.
-
-** Inconsistent errors.
-
-An inconsistent error means that after multicasting an enqueue, accept
-or dequeue, some brokers succeed in processing it and others fail.
-
-The new design eliminates most sources of inconsistent errors in the
-old broker: connections, sessions, security, management etc. Only
-store journal errors remain.
-
-The new inconsistent error protocol is similar to the old one with one
-major improvement: brokers do not have to stall processing while an
-error is being resolved.
-
-** Updating new members
-
-When a new member (the updatee) joins a cluster it needs to be brought
-up to date with the rest of the cluster. An existing member (the
-updater) sends an "update".
-
-In the old cluster design the update is a snapshot of the entire
-broker state. To ensure consistency of the snapshot both the updatee
-and the updater "stall" at the start of the update, i.e. they stop
-processing multicast events and queue them up for processing when the
-update is complete. This creates a back-log of work to get through,
-which leaves them lagging behind the rest of the cluster till they
-catch up (which is not guaranteed to happen in a bounded time.)
-
-With the new cluster design only exchanges, queues, bindings and
-messages need to be replicated.
-
-We update individual objects (queues and exchanges) independently.
-- create queues first, then update all queues and exchanges in parallel.
-- multiple updater threads, per queue/exchange.
-
-Queue updater:
-- marks the queue position at the sync point
-- sends messages starting from the sync point working towards the head of the queue.
-- send "done" message.
-
-Queue updatee:
-- enqueues received from CPG: add to back of queue as normal.
-- dequeues received from CPG: apply if found, else save to check at end of update.
-- messages from updater: add to the *front* of the queue.
-- update complete: apply any saved dequeues.
-
-Exchange updater:
-- updater: send snapshot of exchange as it was at the sync point.
-
-Exchange updatee:
-- queue exchange operations after the sync point.
-- when snapshot is received: apply saved operations.
-
-Note:
-- Updater is active throughout, no stalling.
-- Consuming clients actually reduce the size of the update.
-- Updatee stalls clients until the update completes.
- (Note: May be possible to avoid updatee stall as well, needs thought)
-
-** Internal cluster interface
-
-The new cluster interface is similar to the MessageStore interface, but
-provides more detail (message positions) and some additional call
-points (e.g. acquire)
-
-The cluster interface captures these events:
-- wiring changes: queue/exchange declare/bind
-- message enqueued/acquired/released/rejected/dequeued.
-- transactional events.
-
-** Maintainability
-
-This design gives us more robust code with a clear and explicit interfaces.
-
-The cluster depends on specific events clearly defined by an explicit
-interface. Provided the semantics of this interface are not violated,
-the cluster will not be broken by changes to broker code.
-
-The cluster no longer requires identical processing of the entire
-broker stack on each broker. It is not affected by the details of how
-the broker allocates messages. It is independent of the
-protocol-specific state of connections and sessions and so is
-protected from future protocol changes (e.g. AMQP 1.0)
-
-A number of specific ways the code will be simplified:
-- drop code to replicate management model.
-- drop timer workarounds for TTL, management, heartbeats.
-- drop "cluster-safe assertions" in broker code.
-- drop connections, sessions, management from cluster update.
-- drop security workarounds: cluster code now operates after message decoding.
-- drop connection tracking in cluster code.
-- simper inconsistent-error handling code, no need to stall.
-
-** Performance
-
-The standalone broker processes _connections_ concurrently, so CPU
-usage increases as you add more connections.
-
-The new cluster processes _queues_ concurrently, so CPU usage increases as you
-add more queues.
-
-In both cases, CPU usage peaks when the number of "units of
- concurrency" (connections or queues) goes above the number of cores.
-
-When all consumers on a queue are connected to the same broker the new
-cluster uses the same messagea allocation threading/logic as a
-standalone broker, with a little extra asynchronous book-keeping.
-
-If a queue has multiple consumers connected to multiple brokers, the
-new cluster time-shares the queue which is less efficient than having
-all consumers on a queue connected to the same broker.
-
-** Flow control
-New design does not queue up CPG delivered messages, they are
-processed immediately in the CPG deliver thread. This means that CPG's
-flow control is sufficient for qpid.
-
-** Live upgrades
-
-Live upgrades refers to the ability to upgrade a cluster while it is
-running, with no downtime. Each brokers in the cluster is shut down,
-and then re-started with a new version of the broker code.
-
-To achieve this
-- Cluster protocl XML file has a new element <version number=N> attached
- to each method. This is the version at which the method was added.
-- New versions can only add methods, existing methods cannot be changed.
-- The cluster handshake for new members includes the protocol version
- at each member.
-- Each cpg message starts with header: version,size. Allows new encodings later.
-- Brokers ignore messages of higher version.
-
-- The cluster's version is the lowest version among its members.
-- A newer broker can join and older cluster. When it does, it must restrict
- itself to speaking the older version protocol.
-- When the cluster version increases (because the lowest version member has left)
- the remaining members may move up to the new version.
-
-
-* Design debates
-** Active/active vs. active passive
-
-An active-active cluster can be used in an active-passive mode. In
-this mode we would like the cluster to be as efficient as a strictly
-active-passive implementation.
-
-An active/passive implementation allows some simplifications over active/active:
-- drop Queue ownership and locking
-- don't need to replicate message acquisition.
-- can do immediate local enqueue and still guarantee order.
-
-Active/passive introduces a few extra requirements:
-- Exactly one broker has to take over if primary fails.
-- Passive members must refuse client connections.
-- On failover, clients must re-try all known addresses till they find the active member.
-
-Active/active benefits:
-- A broker failure only affects the subset of clients connected to that broker.
-- Clients can switch to any other broker on failover
-- Backup brokers are immediately available on failover.
-- As long as a client can connect to any broker in the cluster, it can be served.
-
-Active/passive benefits:
-- Don't need to replicate message allocation, can feed consumers at top speed.
-
-Active/passive drawbacks:
-- All clients on one node so a failure affects every client in the
- system.
-- After a failure there is a "reconnect storm" as every client
- reconnects to the new active node.
-- After a failure there is a period where no broker is active, until
- the other brokers realize the primary is gone and agree on the new
- primary.
-- Clients must find the single active node, may involve multiple
- connect attempts.
-- No service if a partition separates a client from the active broker,
- even if the client can see other brokers.
-
-
diff --git a/qpid/cpp/design_docs/new-cluster-plan.txt b/qpid/cpp/design_docs/new-cluster-plan.txt
deleted file mode 100644
index 042e03f177..0000000000
--- a/qpid/cpp/design_docs/new-cluster-plan.txt
+++ /dev/null
@@ -1,340 +0,0 @@
--*-org-*-
-
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-* Status of impementation
-
-Meaning of priorities:
-[#A] Essential for basic functioning.
-[#B] Required for first release.
-[#C] Can be addressed in a later release.
-
-The existig prototype is bare bones to do performance benchmarks:
-- Implements publish and consumer locking protocol.
-- Defered delivery and asynchronous completion of message.
-- Optimize the case all consumers are on the same node.
-- No new member updates, no failover updates, no transactions, no persistence etc.
-
-Prototype code is on branch qpid-2920-active, in cpp/src/qpid/cluster/exp/
-
-** Similarities to existing cluster.
-
-/Active-active/: the new cluster can be a drop-in replacement for the
-old, existing tests & customer deployment configurations are still
-valid.
-
-/Virtual synchrony/: Uses corosync to co-ordinate activity of members.
-
-/XML controls/: Uses XML to define the primitives multicast to the
-cluster.
-
-** Differences with existing cluster.
-
-/Report rather than predict consumption/: brokers explicitly tell each
-other which messages have been acquired or dequeued. This removes the
-major cause of bugs in the existing cluster.
-
-/Queue consumer locking/: to avoid duplicates only one broker can acquire or
-dequeue messages at a time - while has the consume-lock on the
-queue. If multiple brokers are consuming from the same queue the lock
-is passed around to time-share access to the queue.
-
-/Per-queue concurrency/: uses a fixed-size set of CPG groups (reflecting
-the concurrency of the host) to allow concurrent processing on
-different queues. Queues are hashed onto the groups.
-
-* Completed tasks
-** DONE [#A] Minimal POC: publish/acquire/dequeue protocol.
- CLOSED: [2011-10-05 Wed 16:03]
-
-Defines broker::Cluster interface and call points.
-Initial interface commite
-
-Main classes
-Core: central object holding cluster classes together (replaces cluster::Cluster)
-BrokerContext: implements broker::Cluster interface.
-QueueContext: Attached to a broker::Queue, holds cluster status.
-MessageHolder:holds local messages while they are being enqueued.
-
-Implements multiple CPG groups for better concurrency.
-
-** DONE [#A] Large message replication.
- CLOSED: [2011-10-05 Wed 17:22]
-Multicast using fixed-size (64k) buffers, allow fragmetation of messages across buffers (frame by frame)
-
-* Design Questions
-** [[Queue sequence numbers vs. independant message IDs]]
-
-Current prototype uses queue+sequence number to identify message. This
-is tricky for updating new members as the sequence numbers are only
-known on delivery.
-
-Independent message IDs that can be generated and sent as part of the
-message simplify this and potentially allow performance benefits by
-relaxing total ordering. However they require additional map lookups
-that hurt performance.
-
-- [X] Prototype independent message IDs, check performance.
-Throughput worse by 30% in contented case, 10% in uncontended.
-
-* Tasks to match existing cluster
-** TODO [#A] Review old cluster code for more tasks. 1
-** TODO [#A] Put cluster enqueue after all policy & other checks.
-
-gsim: we do policy check after multicasting enqueue so
-could have inconsistent outcome.
-
-aconway: Multicast should be after enqueue and any other code that may
-decide to send/not send the message.
-
-gsime: while later is better, is moving it that late the right thing?
-That will mean for example that any dequeues triggered by the enqueue
-(e.g. ring queue or lvq) will happen before the enqueue is broadcast.
-
-** TODO [#A] Defer and async completion of wiring commands. 5
-Testing requirement: Many tests assume wiring changes are visible
-across the cluster once the wiring commad completes.
-
-Name clashes: avoid race if same name queue/exchange declared on 2
-brokers simultaneously.
-
-Ken async accept, never merged: http://svn.apache.org/viewvc/qpid/branches/qpid-3079/
-
-Clashes with non-replicated: see [[Allow non-replicated]] below.
-
-** TODO [#A] defer & async completion for explicit accept.
-
-Explicit accept currently ignores the consume lock. Defer and complete
-it when the lock is acquired.
-
-** TODO [#A] Update to new members joining. 10.
-
-Need to resolve [[Queue sequence numbers vs. independant message IDs]]
-first.
-- implicit sequence numbers are more tricky to replicate to new member.
-
-Update individual objects (queues and exchanges) independently.
-- create queues first, then update all queues and exchanges in parallel.
-- multiple updater threads, per queue/exchange.
-- updater sends messages to special exchange(s) (not using extended AMQP controls)
-
-Queue updater:
-- marks the queue position at the sync point
-- sends messages starting from the sync point working towards the head of the queue.
-- send "done" message.
-Note: updater remains active throughout, consuming clients actually reduce the
-size of the update.
-
-Queue updatee:
-- enqueues received from CPG: add to back of queue as normal.
-- dequeues received from CPG: apply if found, else save to check at end of update.
-- messages from updater: add to the *front* of the queue.
-- update complete: apply any saved dequeues.
-
-Exchange updater:
-- updater: send snapshot of exchange as it was at the sync point.
-
-Exchange updatee:
-- queue exchange operations after the sync point.
-- when snapshot is received: apply saved operations.
-
-Updater remains active throughout.
-Updatee stalls clients until the update completes.
-
-Updating queue/exchange/binding objects is via the same encode/decode
-that is used by the store. Updatee to use recovery interfaces to
-recover?
-
-** TODO [#A] Failover updates to client. 2
-Implement the amq.failover exchange to notify clients of membership.
-** TODO [#A] Passing all existing cluster tests. 5
-
-The new cluster should be a drop-in replacement for the old, so it
-should be able to pass all the existing tests.
-
-** TODO [#B] Initial status protocol. 3
-Handshake to give status of each broker member to new members joining.
-Status includes
-- cluster protocol version.
-- persistent store state (clean, dirty)
-- make it extensible, so additional state can be added in new protocols
-
-Clean store if last man standing or clean shutdown.
-Need to add multicast controls for shutdown.
-
-** TODO [#B] Persistent cluster startup. 4
-
-Based on existing code:
-- Exchange dirty/clean exchanged in initial status.
-- Only one broker recovers from store, others update.
-** TODO [#B] Replace boost::hash with our own hash function. 1
-The hash function is effectively part of the interface so
-we need to be sure it doesn't change underneath us.
-
-** TODO [#B] Management model. 3
-Alerts for inconsistent message loss.
-
-** TODO [#B] Management methods that modify queues. 5
-
-Replicate management methods that modify queues - e.g. move, purge.
-Target broker may not have all messages on other brokers for purge/destroy.
-- Queue::purge() - wait for lock, purge local, mcast dequeues.
-- Queue::move() - wait for lock, move msgs (mcasts enqueues), mcast dequeues.
-- Queue::destroy() - messages to alternate exchange on all brokers.
-- Queue::get() - ???
-Need to add callpoints & mcast messages to replicate these?
-** TODO [#B] TX transaction support. 5
-Extend broker::Cluster interface to capture transaction context and completion.
-Running brokers exchange TX information.
-New broker update includes TX information.
-
- // FIXME aconway 2010-10-18: As things stand the cluster is not
- // compatible with transactions
- // - enqueues occur after routing is complete
- // - no call to Cluster::enqueue, should be in Queue::process?
- // - no transaction context associated with messages in the Cluster interface.
- // - no call to Cluster::accept in Queue::dequeueCommitted
-
-Injecting holes into a queue:
-- Multicast a 'non-message' that just consumes one queue position.
-- Used to reserve a message ID (position) for a non-commited message.
-- Also could allow non-replicated messages on a replicated queue if required.
-
-** TODO [#B] DTX transaction support. 5
-Extend broker::Cluster interface to capture transaction context and completion.
-Running brokers exchange DTX information.
-New broker update includes DTX information.
-
-** TODO [#B] Replicate message groups?
-Message groups may require additional state to be replicated.
-** TODO [#B] Replicate state for Fairshare?
-gsim: fairshare would need explicit code to keep it in sync across
-nodes; that may not be required however.
-** TODO [#B] Timed auto-delete queues?
-gsim: may need specific attention?
-** TODO [#B] Async completion of accept. 4
-When this is fixed in the standalone broker, it should be fixed for cluster.
-
-** TODO [#B] Network partitions and quorum. 2
-Re-use existing implementation.
-
-** TODO [#B] Review error handling, put in a consitent model. 4.
-- [ ] Review all asserts, for possible throw.
-- [ ] Decide on fatal vs. non-fatal errors.
-
-** TODO [#B] Implement inconsistent error handling policy. 5
-What to do if a message is enqueued sucessfully on some broker(s),
-but fails on other(s) - e.g. due to store limits?
-- fail on local broker = possible message loss.
-- fail on non-local broker = possible duplication.
-
-We have more flexibility now, we don't *have* to crash
-- but we've lost some of our redundancy guarantee, how to inform user?
-
-Options to respond to inconsistent error:
-- stop broker
-- reset broker (exec a new qpidd)
-- reset queue
-- log critical
-- send management event
-
-Most important is to inform of the risk of message loss.
-Current favourite: reset queue+log critical+ management event.
-Configurable choices?
-
-** TODO [#C] Allow non-replicated exchanges, queues. 5
-
-3 levels set in declare arguments:
-- qpid.replicate=no - nothing is replicated.
-- qpid.replicate=wiring - queues/exchanges are replicated but not messages.
-- qpid.replicate=yes - queues exchanges and messages are replicated.
-
-Wiring use case: it's OK to lose some messages (up to the max depth of
-the queue) but the queue/exchange structure must be highly available
-so clients can resume communication after fail over.
-
-Configurable default? Default same as old cluster?
-
-Need to
-- save replicated status to stored (in arguments).
-- support in management tools.
-
-Avoid name clashes between replicated/non-replicated: multicast
-local-only names as well, all brokers keep a map and refuse to create
-clashes.
-
-** TODO [#C] Handling immediate messages in a cluster. 2
-Include remote consumers in descision to deliver an immediate message.
-* Improvements over existing cluster
-** TODO [#C] Remove old cluster hacks and workarounds.
-The old cluster has workarounds in the broker code that can be removed.
-- [ ] drop code to replicate management model.
-- [ ] drop timer workarounds for TTL, management, heartbeats.
-- [ ] drop "cluster-safe assertions" in broker code.
-- [ ] drop connections, sessions, management from cluster update.
-- [ ] drop security workarounds: cluster code now operates after message decoding.
-- [ ] drop connection tracking in cluster code.
-- [ ] simpler inconsistent-error handling code, no need to stall.
-
-** TODO [#C] Support for live upgrades.
-
-Allow brokers in a running cluster to be replaced one-by-one with a new version.
-(see new-cluster-design for design notes.)
-
-The old cluster protocol was unstable because any changes in broker
-state caused changes to the cluster protocol.The new design should be
-much more stable.
-
-Points to implement in anticipation of live upgrade:
-- Prefix each CPG message with a version number and length.
- Version number determines how to decode the message.
-- Brokers ignore messages that have a higher version number than they understand.
-- Protocol version XML element in cluster.xml, on each control.
-- Initial status protocol to include protocol version number.
-
-New member udpates: use the store encode/decode for updates, use the
-same backward compatibility strategy as the store. This allows for
-adding new elements to the end of structures but not changing or
-removing new elements.
-
-NOTE: Any change to the association of CPG group names and queues will
-break compatibility. How to work around this?
-
-** TODO [#C] Refactoring of common concerns.
-
-There are a bunch of things that act as "Queue observers" with intercept
-points in similar places.
-- QueuePolicy
-- QueuedEvents (async replication)
-- MessageStore
-- Cluster
-
-Look for ways to capitalize on the similarity & simplify the code.
-
-In particular QueuedEvents (async replication) strongly resembles
-cluster replication, but over TCP rather than multicast.
-
-** TODO [#C] Support for AMQP 1.0.
-
-* Testing
-** TODO [#A] Pass all existing cluster tests.
-Requires [[Defer and async completion of wiring commands.]]
-** TODO [#A] New cluster tests.
-Stress tests & performance benchmarks focused on changes in new cluster:
-- concurrency by queues rather than connections.
-- different handling shared queues when consuemrs are on different brokers.