QPID-7207: Rename and relocate files inside the cpp subtree

git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk@1740034 13f79535-47bb-0310-9956-ffa450edef68
author: Justin Ross <jross@apache.org> 2016-04-20 00:02:02 +0000
committer: Justin Ross <jross@apache.org> 2016-04-20 00:02:02 +0000
commit: a835fb2724824dcd8a470fb51424cedeb6b38f62 (patch)
tree: 48e5d8591c0029ac500330bf87b78bf9a99ed238 /qpid/cpp/docs/design
parent: da7718ef463775acc7d6fbecf2d64c1bbfc39fd8 (diff)
download: qpid-python-a835fb2724824dcd8a470fb51424cedeb6b38f62.tar.gz
9 files changed, 1255 insertions, 0 deletions
diff --git a/qpid/cpp/docs/design/CONTENTS b/qpid/cpp/docs/design/CONTENTS
new file mode 100644
index 0000000000..cc3b868e0e
--- /dev/null
+++ b/qpid/cpp/docs/design/CONTENTS
@@ -0,0 +1,31 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+This directory contains documentation about the C++ source
+that is expressed in formats that does not fit comfortably
+within C++ source files.
+
+As with all documentation, including comments, it may become
+outmoded with respect to the code.  
+
+If you find external code doco useful in your work -- if it 
+helps you save some time -- please return some of that time 
+in the form of effort to keep the documentation updated.
+
+
diff --git a/qpid/cpp/docs/design/DispatchHandle.odg b/qpid/cpp/docs/design/DispatchHandle.odg
new file mode 100644
index 0000000000..c08b3a4e1a
--- /dev/null
+++ b/qpid/cpp/docs/design/DispatchHandle.odg
diff --git a/qpid/cpp/docs/design/broker-acl-work.txt b/qpid/cpp/docs/design/broker-acl-work.txt
new file mode 100644
index 0000000000..e587dc5198
--- /dev/null
+++ b/qpid/cpp/docs/design/broker-acl-work.txt
@@ -0,0 +1,156 @@
+-*-org-*-
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+The broker is accumulating ACL features and additions. This document describes the features and some of the strategies and decisions made along the way.
+
+These changes are not coordinated with the Java Broker.
+
+Queue Limit Property Settings
+=============================
+
+Customer Goal: Prevent users from making queues too small or too big 
+in memory and on disk.
+
+* Add property limit settings to CREATE QUEUE Acl rules.
+
+User Option     	Acl Limit Property      Units
+--------------- 	----------------------  ---------------
+qpid.max_size   	queuemaxsizelowerlimit  bytes
+                	queuemaxsizeupperlimit  bytes
+qpid.max_count  	queuemaxcountlowerlimit messages
+                	queuemaxcountupperlimit messages
+qpid.file_size  	filemaxsizelowerlimit   pages (64Kb per page)
+                	filemaxsizeupperlimit   pages (64Kb per page)
+qpid.file_count 	filemaxcountlowerlimit  files
+                	filemaxcountupperlimit  files
+qpid.max_pages_loaded	pageslowerlimit		pages
+			pagesupperlimit		pages
+qpid.page_factor	pagefactorlowerlimit	integer (multiple of the platform-defined page size)
+			pagefactorlowerlimit	integer (multiple of the platform-defined page size)
+
+
+* Change rule match behavior to accomodate limit settings
+
+** Normal properties upon seeing a mismatch cause the Acl rule processor to go on to the next rule. Property limit settings do not cause a rule mismatch.
+** When property limit checks are violated the effect is to demote an allow rule into a deny rule. Property limit checks are ignored in deny rules.
+
+Routingkey Wildcard Match
+=========================
+
+Customer Goal: Allow users to bind, unbind, access, and publish with wildcards in the routingkey property. A single trailing * wildcard match is insufficient.
+
+* Acl rule processing uses the broker's topic exchange match logic when matching any exchange rule with a "routingkey" property.
+
+* Acl rule writers get to use the same rich matching logic that the broker uses when, for instance, it decides which topic exchange binding keys satisfy an incoming message's routing key.
+
+User Name and Domain Name Symbol Substitution
+=============================================
+
+Customer Goal: Create rules that allow users to access resources only when the user's name is embedded in the resource name.
+
+* The Acl rule processor defines keywords which are substituted with the user's user and domain name.
+
+* User name substitution is allowed in the Acl file anywhere that text is supplied for a property value.
+
+In the following table an authenticated user bob@QPID.COM has his substitution keywords expanded.
+
+| Keyword       | Expansion    |
+|---------------+--------------|
+| ${userdomain} | bob_QPID_COM |
+| ${user}       | bob          |
+| ${domain}     | QPID_COM     |
+
+* User names are normalized by changing asterisk '*' and period '.' to underscores. This allows substitution to work with routingkey specfications.
+
+* The Acl processor matches ${userdomain} before matching either ${user} or ${domain}. Rules that specify ${user}_${domain} will never match.
+
+Resource Quotas
+===============
+
+The Acl module provides broker command line switches to limit users' access to queues and connections.
+
+| Command Line Option          | Specified Quota          | Default |
+|------------------------------+--------------------------+---------|
+| --max-connections-per-user N | connections by user name | 0       |
+| --max-connections-per-IP N   | connections by host name | 0       |
+| --max-queues-per-user N      | queues by user name      | 0       |
+
+* Allowed values for N are 0..65535
+
+* An option value of zero (0) disables that limit check.
+
+* Connections per-user are counted using the authenticated user name. The user may be logged in from any location but resource counts are aggregated under the user's name.
+
+* Connections per-IP are identified by the <broker-ip><broker-port>-<client-ip><client-port> tuple. This is the same string used by broker management to index connections.
+
+** With this scheme hosts may be identified by several names such as localhost, 127.0.0.1, or ::1. A separate counted set of connections is allowed for each name.
+
+** Connections per-ip are counted regardless of the credentials provided with each connection. A user may be allowed 20 connections but if the per-ip limit is 5 then that user may connect from any single host only five times.
+
+Acl Management Interface
+========================
+
+* Acl Lookup Query Methods
+
+The Acl module provides two QMF management methods that allow users to query the Acl authorization interface.
+ 
+  Method: Lookup
+    Argument     Type         Direction  Unit  Description
+    ========================================================
+    userId       long-string  I                
+    action       long-string  I                
+    object       long-string  I                
+    objectName   long-string  I                
+    propertyMap  field-table  I                
+    result       long-string  O                
+
+  Method: LookupPublish
+    Argument      Type         Direction  Unit  Description
+    =========================================================
+    userId        long-string  I                
+    exchangeName  long-string  I                
+    routingKey    long-string  I                
+    result        long-string  O                
+
+The Lookup method is a general query for any action, object, and set of properties. 
+The LookupPublish method is the optimized, per-message fastpath query.
+
+In both methods the result is one of: allow, deny, allow-log, or deny-log.
+
+Example:
+
+The upstream Jira https://issues.apache.org/jira/browse/QPID-3918 has several attachment files that demonstrate how to use the query feature.
+
+ acl-test-01.rules.acl is the Acl file to run in the qpidd broker.
+ acl-test-01.py        is the test script that queries the Acl.
+ acl-test-01.log       is what the console prints when the test script runs.
+
+The script performs 355 queries using the Acl Lookup query methods.
+
+* Management Properties and Statistics
+
+The following properties and statistics have been added to reflect command line settings in effect and Acl quota denial activity.
+ 
+Element                Type         Access    Unit        Notes   Description
+==================================================================================================
+maxConnections         uint16       ReadOnly                      Maximum allowed connections
+maxConnectionsPerIp    uint16       ReadOnly                      Maximum allowed connections
+maxConnectionsPerUser  uint16       ReadOnly                      Maximum allowed connections
+maxQueuesPerUser       uint16       ReadOnly                      Maximum allowed queues
+connectionDenyCount    uint64                                     Number of connections denied
+queueQuotaDenyCount    uint64                                     Number of queue creations denied
diff --git a/qpid/cpp/docs/design/ha-transactions.md b/qpid/cpp/docs/design/ha-transactions.md
new file mode 100644
index 0000000000..bfa5456a2c
--- /dev/null
+++ b/qpid/cpp/docs/design/ha-transactions.md
@@ -0,0 +1,197 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+-->
+
+# Design note: HA with Transactions.
+
+Clients can use transactions (TX or DTX) with the current HA module but:
+
+- the results are not guaranteed to be comitted atomically on the backups.
+- the backups do not record the transaction in the store for possible recovery.
+
+## Requirements
+
+1. Atomic: Transaction results must be atomic on primary and backups.
+
+2. Consistent: Backups and primary must agree on whether the transaction comitted.
+
+3. Isolated: Concurrent transactions must not interfere with each other.
+
+4. Durable: Transactional state is written to the store.
+
+5. Recovery: If a cluster crashes while a DTX is in progress, it can be
+   re-started and participate in DTX recovery with a transaction co-ordinater
+   (currently via JMS)
+
+## TX and DTX
+
+Both TX and DTX transactions require a DTX-like `prepare` phase to synchronize
+the commit or rollback across multiple brokers. This design note applies to both.
+
+For DTX the transaction is identified by the `xid`. For TX the HA module generates
+a unique identifier.
+
+## Intercepting transactional operations
+
+Introduce  2 new interfaces on the Primary to intercept transactional operations:
+
+    TransactionObserverFactory {
+      TransactionObserver create(...);
+    }
+
+    TransactionObserver {
+      publish(queue, msg);
+      accept(accumulatedAck, unacked) # NOTE: translate queue positions to replication ids
+      prepare()
+      commit()
+      rollback()
+    }
+
+The primary will register a `TransactionObserverFactory` with the broker to hook
+into transactions. On the primary, transactional events are processed as normal
+and additionally are passed to the `TransactionObserver` which replicates the
+events to the backups.
+
+## Replicating transaction content
+
+The primary creates a specially-named queue for each transaction (the tx-queue)
+
+TransactionObserver operations:
+
+- publish(queue, msg): push enqueue event onto tx-queue
+- accept(accumulatedAck, unacked) push dequeue event onto tx-queue
+  (NOTE: must translate queue positions to replication ids)
+
+The tx-queue is replicated like a normal queue with some extensions for transactions.
+
+- Backups create a `TransactionReplicator` (extends `QueueReplicator`)
+  - `QueueReplicator` builds up a `TxBuffer` or `DtxBuffer` of transaction events.
+- Primary has `TxReplicatingSubscription` (extends `ReplicatingSubscription`
+
+
+Using a tx-queue has the following benefits:
+
+- Already have the tools to replicate it.
+- Handles async fan-out of transactions to multiple Backups.
+- keeps the tx events in linear order even if the tx spans multiple queues.
+- Keeps tx data available for new backups until the tx is closed
+- By closing the queue (see next section) its easy to establish what backups are
+   in/out of the transaction.
+
+## DTX Prepare
+
+Primary receives dtx.prepare:
+
+- "closes" the tx-queue
+  - A closed queue rejects any attempt to subscribe.
+  - Backups subscribed before the close are in the transaction.
+- Puts prepare event on tx-queue, and does local prepare
+- Returns ok if outcome of local and all backup prepares is ok, fail otherwise.
+
+*TODO*: this means we need asynchronous completion of the prepare control
+so we complete only when all backups have responded (or time out.)
+
+Backups receiving prepare event do a local prepare.  Outcome is communicated to
+the TxReplicatingSubscription on the primary as follows:
+
+- ok: Backup acknowledges prepare event message on the tx-queue
+- fail: Backup cancels tx-queue subscription
+
+## DTX Commit/Rollback
+
+Primary receives dtx.commit/rollback
+
+- Primary puts commit/rollback event on the tx-queue and does local commit/rollback.
+- Backups commit/rollback as instructed and unsubscribe from tx-queue.
+- tx-queue auto deletes when last backup unsubscribes.
+
+## TX Commit/Rollback
+
+Primary receives tx.commit/rollback
+
+- Do prepare phase as for DTX.
+- Do commit/rollback phase as for DTX.
+
+## De-duplication
+
+When tx commits, each broker (backup & primary) pushes tx messages to the local queue.
+
+On the primary, ReplicatingSubscriptions will see the tx messages on the local
+queue. Need to avoid re-sending to backups that already have the messages from
+the transaction.
+
+ReplicatingSubscriptions has a "skip set" of messages already on the backup,
+add tx messages to the skip set before Primary commits to local queue.
+
+## Failover
+
+Backups abort all open tx if the primary fails.
+
+## Atomicity
+
+Keep tx atomic when backups catch up while a tx is in progress.
+A `ready` backup should never contain a proper subset of messages in a transaction.
+
+Scenario:
+
+- Guard placed on Q for an expected backup.
+- Transaction with messages A,B,C on Q is prepared, tx-queue is closed.
+- Expected backup connects and is declared ready due to guard.
+- Transaction commits, primary publishes A, B to Q and crashes before adding C.
+- Backup is ready so promoted to new primary with a partial transaction A,B.
+
+*TODO*: Not happy with the following solution.
+
+Solution: Primary delays `ready` status till full tx is replicated.
+
+- When a new backup joins it is considered 'blocked' by any TX in progress.
+  - create a TxTracker per queue, per backup at prepare, before local commit.
+  - TxTracker holds set of enqueues & dequeues in the tx.
+  - ReplicatingSubscription removes enqueus & dequeues from TxTracker as they are replicated.
+- Backup starts to catch up as normal but is not granted ready till:
+  - catches up on it's guard (as per normal catch-up) AND
+  - all TxTrackers are clear.
+
+NOTE: needs thougthful locking to correctly handle
+
+- multiple queues per tx
+- multiple tx per queue
+- multiple TxTrackers per queue per per backup
+- synchronize backups in/out of transaction wrt setting trackers.
+
+*TODO*: A blocking TX eliminates the benefit of the queue guard for expected backups.
+
+- It won't be ready till it has replicated all guarded positions up to the tx.
+- Could severely delay backups becoming ready after fail-over if queues are long.
+
+## Recovery
+
+*TODO*
+
+## Testing
+
+New tests:
+
+- TX transaction tests in python.
+- TX failover stress test (c.f. `test_failover_send_receive`)
+
+Existing tests:
+
+- JMS transaction tests (DTX & TX)
+- `qpid/tests/src/py/qpid_tests/broker_0_10/dtx.py,tx.py` run against HA broker.
+
diff --git a/qpid/cpp/docs/design/log-model-category-for-correlation.txt b/qpid/cpp/docs/design/log-model-category-for-correlation.txt
new file mode 100644
index 0000000000..280f53bb9d
--- /dev/null
+++ b/qpid/cpp/docs/design/log-model-category-for-correlation.txt
@@ -0,0 +1,131 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+This documennt describes the new logging entries written for
+"QPID-4079 C++ Broker needs log messages to track object life cycles for auditing". 
+
+Please see https://issues.apache.org/jira/browse/QPID-4079 for an overview.
+
+The basic features are simple:
+
+* A new log category, [Model], is added and only the new log entries use it.
+
+* At 'debug' log level are log entries that mirror the corresponding management 
+  events. Debug level statements include user names, remote host information, and
+  other references using the user-specified names for the referenced objects.
+
+* At 'trace' log level are log entries that track the construction and destruction
+  of managed resources. Trace level statements identify the objects using the
+  internal management keys. The trace statement for each deleted object includes the
+  management statistics for that object.
+
+Enabling the Model log
+
+* Use the switch: '--log-enable trace+:Model' to receive both flavors of log
+* Use the switch: '--log-enable debug+:Model' for a less verbose log
+
+Managed Objects in the logs
+
+All managed objects are included in the trace log.
+The debug log has information for: 
+ Connection, Queue, Exchange, Binding, Subscription
+
+The following section lists actual log file data sorted and paired with the 
+corresponding management Event captured with qpid-printevents.
+
+1. Connection
+
+Create connection
+event: Fri Jul 13 17:46:23 2012 org.apache.qpid.broker:clientConnect rhost=[::1]:5672-[::1]:34383 user=anonymous
+debug: 2012-07-13 13:46:23 [Model] debug Create connection. user:anonymous rhost:[::1]:5672-[::1]:34383
+trace: 2012-07-13 13:46:23 [Model] trace Mgmt create connection. id:[::1]:5672-[::1]:34383
+
+Delete connection
+event: Fri Jul 13 17:46:23 2012 org.apache.qpid.broker:clientDisconnect rhost=[::1]:5672-[::1]:34383 user=anonymous
+debug: 2012-07-13 13:46:23 [Model] debug Delete connection. user:anonymous rhost:[::1]:5672-[::1]:34383
+trace: 2012-07-13 13:46:29 [Model] trace Mgmt delete connection. id:[::1]:5672-[::1]:34383 
+       Statistics: {bytesFromClient:1451, bytesToClient:892, closing:False, framesFromClient:25, framesToClient:21, msgsFromClient:1, msgsToClient:1}
+
+2. Session
+
+Create session
+event: TBD
+debug: TBD
+trace: 2012-07-13 13:46:09 [Model] trace Mgmt create session. id:18f52c22-efc5-4c2f-bd09-902d2a02b948:0
+
+Delete session
+event: TBD
+debug: TBD
+trace: 2012-07-13 13:47:13 [Model] trace Mgmt delete session. id:18f52c22-efc5-4c2f-bd09-902d2a02b948:0 
+       Statistics: {TxnCommits:0, TxnCount:0, TxnRejects:0, TxnStarts:0, clientCredit:0, unackedMessages:0}
+
+
+3. Exchange
+
+Create exchange
+event: Fri Jul 13 17:46:34 2012 org.apache.qpid.broker:exchangeDeclare disp=created exName=myE exType=topic durable=False args={} autoDel=False rhost=[::1]:5672-[::1]:34384 altEx= user=anonymous
+debug: 2012-07-13 13:46:34 [Model] debug Create exchange. name:myE user:anonymous rhost:[::1]:5672-[::1]:34384 type:topic alternateExchange: durable:F
+trace: 2012-07-13 13:46:34 [Model] trace Mgmt create exchange. id:myE
+
+
+Delete exchange
+event: Fri Jul 13 18:19:33 2012 org.apache.qpid.broker:exchangeDelete exName=myE rhost=[::1]:5672-[::1]:37199 user=anonymous
+debug: 2012-07-13 14:19:33 [Model] debug Delete exchange. name:myE user:anonymous rhost:[::1]:5672-[::1]:37199
+trace: 2012-07-13 14:19:42 [Model] trace Mgmt delete exchange. id:myE 
+       Statistics: {bindingCount:0, bindingCountHigh:0, bindingCountLow:0, byteDrops:0, byteReceives:0, byteRoutes:0, msgDrops:0, msgReceives:0, msgRoutes:0, producerCount:0, producerCountHigh:0, producerCountLow:0}
+
+
+4. Queue
+
+Create queue
+event: Fri Jul 13 18:19:35 2012 org.apache.qpid.broker:queueDeclare disp=created durable=False args={} qName=myQ autoDel=False rhost=[::1]:5672-[::1]:37200 altEx= excl=False user=anonymous
+debug: 2012-07-13 14:19:35 [Model] debug Create queue. name:myQ user:anonymous rhost:[::1]:5672-[::1]:37200 durable:F owner:0 autodelete:F alternateExchange:
+trace: 2012-07-13 14:19:35 [Model] trace Mgmt create queue. id:myQ
+
+Delete queue
+event: Fri Jul 13 18:19:37 2012 org.apache.qpid.broker:queueDelete user=anonymous qName=myQ rhost=[::1]:5672-[::1]:37201
+debug: 2012-07-13 14:19:37 [Model] debug Delete queue. name:myQ user:anonymous rhost:[::1]:5672-[::1]:37201
+trace: 2012-07-13 14:19:42 [Model] trace Mgmt delete queue. id:myQ 
+       Statistics: {acquires:0, bindingCount:0, bindingCountHigh:0, bindingCountLow:0, byteDepth:0, byteFtdDepth:0, byteFtdDequeues:0, byteFtdEnqueues:0, bytePersistDequeues:0, bytePersistEnqueues:0, byteTotalDequeues:0, byteTotalEnqueues:0, byteTxnDequeues:0, byteTxnEnqueues:0, consumerCount:0, consumerCountHigh:0, consumerCountLow:0, discardsLvq:0, discardsOverflow:0, discardsPurge:0, discardsRing:0, discardsSubscriber:0, discardsTtl:0, flowStopped:False, flowStoppedCount:0, messageLatencyAvg:0, messageLatencyCount:0, messageLatencyMax:0, messageLatencyMin:0, msgDepth:0, msgFtdDepth:0, msgFtdDequeues:0, msgFtdEnqueues:0, msgPersistDequeues:0, msgPersistEnqueues:0, msgTotalDequeues:0, msgTotalEnqueues:0, msgTxnDequeues:0, msgTxnEnqueues:0, releases:0, reroutes:0, unackedMessages:0, unackedMessagesHigh:0, unackedMessagesLow:0}
+
+5. Binding
+
+Create binding
+event: Fri Jul 13 17:46:45 2012 org.apache.qpid.broker:bind exName=myE args={} qName=myQ user=anonymous key=myKey rhost=[::1]:5672-[::1]:34385
+debug: 2012-07-13 13:46:45 [Model] debug Create binding. exchange:myE queue:myQ key:myKey user:anonymous rhost:[::1]:5672-[::1]:34385
+trace: 2012-07-13 13:46:23 [Model] trace Mgmt create binding. id:org.apache.qpid.broker:exchange:,org.apache.qpid.broker:queue:myQ,myQ
+
+Delete binding
+event: Fri Jul 13 17:47:06 2012 org.apache.qpid.broker:unbind user=anonymous exName=myE qName=myQ key=myKey rhost=[::1]:5672-[::1]:34386
+debug: 2012-07-13 13:47:06 [Model] debug Delete binding. exchange:myE queue:myQ key:myKey user:anonymous rhost:[::1]:5672-[::1]:34386
+trace: 2012-07-13 13:47:09 [Model] trace Mgmt delete binding. id:org.apache.qpid.broker:exchange:myE,org.apache.qpid.broker:queue:myQ,myKey 
+       Statistics: {msgMatched:0}
+
+6, Subscription
+
+Create subscription
+event: Fri Jul 13 18:19:28 2012 org.apache.qpid.broker:subscribe dest=0 args={} qName=b78b1818-7a20-4341-a253-76216b40ab4a:0.0 user=anonymous excl=False rhost=[::1]:5672-[::1]:37198
+debug: 2012-07-13 14:19:28 [Model] debug Create subscription. queue:b78b1818-7a20-4341-a253-76216b40ab4a:0.0 destination:0 user:anonymous rhost:[::1]:5672-[::1]:37198 exclusive:F
+trace: 2012-07-13 14:19:28 [Model] trace Mgmt create subscription. id:org.apache.qpid.broker:session:b78b1818-7a20-4341-a253-76216b40ab4a:0,org.apache.qpid.broker:queue:b78b1818-7a20-4341-a253-76216b40ab4a:0.0,0
+
+Delete subscription
+event: Fri Jul 13 18:19:28 2012 org.apache.qpid.broker:unsubscribe dest=0 rhost=[::1]:5672-[::1]:37198 user=anonymous
+debug: 2012-07-13 14:19:28 [Model] debug Delete subscription. destination:0 user:anonymous rhost:[::1]:5672-[::1]:37198
+trace: 2012-07-13 14:19:32 [Model] trace Mgmt delete subscription. id:org.apache.qpid.broker:session:b78b1818-7a20-4341-a253-76216b40ab4a:0,org.apache.qpid.broker:queue:b78b1818-7a20-4341-a253-76216b40ab4a:0.0,0 
+       Statistics: {delivered:1}
diff --git a/qpid/cpp/docs/design/new-ha-design.txt b/qpid/cpp/docs/design/new-ha-design.txt
new file mode 100644
index 0000000000..df6c7242eb
--- /dev/null
+++ b/qpid/cpp/docs/design/new-ha-design.txt
@@ -0,0 +1,304 @@
+-*-org-*-
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+* An active-passive, hot-standby design for Qpid clustering.
+
+This document describes an active-passive approach to HA based on
+queue browsing to replicate message data.
+
+See [[./old-cluster-issues.txt]] for issues with the old design.
+
+** Active-active vs. active-passive (hot-standby)
+
+An active-active cluster allows clients to connect to any broker in
+the cluster. If a broker fails, clients can fail-over to any other
+live broker.
+
+A hot-standby cluster has only one active broker at a time (the
+"primary") and one or more brokers on standby (the "backups"). Clients
+are only served by the primary, clients that connect to a backup are
+redirected to the primary. The backups are kept up-to-date in real
+time by the primary, if the primary fails a backup is elected to be
+the new primary.
+
+The main problem with active-active is co-ordinating consumers of the
+same queue on multiple brokers such that there are no duplicates in
+normal operation. There are 2 approaches:
+
+Predictive: each broker predicts which messages others will take. This
+the main weakness of the old design so not appealing.
+
+Locking: brokers "lock" a queue in order to take messages. This is
+complex to implement and it is not straighforward to determine the
+best strategy for passing the lock. In tests to date it results in
+very high latencies (10x standalone broker).
+
+Hot-standby removes this problem. Only the primary can modify queues
+so it just has to tell the backups what it is doing, there's no
+locking.
+
+The primary can enqueue messages and replicate asynchronously -
+exactly like the store does, but it "writes" to the replicas over the
+network rather than writing to disk.
+
+** Replicating browsers
+
+The unit of replication is a replicating browser. This is an AMQP
+consumer that browses a remote queue via a federation link and
+maintains a local replica of the queue. As well as browsing the remote
+messages as they are added the browser receives dequeue notifications
+when they are dequeued remotely.
+
+On the primary broker incoming mesage transfers are completed only when
+all of the replicating browsers have signaled completion. Thus a completed
+message is guaranteed to be on the backups.
+
+** Failover and Cluster Resource Managers
+
+We want to delegate the failover management to an existing cluster
+resource manager. Initially this is rgmanager from Cluster Suite, but
+other managers (e.g. PaceMaker) could be supported in future.
+
+Rgmanager takes care of starting and stopping brokers and informing
+brokers of their roles as primary or backup. It ensures there's
+exactly one primary broker running at any time. It also tracks quorum
+and protects against split-brain.
+
+Rgmanger can also manage a virtual IP address so clients can just
+retry on a single address to fail over. Alternatively we will also
+support configuring a fixed list of broker addresses when qpid is run
+outside of a resource manager.
+
+** Replicating configuration
+
+New queues and exchanges and their bindings also need to be replicated.
+This is done by a QMF client that registers for configuration changes
+on the remote broker and mirrors them in the local broker.
+
+** Use of CPG (openais/corosync)
+
+CPG is not required in this model, an external cluster resource
+manager takes care of membership and quorum.
+
+** Selective replication
+
+In this model it's easy to support selective replication of individual queues via
+configuration.
+
+Explicit exchange/queue qpid.replicate argument:
+- none: the object is not replicated
+- configuration: queues, exchanges and bindings are replicated but messages are not.
+- all: configuration and messages are replicated.
+
+Set configurable default all/configuration/none
+
+** Inconsistent errors
+
+The new design eliminates most sources of inconsistent errors in the
+old design (connections, sessions, security, management etc.) and
+eliminates the need to stall the whole cluster till an error is
+resolved. We still have to handle inconsistent store errors when store
+and cluster are used together.
+
+We have 3 options (configurable) for handling inconsistent errors,
+on the backup that fails to store a message from primary we can:
+- Abort the backup broker allowing it to be re-started.
+- Raise a critical error on the backup broker but carry on with the message lost.
+- Reset and re-try replication for just the affected queue.
+
+We will provide some configurable options in this regard.
+
+** New backups connecting to primary.
+
+When the primary fails, one of the backups becomes primary and the
+others connect to the new primary as backups.
+
+The backups can take advantage of the messages they already have
+backed up, the new primary only needs to replicate new messages.
+
+To keep the N-way guarantee the primary needs to delay completion on
+new messages until all the back-ups have caught up. However if a
+backup does not catch up within some timeout, it is considered dead
+and its messages are completed so the cluster can carry on with N-1
+members.
+
+
+** Broker discovery and lifecycle.
+
+The cluster has a client URL that can contain a single virtual IP
+address or a list of real IP addresses for the cluster.
+
+In backup mode, brokers reject connections normal client connections
+so clients will fail over to the primary. HA admin tools mark their
+connections so they are allowed to connect to backup brokers.
+
+Clients discover the primary by re-trying connection to all addresses in the client URL
+until they successfully connect to the primary. In the case of a
+virtual IP they re-try the same address until it is relocated to the
+primary. In the case of a list of IPs the client tries each in
+turn. Clients do multiple retries over a configured period of time
+before giving up.
+
+Backup brokers discover the primary in the same way as clients. There
+is a separate broker URL for brokers since they often will connect
+over a different network. The broker URL has to be a list of real
+addresses rather than a virtual address.
+
+** Interaction with rgmanager
+
+rgmanager interacts with qpid via 2 service scripts: backup &
+primary. These scripts interact with the underlying qpidd
+service. rgmanager picks the new primary when the old primary
+fails. In a partition it also takes care of killing inquorate brokers.
+
+*** Initial cluster start
+
+rgmanager starts the backup service on all nodes and the primary service on one node.
+
+On the backup nodes qpidd is in the connecting state. The primary node goes into
+the primary state. Backups discover the primary, connect and catch up.
+
+*** Failover
+
+primary broker or node fails. Backup brokers see the disconnect and
+start trying to re-connect to the new primary.
+
+rgmanager notices the failure and starts the primary service on a new node.
+This tells qpidd to go to primary mode. Backups re-connect and catch up.
+
+The primary can only be started on nodes where there is a ready backup service.
+If the backup is catching up, it's not eligible to take over as primary.
+
+*** Failback
+
+Cluster of N brokers has suffered a failure, only N-1 brokers
+remain. We want to start a new broker (possibly on a new node) to
+restore redundancy.
+
+If the new broker has a new IP address, the sysadmin pushes a new URL
+to all the existing brokers.
+
+The new broker starts in connecting mode. It discovers the primary,
+connects and catches up.
+
+*** Failure during failback
+
+A second failure occurs before the new backup B completes its catch
+up. The backup B refuses to become primary by failing the primary
+start script if rgmanager chooses it, so rgmanager will try another
+(hopefully caught-up) backup to become primary.
+
+*** Backup failure
+
+If a backup fails it is re-started. It connects and catches up from scratch
+to become a ready backup.
+
+** Interaction with the store.
+
+Needs more detail:
+
+We want backup  brokers to be able to user their stored messages on restart
+so they don't have to download everything from priamary.
+This requires a HA sequence number to be stored with the message
+so the backup can identify which messages are in common with the primary.
+
+This will work very similarly to the way live backups can use in-memory
+messages to reduce the download.
+
+Need to determine which broker is chosen as initial primary based on currency of
+stores. Probably using stored generation numbers and status flags. Ideally
+automated with rgmanager, or some intervention might be reqiured.
+
+* Current Limitations
+
+(In no particular order at present)
+
+For message replication (missing items have been fixed)
+
+LM3 - Transactional changes to queue state are not replicated atomically.
+
+LM4 - (No worse than store) Acknowledgements are confirmed to clients before the message
+has been dequeued from replicas or indeed from the local store if that is asynchronous.
+
+LM6 - persistence: In the event of a total cluster failure there are
+no tools to automatically identify the "latest" store.
+
+LM7 - persistence: In the event of a persistent broker being
+re-started (due to failure or admin) it should be able to use its
+stored messages to reduce the download required from the
+primary. This means storing message IDs persistently.
+
+For configuration propagation:
+
+LC2 - Queue and exchange propagation is entirely asynchronous. There
+are three cases to consider here for queue creation:
+
+(a) where queues are created through the addressing syntax supported
+the messaging API, they should be recreated if needed on failover and
+message replication if required is dealt with seperately;
+
+(b) where queues are created using configuration tools by an
+administrator or by a script they can query the backups to verify the
+config has propagated and commands can be re-run if there is a failure
+before that;
+
+(c) where applications have more complex programs on which
+queues/exchanges are created using QMF or directly via 0-10 APIs, the
+completion of the command will not guarantee that the command has been
+carried out on other nodes.
+
+I.e. case (a) doesn't require anything (apart from LM5 in some cases),
+case (b) can be addressed in a simple manner through tooling but case
+(c) would require changes to the broker to allow client to simply
+determine when the command has fully propagated.
+
+LC4 - It is possible on failover that the new primary did not
+previously receive a given QMF event while a backup did (sort of an
+analogous situation to LM1 but without an easy way to detect or remedy
+it).
+
+LC6 - The events and query responses are not fully synchronized.
+
+      In particular it *is* possible to not receive a delete event but
+      for the deleted object to still show up in the query response
+      (meaning the deletion is 'lost' to the update).
+
+      It is also possible for an create event to be received as well
+      as the created object being in the query response. Likewise it
+      is possible to receive a delete event and a query response in
+      which the object no longer appears. In these cases the event is
+      essentially redundant.
+
+      It is not possible to miss a create event and yet not to have
+      the object in question in the query response however.
+
+LC7 Federated links from the primary will be lost in failover, they will not be re-connected on
+the new primary. Federation links to the primary can fail over.
+
+LC9 The "last man standing" feature of the old cluster is not available.
+
+* Benefits compared to previous cluster implementation.
+
+- Allows per queue/exchange control over what is replicated.
+- Does not depend on openais/corosync, does not require multicast.
+- Can be integrated with different resource managers: for example rgmanager, PaceMaker, Veritas.
+- Can be ported to/implemented in other environments: e.g. Java, Windows
+- Disaster Recovery is just another backup, no need for separate queue replication mechanism.
+- Can take advantage of resource manager features, e.g. virtual IP addresses.
+- Fewer inconsistent errors (store failures) that can be handled without killing brokers.
+- Improved performance
diff --git a/qpid/cpp/docs/design/old-cluster-issues.txt b/qpid/cpp/docs/design/old-cluster-issues.txt
new file mode 100644
index 0000000000..c552a67c9a
--- /dev/null
+++ b/qpid/cpp/docs/design/old-cluster-issues.txt
@@ -0,0 +1,81 @@
+-*-org-*-
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+* Issues with the old design.
+
+The cluster is based on virtual synchrony: each broker multicasts
+events and the events from all brokers are serialized and delivered in
+the same order to each broker.
+
+In the current design raw byte buffers from client connections are
+multicast, serialized and delivered in the same order to each broker.
+
+Each broker has a replica of all queues, exchanges, bindings and also
+all connections & sessions from every broker. Cluster code treats the
+broker as a "black box", it "plays" the client data into the
+connection objects and assumes that by giving the same input, each
+broker will reach the same state.
+
+A new broker joining the cluster receives a snapshot of the current
+cluster state, and then follows the multicast conversation.
+
+** Maintenance issues.
+
+The entire state of each broker is replicated to every member:
+connections, sessions, queues, messages, exchanges, management objects
+etc. Any discrepancy in the state that affects how messages are
+allocated to consumers can cause an inconsistency.
+
+- Entire broker state must be faithfully updated to new members.
+- Management model also has to be replicated.
+- All queues are replicated, can't have unreplicated queues (e.g. for management)
+
+Events that are not deterministically predictable from the client
+input data stream can cause inconsistencies. In particular use of
+timers/timestamps require cluster workarounds to synchronize.
+
+A member that encounters an error which is not encounted by all other
+members is considered inconsistent and will shut itself down. Such
+errors can come from any area of the broker code, e.g. different
+ACL files can cause inconsistent errors.
+
+The following areas required workarounds to work in a cluster:
+
+- Timers/timestamps in broker code: management, heartbeats, TTL
+- Security: cluster must replicate *after* decryption by security layer.
+- Management: not initially included in the replicated model, source of many inconsistencies.
+
+It is very easy for someone adding a feature or fixing a bug in the
+standalone broker to break the cluster by:
+- adding new state that needs to be replicated in cluster updates.
+- doing something in a timer or other non-connection thread.
+
+It's very hard to test for such breaks. We need a looser coupling
+and a more explicitly defined interface between cluster and standalone
+broker code.
+
+** Performance issues.
+
+Virtual synchrony delivers all data from all clients in a single
+stream to each broker.  The cluster must play this data thru the full
+broker code stack: connections, sessions etc. in a single thread
+context in order to get identical behavior on each broker. The cluster
+has a pipelined design to get some concurrency but this is a severe
+limitation on scalability in multi-core hosts compared to the
+standalone broker which processes each connection in a separate thread
+context.
diff --git a/qpid/cpp/docs/design/overview.txt b/qpid/cpp/docs/design/overview.txt
new file mode 100644
index 0000000000..f44aa8a5af
--- /dev/null
+++ b/qpid/cpp/docs/design/overview.txt
@@ -0,0 +1,97 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+Qpid C++ AMQP implementation
+=============================
+
+= Project layout =
+
+For Build system design see comment at start of Makefile.
+
+Project contains:
+ * Client library (lib/libqpid_client): src/qpid/client
+ * Broker library (lib/libqpid_broker): src/qpid/broker
+ * Common classes
+  * src/qpid/framing: wire encoding/decoding
+  * src/qpid/sys: io, threading etc
+  * src/qpid/Exception.cpp, QpidError.cpp: Exception classes.
+ * Qpid Daemon (bin/qpidd): src/qpidd.cpp
+
+Unit tests in test/unit: each *Test.cpp builds a CppUnit plugin.
+
+Client tests in test/client: each *.cpp builds a test executable.
+
+Test utilities: test/include
+
+= Client Design =
+
+The client module is primarily concerned with presenting the
+functionality offered by AMQP to users through a simple API that
+nevertheless allows all the protocol functionality to be exploited.
+[Note: it is currently nothing like complete in this regard!]
+
+The code in the client module is concerned with the logic of the AMQP
+protocol and interacts with the lower level transport issues through
+the InputHandler and OutputHandler abstractions defined in
+common/framing.  It uses these in conjunction with the Connector
+interface, defined in common/io, for establishing a connection to the
+broker and interacting with it through the sending and receiving of
+messages represented by AMQFrame (defined in common/framing).
+
+The Connector implementation is responsible for connection set up,
+threading strategy and getting data on and off the wire.  It delegates
+to the framing module for encode/decode operations.  The interface
+between the io and the framing modules is primarily through the Buffer
+and AMQFrame classes. 
+
+A Buffer allows 'raw' data to be read or written in terms of the AMQP
+defined 'types' (octet, short, long, long long, short string, long
+string, field table etc.).  AMQP is defined in terms frames with
+specific bodies and the frame (as well as these different bodies) are
+defined in terms of these 'types'.  The AMQFrame class allows a frame
+to be decoded by reading from the supplied buffer, or it allows a
+particular frame to be constructed and then encoded by writing to the
+supplied buffer.  The io layer can then access the raw data that
+'backs' the buffer to either out it on the wire or to populate it from
+the wire.
+
+One minor exception to this is the protocol initiation.  AMQP defines
+a protocol 'header', that is not a frame, and is sent by a client to
+intiate a connection.  The Connector allows (indeed requires) such a
+frame to be passed in to initialise the connection (the Acceptor, when
+defined, will allow an InitiationHandler to be set allowing the broker
+to hook into the connection initiation).  In order to remove
+duplication, the ProtocolInitiation class and the AMQFrame class both
+implement a AMQDataBlock class that defines the encode and decode
+methods.  This allows both types to be treated generically for the
+purposes of encoding.  In decoding, the context determines which type
+is expected and should be used for decoding (this is only relevant to
+the broker).
+
+
+
+              
+                  --------api--------                                     
+                     Client Impl     ...............uses.....
+input handler --> --------- --------- <-- output handler    .
+                     A         |                            .
+                     |         |                      framing utils   
+                     |         V                            .
+                  ------------------- <-- connector         .
+                       IO Layer      ................uses....
diff --git a/qpid/cpp/docs/design/windows_clfs_store_design.txt b/qpid/cpp/docs/design/windows_clfs_store_design.txt
new file mode 100644
index 0000000000..944d957083
--- /dev/null
+++ b/qpid/cpp/docs/design/windows_clfs_store_design.txt
@@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+Design for Hybrid SQL/CLFS-Based Store in Qpid
+==============================================
+
+CLFS (Common Log File System) is a new facility in recent Windows versions.
+CLFS is an ARIES-compliant log intended to support high performance and
+transactional applications. CLFS is available in Windows Server 2003R2 and
+higher, as well as Windows Vista and Windows 7.
+
+There is currently an all-SQL store in Qpid. The new hybrid SQL-CLFS store
+moves the message, messages-mapping to queues, and transaction aspects
+of the SQL store into CLFS logs. Records of queues, exchanges, bindings,
+and configurations will remain in SQL. The main goal of this change is
+to yield higher performance on the time-critical messaging operations.
+CLFS and, therefore, the new hybrid store, is not available on Windows XP
+and Windows Server prior to 2003R2; these platforms will need to run the
+all-SQL store.
+
+Note for future consideration: it is possible to maintain all durable
+objects in CLFS, which would remove the need for SQL completely. It would
+require added log handling as well as the logic to ensure referential
+integrity between exchanges and queues via bindings as SQL does today.
+Also, the CLFS store counts on the SQL-stored queue records being correct
+when recovering messages; if a message operation in the log refers to a queue
+ID that's unknown, the CLFS store assumes the queue was deleted in the
+previous broker session and the log wasn't updated. That sort of assumption
+would need to be revisited if all content moves to a log.
+
+CLFS Capabilities
+-----------------
+
+This section explains some of the key CLFS concepts that are important
+in order to understand the designed use of CLFS for the store. It is
+not a complete explanation and is not feature-complete. Please see the
+CLFS documentation at MSDN for complete details
+(http://msdn.microsoft.com/en-us/library/bb986747%28v=VS.85%29.aspx).
+
+CLFS provides logs; each log can be dedicated or multiplexed. A multiplexed
+log has multiple streams of independent log records; a dedicated log has
+only one stream. Each log uses containers to hold the actual data; a log
+requires a minimum of two containers, each of which must be at least 512KB.
+Thus, the smallest log possible is 1MB. They can, of course, be larger, but
+with 1 MB as minimum size for a log, they shouldn't be used willy-nilly.
+The maximum number of streams per log is approximately 100.
+
+As records are written to the log CLFS assigns Log Sequence Numbers (LSNs).
+The first valid LSN in a log stream is called the Base, or Tail. CLFS
+can automatically reclaim and reuse container space for the log as the
+base LSN is moved when records are no longer needed. When a log is multiplexed,
+a stream which doesn't move its tail can prevent CLFS from reclaiming space
+and cause the log to grow indefinitely. Thus, mixing streams which don't
+update (and, thus, move their tails) with streams that are very dynamic in
+a single log will probably cause the log to continue to expand even though
+much of the space will be unused.
+
+CLFS provides three LSN types that are used to chain records together:
+
+- Next: This is a forward sequence maintained by CLFS itself by the order
+  records are put into the stream.
+- Undo-next, Undo-prev: These are backward-looking chains that are used
+  to link a new record to some previous record(s) in the same stream.
+
+Also note that although log files are simply located in the file system,
+easily locatable, streams within a log are not easily known or listable
+outside of some application-specific recording of the stream names somewhere.
+
+Log Usage
+---------
+
+There are two logs in use.
+
+- Message: Each message will be represented by a chain of log records. All
+  messages will be intermixed in the same dedicated stream. Each portion of
+  a message content (sometimes they are written in multiple chunks) as well
+  as each operation involving a message (enqueue, dequeue, etc.) will be
+  in a log record chained to the others related to the same message.
+
+- Transaction: Each transaction, local and distributed, will be represented
+  by a chain of log records. The record content will denote the transaction
+  as local or distributed.
+
+Both transaction and message logs use the LSN of the first record for a
+given object (message or transaction) as the persistence ID for that object.
+The LSN is a CLFS-maintained, always-increasing value that is 64 bits long,
+the same as a persistence ID.
+
+Log records that relate to a transaction or message previously logged use the
+log record undo-prev LSN to indicate which transaction/message the record
+relates to.
+
+Message Log Records
+-------------------
+
+Message log records will be one of the following types:
+
+- Message-Start: the first (and possibly only) section of message content
+- Message-Chunk: second and succeeding message content chunks
+- Message-Delete: marks the end of the message's lifetime
+- Message-Enqueue: records the message's placement on a queue
+- Message-Dequeue: records the message's removal from a queue
+
+The LSN of the Message-Start record is the persistence ID for the message.
+The log record undo-prev LSN is used to link each subsequent record for that
+message to the Message-Start record.
+
+A message's sequence of log records is extended for each operation on that
+message, until the message is deleted whereupon a Message-Delete record is
+written. When the Message-Delete is written, the log's base LSN can be moved
+up to the next earliest message if the deleted one opens up a set of
+records at the tail of the log that are no longer needed. To help maintain
+the order and know when the base can be moved, the store keeps message
+information in a STL map whose key is the message ID (Message-Start LSN).
+Thus, the first entry in the map is the earliest ID/LSN in use.
+During recovery, messages still residing in the log can be ignored when the
+record sequence for the message ends with Message-Delete. Similarly, there
+may be log records for messages that are deleted; in this case the previous
+LSN won't be one that's still within the log and, therefore, there won't have
+been a Message Start record recovered and the record can be ignored.
+
+Transaction Log Records
+-----------------------
+
+Transaction log records will be one of the following types:
+
+- Dtx-Start: Start of a distributed transaction
+- Tx-Start: Start of a local transaction
+- End: End of the transaction
+- Rollback: Marks that the transaction is rolled back
+- Prepare: Marks the dtx as prepared
+- Commit: Marks the transaction as committed
+- Delete: Notes that the transaction is no longer valid
+
+Transactions are also identified by the LSN of the start (Dtx-Start or
+Tx-Start) record. Successive records associated with the same transaction
+are linked backwards using the undo-prev LSN.
+
+The association between messages and transactions is maintained in the
+message log; if the message enqueue/dequeue operation is part of a transaction,
+the operation includes a transaction ID. The transaction log maintains the
+state of the transaction itself. Thus, each operation (enqueue, dequeue,
+prepare, rollback, commit) is a single log record.
+
+A few notes:
+- The transactions need to be recovered and sorted out prior to recovering
+  the messages. The message recovery needs to know if a enqueue/dequeue
+  associated with a transaction can be discarded or should be acted on.
+
+- Transaction IDs need to remain valid as long as any messages exist that
+  refer to them. This prevents the problem of trying to recover a message
+  with a transaction ID that doesn't exist - was it finalized? was it aborted?
+  Reference to a missing transaction ID can be ignored with assurance that
+  the message was deleted further along or the transaction would still be there.
+
+- Transaction IDs needing to be valid requires that a refcount be kept on each
+  transaction at run time. As messages are deleted, the transaction set can
+  be notified that the message is gone. To enforce this, Message objects have
+  a boost::shared_ptr to each Transaction they're associated with. When the
+  Message is destroyed, refs to Transactions go down too. When Transaction is
+  destroyed, it's done so write its delete to the log.
+
+In-Memory Objects
+-----------------
+
+The store holds the message and transaction relationships in memory. CLFS is
+a backing store for that information so it can be reliably reconstructed in
+the event of a failure. This is a change from the SQL-only store where all
+of the information is maintained in SQL and none is kept in memory. The
+CLFS-using store is designed for high-throughput operation where it is assumed
+that messages will transit the broker (and, therefore, the store) quickly.
+
+- Message list: this is a map of persistence ID (message LSN) to a list of
+  queues where the message is located and an indication that there is
+  (or isn't) a transaction involved and in which direction (enqueue/dequeue)
+  so a dequeued message doesn't get deleted while a transacted enqueue is
+  pending.
+
+- Transaction list: also probably a map of id/LSN to a transaction object.
+  The transaction object needs to keep a list of messages/queues that are
+  impacted as well as the transaction state and Xid (for dtx).
+
+- Right now log records are written as need with no preallocation or
+  reservation. It may be better to pre-reserve records in some cases, such
+  as a transaction prepare where the space for commit or rollback may be
+  reserved at the same time. This may be the only case where losing a
+  record may be an issue - needs some more thought.
+
+Recovery
+--------
+
+During recovery, need to verify recovered messages' queues exist; if there's a
+failure after a queue's deletion is final but before the messages are recorded
+as dequeued (and possibly deleted) the remainder of those dequeues (and
+possibly deleting the message) needs to be handled during recovery by not
+restoring them for the broker, and also logging their deletion. Could also
+skip the logging of deletion and let the normal tail-maintenance eventually
+move up over the old message entries. Since the invalid messages won't be
+kept in the message map, their IDs won't be taken into account when maintaining
+the tail - the tail will move up over them as soon as enough messages come
+and go.
+
+Plugin Options
+--------------
+
+The command-line options added by the CLFS plugin are;
+
+  --connect             The SQL connect string for the SQL parts; same as the
+                        SQL plugin.
+  --catalog             The SQL database (catalog) name; same as the SQL plugin.
+  --store-dir           The directory to store the logs in. Defaults to the
+                        broker --data-dir value. If --no-data-dir specified,
+                        --store-dir must be.
+  --container-size      The size of each container in the log, in bytes. The
+                        minimum size is 512K (smaller sizes will be rounded up).
+                        Additionally, the size will be rounded up to a multiple
+                        of the sector size on the disk holding the log. Once
+                        the log is created, each newly added container will
+                        be the same size as the initial container(s). Default
+                        is 1MB.
+  --initial-containers  The number of containers to populate a new log with
+                        if a new log is created. Ignored if the log exists.
+                        Default is 2.
+  --max-write-buffers   The maximum number of write buffers that the plugin can
+                        use before CLFS automatically flushes the log to disk.
+                        Lower values flush more often; higher values have
+                        higher performance. Default is 10.
+
+  Maybe need an option to hold messages of a certain size in memory? I think
+  maybe the broker proper holds the message content, so the store need not.
+
+Testing
+-------
+
+More tests will need to be written to stress the log container extension
+capability and ensure that moving the base LSN works properly and the store
+doesn't continually grow the log without bounds.
+
+Note that running "qpid-perftest --durable yes" stresses the log extension
+and tail maintenance. It doesn't get run as a normal regression test but should
+be run when playing with the container/tail maintenance logic to ensure it's
+not broken.
author	Justin Ross <jross@apache.org>	2016-04-20 00:02:02 +0000
committer	Justin Ross <jross@apache.org>	2016-04-20 00:02:02 +0000
commit	a835fb2724824dcd8a470fb51424cedeb6b38f62 (patch)
tree	48e5d8591c0029ac500330bf87b78bf9a99ed238 /qpid/cpp/docs/design
parent	da7718ef463775acc7d6fbecf2d64c1bbfc39fd8 (diff)
download	qpid-python-a835fb2724824dcd8a470fb51424cedeb6b38f62.tar.gz