From aa779adc487929fb6732437b41904681d7479eba Mon Sep 17 00:00:00 2001 From: "Stephen D. Huston" Date: Fri, 5 Nov 2010 22:02:57 +0000 Subject: Add design doc for new Windows hybrid SQL-CLFS store. git-svn-id: https://svn.apache.org/repos/asf/qpid/trunk/qpid@1031841 13f79535-47bb-0310-9956-ffa450edef68 --- cpp/design_docs/windows_clfs_store_design.txt | 239 ++++++++++++++++++++++++++ 1 file changed, 239 insertions(+) create mode 100644 cpp/design_docs/windows_clfs_store_design.txt (limited to 'cpp') diff --git a/cpp/design_docs/windows_clfs_store_design.txt b/cpp/design_docs/windows_clfs_store_design.txt new file mode 100644 index 0000000000..76ae419b40 --- /dev/null +++ b/cpp/design_docs/windows_clfs_store_design.txt @@ -0,0 +1,239 @@ +Design for Hybrid SQL/CLFS-Based Store in Qpid +============================================== + +CLFS (Common Log File System) is a new facility in recent Windows versions. +CLFS is an ARIES-compliant log intended to support high performance and +transactional applications. CLFS is available in Windows Server 2003R2 and +higher, as well as Windows Vista and Windows 7. + +There is currently an all-SQL store in Qpid. The new hybrid SQL-CLFS store +moves the message, messages-mapping to queues, and transaction aspects +of the SQL store into CLFS logs. Records of queues, exchanges, bindings, +and configurations will remain in SQL. The main goal of this change is +to yield higher performance on the time-critical messaging operations. +CLFS and, therefore, the new hybrid store, is not available on Windows XP +and Windows Server prior to 2003R2; these platforms will need to run the +all-SQL store. + +Note for future consideration: it is possible to maintain all durable +objects in CLFS, which would remove the need for SQL completely. It would +require added log handling as well as the logic to ensure referential +integrity between exchanges and queues via bindings as SQL does today. +Also, the CLFS store counts on the SQL-stored queue records being correct +when recovering messages; if a message operation in the log refers to a queue +ID that's unknown, the CLFS store assumes the queue was deleted in the +previous broker session and the log wasn't updated. That sort of assumption +would need to be revisited if all content moves to a log. + +CLFS Capabilities +----------------- + +This section explains some of the key CLFS concepts that are important +in order to understand the designed use of CLFS for the store. It is +not a complete explanation and is not feature-complete. Please see the +CLFS documentation at MSDN for complete details +(http://msdn.microsoft.com/en-us/library/bb986747%28v=VS.85%29.aspx). + +CLFS provides logs; each log can be dedicated or multiplexed. A multiplexed +log has multiple streams of independent log records; a dedicated log has +only one stream. Each log uses containers to hold the actual data; a log +requires a minimum of two containers, each of which must be at least 512KB. +Thus, the smallest log possible is 1MB. They can, of course, be larger, but +with 1 MB as minimum size for a log, they shouldn't be used willy-nilly. +The maximum number of streams per log is approximately 100. + +As records are written to the log CLFS assigns Log Sequence Numbers (LSNs). +The first valid LSN in a log stream is called the Base, or Tail. CLFS +can automatically reclaim and reuse container space for the log as the +base LSN is moved when records are no longer needed. When a log is multiplexed, +a stream which doesn't move its tail can prevent CLFS from reclaiming space +and cause the log to grow indefinitely. Thus, mixing streams which don't +update (and, thus, move their tails) with streams that are very dynamic in +a single log will probably cause the log to continue to expand even though +much of the space will be unused. + +CLFS provides three LSN types that are used to chain records together: + +- Next: This is a forward sequence maintained by CLFS itself by the order + records are put into the stream. +- Undo-next, Undo-prev: These are backward-looking chains that are used + to link a new record to some previous record(s) in the same stream. + +Also note that although log files are simply located in the file system, +easily locatable, streams within a log are not easily known or listable +outside of some application-specific recording of the stream names somewhere. + +Log Usage +--------- + +There are two logs in use. + +- Message: Each message will be represented by a chain of log records. All + messages will be intermixed in the same dedicated stream. Each portion of + a message content (sometimes they are written in multiple chunks) as well + as each operation involving a message (enqueue, dequeue, etc.) will be + in a log record chained to the others related to the same message. + +- Transaction: Each transaction, local and distributed, will be represented + by a chain of log records. The record content will denote the transaction + as local or distributed. + +Both transaction and message logs use the LSN of the first record for a +given object (message or transaction) as the persistence ID for that object. +The LSN is a CLFS-maintained, always-increasing value that is 64 bits long, +the same as a persistence ID. + +Log records that relate to a transaction or message previously logged use the +log record undo-prev LSN to indicate which transaction/message the record +relates to. + +Message Log Records +------------------- + +Message log records will be one of the following types: + +- Message-Start: the first (and possibly only) section of message content +- Message-Chunk: second and succeeding message content chunks +- Message-Delete: marks the end of the message's lifetime +- Message-Enqueue: records the message's placement on a queue +- Message-Dequeue: records the message's removal from a queue + +The LSN of the Message-Start record is the persistence ID for the message. +The log record undo-prev LSN is used to link each subsequent record for that +message to the Message-Start record. + +A message's sequence of log records is extended for each operation on that +message, until the message is deleted whereupon a Message-Delete record is +written. When the Message-Delete is written, the log's base LSN can be moved +up to the next earliest message if the deleted one opens up a set of +records at the tail of the log that are no longer needed. To help maintain +the order and know when the base can be moved, the store keeps message +information in a STL map whose key is the message ID (Message-Start LSN). +Thus, the first entry in the map is the earliest ID/LSN in use. +During recovery, messages still residing in the log can be ignored when the +record sequence for the message ends with Message-Delete. Similarly, there +may be log records for messages that are deleted; in this case the previous +LSN won't be one that's still within the log and, therefore, there won't have +been a Message Start record recovered and the record can be ignored. + +Transaction Log Records +----------------------- + +Transaction log records will be one of the following types: + +- Dtx-Start: Start of a distributed transaction +- Tx-Start: Start of a local transaction +- End: End of the transaction +- Rollback: Marks that the transaction is rolled back +- Prepare: Marks the dtx as prepared +- Commit: Marks the transaction as committed +- Delete: Notes that the transaction is no longer valid + +Transactions are also identified by the LSN of the start (Dtx-Start or +Tx-Start) record. Successive records associated with the same transaction +are linked backwards using the undo-prev LSN. + +The association between messages and transactions is maintained in the +message log; if the message enqueue/dequeue operation is part of a transaction, +the operation includes a transaction ID. The transaction log maintains the +state of the transaction itself. Thus, each operation (enqueue, dequeue, +prepare, rollback, commit) is a single log record. + +A few notes: +- The transactions need to be recovered and sorted out prior to recovering + the messages. The message recovery needs to know if a enqueue/dequeue + associated with a transaction can be discarded or should be acted on. + +- Transaction IDs need to remain valid as long as any messages exist that + refer to them. This prevents the problem of trying to recover a message + with a transaction ID that doesn't exist - was it finalized? was it aborted? + Reference to a missing transaction ID can be ignored with assurance that + the message was deleted further along or the transaction would still be there. + +- Transaction IDs needing to be valid requires that a refcount be kept on each + transaction at run time. As messages are deleted, the transaction set can + be notified that the message is gone. To enforce this, Message objects have + a boost::shared_ptr to each Transaction they're associated with. When the + Message is destroyed, refs to Transactions go down too. When Transaction is + destroyed, it's done so write its delete to the log. + +In-Memory Objects +----------------- + +The store holds the message and transaction relationships in memory. CLFS is +a backing store for that information so it can be reliably reconstructed in +the event of a failure. This is a change from the SQL-only store where all +of the information is maintained in SQL and none is kept in memory. The +CLFS-using store is designed for high-throughput operation where it is assumed +that messages will transit the broker (and, therefore, the store) quickly. + +- Message list: this is a map of persistence ID (message LSN) to a list of + queues where the message is located and an indication that there is + (or isn't) a transaction involved and in which direction (enqueue/dequeue) + so a dequeued message doesn't get deleted while a transacted enqueue is + pending. + +- Transaction list: also probably a map of id/LSN to a transaction object. + The transaction object needs to keep a list of messages/queues that are + impacted as well as the transaction state and Xid (for dtx). + +- Right now log records are written as need with no preallocation or + reservation. It may be better to pre-reserve records in some cases, such + as a transaction prepare where the space for commit or rollback may be + reserved at the same time. This may be the only case where losing a + record may be an issue - needs some more thought. + +Recovery +-------- + +During recovery, need to verify recovered messages' queues exist; if there's a +failure after a queue's deletion is final but before the messages are recorded +as dequeued (and possibly deleted) the remainder of those dequeues (and +possibly deleting the message) needs to be handled during recovery by not +restoring them for the broker, and also logging their deletion. Could also +skip the logging of deletion and let the normal tail-maintenance eventually +move up over the old message entries. Since the invalid messages won't be +kept in the message map, their IDs won't be taken into account when maintaining +the tail - the tail will move up over them as soon as enough messages come +and go. + +Plugin Options +-------------- + +The command-line options added by the CLFS plugin are; + + --connect The SQL connect string for the SQL parts; same as the + SQL plugin. + --catalog The SQL database (catalog) name; same as the SQL plugin. + --store-dir The directory to store the logs in. Defaults to the + broker --data-dir value. If --no-data-dir specified, + --store-dir must be. + --container-size The size of each container in the log, in bytes. The + minimum size is 512K (smaller sizes will be rounded up). + Additionally, the size will be rounded up to a multiple + of the sector size on the disk holding the log. Once + the log is created, each newly added container will + be the same size as the initial container(s). Default + is 1MB. + --initial-containers The number of containers to populate a new log with + if a new log is created. Ignored if the log exists. + Default is 2. + --max-write-buffers The maximum number of write buffers that the plugin can + use before CLFS automatically flushes the log to disk. + Lower values flush more often; higher values have + higher performance. Default is 10. + + Maybe need an option to hold messages of a certain size in memory? I think + maybe the broker proper holds the message content, so the store need not. + +Testing +------- + +More tests will need to be written to stress the log container extension +capability and ensure that moving the base LSN works properly and the store +doesn't continually grow the log without bounds. + +Note that running "qpid-perftest --durable yes" stresses the log extension +and tail maintenance. It doesn't get run as a normal regression test but should +be run when playing with the container/tail maintenance logic to ensure it's +not broken. -- cgit v1.2.1