diff options
| author | Matthew Sackman <matthew@lshift.net> | 2010-01-21 14:21:43 +0000 |
|---|---|---|
| committer | Matthew Sackman <matthew@lshift.net> | 2010-01-21 14:21:43 +0000 |
| commit | 91fda7dfb7aec70a24de0419e9717a2758589086 (patch) | |
| tree | 7fe72b84fc2a4943978febae4a30f22c450eb5e4 /src | |
| parent | 0e6a36cab0dd953d61d42616b35128f84d39066f (diff) | |
| download | rabbitmq-server-git-91fda7dfb7aec70a24de0419e9717a2758589086.tar.gz | |
Added documentation for qi
Diffstat (limited to 'src')
| -rw-r--r-- | src/rabbit_queue_index.erl | 70 |
1 files changed, 70 insertions, 0 deletions
diff --git a/src/rabbit_queue_index.erl b/src/rabbit_queue_index.erl index 46a6e008ec..cc868598db 100644 --- a/src/rabbit_queue_index.erl +++ b/src/rabbit_queue_index.erl @@ -40,6 +40,76 @@ -define(CLEAN_FILENAME, "clean.dot"). %%---------------------------------------------------------------------------- + +%% The queue index is responsible for recording the order of messages +%% within a queue on disk. + +%% Because of the fact that the queue can decide at any point to send +%% a queue entry to disk, you can not rely on publishes appearing in +%% order. The only thing you can rely on is a message being published, +%% then delivered, then ack'd. + +%% In order to be able to clean up ack'd messages, we write to segment +%% files. These files have a fixed maximum size: ?SEGMENT_ENTRY_COUNT +%% publishes, delivers and acknowledgements. They are numbered, and so +%% it is known that the 0th segment contains messages 0 -> +%% ?SEGMENT_ENTRY_COUNT, the 1st segment contains messages +%% ?SEGMENT_ENTRY_COUNT +1 -> 2*?SEGMENT_ENTRY_COUNT and so on. As +%% such, in the segment files, we only refer to message sequence ids +%% by the LSBs as SeqId rem ?SEGMENT_ENTRY_COUNT. This gives them a +%% fixed size. + +%% However, transient messages which are not sent to disk at any point +%% will cause gaps to appear in segment files. Therefore, we delete a +%% segment file whenever the number of publishes == number of acks +%% (note that although it is not fully enforced, it is assumed that a +%% message will never be ackd before it is delivered, thus this test +%% also implies == number of delivers). In practise, this does not +%% cause disk churn in the pathological case because of the journal +%% and caching (see below). + +%% Because of the fact that publishes, delivers and acks can occur all +%% over, we wish to avoid lots of seeking. Therefore we have a fixed +%% sized journal to which all actions are appended. When the number of +%% entries in this journal reaches ?MAX_JOURNAL_ENTRY_COUNT, the +%% journal entries are scattered out to their relevant files, and the +%% journal is truncated to zero size. Note that entries in the journal +%% must carry the full sequence id, thus the format of entries in the +%% journal is different to that in the segments. + +%% The journal is also kept fully in memory, pre-segmented: the state +%% contains a dict from segment numbers to state-per-segment. Actions +%% are stored directly in this state. Thus at the point of flushing +%% the journal, firstly no reading from disk is necessary, but +%% secondly if the known number of acks and publishes are equal, given +%% the known state of the segment file, combined with the journal, no +%% writing needs to be done to the segment file either (in fact it is +%% deleted if it exists at all). This is safe given that the set of +%% acks is a subset of the set of publishes. When it's necessary to +%% sync messages because of transactions, it's only necessary to fsync +%% on the journal: when entries are distributed from the journal to +%% segment files, those segments appended to are fsync'd prior to the +%% journal being truncated. + +%% It is very common to need to access two particular segments very +%% frequently: one for publishes, and one for deliveries and acks. As +%% such, and the poor performance of the erlang dict module, we cache +%% the per-segment-state for the two most recently used segments in +%% the state, this provides a substantial performance improvement. + +%% This module is also responsible for scanning the queue index files +%% and seeding the message store on start up. + +%% Note that in general, the representation of a message as the tuple: +%% {('no_pub'|{MsgId, IsPersistent}), ('del'|'no_del'), +%% ('ack'|'no_ack')} is richer than strictly necessary for most +%% operations. However, for startup, and to ensure the safe and +%% correct combination of journal entries with entries read from the +%% segment on disk, this richer representation vastly simplifies and +%% clarifies the code. + +%%---------------------------------------------------------------------------- + %% ---- Journal details ---- -define(MAX_JOURNAL_ENTRY_COUNT, 262144). |
