diff options
| author | Matthew Sackman <matthew@lshift.net> | 2010-01-19 17:45:47 +0000 |
|---|---|---|
| committer | Matthew Sackman <matthew@lshift.net> | 2010-01-19 17:45:47 +0000 |
| commit | 873e3dfccca03eefcc781c388c2cc8ee6f951947 (patch) | |
| tree | 6bf08718e1b47ce1e4af1f096dc9dfb7b975e1d7 /src | |
| parent | 51c879d3069024a8264a61dd4037f914d10cb2fe (diff) | |
| download | rabbitmq-server-git-873e3dfccca03eefcc781c388c2cc8ee6f951947.tar.gz | |
documentation
Diffstat (limited to 'src')
| -rw-r--r-- | src/rabbit_msg_store.erl | 64 |
1 files changed, 47 insertions, 17 deletions
diff --git a/src/rabbit_msg_store.erl b/src/rabbit_msg_store.erl index ef31efadcd..1b35637057 100644 --- a/src/rabbit_msg_store.erl +++ b/src/rabbit_msg_store.erl @@ -113,8 +113,9 @@ %% The components: %% -%% MsgLocation: this is an ets table which contains: +%% MsgLocation: this is a mapping from MsgId to #msg_location{}: %% {MsgId, RefCount, File, Offset, TotalSize} +%% By default, it's in ets, but it's also pluggable. %% FileSummary: this is an ets table which contains: %% {File, ValidTotalSize, ContiguousTop, Left, Right} %% @@ -126,7 +127,7 @@ %% eldest file. %% %% We need to keep track of which messages are in which files (this is -%% the MsgLocation table); how much useful data is in each file and +%% the MsgLocation mapping); how much useful data is in each file and %% which files are on the left and right of each other. This is the %% purpose of the FileSummary table. %% @@ -136,26 +137,31 @@ %% valid data right at the start of each file. These are needed for %% garbage collection. %% -%% When we discover that either a file is now empty or that it can be -%% combined with the useful data in either its left or right file, we -%% compact the two files together. This keeps disk utilisation high -%% and aids performance. +%% When we discover that a file is now empty, we delete it. When we +%% discover that it can be combined with the useful data in either its +%% left or right neighbour, and overall, across all the files, we have +%% ((the amount of garbage) / (the sum of all file sizes)) > +%% ?GARBAGE_FRACTION, we start a garbage collection run concurrently, +%% which will compact the two files together. This keeps disk +%% utilisation high and aids performance. We deliberately do this +%% lazily in order to prevent doing GC on files which are soon to be +%% emptied (and hence deleted) soon. %% -%% Given the compaction between two files, the left file is considered -%% the ultimate destination for the good data in the right file. If -%% necessary, the good data in the left file which is fragmented -%% throughout the file is written out to a temporary file, then read -%% back in to form a contiguous chunk of good data at the start of the -%% left file. Thus the left file is garbage collected and -%% compacted. Then the good data from the right file is copied onto -%% the end of the left file. MsgLocation and FileSummary tables are -%% updated. +%% Given the compaction between two files, the left file (i.e. elder +%% file) is considered the ultimate destination for the good data in +%% the right file. If necessary, the good data in the left file which +%% is fragmented throughout the file is written out to a temporary +%% file, then read back in to form a contiguous chunk of good data at +%% the start of the left file. Thus the left file is garbage collected +%% and compacted. Then the good data from the right file is copied +%% onto the end of the left file. MsgLocation and FileSummary tables +%% are updated. %% %% On startup, we scan the files we discover, dealing with the %% possibilites of a crash have occured during a compaction (this %% consists of tidyup - the compaction is deliberately designed such %% that data is duplicated on disk rather than risking it being lost), -%% and rebuild the ets tables (MsgLocation, FileSummary). +%% and rebuild the FileSummary ets table and MsgLocation mapping. %% %% So, with this design, messages move to the left. Eventually, they %% should end up in a contiguous block on the left and are then never @@ -215,7 +221,8 @@ %% %% Messages are reference-counted. When a message with the same id is %% written several times we only store it once, and only remove it -%% from the store when it has been removed the same number of times. +%% from the store when it has been removed the same number of +%% times. %% %% The reference counts do not persist. Therefore the initialisation %% function must be provided with a generator that produces ref count @@ -228,6 +235,29 @@ %% are read from several processes they are read back as the same %% binary object rather than multiples of identical binary %% objects. +%% +%% Reads can be performed directly by clients without calling to the +%% server. This is safe because multiple file handles can be used to +%% read files. However, locking is used by the concurrent GC to make +%% sure that reads are not attempted from files which are in the +%% process of being garbage collected. +%% +%% The server automatically defers reads, removes and contains calls +%% that occur which refer to files which are currently being +%% GC'd. Contains calls are only deferred in order to ensure they do +%% not overtake removes. +%% +%% The current file to which messages are being written has a +%% write-back cache. This is written to immediately by the client and +%% can be read from by the client too. This means that there are only +%% ever writes made to the current file, thus eliminating delays due +%% to flushing write buffers in order to be able to safely read from +%% the current file. The one exception to this is that on start up, +%% the cache is not populated with msgs found in the current file, and +%% thus in this case only, reads may have to come from the file +%% itself. The effect of this is that even if the msg_store process is +%% heavily overloaded, clients can still write and read messages with +%% very low latency and not block at all. %%---------------------------------------------------------------------------- %% public API |
