summaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorMatthew Sackman <matthew@lshift.net>2009-04-22 14:30:23 +0100
committerMatthew Sackman <matthew@lshift.net>2009-04-22 14:30:23 +0100
commit631a97db8902208a83225393ec00b685c0d9ee23 (patch)
treebe300e74b690038bb31912893c5c1cc27c0c6f4b /src
parent30bc61f26b3a2c2e6ecca02ba3905455b014c2e3 (diff)
downloadrabbitmq-server-git-631a97db8902208a83225393ec00b685c0d9ee23.tar.gz
copy + paste job extending essay
Diffstat (limited to 'src')
-rw-r--r--src/rabbit_disk_queue.erl54
1 files changed, 54 insertions, 0 deletions
diff --git a/src/rabbit_disk_queue.erl b/src/rabbit_disk_queue.erl
index 0ae890d967..7e2d8d12bf 100644
--- a/src/rabbit_disk_queue.erl
+++ b/src/rabbit_disk_queue.erl
@@ -150,6 +150,60 @@
%% set to be a disk_only_table in order to ensure that we are not RAM
%% constrained.
+%% So, with this design, messages move to the left. Eventually, they
+%% should end up in a contiguous block on the left and are then never
+%% rewritten. But this isn't quite the case. If in a file there is one
+%% message that is being ignored, for some reason, and messages in the
+%% file to the right and in the current block are being read all the
+%% time then it will repeatedly be the case that the good data from
+%% both files can be combined and will be written out to a new
+%% file. Whenever this happens, our shunned message will be rewritten.
+%%
+%% So, provided that we combine messages in the right order,
+%% (i.e. left file, bottom to top, right file, bottom to top),
+%% eventually our shunned message will end up at the bottom of the
+%% left file. The compaction/combining algorithm is smart enough to
+%% read in good data from the left file that is scattered throughout
+%% (i.e. C and D in the below diagram), then truncate the file to just
+%% above B (i.e. truncate to the limit of the good contiguous region
+%% at the start of the file), then write C and D on top and then write
+%% E, F and G from the right file on top. Thus contiguous blocks of
+%% good data at the bottom of files are not rewritten.
+%%
+%% --------- --------- ---------
+%% | X | | G | | G |
+%% --------- --------- ---------
+%% | D | | X | | F |
+%% --------- --------- ---------
+%% | X | | X | | E |
+%% --------- --------- ---------
+%% | C | | F | ===> | D |
+%% --------- --------- ---------
+%% | X | | X | | C |
+%% --------- --------- ---------
+%% | B | | X | | B |
+%% --------- --------- ---------
+%% | A | | E | | A |
+%% --------- --------- ---------
+%% left right left
+%%
+%% From this reasoning, we do have a bound on the number of times the
+%% message is rewritten. From when it is inserted, there can be no
+%% files inserted between it and the head of the queue, and the worst
+%% case is that everytime it is rewritten, it moves one position lower
+%% in the file (for it to stay at the same position requires that
+%% there are no holes beneath it, which means truncate would be used
+%% and so it would not be rewritten at all). Thus this seems to
+%% suggest the limit is the number of messages ahead of it in the
+%% queue, though it's likely that that's pessimistic, given the
+%% requirements for compaction/combination of files.
+%%
+%% The other property is that we have is the bound on the lowest
+%% utilisation, which should be 50% - worst case is that all files are
+%% fractionally over half full and can't be combined (equivalent is
+%% alternating full files and files with only one tiny message in
+%% them).
+
%% ---- PUBLIC API ----
start_link(FileSizeLimit, ReadFileHandlesLimit) ->