diff options
| -rw-r--r-- | src/rabbit_disk_queue.erl | 54 |
1 files changed, 54 insertions, 0 deletions
diff --git a/src/rabbit_disk_queue.erl b/src/rabbit_disk_queue.erl index 0ae890d967..7e2d8d12bf 100644 --- a/src/rabbit_disk_queue.erl +++ b/src/rabbit_disk_queue.erl @@ -150,6 +150,60 @@ %% set to be a disk_only_table in order to ensure that we are not RAM %% constrained. +%% So, with this design, messages move to the left. Eventually, they +%% should end up in a contiguous block on the left and are then never +%% rewritten. But this isn't quite the case. If in a file there is one +%% message that is being ignored, for some reason, and messages in the +%% file to the right and in the current block are being read all the +%% time then it will repeatedly be the case that the good data from +%% both files can be combined and will be written out to a new +%% file. Whenever this happens, our shunned message will be rewritten. +%% +%% So, provided that we combine messages in the right order, +%% (i.e. left file, bottom to top, right file, bottom to top), +%% eventually our shunned message will end up at the bottom of the +%% left file. The compaction/combining algorithm is smart enough to +%% read in good data from the left file that is scattered throughout +%% (i.e. C and D in the below diagram), then truncate the file to just +%% above B (i.e. truncate to the limit of the good contiguous region +%% at the start of the file), then write C and D on top and then write +%% E, F and G from the right file on top. Thus contiguous blocks of +%% good data at the bottom of files are not rewritten. +%% +%% --------- --------- --------- +%% | X | | G | | G | +%% --------- --------- --------- +%% | D | | X | | F | +%% --------- --------- --------- +%% | X | | X | | E | +%% --------- --------- --------- +%% | C | | F | ===> | D | +%% --------- --------- --------- +%% | X | | X | | C | +%% --------- --------- --------- +%% | B | | X | | B | +%% --------- --------- --------- +%% | A | | E | | A | +%% --------- --------- --------- +%% left right left +%% +%% From this reasoning, we do have a bound on the number of times the +%% message is rewritten. From when it is inserted, there can be no +%% files inserted between it and the head of the queue, and the worst +%% case is that everytime it is rewritten, it moves one position lower +%% in the file (for it to stay at the same position requires that +%% there are no holes beneath it, which means truncate would be used +%% and so it would not be rewritten at all). Thus this seems to +%% suggest the limit is the number of messages ahead of it in the +%% queue, though it's likely that that's pessimistic, given the +%% requirements for compaction/combination of files. +%% +%% The other property is that we have is the bound on the lowest +%% utilisation, which should be 50% - worst case is that all files are +%% fractionally over half full and can't be combined (equivalent is +%% alternating full files and files with only one tiny message in +%% them). + %% ---- PUBLIC API ---- start_link(FileSizeLimit, ReadFileHandlesLimit) -> |
