| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |\
| |
| |
| |
| |
| | |
queues, give 2 queues the same length such that they can't both fit in memory, and slowly trickle in messages. As each one gets a message, it'll force the other one out to disk (the other one will either be in the hibernating or lowrate groups). This is bad. Therefore, adjusted the conditions under which we bring a queue back in from disk to exclude queues that are either hibernating or low rate (don't forget, even a list_queues will wake up a queue and cause it to report memory). If you have two fast queues then neither of them will be in the groups of low rate or hibernating queues, so neither will be candidates for eviction so the problem doesn't exist there, instead, if they need more memory and can't fit in ram then they'll evict themselves to disk rather than anyone else.
Also realised that a million queues isn't unreasonable, so minimum number of tokens in the system should be more like 1e7 if not higher.
|
| | | |
|
| | | |
|
| | | |
|
| | |
| |
| |
| | |
that I failed to spot last night, but apparently came to me during my dreams. I have no idea how the tests managed to pass last night...
|
| | |
| |
| |
| | |
is on disk or not. It does not use any sequence numbers, nor does it try to correllate queue position with sequence numbers in the disk_queue. Therefore, there is absolutely no reason for the disk_queue to have all the necessary complexity associated with being able to cope with non-contiguous sequence ids. Thus all removed. This has made the disk_queue a good bit simpler and slightly faster in a few cases too. All tests pass.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
mixed_queue:to_disk_only_mode. This function puts the next N messages at the front of the queue to the back and is MUCH more efficient than calling phantom_deliver and then requeue_with_seqs. This means that a queue which has been sent to disk, then converted back to mixed mode, some minor been done, and then sent back to disk takes almost no time in transitions beyond the first transition. The test of this:
1) declare durable queue
2) send 100,000 persistent messages to it
3) send 100,000 non-persistent messages to it
4) send 100,000 persistent messages to it
5) now pin it to disk - it'll make two calls to requeue_next_n and should be rather quick as it's only the middle 100,000 messages that actually have to be written, the other 200,000 don't even get sent between the disk_queue and mixed_queue in either direction. A total of 100,003 calls are necessary for this transition: 2 requeue_next_n, 100,000 tx_publish, 1 tx_commit
6) now unpin it from disk and list the queues to wake it up. The transition to mixed_mode is one call, zero reads, and instantaneous
7) now repin it to disk. The mixed queue knows everything is still on disk, so it makes one call to requeue_next_n with N = 300,000. The disk_queue sees this is the whole queue and so doesn't need to do any work at all and so is instant.
All tests pass.
|
| | |
| |
| |
| |
| |
| | |
a process comes out of hibernation, then does under 10 second's work before hibernating again then it'll only issue a memory report when it goes back to hibernation, thus it'll always claim to the queue_mode_manager that it's hibernating. Thus now, when hibernating, or when receiving the report_memory message, we set state so that when the next normal message comes in, we always send a memory report after that message. This ensures that when a process wakes up and does some real work, the queue_mode_manager will be informed.
Applied this, and the ability to hibernate to the disk_queue too. Plus some minor refactoring and better state field names. All tests pass, and the disk_queue really does hibernate with the binary backoff as I wanted.
|
| | |
| |
| |
| | |
queue_mode_manager rather than directly talking to the queues. This means the queues and the queue manager can't disagree on the mode a queue should be in.
|
| | |
| |
| |
| |
| |
| | |
mixed_queue so that it does batching. This means that it won't just flood the disk_queue with a billion messages, thus exhausting memory. Instead it does batching and uses tx_commit to demarkate the batches. This means the conversion happens as quickly as possible and does not exhaust memory. Dropped the memory alarms to 0.8. This is a good idea because converting queues between modes transiently takes a fair chunk of memory, and leaving the alarms up at 0.95 was proving too high making the mode transitions exhaust ram and swap to buggery.
However, there is a problem when going to disk_mode in mixed queue where messages in the queue are already on disk. A million calls to phantom deliver is not a good idea, and locks a CPU core at 100% for a very long time.
|
| |\ \
| |/ |
|
| | |
| |
| |
| |
| |
| | |
gen_server2 to R13B1 - this was a change originally made by matthias to ensure that messages cast to remote nodes are done so in order
b) Add guards and slightly relax name/1 so that it works in R11B5. All tests pass in R11B5 and manual testing of the binary backoff hibernation shows that too works.
|
| |\ \
| |/ |
|
| | | |
|
| |\ \
| |/ |
|
| | |
| |
| |
| | |
Adjusted amqqueue_process to use it. Added documentation. Tested thoroughly with explicit test module (not added), and full test suite, which all passed. Existing tests further up in this bug similarly pass and demonstrate code is functioning correctly.
|
| | |
| |
| |
| | |
slip badly behind the shipped version.
|
| | | |
|
| | |
| |
| |
| | |
Obviously, converting a mixed queue to disk does take some time and the values are deliberately set low to save memory because on this transition, the disk_queue mailbox will go insane and eat lots of memory very quickly. But it seems about the right balance. I'll add documentation next
|
| | |
| |
| |
| |
| |
| | |
is no need for the emergency tokens, nor any need for the weird doubling. So it's basically got much simpler.
We hold two queues, one of hibernating queues (ordered by when they hibernated) and another priority_queue of lowrate queues (ordered by the amount of memory allocated to them). We evict to disk from the hibernated and then the lowrate queues in their relevant orders. Seems to work. Oh and disk_queue is now managed by the tokens too.
|
| | |
| |
| |
| |
| |
| | |
a) when we're not hibernating, every 10 seconds
b) immediately prior to hibernating
c) as soon as we stop hibernating
|
| | | |
|
| | | |
|
| | |
| |
| |
| | |
mixed mode, we may as well do it really lazily and not bother with any communication with the disk_queue. We just have a token in the queue which indicates how many messages we are expecting to get from the disk queue. This makes disk -> mixed almost instantaneous. This also means that performance is not initially brilliant. Maybe we need some way of the queue knowing that both it and the disk_queue are idle and deciding to prefetch. Even batching could work well. It's an endless trade off between getting operations to happen quickly and being able to get good performance. Dunno what the third thing is, probably not necessary, as you can't even have both of those, let alone pick 2 from 3!
|
| | |
| |
| |
| | |
apply_after, not apply interval, and then after reporting memory use, don't set a new timer going (but do set a new timer going on every other message (other than timeouts)). This means that if nothing is going on, after a memory report, the process can wait as long as it needs to before the hibernate timeout fires.
|
| |\ \
| |/
| |
| | |
10seconds which means the memory_report timer will always fire and reset the timeout - thus the queue process will never hibernate.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
reply and noreply functions. This means that the now() value there includes computation relating to the last message in. This is maybe not desirable, but the alternative is to wrap all of handle_cast, handle_call and handle_info. Nevertheless, testing shows this works:
in the erlang client:
Conn = amqp_connection:start("guest", "guest", "localhost"),
Chan = lib_amqp:start_channel(Conn),
[begin Q = list_to_binary(integer_to_list(R)), Q = lib_amqp:declare_queue(Chan, Q) end || R <- lists:seq(1,1000)],
Props = (amqp_util:basic_properties()).
[begin Q = list_to_binary(integer_to_list(R)), ok = lib_amqp:publish(Chan, <<"">>, Q, <<0:(8*1024)>>, Props) end || _ <- lists:seq(1,1500), R <- lists:seq(1,1000)].
Then, after that lot's gone in, in a shell do:
watch -n 2 "time ./scripts/rabbitmqctl list_queues | tail"
The times for me start off at about 2.3 seconds, then drop rapidly to 1.4 and then 0.2 seconds and stay there.
|
| | |\ |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | | | |
|
| | |/ |
|
| | | |
|
| | | |
|
| | |
| |
| |
| | |
publish. This massively reduces the number of sync calls to disk_queue, potentially to one, if every message in the queue is non persistent (or the queue is non durable).
|
| | |
| |
| |
| | |
solve the problems. I don't quite buy this though, as all I was doing was stopping and starting the app so I don't understand why this was affecting the clustering configuration or causing issues _much_ further down the test line. But still, it seems to be repeatedly passing for me atm.
|
| |\ \
| | |
| | |
| | | |
"All replicas on diskfull nodes are not active yet".
|
| | | | |
|
| | | | |
|
| | | |
| | |
| | |
| | | |
This means the conversion is much faster than it was which is a good thing, at the cost of slower initial delivery.
|
| | | | |
|
| | | | |
|
| | | | |
|
| |\ \ \ |
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
it to me yesterday at the Erlang Factory conference.
When you sync, you know that everything up to the current state of the file is sync'd. Given that we're always appending, we know that any message before the current length of the file is available. Thus when we're reading messages from the current write file, even if the file is dirty, we don't need to sync unless the message we're reading is beyond the length of the file at the last sync.
This can be very effective, for example, if there are a few hundred messages in the queue and then you're reading and writing to the queue at the same rate, then this will mean that rather than doing a sync for every read, we now only sync once per size of queue (altitude or ramp size). Sure enough, my publish_one_in_one_out_receive(1000) (altitude of 1000, then 5000 @ one in, one out) reduces from 6089 calls to fsync to 21, and from 15.4 seconds to 3.6.
It's also possible to apply the same optimisation in tx_commit - not only do we now return immediately if the current file is not dirty or if none of the messages in the txn are in the current file, but we can also return immediately if the current file is dirty and messages are in the current file, but they're all below the last sync file size.
Surprising very little extra code needed.
|
| | | | | |
|
| | | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Also, the sync version of publish is unnecessary as we were only ever using it in one place where we threw away the result. Thus even when publishing a message and marking it delivered in one up (as opposed to publish_delivered, which is quite different ;) ), we can make it cast, not call, as we don't need the acktag.
Also, the memory accounting was wrong for requeue in mixed_queue because requeue doesn't actually change the memory sizes (memory goes up on (tx_)publish, and down on ack/tx_cancel. Requeue has no effect. Nor does deliver.).
|