Stuff Goes Bad:Erlang In Anger

Random Drop


Randomly dropping messages is the easiest way to do such a thing, and might also be the most robust implementation, due to its simplicity.
The trick is to define some threshold value between 0.0 and 1.0 and to fetch a random number in that range:


random(Rate) ->
random:uniform() =< Rate.
maybe_seed() ->
case get(random_seed) of
undefined -> random:seed(erlang:now());
{X,X,X} -> random:seed(erlang:now());
_ -> ok

If you aim to keep 95% of the messages you send, the authorization could be written by a call to case drop:random(0.95) of true -> send(); false -> drop() end, or a shorter drop:random(0.95) andalso send() if you don’t need to do anything specific when dropping a message.

case drop:random(0.95) of
ture -> send();
false -> drop()
drop:random(0.95) andalso send().

The maybe_seed() function will check that a valid seed is present in the process dictionary and use it rather than a crappy one, but only if it has not been defined before, in order to avoid calling now() (a monotonic function that requires a global lock) too often.
There is one ‘gotcha’ to this method, though: the random drop must ideally be done at the producer level rather than at the queue (the receiver) level.

 maybe_seed() 函数会在进程字典里面检查是否存在一个有效的种子,并使用它。但这只针对种子已被定义过的情况,用来避免每次都要调用一个now()(此单调函数是有一个全局锁的)

The best way to avoid overloading a queue is to not send data its way in the first place. Because there are no bounded mailboxes in Erlang, dropping in the receiving process only guarantees that this process will be spinning wildly, trying to get rid of messages, and fighting the schedulers to do actual work.
On the other hand, dropping at the producer level is guaranteed to distribute the work equally across all processes.
This can give place to interesting optimizations where the working process or a given monitor process15 uses values in an ETS table or application:set_env/3 to dynamically increase and decrease the threshold to be used with the random number.
This allows control over how many messages are dropped based on overload, and the configuration data can be fetched by any process rather efficiently by using application:get_env/2.
Similar techniques could also be used to implement different drop ratios for different message priorities, rather than trying to sort it all out at the consumer level.


[15] Any process tasked with checking the load of specific processes using heuristics such as process_info(Pid, message_queue_len) could be a monitor