Stuff Goes Bad:Erlang In Anger

Dealing With Constant Overload


Being under constant overload may require a new solution. Whereas both queues and buffers will be great for cases where overload happens from time to time (even if it’s a rather prolonged period of time), they both work more reliably when you expect the input rate to eventually drop, letting you catch up.
You’ll mostly get problems when trying to send so many messages they can’t make it all to one process without overloading it.
Two approaches are generally good for this case:
 • Have many processes that act as buffers and load-balance through them (scale horizontally)
 • use ETS tables as locks and counters (reduce the input)

 • 用更多的进程为它们来做缓冲(buffers)和负载均衡(扩张规模)
 • 使用ETS表来作锁(locks)和计数器(counters),(减缓输入)

ETS tables are generally able to handle a ton more requests per second than a process, but the operations they support are a lot more basic. A single read, or adding or removing from a counter atomically is as fancy as you should expect things to get for the general case.
ETS tables will be required for both approaches. Generally speaking, the first approach could work well with the regular process registry:
you take N processes to divide up the load, give them all a known name, and pick one of them to send the message to. Given you’re pretty much going to assume you’ll be overloaded, randomly picking a process with an even distribution tends to be reliable: no state communication is required, work will be shared in a roughly equal manner, and it’s rather insensitive to failure.
In practice, though, we want to avoid atoms generated dynamically, so I tend to prefer to register workers in an ETS table with read_concurrency set to true.
It’s a bit more work, but it gives more flexibility when it comes to updating the number of workers later on.
An approach similar to this one is used in the lhttpc 22 library mentioned earlier, to split load balancers on a per-domain basis.

 通常ETS 表能比一个进程在每秒内处理更多的请求,而且ETS支持更多的基本操作,读,增或从计数器中自动删除都会和你平常的预期一样的。
 ETS表可以满足这两种方法。通常来讲,第一种方法(用更多的进程来做缓冲和负载均衡)可以在常规注册进程(the regular process registry)机制下工作得很好。
 与上面这个类似的方法:就是前面提到的lhttpc22,将负载均衡放在每一个基本域上(per-domain basis).

For the second approach, using counters and locks, the same basic structure still remains (pick one of many options, send it a message), but before actually sending a message, you must atomically update an ETS counter 23. There is a known limit shared across all clients (either through their supervisor, or any other config or ETS value) and each request that can be made to a process needs to clear this limit first.
This approach has been used in dispcount 24 to avoid message queues, and to guarantee low-latency responses to any message that won’t be handled so that you do not need to wait to know your request was denied. It is then up to the user of the library whether to give up as soon as possible, or to keep retrying with different workers.


[22] The lhttpc_lb module in this library implements it.
[23] By using ets:update_counter/3.

[注22] :可以查看lhttpc_l模块实现了。
[注23] :通过使用ets:update_counter/3。
[注24] :