Stuff Goes Bad:Erlang In Anger

What Users See


The tricky part about back-pressure is reporting it. When back-pressure is done implicitly through synchronous calls, the only way to know it is at work due to overload is that the system becomes slower and less usable.
Sadly, this is also going to be a potential symptom of bad hardware, bad network, unrelated overload, and possibly a slow client.
Trying to figure out that a system is applying back-pressure by measuring its responsiveness is equivalent to trying to diagnose which illness someone has by observing that person has a fever.
It tells you something is wrong, but not what.
Asking for permission, as a mechanism, will generally allow you to define your interface in such a way that you can explicitly report what is going on: the system as a whole is overloaded, or you’re hitting a limit into the rate at which you can perform an operation and adjust accordingly.


There is a choice to be made when designing the system. Are your users going to have per-account limits, or are the limits going to be global to the system?
System-global or node-global limits are usually easy to implement, but will have the downside that they may be unfair. A user doing 90% of all your requests may end up making the platform unusable for the vast majority of the other users.
Per-account limits, on the other hand, tend to be very fair, and allow fancy schemes such as having premium users who can go above the usual limits. This is extremely nice, but has the downside that the more users use your system, the higher the effective global system limit tends to move. Starting with 100 users that can do 100 requests a minute gives you a global 10000 requests per minute. Add 20 new users with that same rate allowed, and suddenly you may crash a lot more often.
It’s important to consider the tradeoffs your business can tolerate from that point of view, because users will tend not to appreciate seeing their allowed usage go down all the time, possibly even more so than seeing the system go down entirely from time to time.

 而另一方面,如果对每个用户都作限制,就会非常公平,可以容许高级用户直接突破规则,这真的非常好,但这也会限制了更多用户来使用你的系统,相比而言,全局限制的效率就更高一些,如果从100个可以做100请求/min的用户开始,相当于10000 请求/min的全局限制效果。再在相同限制条件下新加20个新用户,你就有可能经常崩溃。