Stuff Goes Bad:Erlang In Anger

Chapter 8 CPU and Scheduler Hogs

While memory leaks tend to absolutely kill your system, CPU exhaustion tends to act like a bottleneck and limits the maximal work you can get out of a node. Erlang developers will have a tendency to scale horizontally when they face such issues. It is often an easy enough job to scale out the more basic pieces of code out there. Only centralized global state (process registries, ETS tables, and so on) usually need to be modified. 1 Still, if you want to optimize locally before scaling out at first, you need to be able to find your CPU and scheduler hogs.

 当内存泄露差不多要把你的系统杀死时,CPU耗尽就成为一个瓶颈,限制你在节点上的工作。当Erlang开发者面临这个问题时,会倾向于纵向扩展(scale horizontally)。通常向外扩展基本代码是非常简单的。只需要集中修改1全局状态(进程注册信息,ETS表等)。当然,如果你想在扩展前先局部优化下,你需要找出CPU和调度器的问题(hogs)。

 It is generally difficult to properly analyze the CPU usage of an Erlang node to pin problems to a specific piece of code. With everything concurrent and in a virtual machine, there is no guarantee you will find out if a specific process, driver, your own Erlang code, NIFs you may have installed, or some third-party library is eating up all your processing power.
 The existing approaches are often limited to profiling and reduction-counting if it’s in your code, and to monitoring the scheduler’s work if it might be anywhere else (but also your code).


[1] Usually this takes the form of sharding or finding a state-replication scheme that’s suitable, and little more. It’s still a decent piece of work, but nothing compared to finding out most of your program’s semantics aren’t applicable to distributed systems given Erlang usually forces your hand there in the first place.