Stuff Goes Bad:Erlang In Anger



There are a lot of different ways in which process memory can grow. Most interesting cases will be related to a few common cases: process leaks (as in, you’re leaking processes), specific processes leaking their memory, and so on. It’s possible there’s more than one cause, so multiple metrics are worth investigating. Note that the process count itself is skipped and has been covered before.


Is the global process count indicative of a leak? If so, you may need to investigate unlinked processes, or peek inside supervisors’ children lists to see what may be weird-looking.
 Finding unlinked (and unmonitored) processes is easy to do with a few basic commands:


1> [P || P <- processes(),
[{_,Ls},{_,Ms}] <- [process_info(P, [links,monitors])],
[]==Ls, []==Ms].
This will return a list of processes with neither. For supervisors, just fetching supervisor: count_children(SupervisorPid Or Name) and seeing what looks normal can be a good pointer.


Memory Used

The per-process memory model is briefly described in Subsection 7.3.2, but generally speaking, you can find which individual processes use the most memory by looking for their memory attribute. You can look things up either as absolute terms or as a sliding window.
 For memory leaks, unless you’re in a predictable fast increase, absolute values are usually those worth digging into first:

 对于每个进程的的内存模型在章节7.3.2已简明描述过了,但通常来讲,你可以通过查看内存属性得到使用内存最多的那个进程。你可以通过绝对项目(absolute terms)或滑动视窗(sliding window)来查看。  对于内存泄露,首先深挖绝对值(absolute values)对找到问题非常有帮助,除非你处理的是可以预测的快速增长阶段。

1> recon:proc_count(memory, 3).
 Attributes that may be interesting to check other than memory may be any other fields in Subsection 5.2.1, including message_queue_len, but memory will usually encompass all other types.


Garbage Collections

It is very well possible that a process uses lots of memory, but only for short periods of time. For long-lived nodes with a large overhead for operations, this is usually not a problem, but whenever memory starts being scarce, such spiky behaviour might be something you want to get rid of.
Monitoring all garbage collections in real-time from the shell would be costly. Instead, setting up Erlang’s system monitor 7 might be the best way to go at it. Erlang’s system monitor will allow you to track information such as long garbage collection periods and large process heaps, among other things. A monitor can temporarily be set up as follows:


1> erlang:system_monitor().
2> erlang:system_monitor(self(), [{long_gc, 500}]).
3> flush().
Shell got {monitor,<4683.31798.0>,long_gc,
5> erlang:system_monitor(undefined).
6> erlang:system_monitor().
 The first command checks that nothing (or nobody else) is using a system monitor yet — you don’t want to take this away from an existing application or coworker.
 The second command will be notified every time a garbage collection takes over 500 milliseconds. The result is flushed in the third command. Feel free to also check for {large_heap, NumWords} if you want to monitor such sizes. Be careful to start with large values at first if you’re unsure. You don’t want to flood your process’ mailbox with a bunch of heaps that are 1-word large or more, for example.
 Command 5 unsets the system monitor (exiting or killing the monitor process also frees it up), and command 6 validates that everything worked.
 You can then find out if such monitoring messages tend to coincide with the memory increases that seem to result in leaks or overuses, and try to catch culprits before things are too bad. Quickly reacting and digging into the process (possibly with recon:info/1) may help find out what’s wrong with the application.

 第二个命令作用:设置垃圾回收每500ms就通知一下shell,shell得到结果要通过第三个命令刷新一下才能看到。如果你想检测大小,那么可以随时检查{large_heap, NumWords}。如果你不确定,那么就不要在开始时就监测lager values。你肯定也不想你的信箱被大量的大于1-word的堆撑爆吧。
 第五个命令作用:释放system monitory(退出或杀掉你的进程也可以释放),第六个命令作用:验证下一切工作正常。