Stuff Goes Bad:Erlang In Anger

Global View


For a view of the VM in the large, it’s useful to track statistics and metrics general to the VM, regardless of the code running on it. Moreover, you should aim for a solution that allows long-term views of each metric — some problems show up as a very long accumulation over weeks that couldn’t be detected over small time windows.
 Good examples for issues exposed by a long-term view include memory or process leaks, but also could be regular or irregular spikes in activities relative to the time of the day or week, which can often require having months of data to be sure about it.


 For these cases, using existing Erlang metrics applications is useful. Common options are:
 • folsom 3 to store metrics in memory within the VM, whether global or app-specific..
 • vmstats4 and statsderl 5, sending node metrics over to graphite through statsd 6.
 • exometer 7, a fancy-pants metrics system that can integrate with folsom (among other things), and a variety of back-ends (graphite, collectd, statsd, Riak, SNMP, etc.). It’s the newest player in town
 • ehmon 8 for output done directly to standard output, to be grabbed later through specific agents, splunk, and so on.
 • custom hand-rolled solutions, generally using ETS tables and processes periodically dumping the data. 9
 • or if you have nothing and are in trouble, a function printing stuff in a loop in a shell 10.
It is generally a good idea to explore them a bit, pick one, and get a persistence layer that will let you look through your metrics over time.

 对于这些情况,使用现有的Erlang metrics application非常有用,常用的选择如下:
 • folsom3把指标储存在VM的内存中,可以指定是全局的还是app所特有的。
 • vmstats4statsderl5使用statsd6发送节点的指标。
 • exometer7
 • ehmon8把输出直接放到标准输出上,可以被其它特定的代理(specific agents, splunk)所捕获。
 • 自定义的方案:通常是使用ETS表,进程定期的dumping数据9
 • 或许你根本就没有什么麻烦,你只需要一个函数在loop里面把信息打印到shell上就行了10

[9] Common patterns may fit the ectr application, at
[10] The recon application has the function recon:node_stats_print/2 to do this if you’re in app

[注9]: Common patterns may fit the ectr application, at
[注10]:recon application 有一个函数可以在app里调用recon:node_stats_print/2来做这件事。