Memory matters - even in Erlang (再次说明了了解内存如何工作的必要性) - erlang非业余研究

mryufeng

浏览: 982759 次
性别:
来自: 广州

最近访客更多访客>>

wszt

antsmall

funing

wjmboss

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Memory matters - even in Erlang (再次说明了了解内存如何工作的必要性)

博客分类：

erlang

Erlang 工作 Hibernate Access Cache

原文地址:http://www.lshift.net/blog/2010/02/28/memory-matters-even-in-erlang

作者解决问题的思路非常敬佩! 真没想到hibernation后, 由于对象的移动, 使得内存访问的不连续, 导致内存cahche的失效, 速度可以慢这么多!

Some time ago we got an interesting bug report for RabbitMQ. Surprisingly, unlike other complex bugs, this one is easy to describe:

At some point basic.get suddenly starts being very slow - about 9 times slower!

Basic.get doesn’t do anything complex - it just pops a message from a queue. This behaviour was quite unexpected. Our initial tests confirmed that we have a problem when a queue contains thousands of elements:

queue_length: 90001 basic_get 3333 times took: 1421.250ms
queue_length: 83335 basic_get 3333 times took: 1576.664ms
queue_length: 60004 basic_get 3333 times took: 1403.086ms
queue_length: 53338 basic_get 3333 times took: 9659.434ms [ look at that! ]
queue_length: 50005 basic_get 3333 times took: 9885.598ms
queue_length: 46672 basic_get 3333 times took: 8562.136ms

Let me repeat that. Usually popping a message from a queue takes Xms. At some point, it slows down to 9*Xms.

It turned out that the problem is with the queue:len() function, which is executed during the basic.get. Actually, queue:len() calls only erlang:length() builtin. At some point it switches to the “slow” mode.

Erlang:length() is a builtin that iterates through a linked list and counts it’s length. It’s complexity is O(N), where N is the length of the list. This function is implemented in the VM so it’s expected to be very, very fast.

The problem is not with erlang:length() being slow. It’s about being unpredictably slow. Let’s take a look at Erlang interpreter source code (erl_bif_guard.c:erts_gc_length_1). Here’s the main loop for erlang:length():

i=0
while (is_list(list)) {
    i++;
    list = CDR(list_val(list));
}

It does nothing unusual - it just iterates through list elements. However, recompiling Erlang with some debugging information confirms that the problem is indeed here:

clock_gettime(CLOCK_REALTIME, &t0);
while (is_list(list)) {
    i++;
    list = CDR(list_val(list));
}
clock_gettime(CLOCK_REALTIME, &t1);
td_ms = TIMESPEC_NSEC_SUBTRACT(t1, t0) / 1000000.0;
if (i > 200000 || td_ms > 2.0) {
    fprintf(stderr, "gc_length_1(%p)=%i %.3fms\n\r", reg[live], i, td_ms);
}

gc_length_1(0x7f4dbfa7fc19)=499999 2.221ms
gc_length_1(0x7f4dbfa7fc19)=499999 2.197ms
gc_length_1(0x7f4dbfa7fc19)=499999 2.208ms
(hibernation)
gc_length_1(0x7f4db0572049)=499999 13.793ms
gc_length_1(0x7f4db0572049)=499999 12.806ms
gc_length_1(0x7f4db0572049)=499999 12.531ms

This confirms Matthias’ initial guess - the slowdown starts after Erlang process hibernation.

For those who aren’t Erlang experts: Hibernation is an operation that compacts an Erlang process. It does aggressive garbage collection and reduces the memory footprint of a process to absolute minimum.

The intended result of hibernation is recovering free memory from the process. However its side effect is a new memory layout of objects allocated on the heap.

Ah, how could I have forgotten! The memory is nowadays slow! What happens, is that before hibernation list elements are aligned differently, more dense. Whereas after hibernation they are sparse. It’s easy to test it - let’s count the average distance between pointers to list elements:

gc_length_1(0x7f5c626fbc19)=499999 2.229ms avg=16.000 dev=0.023
gc_length_1(0x7f5c626fbc19)=499999 3.349ms avg=16.000 dev=0.023
gc_length_1(0x7f5c626fbc19)=499999 3.345ms avg=16.000 dev=0.023
(hibernation)
gc_length_1(0x7f5c61f7d049)=499999 13.800ms avg=136.000 dev=0.266
gc_length_1(0x7f5c61f7d049)=499999 12.726ms avg=136.000 dev=0.266
gc_length_1(0x7f5c61f7d049)=499999 12.367ms avg=136.000 dev=0.266

Confirmed! Standard deviation is surprisingly small, so we can read the numbers as:

    * Before hibernation list elements are aligned exactly one after another, values are somewhere else.
    * After hibernation list elements are interleaved with values.

This behavior does make sense. In most cases when you traverse the list, you actually do something with the values. After hibernation, when you access list item, the value will be already loaded to the CPU cache.

Knowing the mechanism, it’s easy to write a test case that reproduces the problem.

The average distance between pointers in my case is constant - the standard deviation is negligible. This information has a practical implication - we can “predict” where the next pointer will be. Let’s use that information to “fix” the Erlang VM by prefetching memory!

while (is_list(list)) {
    i++;
    list2 = CDR(list_val(list));
    __builtin_prefetch((char*)list2 + 128*((long)list2-(long)list));
    list = list2;
}

Test script running on original Erlang VM:

length: 300001 avg:0.888792ms dev:0.061587ms
length: 300001 avg:0.881030ms dev:0.040961ms
length: 300001 avg:0.875158ms dev:0.019436ms
hibernate
length: 300001 avg:14.861762ms dev:0.150635ms
length: 300001 avg:14.833733ms dev:0.017405ms
length: 300001 avg:14.884861ms dev:0.220119ms

Patched Erlang VM:

length: 300001 avg:0.742822ms dev:0.029322ms
length: 300001 avg:0.739149ms dev:0.012897ms
length: 300001 avg:0.739465ms dev:0.014417ms
hibernate
length: 300001 avg:7.543693ms dev:0.284355ms
length: 300001 avg:7.342802ms dev:0.330158ms
length: 300001 avg:7.265960ms dev:0.053176ms

The test runs only a tiny bit faster for the “fast” case (dense conses) and twice as fast for the “slow” case (sparse conses).

Should this patch be merged into mainline Erlang? Not really. I have set the prefetch multiplier value to 128 and I don’t even know if it’s optimal. This was only an experiment. But it was fun to see how low-level system architecture can affect high-level applications.

分享到：

CloudI 0.0.9 Released, A Cloud as an Int ... | Some simple examples of using Erlang’ ...

2010-03-09 20:26
浏览 3442
评论(5)
论坛回复 / 浏览 (5 / 3532)
分类:编程语言
查看更多

5 楼 pizigou 2010-04-20

hack 需要达到这个级别太深入了。。

4 楼 litaocheng 2010-03-11

mryufeng 写道

litaocheng 写道

呵呵，不了解内部结构，遇到问题，只能大呼诡异啊。。
不过rabbitmq，为什么不用一个变量保存queue的长度啊，每次reply的时候都调用queue:len/1 确实有点不必要。O(N)的复杂度啊..

reply({ok, queue:len(BufferTail), Msg},
State#q{message_buffer = BufferTail,
next_msg_id = NextId + 1});

和这个命令的调用频度有关?

它这个basic.get命令调用应该很频繁哦。应答会返回对应的msg和剩余msg的数量。

3 楼 mryufeng 2010-03-10

litaocheng 写道

和这个命令的调用频度有关?

2 楼 litaocheng 2010-03-10

1 楼 iso1600 2010-03-10

除了erlang:length() 其他地方应该也有类似现象，patch这个地方治标不治本。
是不是hibernate本身优化解决比较好。

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Memory matters - even in Erlang (再次说明了了解内存如何工作的必要性)

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Memory matters - even in Erlang (再次说明了了解内存如何工作的必要性)

评论

发表评论

相关推荐

OTP R14A今天发布了

R14A实现了EEP31，添加了binary模块

如何查看节点的可用句柄数目和已用句柄数

获取Erlang系统信息的代码片段

iolist跟list有什么区别？

erlang:send_after和erlang:start_timer的使用解释

Latest news from the Erlang/OTP team at Ericsson 2010

对try 异常 运行的疑问，为什么出现两种结果

Erlang ERTS Async基础设施

CloudI 0.0.9 Released, A Cloud as an Interface

Some simple examples of using Erlang’s XPath implementation

lcnt 环境搭建

Erlang强大的代码重构工具 tidier

[Feb 24 2010] Erlang/OTP R13B04 has been released

R13B04 Installation

Running tests

R13B04在细化Binary heap

R13B03 binary vheap有助减少binary内存压力

erl_nif 扩展erlang的另外一种方法

escript的高级特性

最近访客更多访客>>

对try 异常运行的疑问，为什么出现两种结果