Lies, Damned Lies, and Benchmarks（R13A smp性能测试）

mryufeng

浏览: 982396 次
性别:
来自: 广州

最近访客更多访客>>

antsmall

funing

wjmboss

leeyisoft

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

erlang

Erlang CentOS Linux performance thread

原文地址： http://www.erlangatwork.com/2009/03/lies-damned-lies-and-benchmarks.html

Erlang/OTP R13A was released today with a number of major SMP improvements. I've been playing with R13 snapshots for a while and wrote a simple HTTP server to compare the SMP performance on R12 and R13. This server uses {packet, http} to decode requests, increments a counter with a transactional mnesia:read/3 and mnesia:write/1, and responds with the counter's previous value. You'll find the source here.

I ran the HTTP server on a x86_64 CentOS 5 machine running Linux 2.6.18-53.el5. The server has two quad-Core Intel Xeon E5450 CPUs and 8GB of RAM. Erlang/OTP R12B-5 and R13A were compiled from source and run as erl -pa ebin +SN -s ehttpd start where N indicated the number of schedulers to run.

To get performance numbers I ran ab on another server connected via a 100 Mb/s private VLAN as ab -c N -n 100000 http://10.0.0.32:8889/ where N was the number of concurrent requests. ab was run 3 times for each value of N and the following chart shows the average requests/sec with 4 and 8 schedulers.

[img]schedulers.png [/img]

R13A's SMP improvements include multiple run queues and improved locking. It also supports binding schedulers to specific CPU cores and hardware threads. Binding isn't enabled by default, so the following chart shows the result of setting erlang:system_flag(scheduler_bind_type, thread_no_node_processor_spread) and running with 100 concurrent requests.

[img]requests_sec.png [/img]

There is a lot missing from these benchmarks, I didn't test kernel polling and only generated load from one client machine. The drop between 500 and 1000 concurrent requests on R13A +S8 looks too steep and may be the result of using ab. That said, the SMP optimizations in R13 are looking very promising!

根据我在ecug上做的实验：8核心的cpu
[spawn(ring, run,[["100", "10000000000"]]) || _X <- lists:seq(1,1000)].

R12B5:
CPU User% Sys% Wait% Idle|0          |25         |50          |75       100|                                                    3
3 1 21.3 62.4   0.0   16.3|UUUUUUUUUUsssssssssssssssssssssssssssssss       >|                                                    3
3 2 20.9 61.7   0.0   17.4|UUUUUUUUUUssssssssssssssssssssssssssssss      > |                                                    3
3 3 19.9 63.2   0.0   16.9|UUUUUUUUUsssssssssssssssssssssssssssssss         >                                                    3
3 4 18.9 64.2   0.0   16.9|UUUUUUUUUssssssssssssssssssssssssssssssss        >                                                    3
3 5 19.9 62.7   0.0   17.4|UUUUUUUUUsssssssssssssssssssssssssssssss         >                                                    3
3 6 20.9 63.2   0.0   15.9|UUUUUUUUUUsssssssssssssssssssssssssssssss        >                                                    3
3 7 19.4 62.7   0.0   17.9|UUUUUUUUUsssssssssssssssssssssssssssssss       > |                                                    3
3 8 19.4 63.7   0.0   16.9

R13A:

CPU User% Sys% Wait% Idle|0          |25         |50          |75       100|                                                    3
3 1 61.2 31.8   0.0    7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss > |                                                    3
3 2 64.7 29.9   0.0    5.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > |                                                    3
3 3 62.7 29.9   0.0    7.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > |                                                    3
3 4 61.0 32.5   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssssss > |                                                    3
3 5 62.5 30.5   0.0    7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss > |                                                    3
3 6 64.2 29.4   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss >|                                                    3
3 7 63.7 29.9   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > |                                                    3
3 8 65.7 27.9   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssss > |                                                    3
3                           +-------------------------------------------------+                                                    3
3Avg 63.2 30.2   0.0    6.5

sys的调用主要是futex 所有对锁的依赖大量减少！

结论：速度提高了将近2倍效果真的很好yeah!

查看图片附件

分享到：

Dynamically sizing a fragmented mnesia s ... | R13A 新增Reltool模块

2009-03-19 15:12
浏览 1687
评论(5)
查看更多

5 楼 mryufeng 2009-03-19

我strace了基本上是futex 也就是说大量的锁争夺不见了因为这个测试是ring，基本上是消息发送和接受！这个效果可以说非常好！

4 楼 Arbow 2009-03-19

哇，内核占用的cpu少了不少啊

3 楼 mryufeng 2009-03-19

根据我在ecug上做的实验：8核心的cpu
[spawn(ring, run,[["100", "10000000000"]]) || _X <- lists:seq(1,1000)].

R12B5:
CPU User% Sys% Wait% Idle|0          |25         |50          |75       100|                                                    3
3 1 21.3 62.4   0.0   16.3|UUUUUUUUUUsssssssssssssssssssssssssssssss       >|                                                    3
3 2 20.9 61.7   0.0   17.4|UUUUUUUUUUssssssssssssssssssssssssssssss      > |                                                    3
3 3 19.9 63.2   0.0   16.9|UUUUUUUUUsssssssssssssssssssssssssssssss         >                                                    3
3 4 18.9 64.2   0.0   16.9|UUUUUUUUUssssssssssssssssssssssssssssssss        >                                                    3
3 5 19.9 62.7   0.0   17.4|UUUUUUUUUsssssssssssssssssssssssssssssss         >                                                    3
3 6 20.9 63.2   0.0   15.9|UUUUUUUUUUsssssssssssssssssssssssssssssss        >                                                    3
3 7 19.4 62.7   0.0   17.9|UUUUUUUUUsssssssssssssssssssssssssssssss       > |                                                    3
3 8 19.4 63.7   0.0   16.9

R13A:

CPU User% Sys% Wait% Idle|0          |25         |50          |75       100|                                                    3
3 1 61.2 31.8   0.0    7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss > |                                                    3
3 2 64.7 29.9   0.0    5.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > |                                                    3
3 3 62.7 29.9   0.0    7.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > |                                                    3
3 4 61.0 32.5   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssssss > |                                                    3
3 5 62.5 30.5   0.0    7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss > |                                                    3
3 6 64.2 29.4   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss >|                                                    3
3 7 63.7 29.9   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > |                                                    3
3 8 65.7 27.9   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssss > |                                                    3
3                           +-------------------------------------------------+                                                    3
3Avg 63.2 30.2   0.0    6.5

sys的调用主要是futex 所有对锁的依赖大量减少！

速度提高了将近2倍效果真的很好yeah!

2 楼 mryufeng 2009-03-19

性能提升确实很多哦这次单单smp就添加了几千行代码！

1 楼 dogstar 2009-03-19

早知道就直接看最后一个单词了 promising

:)

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论