`
mryufeng
  • 浏览: 982396 次
  • 性别: Icon_minigender_1
  • 来自: 广州
社区版块
存档分类
最新评论

Lies, Damned Lies, and Benchmarks(R13A smp性能测试)

阅读更多
原文地址: http://www.erlangatwork.com/2009/03/lies-damned-lies-and-benchmarks.html

Erlang/OTP R13A was released today with a number of major SMP improvements. I've been playing with R13 snapshots for a while and wrote a simple HTTP server to compare the SMP performance on R12 and R13. This server uses {packet, http} to decode requests, increments a counter with a transactional mnesia:read/3 and mnesia:write/1, and responds with the counter's previous value. You'll find the source here.

I ran the HTTP server on a x86_64 CentOS 5 machine running Linux 2.6.18-53.el5. The server has two quad-Core Intel Xeon E5450 CPUs and 8GB of RAM. Erlang/OTP R12B-5 and R13A were compiled from source and run as erl -pa ebin +SN -s ehttpd start where N indicated the number of schedulers to run.

To get performance numbers I ran ab on another server connected via a 100 Mb/s private VLAN as ab -c N -n 100000 http://10.0.0.32:8889/ where N was the number of concurrent requests. ab was run 3 times for each value of N and the following chart shows the average requests/sec with 4 and 8 schedulers.


[img]schedulers.png [/img]

R13A's SMP improvements include multiple run queues and improved locking. It also supports binding schedulers to specific CPU cores and hardware threads. Binding isn't enabled by default, so the following chart shows the result of setting erlang:system_flag(scheduler_bind_type, thread_no_node_processor_spread) and running with 100 concurrent requests.

[img]requests_sec.png [/img]

There is a lot missing from these benchmarks, I didn't test kernel polling and only generated load from one client machine. The drop between 500 and 1000 concurrent requests on R13A +S8 looks too steep and may be the result of using ab. That said, the SMP optimizations in R13 are looking very promising!

根据我在ecug上做的实验:8核心的cpu
[spawn(ring, run,[["100", "10000000000"]]) || _X <- lists:seq(1,1000)].

R12B5:
CPU  User%  Sys% Wait% Idle|0          |25         |50          |75       100|                                                    3
3 1  21.3  62.4   0.0   16.3|UUUUUUUUUUsssssssssssssssssssssssssssssss       >|                                                    3
3 2  20.9  61.7   0.0   17.4|UUUUUUUUUUssssssssssssssssssssssssssssss      >  |                                                    3
3 3  19.9  63.2   0.0   16.9|UUUUUUUUUsssssssssssssssssssssssssssssss         >                                                    3
3 4  18.9  64.2   0.0   16.9|UUUUUUUUUssssssssssssssssssssssssssssssss        >                                                    3
3 5  19.9  62.7   0.0   17.4|UUUUUUUUUsssssssssssssssssssssssssssssss         >                                                    3
3 6  20.9  63.2   0.0   15.9|UUUUUUUUUUsssssssssssssssssssssssssssssss        >                                                    3
3 7  19.4  62.7   0.0   17.9|UUUUUUUUUsssssssssssssssssssssssssssssss       > |                                                    3
3 8  19.4  63.7   0.0   16.9

R13A:

CPU  User%  Sys% Wait% Idle|0          |25         |50          |75       100|                                                    3
3 1  61.2  31.8   0.0    7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss >  |                                                    3
3 2  64.7  29.9   0.0    5.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > |                                                    3
3 3  62.7  29.9   0.0    7.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss >  |                                                    3
3 4  61.0  32.5   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssssss > |                                                    3
3 5  62.5  30.5   0.0    7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss > |                                                    3
3 6  64.2  29.4   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss  >|                                                    3
3 7  63.7  29.9   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss >  |                                                    3
3 8  65.7  27.9   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssss >  |                                                    3
3                           +-------------------------------------------------+                                                    3
3Avg 63.2  30.2   0.0    6.5


sys的调用主要是futex 所有对锁的依赖大量减少!

结论: 速度提高了将近2倍 效果真的很好yeah!
  • 大小: 34.1 KB
  • 大小: 28.9 KB
分享到:
评论
5 楼 mryufeng 2009-03-19  
我strace了 基本上是futex 也就是说大量的锁争夺不见了 因为这个测试是ring, 基本上是消息发送和接受! 这个效果可以说非常好!
4 楼 Arbow 2009-03-19  
哇,内核占用的cpu少了不少啊
3 楼 mryufeng 2009-03-19  
根据我在ecug上做的实验:8核心的cpu
[spawn(ring, run,[["100", "10000000000"]]) || _X <- lists:seq(1,1000)].

R12B5:
CPU  User%  Sys% Wait% Idle|0          |25         |50          |75       100|                                                    3
3 1  21.3  62.4   0.0   16.3|UUUUUUUUUUsssssssssssssssssssssssssssssss       >|                                                    3
3 2  20.9  61.7   0.0   17.4|UUUUUUUUUUssssssssssssssssssssssssssssss      >  |                                                    3
3 3  19.9  63.2   0.0   16.9|UUUUUUUUUsssssssssssssssssssssssssssssss         >                                                    3
3 4  18.9  64.2   0.0   16.9|UUUUUUUUUssssssssssssssssssssssssssssssss        >                                                    3
3 5  19.9  62.7   0.0   17.4|UUUUUUUUUsssssssssssssssssssssssssssssss         >                                                    3
3 6  20.9  63.2   0.0   15.9|UUUUUUUUUUsssssssssssssssssssssssssssssss        >                                                    3
3 7  19.4  62.7   0.0   17.9|UUUUUUUUUsssssssssssssssssssssssssssssss       > |                                                    3
3 8  19.4  63.7   0.0   16.9

R13A:

CPU  User%  Sys% Wait% Idle|0          |25         |50          |75       100|                                                    3
3 1  61.2  31.8   0.0    7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss >  |                                                    3
3 2  64.7  29.9   0.0    5.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss > |                                                    3
3 3  62.7  29.9   0.0    7.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss >  |                                                    3
3 4  61.0  32.5   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssssss > |                                                    3
3 5  62.5  30.5   0.0    7.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssssss > |                                                    3
3 6  64.2  29.4   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss  >|                                                    3
3 7  63.7  29.9   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUssssssssssssss >  |                                                    3
3 8  65.7  27.9   0.0    6.5|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUsssssssssssss >  |                                                    3
3                           +-------------------------------------------------+                                                    3
3Avg 63.2  30.2   0.0    6.5


sys的调用主要是futex 所有对锁的依赖大量减少!

速度提高了将近2倍 效果真的很好yeah!

2 楼 mryufeng 2009-03-19  
性能提升确实很多哦 这次单单smp就添加了几千行代码!
1 楼 dogstar 2009-03-19  
早知道就直接看最后一个单词了 promising

:)

相关推荐

Global site tag (gtag.js) - Google Analytics