- 浏览: 2968422 次
- 性别:
- 来自: 上海
文章分类
- 全部博客 (2529)
- finance (1459)
- technology (218)
- life (343)
- play (150)
- technology-component (0)
- idea (6)
- house (74)
- health (75)
- work (32)
- joke (23)
- blog (1)
- amazing (13)
- important (22)
- study (13)
- Alternative (0)
- funny (8)
- stock_technology (12)
- business (16)
- car (21)
- decorate (4)
- basketball (2)
- English (16)
- banker (1)
- TheBest (1)
- sample (2)
- love (13)
- management (4)
最新评论
-
zhongmin2012:
BSM确实需要实践,标准ITIL服务流程支持,要做好,需要花费 ...
BSM实施之前做什么 -
shw340518:
提示楼主,有时间逻辑bug:是你妈二十那年写的 那会儿连你爹都 ...
80后辣妈给未来儿子的信~我的儿,你也给我记住了~~~ -
guoapeng:
有相关的文档吗?
it项目管理表格(包含146个DOC文档模板) -
solomon:
看到的都是 这种 CTRL+C 和 CTRL+V 的文章, ...
Designing a website with InfoGlue components -
wendal:
恩, 不错. 有参考价值
Designing a website with InfoGlue components
Goal
Various mechanisms of handling multiple sockets with a single server thread are available; not all mechanisms are available on all platforms. A C++ class, Poller, has been written which abstracts these mechanisms into a common interface, and should provide both high performance and portability. This microbenchmark compares the performance of select(), poll(), kqueue(), and /dev/poll using Poller on Solaris 7, Linux 2.2.14, Linux 2.4.0-test10-pre4, and FreeBSD 4.x-STABLE.
Note that this is a synthetic microbenchmark, not a real world benchmark. In the real world, other effects often swamp the kinds of things measured here.
Description
Poller and Poller_bench are GPL'd software; you can download the source or view the doc online.
Poller_bench sets up an array of socketpairs, has a Poller monitor the read end of each socketpair, and measures how long it takes to execute the following snippet of code with various styles of Poller:
for (k=0; k<num_active; k++)
write(fdpairs[k * spacing + spacing/2][1], "a", 1);
poller.waitAndDispatchEvents(0);
where spacing = num_pipes / num_active.
poller.waitAndDispatchEvents() calls poll() or ioctl(m_dpfd, DP_POLL, &dopoll), as appropriate, then calls an event handler for each ready socket.
The event handler for this benchmark just executes
read(event->fd, buf, 1);
Setup - Linux
Download the /dev/poll patch. (Note: the author of the /dev/poll patch asked me to remove the link. The preferred interface for Linux is sys_epoll; use that instead of /dev/poll on Linux. This will require an interesting rewrite of user code, as sys_epoll is edge-triggered, but /dev/poll was level-triggered.)
Apply the patch, configure your kernel to enable /dev/poll support (with 'make menuconfig'), and rebuild the kernel.
Create an entry in /dev for the /dev/poll driver with
cd /dev
mknod poll u 15 125
chmod 666 /dev/poll
where 15 is MISC_MAJOR and 125 is DEVPOLL_MINOR from the kernel sources; your MISC_MAJOR may differ, be sure to check /usr/src/linux/include/linux/major.h for the definition of MISC_MAJOR on your system.
Create a symbolic link so the benchmark (which includes usr/include/sys/devpoll.h) can be compiled:
cd /usr/include/asm
ln -s ../linux/devpoll.h
Setup - Solaris
On Solaris 7, you may need to install a patch to get /dev/poll (or at least to get it to work properly); it's standard in Solaris 8. See also my notes on /dev/poll.
Also, a line near the end of /usr/include/sys/poll_impl.h may need to be moved to get it to compile when included from C++ programs.
Procedure
Download the dkftpbench source tarball from http://www.kegel.com/dkftpbench/ and unpack.
On Linux, if you want kernel profile results, boot with argument 'profile=2' to enable the kernel's builtin profiler.
Run the shell script Poller_bench.sh as follows:
su
sh Poller_bench.sh
The script raises file descriptor limits, then runs the command
./Poller_bench 5 1 spd 100 1000 10000
It should be run on an idle machine, with no email client, web browser, or X server running. The Pentium III machine at my disposal was running a single sshd; the Solaris machine was running two sshd's and an idle XWindow server, so it wasn't quite as idle.
Results
With 1 active socket amongst 100, 1000, or 10000 total sockets, waitAndDispatchEvents takes the following amount of wall-clock time, in microseconds (lower is faster):
On a 167MHz sun4u Sparc Ultra-1 running SunOS 5.7 (Solaris 7) Generic_106541-11:
pipes 100 1000 10000
select 151 - -
poll 470 676 3742
/dev/poll 61 70 92
165133 microseconds to open each of 10000 socketpairs
29646 microseconds to close each of 10000 socketpairs
On a 4X400Mhz Enterprise 450 running Solaris 8 (results contributed by Doug Lea):
pipes 100 1000 10000
select 60 - -
poll 273 388 1559
/dev/poll 27 28 34
116586 microseconds to open each of 10000 socketpairs
19235 microseconds to close each of 10000 socketpairs
(The machine wasn't idle, but at most one CPU was doing other stuff during test, and the test seemed to occupy only one CPU.)
On an idle 650 MHz dual Pentium III running Red Hat Linux 6.2 with kernel 2.2.14smp plus the /dev/poll patch plus Dave Miller's patch to speed up close():
pipes 100 1000 10000
select 28 - -
poll 23 890 11333
/dev/poll 19 146 4264
(Time to open or close socketpairs was not recorded, but was under 14 microseconds.)
On the same machine as above, but with kernel 2.4.0-test10-pre4 smp:
pipes 100 1000 10000
select 52 - -
poll 49 1184 14660
26 microseconds to open each of 10000 socketpairs
14 microseconds to close each of 10000 socketpairs
(Note: the /dev/poll patch does not apply cleanly to recent 2.4.0-test kernels, I believe, and I did not try it.)
On a single processor 600Mhz Pentium-III with 512MB of memory, running FreeBSD 4.x-STABLE (results contributed by Jonathan Lemon):
pipes 100 1000 10000 30000
select 54 - - -
poll 50 552 11559 35178
kqueue 8 8 8 8
(Note: Jonathan also varied the number of active pipes, and found that kqueue's time scaled linearly with that number, whereas poll's time scaled linearly with number of total pipes.)
The test was also run with pipes instead of socketpairs (results not shown); the performance on Solaris was about the same, but the /dev/poll driver on Linux did not perform well with pipes. According to Niels Provos,
The hinting code which causes a considerable speed up for /dev/poll only applies to network sockets. If there are any serious applications that make uses of pipes in a manner that would benefit from /dev/poll then the pipe code needs to return hints too.
Discussion
Miscellany
Running the benchmark was painfully slow on Solaris 7 because the time to create or close socketpairs was outrageous. Likewise, on unpatched 2.2.14, the time to close socketpairs was outrageous, but the recent patch from Dave Miller fixes that nicely.
2.4.0-test10-pre4 was slower than 2.2.14 in all cases tested.
I should show results for pipes as well as socketpairs.
The Linux 2.2.14 /dev/poll driver printed messages to the console when sockets were closed; this should probably be disabled for production.
kqueue()
It looks offhand like kqueue() performs best of all the tested methods. It's even faster than, and scales better than, /dev/poll, at least in this microbenchmark.
/dev/poll vs. poll
In all cases tested involving sockets, /dev/poll was appreciably faster than poll().
The 2.2.14 Linux /dev/poll driver was about six times faster than poll() for 1000 fds, but fell down to only 2.7 times faster at 10000 fds. The Solaris /dev/poll driver was about seven times faster than poll() at 100 fds, and increased to 40 times faster at 10000 fds.
Scalability of poll() and /dev/poll
Under Solaris 7, when the number of idle sockets was increased from 100 to 10000, the time to check for active sockets with poll() and /dev/poll increased by a factor of only 6.5 (good) and 1.5 (fantastic), respectively.
Under Linux 2.2.14, when the number of idle sockets was increased from 100 to 10000, the time to check for active sockets with poll() and /dev/poll increased by a factor of 493 and 224, respectively. This is terribly, horribly bad scaling behavior.
Under Linux 2.4.0-test10-pre4, when the number of idle sockets was increased from 100 to 10000, the time to check for active sockets with poll() increased by a factor of 300. This is terribly, horribly bad scaling behavior.
There seems to be a scalability problem in poll() under both Linux 2.2.14 and 2.4.0-test10-pre4 and in /dev/poll under Linux 2.2.14.
poll() is stuck with an interface that dictates O(n) behavior on total pipes; still, Linux's implementation could be improved. The design of the current Linux /dev/poll patch is O(n) in total pipes, in spite of the fact that its interface allows it to be O(1) in total pipes and O(n) only in active pipes.
See also the recent discussions on linux-kernel.
Results - kernel profiling
To look for the scalability problem, I added support to the benchmark to trigger the Linux kernel profiler. A few results are shown below. (No smoking gun was found, but then, I wouldn't know a smoking gun if it hit me in the face. Perhaps real kernel hackers can pick up the hunt from here.)
If you run the above test on a Linux system booted with 'profile=2', Poller_bench will output one kernel profiling data file per test condition. Poller_bench.sh does a gross analysis using 'readprofile | sort -rn | head > bench%d%c.top' to find the kernel functions with the highest CPU usage, where %d is the number of socketpairs, and %c is p for poll, d for /dev/poll, etc.
'more bench10000*.top' shows the results for 10000 socketpairs. On 2.2.14, it shows:
::::::::::::::
bench10000d.dat.top
::::::::::::::
901 total 0.0008
833 dp_poll 1.4875
27 do_bottom_half 0.1688
7 __get_request_wait 0.0139
4 startup_32 0.0244
3 unix_poll 0.0203
::::::::::::::
bench10000p.dat.top
::::::::::::::
584 total 0.0005
236 unix_poll 1.5946
162 sock_poll 4.5000
148 do_poll 0.6727
24 sys_poll 0.0659
7 __generic_copy_from_user 0.1167
This seems to indicate that /dev/poll spends nearly all of its time in dp_poll(), and poll spends a fair bit of time in three routines: unix_poll, sock_poll, and do_poll.
On 2.4.0-test10-pre4 smp, 'more bench10000*.top' shows:
::::::::::::::
2.4/bench10000p.dat.top
::::::::::::::
1507 total 0.0011
748 default_idle 14.3846
253 unix_poll 1.9167
209 fget 2.4881
195 sock_poll 5.4167
29 sys_poll 0.0342
29 fput 0.1272
29 do_pollfd 0.1648
It seems curious that the idle routine should show up so much, but it's probably just the second CPU doing nothing.
Poller_bench.sh will also try to do a fine analysis of dp_poll() using the 'profile' tool (source included), which is a variant of readprofile that shows hotspots within kernel functions. Looking at its output for the run on 2.2.14, the three four-byte regions that take up the most CPU time in dp_poll() in the 10000 socketpair case are
c01d9158 39.135654% 326
c01d9174 11.404561% 95
c01d91a0 27.250900% 227
Looking at the output of 'objdump -d /usr/src/linux/vmlinux', that region corresponds to the object code:
c01d9158: c7 44 24 14 00 00 00 movl $0x0,0x14(%esp,1)
c01d915f: 00
c01d9160: 8b 74 24 24 mov 0x24(%esp,1),%esi
c01d9164: 8b 86 8c 04 00 00 mov 0x48c(%esi),%eax
c01d916a: 3b 50 04 cmp 0x4(%eax),%edx
c01d916d: 73 0a jae c01d9179
c01d916f: 8b 40 10 mov 0x10(%eax),%eax
c01d9172: 8b 14 90 mov (%eax,%edx,4),%edx
c01d9175: 89 54 24 14 mov %edx,0x14(%esp,1)
c01d9179: 83 7c 24 14 00 cmpl $0x0,0x14(%esp,1)
c01d917e: 75 12 jne c01d9192
c01d9180: 53 push %ebx
c01d9181: ff 74 24 3c pushl 0x3c(%esp,1)
c01d9185: e8 5a fc ff ff call c01d8de4
c01d918a: 83 c4 08 add $0x8,%esp
c01d918d: e9 d1 00 00 00 jmp c01d9263
c01d9192: 8b 7c 24 10 mov 0x10(%esp,1),%edi
c01d9196: 0f bf 4f 06 movswl 0x6(%edi),%ecx
c01d919a: 31 c0 xor %eax,%eax
c01d919c: f0 0f b3 43 10 lock btr %eax,0x10(%ebx)
c01d91a1: 19 c0 sbb %eax,%eax
I'm not yet familiar enough with kernel hacker tools to associate those with lines of code in /usr/src/linux/drivers/char/devpoll.c, but that 'lock btr' hotspot appears to be the call to test_and_clear_bit().
lmbench results
lmbench results are presented here to help people trying to compare the Intel and Sparc parts of the results shown above.
The source used was lmbench-2alpha10 from bitmover.com. I did not check into why the TCP test failed on the linux box.
L M B E N C H 1 . 9 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
sparc-sun SunOS 5.7 167 2.9 12. 48 55 0.40K 6.6 81 3.8K 15K 32K
i686-linu Linux 2.2.14d 651 0.5 0.8 4 5 0.03K 1.4 2 0.3K 1K 6K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
sparc-sun SunOS 5.7 19 69 235 114 349 116 367
i686-linu Linux 2.2.14d 1 5 17 5 129 30 129
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
sparc-sun SunOS 5.7 19 60 120 197 215 1148
i686-linu Linux 2.2.14d 1 7 13 31 80
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
sparc-sun SunOS 5.7 6605 15 5.2K
i686-linu Linux 2.2.14d 10 0 19 1 5968 1 0.5K
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
sparc-sun SunOS 5.7 60 55 54 84 122 177 89 122 141
i686-linu Linux 2.2.14d 528 366 -1 357 451 150 138 451 171
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
---------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- --- ---- ---- -------- -------
sparc-sun SunOS 5.7 167 12 59 273
i686-linu Linux 2.2.14d 651 4 10 131
Various mechanisms of handling multiple sockets with a single server thread are available; not all mechanisms are available on all platforms. A C++ class, Poller, has been written which abstracts these mechanisms into a common interface, and should provide both high performance and portability. This microbenchmark compares the performance of select(), poll(), kqueue(), and /dev/poll using Poller on Solaris 7, Linux 2.2.14, Linux 2.4.0-test10-pre4, and FreeBSD 4.x-STABLE.
Note that this is a synthetic microbenchmark, not a real world benchmark. In the real world, other effects often swamp the kinds of things measured here.
Description
Poller and Poller_bench are GPL'd software; you can download the source or view the doc online.
Poller_bench sets up an array of socketpairs, has a Poller monitor the read end of each socketpair, and measures how long it takes to execute the following snippet of code with various styles of Poller:
for (k=0; k<num_active; k++)
write(fdpairs[k * spacing + spacing/2][1], "a", 1);
poller.waitAndDispatchEvents(0);
where spacing = num_pipes / num_active.
poller.waitAndDispatchEvents() calls poll() or ioctl(m_dpfd, DP_POLL, &dopoll), as appropriate, then calls an event handler for each ready socket.
The event handler for this benchmark just executes
read(event->fd, buf, 1);
Setup - Linux
Download the /dev/poll patch. (Note: the author of the /dev/poll patch asked me to remove the link. The preferred interface for Linux is sys_epoll; use that instead of /dev/poll on Linux. This will require an interesting rewrite of user code, as sys_epoll is edge-triggered, but /dev/poll was level-triggered.)
Apply the patch, configure your kernel to enable /dev/poll support (with 'make menuconfig'), and rebuild the kernel.
Create an entry in /dev for the /dev/poll driver with
cd /dev
mknod poll u 15 125
chmod 666 /dev/poll
where 15 is MISC_MAJOR and 125 is DEVPOLL_MINOR from the kernel sources; your MISC_MAJOR may differ, be sure to check /usr/src/linux/include/linux/major.h for the definition of MISC_MAJOR on your system.
Create a symbolic link so the benchmark (which includes usr/include/sys/devpoll.h) can be compiled:
cd /usr/include/asm
ln -s ../linux/devpoll.h
Setup - Solaris
On Solaris 7, you may need to install a patch to get /dev/poll (or at least to get it to work properly); it's standard in Solaris 8. See also my notes on /dev/poll.
Also, a line near the end of /usr/include/sys/poll_impl.h may need to be moved to get it to compile when included from C++ programs.
Procedure
Download the dkftpbench source tarball from http://www.kegel.com/dkftpbench/ and unpack.
On Linux, if you want kernel profile results, boot with argument 'profile=2' to enable the kernel's builtin profiler.
Run the shell script Poller_bench.sh as follows:
su
sh Poller_bench.sh
The script raises file descriptor limits, then runs the command
./Poller_bench 5 1 spd 100 1000 10000
It should be run on an idle machine, with no email client, web browser, or X server running. The Pentium III machine at my disposal was running a single sshd; the Solaris machine was running two sshd's and an idle XWindow server, so it wasn't quite as idle.
Results
With 1 active socket amongst 100, 1000, or 10000 total sockets, waitAndDispatchEvents takes the following amount of wall-clock time, in microseconds (lower is faster):
On a 167MHz sun4u Sparc Ultra-1 running SunOS 5.7 (Solaris 7) Generic_106541-11:
pipes 100 1000 10000
select 151 - -
poll 470 676 3742
/dev/poll 61 70 92
165133 microseconds to open each of 10000 socketpairs
29646 microseconds to close each of 10000 socketpairs
On a 4X400Mhz Enterprise 450 running Solaris 8 (results contributed by Doug Lea):
pipes 100 1000 10000
select 60 - -
poll 273 388 1559
/dev/poll 27 28 34
116586 microseconds to open each of 10000 socketpairs
19235 microseconds to close each of 10000 socketpairs
(The machine wasn't idle, but at most one CPU was doing other stuff during test, and the test seemed to occupy only one CPU.)
On an idle 650 MHz dual Pentium III running Red Hat Linux 6.2 with kernel 2.2.14smp plus the /dev/poll patch plus Dave Miller's patch to speed up close():
pipes 100 1000 10000
select 28 - -
poll 23 890 11333
/dev/poll 19 146 4264
(Time to open or close socketpairs was not recorded, but was under 14 microseconds.)
On the same machine as above, but with kernel 2.4.0-test10-pre4 smp:
pipes 100 1000 10000
select 52 - -
poll 49 1184 14660
26 microseconds to open each of 10000 socketpairs
14 microseconds to close each of 10000 socketpairs
(Note: the /dev/poll patch does not apply cleanly to recent 2.4.0-test kernels, I believe, and I did not try it.)
On a single processor 600Mhz Pentium-III with 512MB of memory, running FreeBSD 4.x-STABLE (results contributed by Jonathan Lemon):
pipes 100 1000 10000 30000
select 54 - - -
poll 50 552 11559 35178
kqueue 8 8 8 8
(Note: Jonathan also varied the number of active pipes, and found that kqueue's time scaled linearly with that number, whereas poll's time scaled linearly with number of total pipes.)
The test was also run with pipes instead of socketpairs (results not shown); the performance on Solaris was about the same, but the /dev/poll driver on Linux did not perform well with pipes. According to Niels Provos,
The hinting code which causes a considerable speed up for /dev/poll only applies to network sockets. If there are any serious applications that make uses of pipes in a manner that would benefit from /dev/poll then the pipe code needs to return hints too.
Discussion
Miscellany
Running the benchmark was painfully slow on Solaris 7 because the time to create or close socketpairs was outrageous. Likewise, on unpatched 2.2.14, the time to close socketpairs was outrageous, but the recent patch from Dave Miller fixes that nicely.
2.4.0-test10-pre4 was slower than 2.2.14 in all cases tested.
I should show results for pipes as well as socketpairs.
The Linux 2.2.14 /dev/poll driver printed messages to the console when sockets were closed; this should probably be disabled for production.
kqueue()
It looks offhand like kqueue() performs best of all the tested methods. It's even faster than, and scales better than, /dev/poll, at least in this microbenchmark.
/dev/poll vs. poll
In all cases tested involving sockets, /dev/poll was appreciably faster than poll().
The 2.2.14 Linux /dev/poll driver was about six times faster than poll() for 1000 fds, but fell down to only 2.7 times faster at 10000 fds. The Solaris /dev/poll driver was about seven times faster than poll() at 100 fds, and increased to 40 times faster at 10000 fds.
Scalability of poll() and /dev/poll
Under Solaris 7, when the number of idle sockets was increased from 100 to 10000, the time to check for active sockets with poll() and /dev/poll increased by a factor of only 6.5 (good) and 1.5 (fantastic), respectively.
Under Linux 2.2.14, when the number of idle sockets was increased from 100 to 10000, the time to check for active sockets with poll() and /dev/poll increased by a factor of 493 and 224, respectively. This is terribly, horribly bad scaling behavior.
Under Linux 2.4.0-test10-pre4, when the number of idle sockets was increased from 100 to 10000, the time to check for active sockets with poll() increased by a factor of 300. This is terribly, horribly bad scaling behavior.
There seems to be a scalability problem in poll() under both Linux 2.2.14 and 2.4.0-test10-pre4 and in /dev/poll under Linux 2.2.14.
poll() is stuck with an interface that dictates O(n) behavior on total pipes; still, Linux's implementation could be improved. The design of the current Linux /dev/poll patch is O(n) in total pipes, in spite of the fact that its interface allows it to be O(1) in total pipes and O(n) only in active pipes.
See also the recent discussions on linux-kernel.
Results - kernel profiling
To look for the scalability problem, I added support to the benchmark to trigger the Linux kernel profiler. A few results are shown below. (No smoking gun was found, but then, I wouldn't know a smoking gun if it hit me in the face. Perhaps real kernel hackers can pick up the hunt from here.)
If you run the above test on a Linux system booted with 'profile=2', Poller_bench will output one kernel profiling data file per test condition. Poller_bench.sh does a gross analysis using 'readprofile | sort -rn | head > bench%d%c.top' to find the kernel functions with the highest CPU usage, where %d is the number of socketpairs, and %c is p for poll, d for /dev/poll, etc.
'more bench10000*.top' shows the results for 10000 socketpairs. On 2.2.14, it shows:
::::::::::::::
bench10000d.dat.top
::::::::::::::
901 total 0.0008
833 dp_poll 1.4875
27 do_bottom_half 0.1688
7 __get_request_wait 0.0139
4 startup_32 0.0244
3 unix_poll 0.0203
::::::::::::::
bench10000p.dat.top
::::::::::::::
584 total 0.0005
236 unix_poll 1.5946
162 sock_poll 4.5000
148 do_poll 0.6727
24 sys_poll 0.0659
7 __generic_copy_from_user 0.1167
This seems to indicate that /dev/poll spends nearly all of its time in dp_poll(), and poll spends a fair bit of time in three routines: unix_poll, sock_poll, and do_poll.
On 2.4.0-test10-pre4 smp, 'more bench10000*.top' shows:
::::::::::::::
2.4/bench10000p.dat.top
::::::::::::::
1507 total 0.0011
748 default_idle 14.3846
253 unix_poll 1.9167
209 fget 2.4881
195 sock_poll 5.4167
29 sys_poll 0.0342
29 fput 0.1272
29 do_pollfd 0.1648
It seems curious that the idle routine should show up so much, but it's probably just the second CPU doing nothing.
Poller_bench.sh will also try to do a fine analysis of dp_poll() using the 'profile' tool (source included), which is a variant of readprofile that shows hotspots within kernel functions. Looking at its output for the run on 2.2.14, the three four-byte regions that take up the most CPU time in dp_poll() in the 10000 socketpair case are
c01d9158 39.135654% 326
c01d9174 11.404561% 95
c01d91a0 27.250900% 227
Looking at the output of 'objdump -d /usr/src/linux/vmlinux', that region corresponds to the object code:
c01d9158: c7 44 24 14 00 00 00 movl $0x0,0x14(%esp,1)
c01d915f: 00
c01d9160: 8b 74 24 24 mov 0x24(%esp,1),%esi
c01d9164: 8b 86 8c 04 00 00 mov 0x48c(%esi),%eax
c01d916a: 3b 50 04 cmp 0x4(%eax),%edx
c01d916d: 73 0a jae c01d9179
c01d916f: 8b 40 10 mov 0x10(%eax),%eax
c01d9172: 8b 14 90 mov (%eax,%edx,4),%edx
c01d9175: 89 54 24 14 mov %edx,0x14(%esp,1)
c01d9179: 83 7c 24 14 00 cmpl $0x0,0x14(%esp,1)
c01d917e: 75 12 jne c01d9192
c01d9180: 53 push %ebx
c01d9181: ff 74 24 3c pushl 0x3c(%esp,1)
c01d9185: e8 5a fc ff ff call c01d8de4
c01d918a: 83 c4 08 add $0x8,%esp
c01d918d: e9 d1 00 00 00 jmp c01d9263
c01d9192: 8b 7c 24 10 mov 0x10(%esp,1),%edi
c01d9196: 0f bf 4f 06 movswl 0x6(%edi),%ecx
c01d919a: 31 c0 xor %eax,%eax
c01d919c: f0 0f b3 43 10 lock btr %eax,0x10(%ebx)
c01d91a1: 19 c0 sbb %eax,%eax
I'm not yet familiar enough with kernel hacker tools to associate those with lines of code in /usr/src/linux/drivers/char/devpoll.c, but that 'lock btr' hotspot appears to be the call to test_and_clear_bit().
lmbench results
lmbench results are presented here to help people trying to compare the Intel and Sparc parts of the results shown above.
The source used was lmbench-2alpha10 from bitmover.com. I did not check into why the TCP test failed on the linux box.
L M B E N C H 1 . 9 S U M M A R Y
------------------------------------
(Alpha software, do not distribute)
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host OS Mhz null null open selct sig sig fork exec sh
call I/O stat clos inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
sparc-sun SunOS 5.7 167 2.9 12. 48 55 0.40K 6.6 81 3.8K 15K 32K
i686-linu Linux 2.2.14d 651 0.5 0.8 4 5 0.03K 1.4 2 0.3K 1K 6K
Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
sparc-sun SunOS 5.7 19 69 235 114 349 116 367
i686-linu Linux 2.2.14d 1 5 17 5 129 30 129
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP
ctxsw UNIX UDP TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
sparc-sun SunOS 5.7 19 60 120 197 215 1148
i686-linu Linux 2.2.14d 1 7 13 31 80
File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page
Create Delete Create Delete Latency Fault Fault
--------- ------------- ------ ------ ------ ------ ------- ----- -----
sparc-sun SunOS 5.7 6605 15 5.2K
i686-linu Linux 2.2.14d 10 0 19 1 5968 1 0.5K
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host OS Pipe AF TCP File Mmap Bcopy Bcopy Mem Mem
UNIX reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
sparc-sun SunOS 5.7 60 55 54 84 122 177 89 122 141
i686-linu Linux 2.2.14d 528 366 -1 357 451 150 138 451 171
Memory latencies in nanoseconds - smaller is better
(WARNING - may not be correct, check graphs)
---------------------------------------------------
Host OS Mhz L1 $ L2 $ Main mem Guesses
--------- ------------- --- ---- ---- -------- -------
sparc-sun SunOS 5.7 167 12 59 273
i686-linu Linux 2.2.14d 651 4 10 131
发表评论
-
New Enterprise Security Solutions
2011-09-13 15:46 0<!-- [if !mso]> <styl ... -
ES Announces Enterprise Security Solutions
2011-09-13 15:40 0<!-- [if !mso]> <styl ... -
linux下如何将文件打包、压缩并分割成制定大小?
2010-09-15 18:52 3319将大文件或目录打包、 ... -
rhel4 yum安装, 使用
2010-09-07 16:37 0第一种方法: yum源来自chinalinuxpub.com ... -
Windows: 远程自动安装程序
2010-08-26 15:48 1114问题的提出 作为 ... -
Oracle体系结构
2010-08-07 09:53 1049Oracle体系结构 Oracle Server包括Oracl ... -
ocp sesson 3
2010-07-31 14:39 0show parameter undo 只有 默认情况下服务 ... -
ocp session 2
2010-07-25 17:00 0/home/oracle/raInventory/orains ... -
ocp session 1
2010-07-24 13:02 0ocp first lesson D:\oracle_cou ... -
Python的xmlrpc调试
2010-07-19 23:55 2141Python的xmlrpc 调 试 ----------- ... -
mdadm使用详解及RAID 5简单分析
2010-07-11 16:19 1401http://blog.csdn.net/chinalinux ... -
Linux的lvm的基本配置步骤
2010-07-11 14:53 12931.增加硬件 增加的ide硬盘前缀为hd,scs ... -
OCP study material
2010-07-11 13:52 0\\192.168.1.105watch -n 1 'stat ... -
apache+python+mod_python+django 编译安装指南
2010-06-24 17:25 14781、本文将知道你在 linux 下使用源码包安装 ... -
在ubuntu下配置apache运行python脚本
2010-06-22 16:11 2282常用的简单命令 sudo apt ... -
Python 2.5 Quick Reference
2010-06-21 11:18 1475... -
shell 面试题汇集
2010-06-10 19:50 1075利用 top 取某个进程的 CPU 的脚本 : ... -
shell程序面试题
2010-06-10 19:48 29421.要求分析Apache访问日志,找出里面数量在前面100位的 ... -
EMC技术支持工程师笔试部分试题回忆
2010-06-07 15:16 1660要查看更多EMC公司笔经相关信息,请访问EMC公司校园招聘CL ... -
linux shell 条件语句
2010-06-03 23:29 1805...
相关推荐
> FreeBSD(MacOSX, iOS...) - EPOLL > Linux(Linux, Android...) - 支持极高的并发 - Windows > 能跑10万以上的并发数, 需要修改注册表调整默认的最大端口数 - Mac > 做了初步测试, 测试环境为虚拟机中的...
2. **FreeBSD Kqueue**: Kqueue是FreeBSD操作系统中的一种事件通知机制,与Epoll类似,但具有更强的通用性和灵活性。Kqueue不仅可以用于文件描述符,还可以监控信号、进程、线程等。在c-event-machine中,kqueue被...
libevent支持多种事件模型,如epoll(Linux)、kqueue(FreeBSD)、select和poll等,但在Windows上,它通常使用select或WSAAsyncSelect,这些模型在高并发场景下性能有限。 **集成Windows IOCP到libevent** 1. **...
而"含有libevent-vc6代码.txt"可能是包含了一种名为libevent的跨平台事件库的VC6编译版本,虽然libevent通常用于epoll(Linux)和kqueue(FreeBSD)等机制,但可能也包含了对IOCP的支持。 在实际开发中,使用IOCP时...
对于IO多路复用,epoll机制是linux独有的,其他类unix系统(macOS、FreeBSD、OpenBSD、NetBSD)使用的是kqueue,但是SUNOS系列使用的是event ports(即evports)。 IOCP是window基于线程池技术实现的异步IO,非常稳定。
本文将深入探讨一款名为"libfiber"的高性能协同库,它为Linux、FreeBSD以及Windows操作系统提供了强大的异步I/O和多路复用技术,包括select、poll、epoll、kqueue和iocp等。 首先,让我们理解一下协同库的核心价值...
Windows上采用iocp、Linux上采用epoll、Bsd上采用kqueue。 5、功能强大、灵活 kangle的访问控制理念来自linux的iptables防火墙,kangle拥有功能最小化的匹配模块和标记模块,通过组合,反转等可以实现用户最复杂的...
Windows上采用iocp、Linux上采用epoll、Bsd上采用kqueue。 5、功能强大、灵活 kangle的访问控制理念来自linux的iptables防火墙,kangle拥有功能最小化的匹配模块和标记模块,通过组合,反转等可以实现用户最复杂的...
1. **多事件机制**:libevent支持多种操作系统提供的事件通知机制,如Linux的epoll、FreeBSD的kqueue、Windows的IOCP等,能自动选择最优的事件模型,提供最佳性能。 2. **事件驱动**:libevent采用非阻塞I/O模型,...
Windows上采用iocp、Linux上采用epoll、Bsd上采用kqueue。 5、功能强大、灵活 kangle的访问控制理念来自linux的iptables防火墙,kangle拥有功能最小化的匹配模块和标记模块,通过组合,反转等可以实现用户最复杂的...
这种机制基于操作系统提供的事件机制,如Linux的epoll、FreeBSD的kqueue或Windows的IOCP。 在Libevent中,有以下几个关键概念: 1. **Event Base**:这是Libevent的核心,负责管理和调度所有事件。每个Event Base...
4. **Epoll(Linux)/Kqueue(FreeBSD)**:在Unix-like系统中,Epoll和Kqueue提供了类似的功能,用于高效地管理大量套接字的I/O事件。 5. **选择器(Selectors)**:如Java的Selector API,它们允许程序在一个单独...
- **I/O复用**:通过`select`、`poll`、`epoll`(Linux)或`kqueue`(FreeBSD)等函数,应用可以监控多个socket,当它们准备就绪时,系统会通知。I/O复用模型适合处理大量并发连接,特别是服务器场景。 - **信号...
首先,libevent的核心是其事件模型,它支持多种操作系统内建的事件机制,如POSIX的epoll、FreeBSD的kqueue、Windows的IOCP等。在VS2005中,我们需要为libevent选择合适的事件机制,通常是基于Win32 API的...
3. 多路复用:Boost.Asio支持IO复用技术,如epoll(Linux)、kqueue(FreeBSD/Mac OS X)和IOCP(Windows),这使得单个线程可以同时处理多个I/O事件。 4. 时间管理和超时:Boost.Asio提供了定时器类,可以设置超时...
1. **完整的事件循环机制**:支持多种操作系统上的事件轮询机制,如 epoll(Linux)、kqueue(FreeBSD 和 macOS)、IOCP(Windows)以及 event ports。 2. **异步 TCP 和 UDP 套接字**:通过非阻塞的方式处理 TCP ...
2. **跨平台支持**:libuv库旨在为多种操作系统提供一致的API,包括但不限于Windows、Linux、macOS、FreeBSD等,这使得开发者可以编写一次代码,到处运行。 3. **网络支持**:libuv提供了TCP和UDP套接字、HTTP、...
- **kqueue** (FreeBSD/MacOS) - **IOCP** (Windows) - **event ports** (Solaris) libuv 提供了针对不同操作系统优化的事件循环机制,确保跨平台的一致性和高效性。 ##### 2. 异步网络操作 - **TCP** 和 **UDP**...
2. **事件模型**:libevent 支持多种事件模型,如epoll (Linux),kqueue (FreeBSD),select 和 poll 等。在Windows环境下,它使用IOCP(I/O完成端口)来实现高性能的事件通知。 3. **事件回调**:当某个事件发生时...
3. **I/O多路复用器设置**:zeromq使用了epoll(Linux)、kqueue(FreeBSD)、IOCP(Windows)等操作系统级别的I/O多路复用技术,`zmq_init`会根据运行平台选择合适的多路复用器并进行初始化。 4. **上下文参数设置...