锁定老帖子 主题:我所做的Java和C++性能测试
精华帖 (2) :: 良好帖 (2) :: 新手帖 (11) :: 隐藏帖 (7)
|
|
---|---|
作者 | 正文 |
发表时间:2011-05-25
jellyfish 写道 RednaxelaFX 写道 jellyfish 写道 java's sin() is slow comparing to c calls. Several years ago, I saw an article posting the comparing results, my runs came out almost the same results. It was about 60 times slow. Someone did some dig on the c side and found out ms did some optimization using assembly.
On the other hand, after digging the code on the java side, folks initiated a discusion with java's grandfather (you could google his name and sin function), then a debate came out as a big news. Not sure about the performance of your code. Of course, there is always a hardware acceleration, such as GPU, but I am assuming we are in the context of the normal pc, since that's where I am working on everyday. You miles may vary. That's "your mileage may vary". Learn, YMMV. It's bad micro benchmarks that give people the wrong impression that Java is slow. A Java program may be slower than a well-tuned C program in a real world scenario, but that's got nothing to do with what those bad benchmarks are trying to tell you. Go ahead and disassemble your /lib64/libm.so.6 equivalent, and see what "<sin>" turns into for yourself. As for articles on Java's floating point arithmetic, I guess you're referring to something like this: How Java's Floating-Point Hurts Everyone Everywhere. That's from more than a decade ago, and the statements in it don't hold anymore now. That reference is really a wild guess. NO, it's not. http://blogs.oracle.com/jag/entry/transcendental_meditation a simple google on "java sin cos slow" can generate a lot of interesting entries, especially in the game arena, so it should be classified as "well known". While I am saying java sin() is slower than C version, I am not saying in general java is slow, did I? In fact, if you can make java sin() as closely fast as C, I would take it. I did quite some coding on how to make those special functions as fast as possible, such as gamma and log gamma. It's just hard. It's so hard that sometimes people accept the inaccuracy as the cost. I've done a lot performance tunings as well, and have seen so many cases for premature optimization. The most common case is that people don't understand the problem itself and still try to optimize/profile it. James Gosling的文章大意就是为了在平台的浮点处理器中保持[-pi/4, pi/4]范围外数值可以获得符合jvm规范精度要求的计算结果,java先进行了精度处理,再使用fsin指令。所以比cpp更精确,但也要慢一些。 这话题已经从cpp和java性能对比转移到cpp和java的sin()实现对比上了。我相信广大同学还是比较想听听跳出楼主这段测试程序之外的,对cpp和java的性能评价。挖了个坑,请撒迦同学一跳。奸笑ing…… |
|
返回顶楼 | |
发表时间:2011-05-25
优化完代码,用Oracle的JRocket JVM跑一下。
最后你会发现Java浮点计算只比C++大概慢10%左右,Java的强项是I/O,Java的I/O比C++来得快,C++的I/O需要很多很多优化才能攀比的上,这也是Java Network Programming比较火的原因。而且Java的开发效率要高得多。 Java是一个半编译半解释语言。半编译的原因是Java会在运行当中动态编译成本地字节码,这是为了跨平台的考虑,所以Java会越跑越快。 当然,很多论点都是在一些国外专业的评测网站上看到的,有些模糊的印象,自己并未做过测试。 |
|
返回顶楼 | |
发表时间:2011-05-25
IO,java能快得过链crt库的c++?
大量的随机new,delete操作,c++使用内存池自管理分配,带来的性能提升也不是java能比的,更不用说gc 想看综合java和c++的性能比较最简单的方式就是对比db的各个版本的driver,Oracle,dbb,mongodb,这些driver都是直接的平台代码运行能力的体现 java要拼效率就要避免同步操纵io设备和jni的使用,或者用一次性的大量吞吐稀释频繁jni的成本 |
|
返回顶楼 | |
发表时间:2011-05-26
RednaxelaFX 写道 hmm, microbenchmarks...
...... 记得用JDK6u25,java -server -XX:InlineSmallCode=2000 -XX:+AggressiveOpts PerformTest 来跑 然后像楼下说的,有条件的话试试IBM JDK 6,重复多跑几次会有发现 >_< 我用了两个版本跑了下, 发现相互之间相差不大。 jdk1.6.0_25/bin/./java -server -XX:InlineSmallCode=2000 -XX:+AggressiveOpts 在jdk 1.6.0_25下 。 start test... Program run duration: 86146.587816 ms. start test... Program run duration: 82345.412324 ms. 在jdk 1.6.0_21下 。 start test... Program run duration: 85105.833883 ms. |
|
返回顶楼 | |
发表时间:2011-05-26
ppgunjack 写道 IO,java能快得过链crt库的c++?
大量的随机new,delete操作,c++使用内存池自管理分配,带来的性能提升也不是java能比的,更不用说gc 想看综合java和c++的性能比较最简单的方式就是对比db的各个版本的driver,Oracle,dbb,mongodb,这些driver都是直接的平台代码运行能力的体现 java要拼效率就要避免同步操纵io设备和jni的使用,或者用一次性的大量吞吐稀释频繁jni的成本 好吧。无意义的争论是没有必要的。 你可以看看一些高手和权威机构的解释。 http://blog.csdn.net/yongzhewuwei_2008/archive/2006/11/16/1387476.aspx |
|
返回顶楼 | |
发表时间:2011-05-26
JDK自从升级到1.6之后,速度继续翻番。我个人觉得C++只适合一些专业领域了……但是它又没有C的速度快,所以生存空间只会越来越小……
|
|
返回顶楼 | |
发表时间:2011-05-26
这个帖子感觉很多是瞎掰
“Java写的数据库的性能是C++写的数据库性能的近600倍!” Oracle中c++和c访问数据库速度没什么差别,和jdbc比是数量级的优势 06年写的文章内联不好说就是胡说,刚刚在gcc4.5.2上试了下 -O3在父类指针调用子类方法和调父类自身方法时都被正确内联化了 |
|
返回顶楼 | |
发表时间:2011-05-26
ppgunjack 写道 这个帖子感觉很多是瞎掰
“Java写的数据库的性能是C++写的数据库性能的近600倍!” Oracle中c++和c访问数据库速度没什么差别,和jdbc比是数量级的优势 06年写的文章内联不好说就是胡说,刚刚在gcc4.5.2上试了下 -O3在父类指针调用子类方法和调父类自身方法时都被正确内联化了 恩,说得好.既然抱怨瞎掰,那干嘛不动手验证一下分析一下呢? |
|
返回顶楼 | |
发表时间:2011-05-26
DOCDOC 写道 ppgunjack 写道 这个帖子感觉很多是瞎掰
“Java写的数据库的性能是C++写的数据库性能的近600倍!” Oracle中c++和c访问数据库速度没什么差别,和jdbc比是数量级的优势 06年写的文章内联不好说就是胡说,刚刚在gcc4.5.2上试了下 -O3在父类指针调用子类方法和调父类自身方法时都被正确内联化了 恩,说得好.既然抱怨瞎掰,那干嘛不动手验证一下分析一下呢? main.o: file format pe-i386 Disassembly of section .text: 00000000 <__ZN4Test7testIntEv>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 56 push %esi 4: 53 push %ebx 5: bb 5a 09 00 00 mov $0x95a,%ebx a: b9 01 00 00 00 mov $0x1,%ecx f: be 56 55 55 55 mov $0x55555556,%esi 14: 89 c8 mov %ecx,%eax 16: f7 ee imul %esi 18: 89 c8 mov %ecx,%eax 1a: c1 f8 1f sar $0x1f,%eax 1d: 29 c2 sub %eax,%edx 1f: 21 da and %ebx,%edx 21: 41 inc %ecx 22: 83 eb 08 sub $0x8,%ebx 25: 81 f9 41 0d 03 00 cmp $0x30d41,%ecx 2b: 75 e7 jne 14 <__ZN4Test7testIntEv+0x14> 2d: 8b 45 08 mov 0x8(%ebp),%eax 30: 89 50 04 mov %edx,0x4(%eax) 33: 5b pop %ebx 34: 5e pop %esi 35: c9 leave 36: c3 ret 37: 90 nop 00000038 <__ZN4Test6doTestEv>: 38: 55 push %ebp 39: 89 e5 mov %esp,%ebp 3b: 53 push %ebx 3c: 83 ec 14 sub $0x14,%esp 3f: 8b 5d 08 mov 0x8(%ebp),%ebx 42: 8b 03 mov (%ebx),%eax 44: 89 1c 24 mov %ebx,(%esp) 47: ff 10 call *(%eax) 49: 8b 03 mov (%ebx),%eax 4b: 89 5d 08 mov %ebx,0x8(%ebp) 4e: 8b 40 04 mov 0x4(%eax),%eax 51: 83 c4 14 add $0x14,%esp 54: 5b pop %ebx 55: c9 leave 56: ff e0 jmp *%eax 00000058 <___tcf_0>: 58: 55 push %ebp 59: 89 e5 mov %esp,%ebp 5b: 83 ec 18 sub $0x18,%esp 5e: c7 04 24 00 00 00 00 movl $0x0,(%esp) 65: e8 00 00 00 00 call 6a <___tcf_0+0x12> 6a: c9 leave 6b: c3 ret 0000006c <__ZN4Test10testDoubleEv>: 6c: 55 push %ebp 6d: 89 e5 mov %esp,%ebp 6f: 53 push %ebx 70: 83 ec 24 sub $0x24,%esp 73: dd 05 18 00 00 00 fldl 0x18 79: bb 01 00 00 00 mov $0x1,%ebx 7e: eb 10 jmp 90 <__ZN4Test10testDoubleEv+0x24> 80: dd d8 fstp %st(0) 82: 89 5d f4 mov %ebx,-0xc(%ebp) 85: db 45 f4 fildl -0xc(%ebp) 88: dd 1c 24 fstpl (%esp) 8b: e8 00 00 00 00 call 90 <__ZN4Test10testDoubleEv+0x24> 90: 8d 04 9d 00 00 00 00 lea 0x0(,%ebx,4),%eax 97: dd 05 20 00 00 00 fldl 0x20 9d: 50 push %eax 9e: da 04 24 fiaddl (%esp) a1: 83 c4 04 add $0x4,%esp a4: de c9 fmulp %st,%st(1) a6: 43 inc %ebx a7: 81 fb 51 c3 00 00 cmp $0xc351,%ebx ad: 75 d1 jne 80 <__ZN4Test10testDoubleEv+0x14> af: 8b 45 08 mov 0x8(%ebp),%eax b2: dd 58 08 fstpl 0x8(%eax) b5: 83 c4 24 add $0x24,%esp b8: 5b pop %ebx b9: c9 leave ba: c3 ret bb: 90 nop 000000bc <__ZN4TestC1Ev>: bc: 55 push %ebp bd: 89 e5 mov %esp,%ebp bf: 8b 45 08 mov 0x8(%ebp),%eax c2: c7 00 08 00 00 00 movl $0x8,(%eax) c8: c7 40 04 00 00 00 00 movl $0x0,0x4(%eax) cf: d9 ee fldz d1: dd 58 08 fstpl 0x8(%eax) d4: c9 leave d5: c3 ret d6: 66 90 xchg %ax,%ax 000000d8 <_main>: d8: 55 push %ebp d9: 89 e5 mov %esp,%ebp db: 83 e4 f0 and $0xfffffff0,%esp de: 57 push %edi df: 56 push %esi e0: 53 push %ebx e1: 83 ec 14 sub $0x14,%esp e4: e8 00 00 00 00 call e9 <_main+0x11> e9: c7 44 24 08 0d 00 00 movl $0xd,0x8(%esp) f0: 00 f1: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp) f8: 00 f9: c7 04 24 00 00 00 00 movl $0x0,(%esp) 100: e8 00 00 00 00 call 105 <_main+0x2d> 105: a1 00 00 00 00 mov 0x0,%eax 10a: 8b 40 f4 mov -0xc(%eax),%eax 10d: 8b 98 7c 00 00 00 mov 0x7c(%eax),%ebx 113: 85 db test %ebx,%ebx 115: 0f 84 95 01 00 00 je 2b0 <_main+0x1d8> 11b: 80 7b 1c 00 cmpb $0x0,0x1c(%ebx) 11f: 0f 84 37 01 00 00 je 25c <_main+0x184> 125: 8a 43 27 mov 0x27(%ebx),%al 128: 0f be c0 movsbl %al,%eax 12b: 89 44 24 04 mov %eax,0x4(%esp) 12f: c7 04 24 00 00 00 00 movl $0x0,(%esp) 136: e8 00 00 00 00 call 13b <_main+0x63> 13b: 89 04 24 mov %eax,(%esp) 13e: e8 00 00 00 00 call 143 <_main+0x6b> 143: e8 00 00 00 00 call 148 <_main+0x70> 148: 89 c6 mov %eax,%esi 14a: c7 04 24 10 00 00 00 movl $0x10,(%esp) 151: e8 00 00 00 00 call 156 <_main+0x7e> 156: 89 c7 mov %eax,%edi 158: c7 00 08 00 00 00 movl $0x8,(%eax) 15e: c7 40 04 00 00 00 00 movl $0x0,0x4(%eax) 165: c7 40 08 00 00 00 00 movl $0x0,0x8(%eax) 16c: c7 40 0c 00 00 00 00 movl $0x0,0xc(%eax) 173: bb 10 27 00 00 mov $0x2710,%ebx 178: 8b 07 mov (%edi),%eax 17a: 89 3c 24 mov %edi,(%esp) 17d: ff 50 08 call *0x8(%eax) 180: 4b dec %ebx 181: 75 f5 jne 178 <_main+0xa0> 183: 89 3c 24 mov %edi,(%esp) 186: e8 00 00 00 00 call 18b <_main+0xb3> 18b: c7 44 24 04 10 27 00 movl $0x2710,0x4(%esp) 192: 00 193: c7 04 24 00 00 00 00 movl $0x0,(%esp) 19a: e8 00 00 00 00 call 19f <_main+0xc7> 19f: 89 c3 mov %eax,%ebx 1a1: 8b 00 mov (%eax),%eax 1a3: 8b 40 f4 mov -0xc(%eax),%eax 1a6: 8b 7c 03 7c mov 0x7c(%ebx,%eax,1),%edi 1aa: 85 ff test %edi,%edi 1ac: 0f 84 fe 00 00 00 je 2b0 <_main+0x1d8> 1b2: 80 7f 1c 00 cmpb $0x0,0x1c(%edi) 1b6: 0f 84 d7 00 00 00 je 293 <_main+0x1bb> 1bc: 8a 47 27 mov 0x27(%edi),%al 1bf: 0f be c0 movsbl %al,%eax 1c2: 89 44 24 04 mov %eax,0x4(%esp) 1c6: 89 1c 24 mov %ebx,(%esp) 1c9: e8 00 00 00 00 call 1ce <_main+0xf6> 1ce: 89 04 24 mov %eax,(%esp) 1d1: e8 00 00 00 00 call 1d6 <_main+0xfe> 1d6: e8 00 00 00 00 call 1db <_main+0x103> 1db: 89 c3 mov %eax,%ebx 1dd: c7 44 24 08 05 00 00 movl $0x5,0x8(%esp) 1e4: 00 1e5: c7 44 24 04 0e 00 00 movl $0xe,0x4(%esp) 1ec: 00 1ed: c7 04 24 00 00 00 00 movl $0x0,(%esp) 1f4: e8 00 00 00 00 call 1f9 <_main+0x121> 1f9: 29 f3 sub %esi,%ebx 1fb: 89 5c 24 04 mov %ebx,0x4(%esp) 1ff: c7 04 24 00 00 00 00 movl $0x0,(%esp) 206: e8 00 00 00 00 call 20b <_main+0x133> 20b: 89 c3 mov %eax,%ebx 20d: c7 44 24 08 02 00 00 movl $0x2,0x8(%esp) 214: 00 215: c7 44 24 04 14 00 00 movl $0x14,0x4(%esp) 21c: 00 21d: 89 04 24 mov %eax,(%esp) 220: e8 00 00 00 00 call 225 <_main+0x14d> 225: 8b 03 mov (%ebx),%eax 227: 8b 40 f4 mov -0xc(%eax),%eax 22a: 8b 74 03 7c mov 0x7c(%ebx,%eax,1),%esi 22e: 85 f6 test %esi,%esi 230: 74 7e je 2b0 <_main+0x1d8> 232: 80 7e 1c 00 cmpb $0x0,0x1c(%esi) 236: 74 41 je 279 <_main+0x1a1> 238: 8a 46 27 mov 0x27(%esi),%al 23b: 0f be c0 movsbl %al,%eax 23e: 89 44 24 04 mov %eax,0x4(%esp) 242: 89 1c 24 mov %ebx,(%esp) 245: e8 00 00 00 00 call 24a <_main+0x172> 24a: 89 04 24 mov %eax,(%esp) 24d: e8 00 00 00 00 call 252 <_main+0x17a> 252: 31 c0 xor %eax,%eax 254: 83 c4 14 add $0x14,%esp 257: 5b pop %ebx 258: 5e pop %esi 259: 5f pop %edi 25a: c9 leave 25b: c3 ret 25c: 89 1c 24 mov %ebx,(%esp) 25f: e8 00 00 00 00 call 264 <_main+0x18c> 264: 8b 03 mov (%ebx),%eax 266: c7 44 24 04 0a 00 00 movl $0xa,0x4(%esp) 26d: 00 26e: 89 1c 24 mov %ebx,(%esp) 271: ff 50 18 call *0x18(%eax) 274: e9 af fe ff ff jmp 128 <_main+0x50> 279: 89 34 24 mov %esi,(%esp) 27c: e8 00 00 00 00 call 281 <_main+0x1a9> 281: 8b 06 mov (%esi),%eax 283: c7 44 24 04 0a 00 00 movl $0xa,0x4(%esp) 28a: 00 28b: 89 34 24 mov %esi,(%esp) 28e: ff 50 18 call *0x18(%eax) 291: eb a8 jmp 23b <_main+0x163> 293: 89 3c 24 mov %edi,(%esp) 296: e8 00 00 00 00 call 29b <_main+0x1c3> 29b: 8b 07 mov (%edi),%eax 29d: c7 44 24 04 0a 00 00 movl $0xa,0x4(%esp) 2a4: 00 2a5: 89 3c 24 mov %edi,(%esp) 2a8: ff 50 18 call *0x18(%eax) 2ab: e9 0f ff ff ff jmp 1bf <_main+0xe7> 2b0: e8 00 00 00 00 call 2b5 <_main+0x1dd> 2b5: 8d 76 00 lea 0x0(%esi),%esi 000002b8 <__GLOBAL__I__ZN4TestC2Ev>: 2b8: 55 push %ebp 2b9: 89 e5 mov %esp,%ebp 2bb: 83 ec 18 sub $0x18,%esp 2be: c7 04 24 00 00 00 00 movl $0x0,(%esp) 2c5: e8 00 00 00 00 call 2ca <__GLOBAL__I__ZN4TestC2Ev+0x12> 2ca: c7 04 24 58 00 00 00 movl $0x58,(%esp) 2d1: e8 00 00 00 00 call 2d6 <__GLOBAL__I__ZN4TestC2Ev+0x1e> 2d6: c9 leave 2d7: c3 ret Disassembly of section .text$_ZN3Foo7testIntEv: 00000000 <__ZN3Foo7testIntEv>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: c9 leave 4: c3 ret 5: 90 nop 6: 90 nop 7: 90 nop Disassembly of section .text$_ZN3Foo10testDoubleEv: 00000000 <__ZN3Foo10testDoubleEv>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: c9 leave 4: c3 ret 5: 90 nop 6: 90 nop 7: 90 nop Disassembly of section .text$_ZN3Foo6doTestEv: 00000000 <__ZN3Foo6doTestEv>: 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: c9 leave 4: c3 ret 5: 90 nop 6: 90 nop 7: 90 nop 有兴趣就看吧,给test加了个Foo的父类 所有的子类的function都被提出来了,bdb j版和c版区别、oci、occi、jdbc的性能有兴趣自己验证 |
|
返回顶楼 | |
发表时间:2011-05-26
上面说错了,的确是call到main外面的代码了,即使在O3也并没内联子类代码
先前直接查doTest标号找不到以为被内联了 没有注意循环的汇编很快结束了,而且是通过间接跳转,可能是走虚表了 17d: ff 50 08 call *0x8(%eax) 180: 4b dec %ebx 181: 75 f5 jne 178 <_main+0xa0> 在不存在继承的情况下,O3会出现testInt,testDouble在doTest中被内联,而doTest在main中被内联,但如果出现继承,的确不会对指针的继承function做内联 静态编译应该是无法确定到具体的子类虚函数地址 |
|
返回顶楼 | |