`
starry198804265811
  • 浏览: 9912 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
文章分类
社区版块
存档分类
最新评论

我所做的Java和C++性能测试

阅读更多

今天闲得无聊,突发奇想,做了一个Java和C++的性能对比测试。

 

1 测试方法

  很简单的,就是分别让Java程序和C++程序做很多次的整数和浮点数运算,然后测量测试代码段的运行时间。C++代码中使用Windows函数QueryPerformanceCounter() 获取CPU的高精度计数值,测试开始前后分别获取一次计数值,使用Windows函数QueryPerformanceFrequency()获取运行频率,前后的计数差值除以频率就得到了运行时间。为了使时间测量基准一致,在Java程序中通过JNI去调用这两个Windows函数,测量Java程序的运行时间。

  下面分别给出代码。

 

2 C++测试代码

 

#include "stdafx.h"
#include "windows.h"
#include <iostream>
#include <cmath>
using namespace std;

const int INT_C = 200000;
const int DOU_C = 50000;
const int MAIN_C = 10000;

class Test {
public:
	Test();
	void testInt();
	void testDouble();
	void doTest();
private:
	int m_i;
	double m_d;
};

Test::Test()
{
	m_i = 0;
	m_d = 0.0;
}

void Test::testInt()
{
	for (int i=1;i<= INT_C;i++) {
		m_i = (~(i*7) + 0x963 - i) & (i / 3);
	}
}

void Test::testDouble()
{
	for (int i=1;i<= DOU_C;i++) {
		m_d = ((i<<2) + 0.36954) * sin((double)i);
	}
}

void Test::doTest()
{
	testInt();
	testDouble();
}

int _tmain(int argc, _TCHAR* argv[])
{
	LARGE_INTEGER freq; 
	LARGE_INTEGER start;
	LARGE_INTEGER end;
	QueryPerformanceFrequency(&freq);

	Test* test = NULL;
	int j;

	cout<<"start test..."<<endl;

	QueryPerformanceCounter(&start);
	for (j = 0;j < MAIN_C; j++) {
		test = new Test();
		test->doTest();
		delete test;
	}
	QueryPerformanceCounter(&end);

	double druation = ((double)(end.QuadPart - start.QuadPart)) / ((double)freq.QuadPart);
	cout<<"Program run druation: "<<druation*1000<<" ms."<<endl;
	
	return 0;
}

 

 3 Java测试代码

3.1 测试代码

 

public class PerformTest {
	public static final int INT_C = 200000;
	public static final int DOU_C = 50000;
	public static final int MAIN_C = 10000;
	
	private int m_i;
	private double m_d;
	
	public void testInt() {
		for (int i=1;i<= INT_C;i++) {
			m_i = (~(i*7) + 0x963 - i) & (i / 3);
		}
	}
	
	public void testDouble() {
		for (int i=1;i<= DOU_C;i++) {
			m_d = ((i<<2) + 0.36954) * Math.sin((double)i);
		}
	}
	
	public void doTest() {
		testInt();
		testDouble();
	}
	
	public static void main(String[] args) {
		PerformanceTimer timer = new PerformanceTimer();
		PerformTest test = null;
		int j;
		System.out.println("start test...");
		
		timer.start();
		for (j = 0;j < MAIN_C; j++) {
			test = new PerformTest();
			test.doTest();
			test = null;
		}
		double duration = timer.end();
		
		System.out.println("Program run druation: " + duration + " ms.");
	}
}

 

 3.2 实现时间测量的代码

 

public class PerformanceTimer {
	private double freq;
	private double startTime;
	private double endTime;
	
	public PerformanceTimer() {
		this.freq = queryPerformanceFrequency();
	}
	
	private native double queryPerformanceFrequency();
	
	private native double QueryPerformanceCounter();
	
	public void start() {
		this.startTime = QueryPerformanceCounter();
	}
	
	public double end() {
		this.endTime = QueryPerformanceCounter();
		double duration = (endTime - startTime) / freq * 1000;
		return duration;
	}
	
	static {
		try {
			System.loadLibrary("PerformanceTimer");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

 

 3.3 实现时间测量的本地C++代码

    省略javah生成的头文件,给出实现的源文件。

 

#include "stdafx.h"
#include "windows.h"
#include "PerformanceTimer.h"

JNIEXPORT jdouble JNICALL Java_PerformanceTimer_queryPerformanceFrequency
  (JNIEnv *, jobject)
{
	LARGE_INTEGER freq;
	QueryPerformanceFrequency(&freq);

	return (double)(freq.QuadPart);
}

JNIEXPORT jdouble JNICALL Java_PerformanceTimer_QueryPerformanceCounter
  (JNIEnv *, jobject)
{
	LARGE_INTEGER counter;
	QueryPerformanceCounter(&counter);

	return (double)(counter.QuadPart);
}

 

 

4 测试结果

  我的软硬件环境:

  硬件配置:AMD AthlonII X3 435 2.89GHz; 2GB DDR3内存;WD 500G硬盘;

  软件环境:Windows7 旗舰版;Visual C++ 2008;SUN jdk1.6.0_21.

 

  C++测试结果

 


第一次 第二次 第三次 平均时间
时间(单位:ms) 21023.6 21003.5 21014.7

21013.9

 

  Java测试结果

第一次 第二次 第三次 平均时间
时间(单位:ms) 94369.4 94317.3 94347.2 94344.6

 

C++程序的性能竟是Java的3倍?这真的是他们之间真实的性能差距吗?我所用的测试方法是否科学呢?那么影响我们的Java程序的性能瓶颈在什么地方?

分享到:
评论
55 楼 lfp001 2011-05-31  
看了你的代码,你这样对比测试得出的结论不能说明什么问题。
如果把java代码优化一下下,速度会提升会超出你的预料。不同的JVM运算速度的差别很大。
54 楼 xuwenhao 2011-05-29  
JNI调用需要内存复制,会慢很多的
53 楼 IcyFenix 2011-05-28  
看来ls吊完几瓶点滴又原地满状态复活了。
52 楼 RednaxelaFX 2011-05-28  
jellyfish 写道
That reference is really a wild guess. NO, it's not.

I apologize for the wild guess. My bad.

jellyfish 写道
http://blogs.oracle.com/jag/entry/transcendental_meditation

Yes, this is a nice post that describes exactly what's going on in the HotSpot VM. In fact, if you read the assembler code I posted on page 1 of this thread, that's the actual implementation used in x86-64 version of the HotSpot VM: it does a range check, and then either use fsin if the value is within range [-pi/4, pi/4], otherwise it does argument reduction and then use fsin.

jellyfish 写道
a simple google on "java sin cos slow" can generate a lot of interesting entries, especially in the game arena, so it should be classified as "well known".

Yeah, you can search Google and get a lot of nonsense as well, that's not convincing enough. All we need is solid evidence, which is easier to get by reading source code and running "macro benchmarks" instead of searching Google.

jellyfish 写道
While I am saying java sin() is slower than C version, I am not saying in general java is slow, did I?

No, you didn't. I missed the point, my bad.

jellyfish 写道
In fact, if you can make java sin() as closely fast as C, I would take it. I did quite some coding on how to make those special functions as fast as possible, such as gamma and log gamma. It's just hard. It's so hard that sometimes people accept the inaccuracy as the cost.

"As fast as C" -- that's not hard, if you can accept giving up Java semantics for floating point arithmetic, and instead put the C implementation into a JVM, which is quite easy.
The post you quoted from James Gosling was telling you how sin() and cos() are implemented in HotSpot. But that's their way of doing it, for a tradeoff between performance and Java conformance.

Pick a "sin() in C" implementation that satisfies you, and replace the sin() implementation in HotSpot, and there you've got what you want.
sin() and cos() are intrinsic functions in HotSpot; calling them wouldn't incur any JNI invocation overhead -- JNI would be too slow for this kind of stuff.
If you'd like to give a shot at this, a few of the places to look for would be:
hotspot/src/share/vm/classfile/vmSymbols.hpp
  do_intrinsic(_dsin,                     java_lang_Math,         sin_name,   double_double_signature,           F_S)   \

that's where java.lang.Math.sin() gets declared as an intrinsic function.

hotspot/src/share/vm/runtime/sharedRuntimeTrig.cpp
//----------------------------------------------------------------------
//
// Routines for new sin/cos implementation
//
//----------------------------------------------------------------------

/* sin(x)
 * Return sine function of x.
 *
 * kernel function:
 *      __kernel_sin            ... sine function on [-pi/4,pi/4]
 *      __kernel_cos            ... cose function on [-pi/4,pi/4]
 *      __ieee754_rem_pio2      ... argument reduction routine
 *
 * Method.
 *      Let S,C and T denote the sin, cos and tan respectively on
 *      [-PI/4, +PI/4]. Reduce the argument x to y1+y2 = x-k*pi/2
 *      in [-pi/4 , +pi/4], and let n = k mod 4.
 *      We have
 *
 *          n        sin(x)      cos(x)        tan(x)
 *     ----------------------------------------------------------
 *          0          S           C             T
 *          1          C          -S            -1/T
 *          2         -S          -C             T
 *          3         -C           S            -1/T
 *     ----------------------------------------------------------
 *
 * Special cases:
 *      Let trig be any of sin, cos, or tan.
 *      trig(+-INF)  is NaN, with signals;
 *      trig(NaN)    is that NaN;
 *
 * Accuracy:
 *      TRIG(x) returns trig(x) nearly rounded
 */

JRT_LEAF(jdouble, SharedRuntime::dsin(jdouble x))
  double y[2],z=0.0;
  int n, ix;

  /* High word of x. */
  ix = __HI(x);

  /* |x| ~< pi/4 */
  ix &= 0x7fffffff;
  if(ix <= 0x3fe921fb) return __kernel_sin(x,z,0);

  /* sin(Inf or NaN) is NaN */
  else if (ix>=0x7ff00000) return x-x;

  /* argument reduction needed */
  else {
    n = __ieee754_rem_pio2(x,y);
    switch(n&3) {
    case 0: return  __kernel_sin(y[0],y[1],1);
    case 1: return  __kernel_cos(y[0],y[1]);
    case 2: return -__kernel_sin(y[0],y[1],1);
    default:
      return -__kernel_cos(y[0],y[1]);
    }
  }
JRT_END

the general/slow-path implementation

hotspot/src/cpu/x86/vm/assembler_x86.cpp
void MacroAssembler::trigfunc(char trig, int num_fpu_regs_in_use) {
  // A hand-coded argument reduction for values in fabs(pi/4, pi/2)
  // was attempted in this code; unfortunately it appears that the
  // switch to 80-bit precision and back causes this to be
  // unprofitable compared with simply performing a runtime call if
  // the argument is out of the (-pi/4, pi/4) range.

  Register tmp = noreg;
  if (!VM_Version::supports_cmov()) {
    // fcmp needs a temporary so preserve rbx,
    tmp = rbx;
    push(tmp);
  }

  Label slow_case, done;

  ExternalAddress pi4_adr = (address)&pi_4;
  if (reachable(pi4_adr)) {
    // x ?<= pi/4
    fld_d(pi4_adr);
    fld_s(1);                // Stack:  X  PI/4  X
    fabs();                  // Stack: |X| PI/4  X
    fcmp(tmp);
    jcc(Assembler::above, slow_case);

    // fastest case: -pi/4 <= x <= pi/4
    switch(trig) {
    case 's':
      fsin();
      break;
    case 'c':
      fcos();
      break;
    case 't':
      ftan();
      break;
    default:
      assert(false, "bad intrinsic");
      break;
    }
    jmp(done);
  }

  // slow case: runtime call
  bind(slow_case);
  // Preserve registers across runtime call
  pusha();
  int incoming_argument_and_return_value_offset = -1;
  if (num_fpu_regs_in_use > 1) {
    // Must preserve all other FPU regs (could alternatively convert
    // SharedRuntime::dsin and dcos into assembly routines known not to trash
    // FPU state, but can not trust C compiler)
    NEEDS_CLEANUP;
    // NOTE that in this case we also push the incoming argument to
    // the stack and restore it later; we also use this stack slot to
    // hold the return value from dsin or dcos.
    for (int i = 0; i < num_fpu_regs_in_use; i++) {
      subptr(rsp, sizeof(jdouble));
      fstp_d(Address(rsp, 0));
    }
    incoming_argument_and_return_value_offset = sizeof(jdouble)*(num_fpu_regs_in_use-1);
    fld_d(Address(rsp, incoming_argument_and_return_value_offset));
  }
  subptr(rsp, sizeof(jdouble));
  fstp_d(Address(rsp, 0));
#ifdef _LP64
  movdbl(xmm0, Address(rsp, 0));
#endif // _LP64

  // NOTE: we must not use call_VM_leaf here because that requires a
  // complete interpreter frame in debug mode -- same bug as 4387334
  // MacroAssembler::call_VM_leaf_base is perfectly safe and will
  // do proper 64bit abi

  NEEDS_CLEANUP;
  // Need to add stack banging before this runtime call if it needs to
  // be taken; however, there is no generic stack banging routine at
  // the MacroAssembler level
  switch(trig) {
  case 's':
    {
      MacroAssembler::call_VM_leaf_base(CAST_FROM_FN_PTR(address, SharedRuntime::dsin), 0);
    }
    break;
  case 'c':
    {
      MacroAssembler::call_VM_leaf_base(CAST_FROM_FN_PTR(address, SharedRuntime::dcos), 0);
    }
    break;
  case 't':
    {
      MacroAssembler::call_VM_leaf_base(CAST_FROM_FN_PTR(address, SharedRuntime::dtan), 0);
    }
    break;
  default:
    assert(false, "bad intrinsic");
    break;
  }
#ifdef _LP64
    movsd(Address(rsp, 0), xmm0);
    fld_d(Address(rsp, 0));
#endif // _LP64
  addptr(rsp, sizeof(jdouble));
  if (num_fpu_regs_in_use > 1) {
    // Must save return value to stack and then restore entire FPU stack
    fstp_d(Address(rsp, incoming_argument_and_return_value_offset));
    for (int i = 0; i < num_fpu_regs_in_use; i++) {
      fld_d(Address(rsp, 0));
      addptr(rsp, sizeof(jdouble));
    }
  }
  popa();

  // Come here with result in F-TOS
  bind(done);

  if (tmp != noreg) {
    pop(tmp);
  }
}

a specialized version on x86

hotspot/src/cpu/x86/vm/stubGenerator_x86_64.cpp
    {
      StubCodeMark mark(this, "StubRoutines", "sin");
      StubRoutines::_intrinsic_sin = (double (*)(double)) __ pc();

      __ subq(rsp, 8);
      __ movdbl(Address(rsp, 0), xmm0);
      __ fld_d(Address(rsp, 0));
      __ trigfunc('s');
      __ fstp_d(Address(rsp, 0));
      __ movdbl(xmm0, Address(rsp, 0));
      __ addq(rsp, 8);
      __ ret(0);
    }

the stub code of the intrinsic sin() on x86-64

hotspot/src/share/vm/opto/library_call.cpp
//------------------------------inline_trig----------------------------------
// Inline sin/cos/tan instructions, if possible.  If rounding is required, do
// argument reduction which will turn into a fast/slow diamond.
bool LibraryCallKit::inline_trig(vmIntrinsics::ID id) {
  _sp += arg_size();            // restore stack pointer
  Node* arg = pop_math_arg();
  Node* trig = NULL;

  switch (id) {
  case vmIntrinsics::_dsin:
    trig = _gvn.transform((Node*)new (C, 2) SinDNode(arg));
    break;
  case vmIntrinsics::_dcos:
    trig = _gvn.transform((Node*)new (C, 2) CosDNode(arg));
    break;
  case vmIntrinsics::_dtan:
    trig = _gvn.transform((Node*)new (C, 2) TanDNode(arg));
    break;
  default:
    assert(false, "bad intrinsic was passed in");
    return false;
  }

  // Rounding required?  Check for argument reduction!
  if( Matcher::strict_fp_requires_explicit_rounding ) {

    static const double     pi_4 =  0.7853981633974483;
    static const double neg_pi_4 = -0.7853981633974483;
    // pi/2 in 80-bit extended precision
    // static const unsigned char pi_2_bits_x[] = {0x35,0xc2,0x68,0x21,0xa2,0xda,0x0f,0xc9,0xff,0x3f,0x00,0x00,0x00,0x00,0x00,0x00};
    // -pi/2 in 80-bit extended precision
    // static const unsigned char neg_pi_2_bits_x[] = {0x35,0xc2,0x68,0x21,0xa2,0xda,0x0f,0xc9,0xff,0xbf,0x00,0x00,0x00,0x00,0x00,0x00};
    // Cutoff value for using this argument reduction technique
    //static const double    pi_2_minus_epsilon =  1.564660403643354;
    //static const double neg_pi_2_plus_epsilon = -1.564660403643354;

    // Pseudocode for sin:
    // if (x <= Math.PI / 4.0) {
    //   if (x >= -Math.PI / 4.0) return  fsin(x);
    //   if (x >= -Math.PI / 2.0) return -fcos(x + Math.PI / 2.0);
    // } else {
    //   if (x <=  Math.PI / 2.0) return  fcos(x - Math.PI / 2.0);
    // }
    // return StrictMath.sin(x);

    // Pseudocode for cos:
    // if (x <= Math.PI / 4.0) {
    //   if (x >= -Math.PI / 4.0) return  fcos(x);
    //   if (x >= -Math.PI / 2.0) return  fsin(x + Math.PI / 2.0);
    // } else {
    //   if (x <=  Math.PI / 2.0) return -fsin(x - Math.PI / 2.0);
    // }
    // return StrictMath.cos(x);

    // Actually, sticking in an 80-bit Intel value into C2 will be tough; it
    // requires a special machine instruction to load it.  Instead we'll try
    // the 'easy' case.  If we really need the extra range +/- PI/2 we'll
    // probably do the math inside the SIN encoding.

    // Make the merge point
    RegionNode *r = new (C, 3) RegionNode(3);
    Node *phi = new (C, 3) PhiNode(r,Type::DOUBLE);

    // Flatten arg so we need only 1 test
    Node *abs = _gvn.transform(new (C, 2) AbsDNode(arg));
    // Node for PI/4 constant
    Node *pi4 = makecon(TypeD::make(pi_4));
    // Check PI/4 : abs(arg)
    Node *cmp = _gvn.transform(new (C, 3) CmpDNode(pi4,abs));
    // Check: If PI/4 < abs(arg) then go slow
    Node *bol = _gvn.transform( new (C, 2) BoolNode( cmp, BoolTest::lt ) );
    // Branch either way
    IfNode *iff = create_and_xform_if(control(),bol, PROB_STATIC_FREQUENT, COUNT_UNKNOWN);
    set_control(opt_iff(r,iff));

    // Set fast path result
    phi->init_req(2,trig);

    // Slow path - non-blocking leaf call
    Node* call = NULL;
    switch (id) {
    case vmIntrinsics::_dsin:
      call = make_runtime_call(RC_LEAF, OptoRuntime::Math_D_D_Type(),
                               CAST_FROM_FN_PTR(address, SharedRuntime::dsin),
                               "Sin", NULL, arg, top());
      break;
    case vmIntrinsics::_dcos:
      call = make_runtime_call(RC_LEAF, OptoRuntime::Math_D_D_Type(),
                               CAST_FROM_FN_PTR(address, SharedRuntime::dcos),
                               "Cos", NULL, arg, top());
      break;
    case vmIntrinsics::_dtan:
      call = make_runtime_call(RC_LEAF, OptoRuntime::Math_D_D_Type(),
                               CAST_FROM_FN_PTR(address, SharedRuntime::dtan),
                               "Tan", NULL, arg, top());
      break;
    }
    assert(control()->in(0) == call, "");
    Node* slow_result = _gvn.transform(new (C, 1) ProjNode(call,TypeFunc::Parms));
    r->init_req(1,control());
    phi->init_req(1,slow_result);

    // Post-merge
    set_control(_gvn.transform(r));
    record_for_igvn(r);
    trig = _gvn.transform(phi);

    C->set_has_split_ifs(true); // Has chance for split-if optimization
  }
  // Push result back on JVM stack
  push_pair(trig);
  return true;
}

the inlined version in HotSpot server compiler

It's important so I'm gonna say it twice: if you're willing to make a different choice on the tradeoff between performance and Java conformance, just tweak the code above, and you'll get what you want. The performance won't be that much different from a C implementation if you choose the same tradeoffs.

jellyfish 写道
I've done a lot performance tunings as well, and have seen so many cases for premature optimization. The most common case is that people don't understand the problem itself and still try to optimize/profile it.

Bad microbenchmarks contribute to the "common case" you're talking about, don't you agree?
51 楼 zk7019311 2011-05-26  
还是很有意思的。学习了
50 楼 ppgunjack 2011-05-26  
<pre name="code" class="cpp">main.o:     file format pe-i386


Disassembly of section .text:

00000000 &lt;__ZN4TestC1Ev&gt;:
   0: 55                   push   %ebp
   1: 89 e5                mov    %esp,%ebp
   3: 8b 45 08             mov    0x8(%ebp),%eax
   6: c7 00 00 00 00 00    movl   $0x0,(%eax)
   c: 8b 4d 08             mov    0x8(%ebp),%ecx
   f: b8 00 00 00 00       mov    $0x0,%eax
  14: ba 00 00 00 00       mov    $0x0,%edx
  19: 89 41 08             mov    %eax,0x8(%ecx)
  1c: 89 51 0c             mov    %edx,0xc(%ecx)
  1f: c9                   leave 
  20: c3                   ret   
  21: 90                   nop

00000022 &lt;__ZN4Test7testIntEv&gt;:
  22: 55                   push   %ebp
  23: 89 e5                mov    %esp,%ebp
  25: 53                   push   %ebx
  26: 83 ec 10             sub    $0x10,%esp
  29: c7 45 f8 01 00 00 00 movl   $0x1,-0x8(%ebp)
  30: eb 39                jmp    6b &lt;__ZN4Test7testIntEv+0x49&gt;
  32: 8b 55 f8             mov    -0x8(%ebp),%edx
  35: 89 d0                mov    %edx,%eax
  37: c1 e0 03             shl    $0x3,%eax
  3a: 29 d0                sub    %edx,%eax
  3c: f7 d0                not    %eax
  3e: 05 63 09 00 00       add    $0x963,%eax
  43: 89 c3                mov    %eax,%ebx
  45: 2b 5d f8             sub    -0x8(%ebp),%ebx
  48: 8b 4d f8             mov    -0x8(%ebp),%ecx
  4b: ba 56 55 55 55       mov    $0x55555556,%edx
  50: 89 c8                mov    %ecx,%eax
  52: f7 ea                imul   %edx
  54: 89 c8                mov    %ecx,%eax
  56: c1 f8 1f             sar    $0x1f,%eax
  59: 89 d1                mov    %edx,%ecx
  5b: 29 c1                sub    %eax,%ecx
  5d: 89 c8                mov    %ecx,%eax
  5f: 89 da                mov    %ebx,%edx
  61: 21 c2                and    %eax,%edx
  63: 8b 45 08             mov    0x8(%ebp),%eax
  66: 89 10                mov    %edx,(%eax)
  68: ff 45 f8             incl   -0x8(%ebp)
  6b: 81 7d f8 40 0d 03 00 cmpl   $0x30d40,-0x8(%ebp)
  72: 0f 9e c0             setle  %al
  75: 84 c0                test   %al,%al
  77: 75 b9                jne    32 &lt;__ZN4Test7testIntEv+0x10&gt;
  79: 83 c4 10             add    $0x10,%esp
  7c: 5b                   pop    %ebx
  7d: c9                   leave 
  7e: c3                   ret   
  7f: 90                   nop

00000080 &lt;__ZN4Test10testDoubleEv&gt;:
  80: 55                   push   %ebp
  81: 89 e5                mov    %esp,%ebp
  83: 83 ec 38             sub    $0x38,%esp
  86: c7 45 f4 01 00 00 00 movl   $0x1,-0xc(%ebp)
  8d: eb 2e                jmp    bd &lt;__ZN4Test10testDoubleEv+0x3d&gt;
  8f: 8b 45 f4             mov    -0xc(%ebp),%eax
  92: c1 e0 02             shl    $0x2,%eax
  95: 89 45 e4             mov    %eax,-0x1c(%ebp)
  98: db 45 e4             fildl  -0x1c(%ebp)
  9b: dd 05 28 00 00 00    fldl   0x28
  a1: de c1                faddp  %st,%st(1)
  a3: dd 5d d8             fstpl  -0x28(%ebp)
  a6: db 45 f4             fildl  -0xc(%ebp)
  a9: dd 1c 24             fstpl  (%esp)
  ac: e8 00 00 00 00       call   b1 &lt;__ZN4Test10testDoubleEv+0x31&gt;
  b1: dc 4d d8             fmull  -0x28(%ebp)
  b4: 8b 45 08             mov    0x8(%ebp),%eax
  b7: dd 58 08             fstpl  0x8(%eax)
  ba: ff 45 f4             incl   -0xc(%ebp)
  bd: 81 7d f4 50 c3 00 00 cmpl   $0xc350,-0xc(%ebp)
  c4: 0f 9e c0             setle  %al
  c7: 84 c0                test   %al,%al
  c9: 75 c4                jne    8f &lt;__ZN4Test10testDoubleEv+0xf&gt;
  cb: c9                   leave 
  cc: c3                   ret   
  cd: 90                   nop

000000ce &lt;__ZN4Test6doTestEv&gt;:
  ce: 55                   push   %ebp
  cf: 89 e5                mov    %esp,%ebp
  d1: 83 ec 18             sub    $0x18,%esp
  d4: 8b 45 08             mov    0x8(%ebp),%eax
  d7: 89 04 24             mov    %eax,(%esp)
  da: e8 43 ff ff ff       call   22 &lt;__ZN4Test7testIntEv&gt; &lt;-------------------------------------
  df: 8b 45 08             mov    0x8(%ebp),%eax
  e2: 89 04 24             mov    %eax,(%esp)
  e5: e8 96 ff ff ff       call   80 &lt;__ZN4Test10testDoubleEv&gt; &lt;-------------------------------------
  ea: c9                   leave 
  eb: c3                   ret   

000000ec &lt;_main&gt;:
  ec: 55                   push   %ebp
  ed: 89 e5                mov    %esp,%ebp
  ef: 83 e4 f0             and    $0xfffffff0,%esp
  f2: 53                   push   %ebx
  f3: 83 ec 2c             sub    $0x2c,%esp
  f6: e8 00 00 00 00       call   fb &lt;_main+0xf&gt;
  fb: c7 44 24 04 00 00 00 movl   $0x0,0x4(%esp)
102: 00
103: c7 04 24 00 00 00 00 movl   $0x0,(%esp)
10a: e8 00 00 00 00       call   10f &lt;_main+0x23&gt;
10f: c7 44 24 04 00 00 00 movl   $0x0,0x4(%esp)
116: 00
117: 89 04 24             mov    %eax,(%esp)
11a: e8 00 00 00 00       call   11f &lt;_main+0x33&gt;
11f: e8 00 00 00 00       call   124 &lt;_main+0x38&gt;
124: 89 44 24 18          mov    %eax,0x18(%esp)
128: c7 04 24 10 00 00 00 movl   $0x10,(%esp)
12f: e8 00 00 00 00       call   134 &lt;_main+0x48&gt;
134: 89 c3                mov    %eax,%ebx
136: 89 d8                mov    %ebx,%eax
138: 89 04 24             mov    %eax,(%esp)
13b: e8 c0 fe ff ff       call   0 &lt;__ZN4TestC1Ev&gt;
140: 89 5c 24 14          mov    %ebx,0x14(%esp)
144: c7 44 24 1c 00 00 00 movl   $0x0,0x1c(%esp)
14b: 00
14c: eb 10                jmp    15e &lt;_main+0x72&gt;
14e: 8b 44 24 14          mov    0x14(%esp),%eax
152: 89 04 24             mov    %eax,(%esp)
155: e8 74 ff ff ff       call   ce &lt;__ZN4Test6doTestEv&gt;     &lt;-------------------------------------
15a: ff 44 24 1c          incl   0x1c(%esp)
15e: 81 7c 24 1c 0f 27 00 cmpl   $0x270f,0x1c(%esp)
165: 00
166: 0f 9e c0             setle  %al
169: 84 c0                test   %al,%al
16b: 75 e1                jne    14e &lt;_main+0x62&gt;


000001fd &lt;___tcf_0&gt;:
1fd: 55                   push   %ebp


00000211 &lt;__Z41__static_initialization_and_destruction_0ii&gt;:
211: 55                   push   %ebp


00000240 &lt;__GLOBAL__I__ZN4TestC2Ev&gt;:
240: 55                   push   %ebp

</pre>
<p> </p>
<p> </p>
<p>这是在无父类FOO时O0的汇编,注意到箭头处的call都是静态的绝对地址,因此当O3优化时能够将call处这些静态地址用内联替换</p>
<p> </p>
49 楼 ppgunjack 2011-05-26  
上面说错了,的确是call到main外面的代码了,即使在O3也并没内联子类代码
先前直接查doTest标号找不到以为被内联了
没有注意循环的汇编很快结束了,而且是通过间接跳转,可能是走虚表了
 17d:   ff 50 08                call   *0x8(%eax)   
 180:   4b                      dec    %ebx   
 181:   75 f5                   jne    178 <_main+0xa0>  


在不存在继承的情况下,O3会出现testInt,testDouble在doTest中被内联,而doTest在main中被内联,但如果出现继承,的确不会对指针的继承function做内联
静态编译应该是无法确定到具体的子类虚函数地址
48 楼 ppgunjack 2011-05-26  
DOCDOC 写道
ppgunjack 写道
这个帖子感觉很多是瞎掰
“Java写的数据库的性能是C++写的数据库性能的近600倍!”
Oracle中c++和c访问数据库速度没什么差别,和jdbc比是数量级的优势
06年写的文章内联不好说就是胡说,刚刚在gcc4.5.2上试了下
-O3在父类指针调用子类方法和调父类自身方法时都被正确内联化了

恩,说得好.既然抱怨瞎掰,那干嘛不动手验证一下分析一下呢?


main.o:     file format pe-i386


Disassembly of section .text:

00000000 <__ZN4Test7testIntEv>:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	56                   	push   %esi
   4:	53                   	push   %ebx
   5:	bb 5a 09 00 00       	mov    $0x95a,%ebx
   a:	b9 01 00 00 00       	mov    $0x1,%ecx
   f:	be 56 55 55 55       	mov    $0x55555556,%esi
  14:	89 c8                	mov    %ecx,%eax
  16:	f7 ee                	imul   %esi
  18:	89 c8                	mov    %ecx,%eax
  1a:	c1 f8 1f             	sar    $0x1f,%eax
  1d:	29 c2                	sub    %eax,%edx
  1f:	21 da                	and    %ebx,%edx
  21:	41                   	inc    %ecx
  22:	83 eb 08             	sub    $0x8,%ebx
  25:	81 f9 41 0d 03 00    	cmp    $0x30d41,%ecx
  2b:	75 e7                	jne    14 <__ZN4Test7testIntEv+0x14>
  2d:	8b 45 08             	mov    0x8(%ebp),%eax
  30:	89 50 04             	mov    %edx,0x4(%eax)
  33:	5b                   	pop    %ebx
  34:	5e                   	pop    %esi
  35:	c9                   	leave  
  36:	c3                   	ret    
  37:	90                   	nop

00000038 <__ZN4Test6doTestEv>:
  38:	55                   	push   %ebp
  39:	89 e5                	mov    %esp,%ebp
  3b:	53                   	push   %ebx
  3c:	83 ec 14             	sub    $0x14,%esp
  3f:	8b 5d 08             	mov    0x8(%ebp),%ebx
  42:	8b 03                	mov    (%ebx),%eax
  44:	89 1c 24             	mov    %ebx,(%esp)
  47:	ff 10                	call   *(%eax)
  49:	8b 03                	mov    (%ebx),%eax
  4b:	89 5d 08             	mov    %ebx,0x8(%ebp)
  4e:	8b 40 04             	mov    0x4(%eax),%eax
  51:	83 c4 14             	add    $0x14,%esp
  54:	5b                   	pop    %ebx
  55:	c9                   	leave  
  56:	ff e0                	jmp    *%eax

00000058 <___tcf_0>:
  58:	55                   	push   %ebp
  59:	89 e5                	mov    %esp,%ebp
  5b:	83 ec 18             	sub    $0x18,%esp
  5e:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
  65:	e8 00 00 00 00       	call   6a <___tcf_0+0x12>
  6a:	c9                   	leave  
  6b:	c3                   	ret    

0000006c <__ZN4Test10testDoubleEv>:
  6c:	55                   	push   %ebp
  6d:	89 e5                	mov    %esp,%ebp
  6f:	53                   	push   %ebx
  70:	83 ec 24             	sub    $0x24,%esp
  73:	dd 05 18 00 00 00    	fldl   0x18
  79:	bb 01 00 00 00       	mov    $0x1,%ebx
  7e:	eb 10                	jmp    90 <__ZN4Test10testDoubleEv+0x24>
  80:	dd d8                	fstp   %st(0)
  82:	89 5d f4             	mov    %ebx,-0xc(%ebp)
  85:	db 45 f4             	fildl  -0xc(%ebp)
  88:	dd 1c 24             	fstpl  (%esp)
  8b:	e8 00 00 00 00       	call   90 <__ZN4Test10testDoubleEv+0x24>
  90:	8d 04 9d 00 00 00 00 	lea    0x0(,%ebx,4),%eax
  97:	dd 05 20 00 00 00    	fldl   0x20
  9d:	50                   	push   %eax
  9e:	da 04 24             	fiaddl (%esp)
  a1:	83 c4 04             	add    $0x4,%esp
  a4:	de c9                	fmulp  %st,%st(1)
  a6:	43                   	inc    %ebx
  a7:	81 fb 51 c3 00 00    	cmp    $0xc351,%ebx
  ad:	75 d1                	jne    80 <__ZN4Test10testDoubleEv+0x14>
  af:	8b 45 08             	mov    0x8(%ebp),%eax
  b2:	dd 58 08             	fstpl  0x8(%eax)
  b5:	83 c4 24             	add    $0x24,%esp
  b8:	5b                   	pop    %ebx
  b9:	c9                   	leave  
  ba:	c3                   	ret    
  bb:	90                   	nop

000000bc <__ZN4TestC1Ev>:
  bc:	55                   	push   %ebp
  bd:	89 e5                	mov    %esp,%ebp
  bf:	8b 45 08             	mov    0x8(%ebp),%eax
  c2:	c7 00 08 00 00 00    	movl   $0x8,(%eax)
  c8:	c7 40 04 00 00 00 00 	movl   $0x0,0x4(%eax)
  cf:	d9 ee                	fldz   
  d1:	dd 58 08             	fstpl  0x8(%eax)
  d4:	c9                   	leave  
  d5:	c3                   	ret    
  d6:	66 90                	xchg   %ax,%ax

000000d8 <_main>:
  d8:	55                   	push   %ebp
  d9:	89 e5                	mov    %esp,%ebp
  db:	83 e4 f0             	and    $0xfffffff0,%esp
  de:	57                   	push   %edi
  df:	56                   	push   %esi
  e0:	53                   	push   %ebx
  e1:	83 ec 14             	sub    $0x14,%esp
  e4:	e8 00 00 00 00       	call   e9 <_main+0x11>
  e9:	c7 44 24 08 0d 00 00 	movl   $0xd,0x8(%esp)
  f0:	00 
  f1:	c7 44 24 04 00 00 00 	movl   $0x0,0x4(%esp)
  f8:	00 
  f9:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
 100:	e8 00 00 00 00       	call   105 <_main+0x2d>
 105:	a1 00 00 00 00       	mov    0x0,%eax
 10a:	8b 40 f4             	mov    -0xc(%eax),%eax
 10d:	8b 98 7c 00 00 00    	mov    0x7c(%eax),%ebx
 113:	85 db                	test   %ebx,%ebx
 115:	0f 84 95 01 00 00    	je     2b0 <_main+0x1d8>
 11b:	80 7b 1c 00          	cmpb   $0x0,0x1c(%ebx)
 11f:	0f 84 37 01 00 00    	je     25c <_main+0x184>
 125:	8a 43 27             	mov    0x27(%ebx),%al
 128:	0f be c0             	movsbl %al,%eax
 12b:	89 44 24 04          	mov    %eax,0x4(%esp)
 12f:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
 136:	e8 00 00 00 00       	call   13b <_main+0x63>
 13b:	89 04 24             	mov    %eax,(%esp)
 13e:	e8 00 00 00 00       	call   143 <_main+0x6b>
 143:	e8 00 00 00 00       	call   148 <_main+0x70>
 148:	89 c6                	mov    %eax,%esi
 14a:	c7 04 24 10 00 00 00 	movl   $0x10,(%esp)
 151:	e8 00 00 00 00       	call   156 <_main+0x7e>
 156:	89 c7                	mov    %eax,%edi
 158:	c7 00 08 00 00 00    	movl   $0x8,(%eax)
 15e:	c7 40 04 00 00 00 00 	movl   $0x0,0x4(%eax)
 165:	c7 40 08 00 00 00 00 	movl   $0x0,0x8(%eax)
 16c:	c7 40 0c 00 00 00 00 	movl   $0x0,0xc(%eax)
 173:	bb 10 27 00 00       	mov    $0x2710,%ebx
 178:	8b 07                	mov    (%edi),%eax
 17a:	89 3c 24             	mov    %edi,(%esp)
 17d:	ff 50 08             	call   *0x8(%eax)
 180:	4b                   	dec    %ebx
 181:	75 f5                	jne    178 <_main+0xa0>
 183:	89 3c 24             	mov    %edi,(%esp)
 186:	e8 00 00 00 00       	call   18b <_main+0xb3>
 18b:	c7 44 24 04 10 27 00 	movl   $0x2710,0x4(%esp)
 192:	00 
 193:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
 19a:	e8 00 00 00 00       	call   19f <_main+0xc7>
 19f:	89 c3                	mov    %eax,%ebx
 1a1:	8b 00                	mov    (%eax),%eax
 1a3:	8b 40 f4             	mov    -0xc(%eax),%eax
 1a6:	8b 7c 03 7c          	mov    0x7c(%ebx,%eax,1),%edi
 1aa:	85 ff                	test   %edi,%edi
 1ac:	0f 84 fe 00 00 00    	je     2b0 <_main+0x1d8>
 1b2:	80 7f 1c 00          	cmpb   $0x0,0x1c(%edi)
 1b6:	0f 84 d7 00 00 00    	je     293 <_main+0x1bb>
 1bc:	8a 47 27             	mov    0x27(%edi),%al
 1bf:	0f be c0             	movsbl %al,%eax
 1c2:	89 44 24 04          	mov    %eax,0x4(%esp)
 1c6:	89 1c 24             	mov    %ebx,(%esp)
 1c9:	e8 00 00 00 00       	call   1ce <_main+0xf6>
 1ce:	89 04 24             	mov    %eax,(%esp)
 1d1:	e8 00 00 00 00       	call   1d6 <_main+0xfe>
 1d6:	e8 00 00 00 00       	call   1db <_main+0x103>
 1db:	89 c3                	mov    %eax,%ebx
 1dd:	c7 44 24 08 05 00 00 	movl   $0x5,0x8(%esp)
 1e4:	00 
 1e5:	c7 44 24 04 0e 00 00 	movl   $0xe,0x4(%esp)
 1ec:	00 
 1ed:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
 1f4:	e8 00 00 00 00       	call   1f9 <_main+0x121>
 1f9:	29 f3                	sub    %esi,%ebx
 1fb:	89 5c 24 04          	mov    %ebx,0x4(%esp)
 1ff:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
 206:	e8 00 00 00 00       	call   20b <_main+0x133>
 20b:	89 c3                	mov    %eax,%ebx
 20d:	c7 44 24 08 02 00 00 	movl   $0x2,0x8(%esp)
 214:	00 
 215:	c7 44 24 04 14 00 00 	movl   $0x14,0x4(%esp)
 21c:	00 
 21d:	89 04 24             	mov    %eax,(%esp)
 220:	e8 00 00 00 00       	call   225 <_main+0x14d>
 225:	8b 03                	mov    (%ebx),%eax
 227:	8b 40 f4             	mov    -0xc(%eax),%eax
 22a:	8b 74 03 7c          	mov    0x7c(%ebx,%eax,1),%esi
 22e:	85 f6                	test   %esi,%esi
 230:	74 7e                	je     2b0 <_main+0x1d8>
 232:	80 7e 1c 00          	cmpb   $0x0,0x1c(%esi)
 236:	74 41                	je     279 <_main+0x1a1>
 238:	8a 46 27             	mov    0x27(%esi),%al
 23b:	0f be c0             	movsbl %al,%eax
 23e:	89 44 24 04          	mov    %eax,0x4(%esp)
 242:	89 1c 24             	mov    %ebx,(%esp)
 245:	e8 00 00 00 00       	call   24a <_main+0x172>
 24a:	89 04 24             	mov    %eax,(%esp)
 24d:	e8 00 00 00 00       	call   252 <_main+0x17a>
 252:	31 c0                	xor    %eax,%eax
 254:	83 c4 14             	add    $0x14,%esp
 257:	5b                   	pop    %ebx
 258:	5e                   	pop    %esi
 259:	5f                   	pop    %edi
 25a:	c9                   	leave  
 25b:	c3                   	ret    
 25c:	89 1c 24             	mov    %ebx,(%esp)
 25f:	e8 00 00 00 00       	call   264 <_main+0x18c>
 264:	8b 03                	mov    (%ebx),%eax
 266:	c7 44 24 04 0a 00 00 	movl   $0xa,0x4(%esp)
 26d:	00 
 26e:	89 1c 24             	mov    %ebx,(%esp)
 271:	ff 50 18             	call   *0x18(%eax)
 274:	e9 af fe ff ff       	jmp    128 <_main+0x50>
 279:	89 34 24             	mov    %esi,(%esp)
 27c:	e8 00 00 00 00       	call   281 <_main+0x1a9>
 281:	8b 06                	mov    (%esi),%eax
 283:	c7 44 24 04 0a 00 00 	movl   $0xa,0x4(%esp)
 28a:	00 
 28b:	89 34 24             	mov    %esi,(%esp)
 28e:	ff 50 18             	call   *0x18(%eax)
 291:	eb a8                	jmp    23b <_main+0x163>
 293:	89 3c 24             	mov    %edi,(%esp)
 296:	e8 00 00 00 00       	call   29b <_main+0x1c3>
 29b:	8b 07                	mov    (%edi),%eax
 29d:	c7 44 24 04 0a 00 00 	movl   $0xa,0x4(%esp)
 2a4:	00 
 2a5:	89 3c 24             	mov    %edi,(%esp)
 2a8:	ff 50 18             	call   *0x18(%eax)
 2ab:	e9 0f ff ff ff       	jmp    1bf <_main+0xe7>
 2b0:	e8 00 00 00 00       	call   2b5 <_main+0x1dd>
 2b5:	8d 76 00             	lea    0x0(%esi),%esi

000002b8 <__GLOBAL__I__ZN4TestC2Ev>:
 2b8:	55                   	push   %ebp
 2b9:	89 e5                	mov    %esp,%ebp
 2bb:	83 ec 18             	sub    $0x18,%esp
 2be:	c7 04 24 00 00 00 00 	movl   $0x0,(%esp)
 2c5:	e8 00 00 00 00       	call   2ca <__GLOBAL__I__ZN4TestC2Ev+0x12>
 2ca:	c7 04 24 58 00 00 00 	movl   $0x58,(%esp)
 2d1:	e8 00 00 00 00       	call   2d6 <__GLOBAL__I__ZN4TestC2Ev+0x1e>
 2d6:	c9                   	leave  
 2d7:	c3                   	ret    

Disassembly of section .text$_ZN3Foo7testIntEv:

00000000 <__ZN3Foo7testIntEv>:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	c9                   	leave  
   4:	c3                   	ret    
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop

Disassembly of section .text$_ZN3Foo10testDoubleEv:

00000000 <__ZN3Foo10testDoubleEv>:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	c9                   	leave  
   4:	c3                   	ret    
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop

Disassembly of section .text$_ZN3Foo6doTestEv:

00000000 <__ZN3Foo6doTestEv>:
   0:	55                   	push   %ebp
   1:	89 e5                	mov    %esp,%ebp
   3:	c9                   	leave  
   4:	c3                   	ret    
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop

有兴趣就看吧,给test加了个Foo的父类
所有的子类的function都被提出来了,bdb j版和c版区别、oci、occi、jdbc的性能有兴趣自己验证
47 楼 DOCDOC 2011-05-26  
ppgunjack 写道
这个帖子感觉很多是瞎掰
“Java写的数据库的性能是C++写的数据库性能的近600倍!”
Oracle中c++和c访问数据库速度没什么差别,和jdbc比是数量级的优势
06年写的文章内联不好说就是胡说,刚刚在gcc4.5.2上试了下
-O3在父类指针调用子类方法和调父类自身方法时都被正确内联化了

恩,说得好.既然抱怨瞎掰,那干嘛不动手验证一下分析一下呢?
46 楼 ppgunjack 2011-05-26  
这个帖子感觉很多是瞎掰
“Java写的数据库的性能是C++写的数据库性能的近600倍!”
Oracle中c++和c访问数据库速度没什么差别,和jdbc比是数量级的优势
06年写的文章内联不好说就是胡说,刚刚在gcc4.5.2上试了下
-O3在父类指针调用子类方法和调父类自身方法时都被正确内联化了
45 楼 eisenwolf 2011-05-26  
JDK自从升级到1.6之后,速度继续翻番。我个人觉得C++只适合一些专业领域了……但是它又没有C的速度快,所以生存空间只会越来越小……
44 楼 eisenwolf 2011-05-26  
ppgunjack 写道
IO,java能快得过链crt库的c++?
大量的随机new,delete操作,c++使用内存池自管理分配,带来的性能提升也不是java能比的,更不用说gc
想看综合java和c++的性能比较最简单的方式就是对比db的各个版本的driver,Oracle,dbb,mongodb,这些driver都是直接的平台代码运行能力的体现
java要拼效率就要避免同步操纵io设备和jni的使用,或者用一次性的大量吞吐稀释频繁jni的成本



好吧。无意义的争论是没有必要的。

你可以看看一些高手和权威机构的解释。

http://blog.csdn.net/yongzhewuwei_2008/archive/2006/11/16/1387476.aspx
43 楼 Anddy 2011-05-26  
RednaxelaFX 写道
hmm, microbenchmarks...
   ......

记得用JDK6u25,java -server -XX:InlineSmallCode=2000 -XX:+AggressiveOpts PerformTest 来跑

然后像楼下说的,有条件的话试试IBM JDK 6,重复多跑几次会有发现 >_<


我用了两个版本跑了下, 发现相互之间相差不大。

jdk1.6.0_25/bin/./java -server -XX:InlineSmallCode=2000 -XX:+AggressiveOpts

在jdk 1.6.0_25下 。
  start test...
  Program run duration: 86146.587816 ms.
  start test...
  Program run duration: 82345.412324 ms.

在jdk 1.6.0_21下 。
  start test...
  Program run duration: 85105.833883 ms.

42 楼 ppgunjack 2011-05-25  
IO,java能快得过链crt库的c++?
大量的随机new,delete操作,c++使用内存池自管理分配,带来的性能提升也不是java能比的,更不用说gc
想看综合java和c++的性能比较最简单的方式就是对比db的各个版本的driver,Oracle,dbb,mongodb,这些driver都是直接的平台代码运行能力的体现
java要拼效率就要避免同步操纵io设备和jni的使用,或者用一次性的大量吞吐稀释频繁jni的成本
41 楼 eisenwolf 2011-05-25  
优化完代码,用Oracle的JRocket JVM跑一下。

最后你会发现Java浮点计算只比C++大概慢10%左右,Java的强项是I/O,Java的I/O比C++来得快,C++的I/O需要很多很多优化才能攀比的上,这也是Java Network Programming比较火的原因。而且Java的开发效率要高得多。

Java是一个半编译半解释语言。半编译的原因是Java会在运行当中动态编译成本地字节码,这是为了跨平台的考虑,所以Java会越跑越快。

当然,很多论点都是在一些国外专业的评测网站上看到的,有些模糊的印象,自己并未做过测试。
40 楼 IcyFenix 2011-05-25  
jellyfish 写道
RednaxelaFX 写道
jellyfish 写道
java's sin() is slow comparing to c calls. Several years ago, I saw an article posting the comparing results, my runs came out almost the same results. It was about 60 times slow. Someone did some dig on the c side and found out ms did some optimization using assembly.
On the other hand, after digging the code on the java side, folks initiated a discusion with java's grandfather (you could google his name and sin function), then a debate came out as a big news.

Not sure about the performance of your code. Of course, there is always a hardware acceleration, such as GPU, but I am assuming we are in the context of the normal pc, since that's where I am working on everyday. You miles may vary.

That's "your mileage may vary". Learn, YMMV.

It's bad micro benchmarks that give people the wrong impression that Java is slow. A Java program may be slower than a well-tuned C program in a real world scenario, but that's got nothing to do with what those bad benchmarks are trying to tell you.

Go ahead and disassemble your /lib64/libm.so.6 equivalent, and see what "<sin>" turns into for yourself.

As for articles on Java's floating point arithmetic, I guess you're referring to something like this: How Java's Floating-Point Hurts Everyone Everywhere. That's from more than a decade ago, and the statements in it don't hold anymore now.

That reference is really a wild guess. NO, it's not.

http://blogs.oracle.com/jag/entry/transcendental_meditation

a simple google on "java sin cos slow" can generate a lot of interesting entries, especially in the game arena, so it should be classified as "well known".

While I am saying java sin() is slower than C version, I am not saying in general java is slow, did I?

In fact, if you can make java sin() as closely fast as C, I would take it. I did quite some coding on how to make those special functions as fast as possible, such as gamma and log gamma. It's just hard. It's so hard that sometimes people accept the inaccuracy as the cost.

I've done a lot performance tunings as well, and have seen so many cases for premature optimization. The most common case is that people don't understand the problem itself and still try to optimize/profile it.


James Gosling的文章大意就是为了在平台的浮点处理器中保持[-pi/4, pi/4]范围外数值可以获得符合jvm规范精度要求的计算结果,java先进行了精度处理,再使用fsin指令。所以比cpp更精确,但也要慢一些。

这话题已经从cpp和java性能对比转移到cpp和java的sin()实现对比上了。我相信广大同学还是比较想听听跳出楼主这段测试程序之外的,对cpp和java的性能评价。挖了个坑,请撒迦同学一跳。奸笑ing……
39 楼 kakaluyi 2011-05-25  
测试方法不平等,
楼主要不测试一下java用RMI调用本地java组件和让c++通过webservice调用本地java组件,看看哪个快
38 楼 jellyfish 2011-05-25  
RednaxelaFX 写道
jellyfish 写道
java's sin() is slow comparing to c calls. Several years ago, I saw an article posting the comparing results, my runs came out almost the same results. It was about 60 times slow. Someone did some dig on the c side and found out ms did some optimization using assembly.
On the other hand, after digging the code on the java side, folks initiated a discusion with java's grandfather (you could google his name and sin function), then a debate came out as a big news.

Not sure about the performance of your code. Of course, there is always a hardware acceleration, such as GPU, but I am assuming we are in the context of the normal pc, since that's where I am working on everyday. You miles may vary.

That's "your mileage may vary". Learn, YMMV.

It's bad micro benchmarks that give people the wrong impression that Java is slow. A Java program may be slower than a well-tuned C program in a real world scenario, but that's got nothing to do with what those bad benchmarks are trying to tell you.

Go ahead and disassemble your /lib64/libm.so.6 equivalent, and see what "<sin>" turns into for yourself.

As for articles on Java's floating point arithmetic, I guess you're referring to something like this: How Java's Floating-Point Hurts Everyone Everywhere. That's from more than a decade ago, and the statements in it don't hold anymore now.

That reference is really a wild guess. NO, it's not.

http://blogs.oracle.com/jag/entry/transcendental_meditation

a simple google on "java sin cos slow" can generate a lot of interesting entries, especially in the game arena, so it should be classified as "well known".

While I am saying java sin() is slower than C version, I am not saying in general java is slow, did I?

In fact, if you can make java sin() as closely fast as C, I would take it. I did quite some coding on how to make those special functions as fast as possible, such as gamma and log gamma. It's just hard. It's so hard that sometimes people accept the inaccuracy as the cost.

I've done a lot performance tunings as well, and have seen so many cases for premature optimization. The most common case is that people don't understand the problem itself and still try to optimize/profile it.
37 楼 ppgunjack 2011-05-24  
在家里的jvm,gcc试了下,java比gcc -O2 -O3的代码还要稍快
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
gcc (GCC) 4.5.2

gcc 47641 ms.-O3 release
java 45515 ms.

gcc在O3时,也是将doTest,testInt,testDouble内联,如果test对象是在栈上面生成,-O3会直接跳过doTest,甚至连循环都不做,除了系统调用代码,其他无意义的代码都被滤掉

如果是没有冗余的纯算法类代码,java和gcc的O1,O3差别都并不大,java和c++速度很接近

36 楼 RednaxelaFX 2011-05-24  
yangyi 写道
那么既然JVM是按照操作系统发布的,可以从本地调用和JVM实现和参数等方面调优,到底还存不存在因为跨平台而早成的性能损失呢,第一个想到的是swing,这个是不是也成了神话

那倒不是。看你跟什么比来谈“性能损失”,以及如何看待整个系统的各方面的效率。

首先,参数调优不能解决所有性能问题。选择了某种JVM实现后,性能就被它的极限所限制住了。
其次的话,JVM上跑的Java程序为了达到跟C之类的同等级的速度,通常要消耗更多内存。这footprint问题也不是在所有场景里都能睁一只眼闭一只眼放过去的。
然后,觉得最重要的一点,反而是在Java程序员身上:因为Java封装了底层细节,所有许多Java程序员写程序的时候并不会精打细算的对待每行代码。相反,许多写C的人会非常计较每一点的性能,而且也比较可控。结果就是Java用来写所谓的业务代码更轻松些,但性能也经常不知不觉的损失在写得欠考虑的代码上了。这未必就是坏事,反正很多Java程序本来就不需要快到哪里去,当然是省点力气舒服些

相关推荐

    java 调用C++库测试代码

    Java调用C++库是跨语言交互的一种常见方式,主要依赖于...这种跨语言交互机制使得开发者能够灵活地结合Java的平台独立性和C++的高效性能。在实际项目中,这样的技术常用于游戏开发、高性能计算以及与硬件交互等场景。

    C++库封装JNI接口-实现java调用c++

    在跨平台的软件开发中,有时我们需要在Java和C++之间进行交互,这通常是由于性能需求、使用已有的C++库或特定硬件接口的原因。Java Native Interface (JNI) 是Java平台提供的一种机制,允许Java代码和其他语言写的...

    现代多线程 JAVA和c++多线程实现 测试和调试

    本资源主要探讨了如何在JAVA和C++中实现多线程,以及相关的测试和调试技术。 在JAVA中,多线程的实现主要依赖于`Thread`类和`Runnable`接口。开发者可以通过直接继承`Thread`类或实现`Runnable`接口来创建新的线程...

    C++代码转java工具

    在实际使用这种工具时,开发者应该了解转换的局限性,理解哪些部分可能需要手动调整,以及如何测试和验证转换后的代码以确保功能正确性和性能。此外,对于复杂的C++项目,转换可能不是最佳解决方案,因为可能会丢失...

    java、c++、软件测试面试题

    5. **性能测试**:了解压力测试、负载测试、稳定性测试,使用工具如JMeter进行性能评估。 6. **敏捷开发与持续集成**:熟悉敏捷开发理念,如Scrum和Kanban,以及持续集成工具如Jenkins的应用。 7. **测试文档**:...

    C++转换JAVA工具

    2. **混合编程**:如果一个项目需要C++的高性能部分和Java的跨平台优势,转换工具可以用来创建接口,使两者协同工作。 3. **学习和理解**:开发者可以通过查看转换后的代码,更好地理解和学习两种语言的异同。 在...

    JAVA调用C++编写的DLL(C++和JAVA源码)

    总结,通过理解并运用JNI,我们可以将Java的易用性和C++的高性能结合起来,实现更高效的应用程序。提供的压缩包"Java_TypeSwitch_Dll"可能包含了示例代码,通过学习和实践这些例子,你将更好地掌握Java调用C++ DLL的...

    C++ JAVA 软件测试面试题汇总

    在软件开发领域,C++和Java是两种广泛使用的编程语言,尤其在企业级应用和系统级编程中。软件测试则是确保这些程序质量的关键环节。...通过学习和解答这些题目,可以提升对C++、Java和软件测试的理解,增强面试竞争力。

    C++转Java工具

    文件列表中包含了"C++ to Java Converter.exe.config"和"C++ to Java Converter.exe"。前者是应用程序的配置文件,通常用于存储应用程序运行时的设置,比如数据库连接字符串、日志级别等。而后者是可执行文件,即...

    JAVA调用C++demo

    这种方式虽然增加了项目的复杂性,但能充分发挥Java的跨平台特性和C++的高性能优势,是解决特定问题的有效途径。在实际开发中,需要对Java、C++以及JNI有深入的理解,才能更好地驾驭这种混合编程模式。

    C++单元测试、压力测试、快速测试工具

    在IT行业中,单元测试、压力测试和快速测试是软件开发过程中的重要环节,尤其是在使用C++这样的编程语言时。这些测试方法确保了代码的质量、稳定性和性能,为开发者提供了信心,保证了软件产品的可靠性。本篇文章将...

    C++调用Java接口

    这将创建一个C++头文件,其中包含Java方法的声明和JNI所需的函数原型。假设我们有一个名为`com_example_JavaApp`的Java包,那么命令可能是这样的: ``` javah -jni com.example.JavaApp.JavaInterface ``` 这...

    C++、JAVA+、C+++、软件测试面试题

    在IT行业中,C++、Java++(这里可能是Java的误写,通常我们只说Java)、C++以及软件测试是至关重要的领域,它们构成了现代软件开发的基石。面试题集是评估求职者技能的重要工具,因此深入理解这些领域的关键概念和...

    java调用c++生成dll

    6. **调试与测试**:使用Java调试器和C++调试器(如Visual Studio的调试器)配合,可以分别在Java和C++层面进行调试,确保问题定位准确。 总的来说,Java调用C++生成的DLL涉及到JNI规范、C++编程、Windows动态库...

    java 调用C++ 动态库

    9. **测试与调试**: 测试Java调用C++的完整流程,确保数据传输正确无误,同时利用C++和Java的调试工具进行问题定位。 总结来说,Java调用C++动态库是一个涉及多步骤的过程,包括生成JNI头文件、编写C++代码、编译...

    Java调用C++的dll之-C++工程

    在IT行业中,跨语言通信是一...通过这样的方式,我们可以充分利用C++的性能优势和Java的跨平台特性,实现两者的高效协作。在实际项目中,这样的跨语言调用常用于高性能计算、利用现有C++库或者实现特定硬件驱动等功能。

    软件测试题大全(Java,c++)

    本资源“软件测试题大全(Java,c++)”汇聚了丰富的测试题目和测试用例,旨在帮助Java和C++开发者及测试工程师提升自己的技能,理解测试的核心原理和实践方法。以下是根据标题、描述以及可能包含的文件内容,对软件...

    C 、JAVA 、C++ 、软件测试面试题

    在IT行业中,编程语言如C、Java和C++以及软件测试是至关重要的领域,它们各自拥有丰富的知识体系,且在实际工作和面试中都扮演着关键角色。下面将分别介绍这些知识点,帮助读者理解并应对相关面试题。 一、C语言 ...

    Java和c++互相回调的例子

    9. **性能考虑**: 虽然JNI提供了Java和C++的互操作性,但频繁的Java到C++的调用可能会引入额外的性能开销,因为涉及到了Java对象到本地数据的转换。因此,在设计系统时,应谨慎考虑这些调用的频率和复杂性。 10. **...

    java调用c++ dll的示例

    7. **测试和调试**: 编译C++代码,生成DLL,然后在Java程序中进行测试。确保所有必要的库和依赖项都被正确地添加到运行环境。调试时,可以在C++端使用传统的C++调试器,而在Java端使用JVM的调试工具。 在提供的...

Global site tag (gtag.js) - Google Analytics