Primitive types to String conversion and String concatenation
基本类型转String 和字符串连接机制
Primitive types to String conversion
From time to time you may need to create a string in your program from several values, some of them may be of primitive types. If you have two or more primitive type values in the beginning of your string concatenation, you need to explicitly convert first of them to a string (otherwise System.out.println( 1 + 'a' ) will print ’98′, but not ’1a’). Of course, there is a family of String.valueOf methods (or corresponding wrapper type methods), but who needs them if there is another way which requires less typing?
我们常常一次又一次的使用基本类型构建String.如果你有2个甚至更多的基本类型变量位于String连接之间.你需要把他们转换为String(否则 System.out.println(1+ 'a') 将打印'98'而不是'la').当然.你也可以使用String.valueOf方法(或者其他包装类的方法)
Concatenating an empty string literal and the first of your primitive type variables (in our example, "" + 1) is the easiest idea. Result of this expression is a String and after that you can safely concatenate any primitive type values to it – compiler will take care of all implicit conversions to String.
Unfortunately, this is the worst way one can imagine. In order to understand why it is so, we need to review how string concatenation operator is translated in Java. If we have a String value (doesn’t matter which sort of it – literal or variable or method call) followed by + operator followed by any type expression:
不幸的是,这是能想象的最糟糕的方式,为了能理解为什么是这样,我们需要去检查String连接操作在java中是被咋样翻译的. 假如我们有一个String值(不需要在意它的的字面,变量,方法的排序 ) 同过+这个操作符后跟了个其他任何表达式
String_exp + any_exp
Java compiler will translate it to:
java 编译器将这样翻译他
new StringBuilder().append(String_exp).append(any_exp).toString();
If you have more than one + operator in the expression, you will end up with several StringBuilder.append calls before final toString call
StringBuilder(String) constructor allocates a buffer containing 16 characters. So, appending up to 16 characters to that StringBuilder will not require buffer reallocation, but appending more than 16 characters will expand StringBuilder buffer. At the end, in the StringBuilder.toString() call a new String object with a copy of StringBuilder buffer will be created
StringBuilder(String) 构造函数 分配一个包含16个字符的缓冲,所以,追加最多16个字符的的StringBuilder将不需要重新分配缓冲,但追加超过16个字符的将需要扩展缓冲.最后,在StringBuilder.toString()方法中将会返回一个对StringBuilder的缓冲拷贝的String对象
This means that for the worst case conversion of a single primitive type value to String, you will need to allocate: one StringBuilder, one char[ 16 ], one String and one char[] of appropriate size to fit your input value. By using one of String.valueOf methods you will at least avoid creating a StringBuilder.
Sometimes you actually don’t have to convert primitive value to String at all. For example, you are parsing an input string, which is a comma-separated string. In the initial version you had something like such call:
final int nextComma = str.indexOf("'");
or even
final int nextComma = str.indexOf('\'');
After that program requirements were extended in order to support any separator. Of course, a straightforward interpretation of “any” means you need to keep a separator in a String object and use String.indexOf(String) method. Let’s suggest that a preconfigured separator is stored in m_separator field. In this case your parsing may look like:
private static List<String> split( final String str )
final List<String> res = new ArrayList<String>( 10 );
int pos, prev = 0;
while ( ( pos = str.indexOf( m_separator, prev ) ) != -1 )
res.add( str.substring( prev, pos ) );
prev = pos + m_separator.length(); // start from next char after separator
res.add( str.substring( prev ) );
return res;
But later it was discovered that you will never get more than a single character separator. In the initialization, you will replace String m_separator with char m_separator and change its setter appropriately. But you may be tempted not to update parsing method a lot (why should I change the working code anyway?):
private static List<String> split2( final String str )
final List<String> res = new ArrayList<String>( 10 );
int pos, prev = 0;
while ( ( pos = str.indexOf("" + m_separatorChar, prev ) ) != -1 )
res.add( str.substring( prev, pos ) );
prev = pos + 1; // start from next char after separator
res.add( str.substring( prev ) );
return res;
As you may see, indexOf call was updated, but it still creates a string and uses it. Of course, this is wrong, because there is a same method accepting char instead of String. Let’s use it:
private static List<String> split3( final String str )
final List<String> res = new ArrayList<String>( 10 );
int pos, prev = 0;
while ( ( pos = str.indexOf( m_separatorChar, prev ) ) != -1 )
res.add( str.substring( prev, pos ) );
prev = pos + 1; // start from next char after separator
res.add( str.substring( prev ) );
return res;
For the test, "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz" string was parsed 10 million times using all 3 methods. Here are Java 6_41 and 7_15 running times. Java 7 running time was increased due to now linear complexity of String.substring method. You can read more about it here.
测试如下, "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz"这个字符串用这3种方法分别简析10次,下面是java 6_41 和 7_15的运行时间,java7的运行时间增加是因为String.subString方法变复杂啦。你可以去这里阅读他.
As you may see, this simple refactoring has considerably decreased time spent in splitting ( split/split2 -> split3 ).
Java 64.65 sec10.34 sec3.8 sec
Java 76.72 sec8.29 sec4.37 sec
String concatenation
This article will not be complete without mentioning the 2 other string concatenation methods. First one, rather rarely used, is String.concat method. Inside, it allocates a char[] of length equal to sum of concatenated strings lengths, copies string data into it and creates a new String using a private String constructor, which doesn’t make a copy of input char[], so only two objects are being created as a result – String and its internal char[]. Unfortunately, this method is only efficient when you need to concatenate exactly 2 strings
这文章不会提及2个完全不关联的字符串的操作, 第一个,相当少的被使用。是String.concat 方法.内部。它将分配一个char类型的数组,数组长度为连接的字符串的长度.把string的数据拷贝到char数组中,用私有的String构造函数创建以个新的字符串.不需要拷贝一个char[]的数组.所以2个对象被创建-String和其内部的char[].不幸的是,这种方法的效率很有限当你需要精确的连接2个字符串时
The third way of string concatenation is using StringBuilder class and its various append methods. This is definitely the fastest way when you need to concatenate many input values. It was introduced in Java 5 as a replacement for StringBuffer class. Their main difference is that a StringBuffer is thread-safe, while StringBuilder is not. Do you often create a string concurrently?
As a test, all numbers between 0 and 100,000 were concatenated using String.concat, + operator and StringBuilder using code like this:
做一个测试.位于0到100,000的数值被连接起来用String.concat, +操作符 和StringBulider ,代码如下:
String res = "";
for ( int i = 0; i < ITERS; ++i )
final String s = Integer.toString( i );
res = res.concat( s ); //second option: res += s;
//third option:
StringBuilder res = new StringBuilder();
for ( int i = 0; i < ITERS; ++i )
final String s = Integer.toString( i );
res.append( s );
10.145 sec42.677 sec0.012 sec
Results are obvious – O(n) algorithm is of course much faster than O(n2) algorithms. But in real life we have a lot of + operators in our programs – they are more convenient. In order to deal with it, -XX:+OptimizeStringConcat option was introduced in Java 6 update 20. It was turned on by default between Java 7_02 and Java 7_15 (and it is still off by default in Java 6_41), so you may have to explicitly turn it on. As many other -XX options, it is extremely badly documented:
Optimize String concatenation operations where possible. (Introduced in Java 6 Update 20)
结论相当明显- 0(n) 算法当然比 0(n2)的算法快.但在现实中,我们经常使用+操作符-他们太方便啦.为了处理这个问题。在java 6 更新版20本以上.参数-XX:+OptimizeStringConcat 可以被使用.这参数在java 7_02至java 7_15版本中默认被开启.(在java 6_41版本中依然没有启用). 所以你也许不得像其他-XX参数一样明确的开启它。
Let’s just assume that Oracle engineers did their best with this option. Anecdotal knowledge tells that it replaces some StringBuilder generated logic with logic similar to String.concat implementation – it creates a char[] with appropriate length for all concatenated values and copies them to that output array. After that it creates a result String. Probably, nested concatenations are also supported ( str1 + ( str2 + str3 ) + str4 ). Running our test with this option proves that time for + operator is getting very similar to String.concat implementation:
我们假设oracle引擎使用了这个参数.经验告诉我们他将会使用StringBuilder逻辑来替代相似逻辑实现的String.concat-它会创建一个长度为所有输入连接的CHAR数组.然后再创建一个String. 适当的. 连接也支持这种( str1 + ( str2 + str3 ) + str4 ). 使用这些参数来进行测试用例,String.concat与+操作符的速率比较接近
10.19 sec10.722 sec0.013 sec
Let’s make one more test for this option. As it was noticed before, default StringBuilder constructor allocates 16 characters buffer. The buffer is expanded when we need to add 17-th character to it. Let’s append each number between 100 and 100,000 to “12345678901234″ string. As a result we will have strings 17 to 20 characters long, so default + operator implementation will require StringBuilder resizing. As a counter example, let’s make another test in which we will explicitly create StringBuilder(21) to ensure that its buffer will not resize
让我们为这个参数做更多的测试。在他被关注之前.默认的StringBuilder构造函数分配16个characters的缓冲.这个缓冲必须被扩展当我们需要第17个character的话。 让我们把100到100,000的每个数字像这样联系起来"1234567891234". 这样我们的字符长度位于17到20直接。默认的的+操作符实现将需要重新分配StringBuilder的长度.像统计的示例.
final String s = BASE + i;
final String s = new StringBuilder( 21 ).append( BASE ).append( i ).toString();
Without this option, time for + implementation is 50% higher than time for explicit StringBuilder implementation. Turning this option on makes both results equal. But what’s more interesting, even explicit StringBuilder implementation is getting faster with it!
没有使用这个参数.+操作符的实现比StringBuilder的实现所发时间多50%. 开启了该参数,2个发的时间基本一致.但更有意思的是.StringBuilder比之前更快啦
+, turned off+, turned onnew StringBuilder(21), turned offnew StringBuilder(21), turned on
0.958 sec0.494 sec0.663 sec0.494 sec
Never use concatenation with an empty string "" as a “to string conversion”. Use appropriate String.valueOf or wrapper types toString(value) methods instead.
不要使用空字符串""去连接. 使用更适合的String.valueOf 或者包装类的toString(value) 方法来替代
Whenever possible, use StringBuilder for string concatenation. Check old code and get rid of StringBuffer is possible.
Use -XX:+OptimizeStringConcat option introduced in Java 6 update 20 in order to improve string concatenation performance. It is turned on by default in recent Java 7 releases, but it is still turned off in Java 6_41.
使用-XX:+OptimizeStringConcat参数来改善字符串连接的性能.在最近的java7版本中默认被开启了该参数. 但在java 6_41版本中没有被使用
