`
SilenceGG
  • 浏览: 109415 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

基本类型转String 和字符串连接机制

    博客分类:
  • JAVA
 
阅读更多

Primitive types to String conversion and String concatenation

基本类型转String 和字符串连接机制

 

 

 

Primitive types to String conversion

 

基本类型转换为String

 

From time to time you may need to create a string in your program from several values, some of them may be of primitive types. If you have two or more primitive type values in the beginning of your string concatenation, you need to explicitly convert first of them to a string (otherwise System.out.println( 1 + 'a' ) will print ’98′, but not ’1a’). Of course, there is a family of String.valueOf methods (or corresponding wrapper type methods), but who needs them if there is another way which requires less typing?

我们常常一次又一次的使用基本类型构建String.如果你有2个甚至更多的基本类型变量位于String连接之间.你需要把他们转换为String(否则 System.out.println(1+ 'a') 将打印'98'而不是'la').当然.你也可以使用String.valueOf方法(或者其他包装类的方法)

 

 

 

 

Concatenating an empty string literal and the first of your primitive type variables (in our example, "" + 1) is the easiest idea. Result of this expression is a String and after that you can safely concatenate any primitive type values to it – compiler will take care of all implicit conversions to String.

把一个字面量为空的字符串与一个基本类型的变量联系起来(在我们的示例中,""+1),把一个字符串后面加任何基本类型的数值都会被编译为String

 

 

Unfortunately, this is the worst way one can imagine. In order to understand why it is so, we need to review how string concatenation operator is translated in Java. If we have a String value (doesn’t matter which sort of it – literal or variable or method call) followed by + operator followed by any type expression:

 

不幸的是,这是能想象的最糟糕的方式,为了能理解为什么是这样,我们需要去检查String连接操作在java中是被咋样翻译的. 假如我们有一个String值(不需要在意它的的字面,变量,方法的排序 ) 同过+这个操作符后跟了个其他任何表达式

     String_exp +  any_exp

 

Java compiler will translate it to:

 java 编译器将这样翻译他

     new StringBuilder().append(String_exp).append(any_exp).toString(); 

 

If you have more than one + operator in the expression, you will end up with several StringBuilder.append calls before final toString call

 如果你的表达式中不止一个+操作符家,在调用toString()前将会持续append操作

 

 StringBuilder(String) constructor allocates a buffer containing 16 characters. So, appending up to 16 characters to that StringBuilder will not require buffer reallocation, but appending more than 16 characters will expand StringBuilder buffer. At the end, in the StringBuilder.toString() call a new String object with a copy of StringBuilder buffer will be created

  

  StringBuilder(String) 构造函数 分配一个包含16个字符的缓冲,所以,追加最多16个字符的的StringBuilder将不需要重新分配缓冲,但追加超过16个字符的将需要扩展缓冲.最后,在StringBuilder.toString()方法中将会返回一个对StringBuilder的缓冲拷贝的String对象

 

 

This means that for the worst case conversion of a single primitive type value to String, you will need to allocate: one StringBuilder, one char[ 16 ], one String and one char[] of appropriate size to fit your input value. By using one of String.valueOf methods you will at least avoid creating a StringBuilder.

 

这意味着一个基本类型转换为String,你需要分配:一个StringBuilder,一个长度为16的char数组,一个String和一个适合你输入值得char数组,用String.valueOf方法至少可以避免创建一个StringBuilder.

 

Sometimes you actually don’t have to convert primitive value to String at all. For example, you are parsing an input string, which is a comma-separated string. In the initial version you had something like such call:

 

有时你实际完全不需要把基本类型转换为String,例如:你解析一个被某个符号分割的String.最初的版本你可能会这样作

 

final int nextComma = str.indexOf("'");

or even 

 甚至这样

 

final int nextComma = str.indexOf('\'');

 

After that program requirements were extended in order to support any separator. Of course, a straightforward interpretation of “any” means you need to keep a separator in a String object and use String.indexOf(String) method. Let’s suggest that a preconfigured separator is stored in m_separator field. In this case your parsing may look like:

后来程序需要你扩展至支持任何分隔符,当然,支持任何分隔符意味着你需要一个Stirng对象的分隔符并且使用String.indexof(String)方法.我们建议把一个默认的分隔符存储在m_separator这变量中,你解析的代码看起来像这个:

   private static List<String> split( final String str )

{

    final List<String> res = new ArrayList<String>( 10 );

    int pos, prev = 0;

    while ( ( pos = str.indexOf( m_separator, prev ) ) != -1 )

    {

        res.add( str.substring( prev, pos ) );

        prev = pos + m_separator.length(); // start from next char after separator

    }

    res.add( str.substring( prev ) );

    return res;

}

 

But later it was discovered that you will never get more than a single character separator. In the initialization, you will replace String m_separator with char m_separator and change its setter appropriately. But you may be tempted not to update parsing method a lot (why should I change the working code anyway?):

但是后来你发现你使用的分隔符从来没有超过单一的character.在初始化时,你会定义一个char类型m_separtor来取缔String类型的m_separtor并且适当的改变他的setter方法.

但你又不想大量的改动解析方法(我们如果改变这工作的代码呢?):

private static List<String> split2( final String str )

{

    final List<String> res = new ArrayList<String>( 10 );

    int pos, prev = 0;

    while ( ( pos = str.indexOf("" + m_separatorChar, prev ) ) != -1 )

    {

        res.add( str.substring( prev, pos ) );

        prev = pos + 1; // start from next char after separator

    }

    res.add( str.substring( prev ) );

    return res;

}

 

As you may see, indexOf call was updated, but it still creates a string and uses it. Of course, this is wrong, because there is a same method accepting char instead of String. Let’s use it:

如你所看到的。indexOf方法被更改啦。但他依然创建以个字符串,并且使用它,当然这是错误的,因为这里也可以使用同样的方法用char来替代String,我们改动下:

  private static List<String> split3( final String str )

{

    final List<String> res = new ArrayList<String>( 10 );

    int pos, prev = 0;

    while ( ( pos = str.indexOf( m_separatorChar, prev ) ) != -1 )

    {

        res.add( str.substring( prev, pos ) );

        prev = pos + 1; // start from next char after separator

    }

    res.add( str.substring( prev ) );

    return res;

}

 

For the test, "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz" string was parsed 10 million times using all 3 methods. Here are Java 6_41 and 7_15 running times. Java 7 running time was increased due to now linear complexity of String.substring method. You can read more about it here.

测试如下,  "abc,def,ghi,jkl,mno,pqr,stu,vwx,yz"这个字符串用这3种方法分别简析10次,下面是java 6_41 和 7_15的运行时间,java7的运行时间增加是因为String.subString方法变复杂啦。你可以去这里阅读他.

 

 

 

As you may see, this simple refactoring has considerably decreased time spent in splitting ( split/split2 -> split3 ).

 

 splitsplit2split3

Java 64.65 sec10.34 sec3.8 sec

Java 76.72 sec8.29 sec4.37 sec

 

如你所看到的,这简单的重构使splitting这个方法执行的时间得到了相当的递减

 

 

 

 

String concatenation 

字符串连接

 

 

This article will not be complete without mentioning the 2 other string concatenation methods. First one, rather rarely used, is String.concat method. Inside, it allocates a char[] of length equal to sum of concatenated strings lengths, copies string data into it and creates a new String using a private String constructor, which doesn’t make a copy of input char[], so only two objects are being created as a result – String and its internal char[]. Unfortunately, this method is only efficient when you need to concatenate exactly 2 strings

 

这文章不会提及2个完全不关联的字符串的操作, 第一个,相当少的被使用。是String.concat 方法.内部。它将分配一个char类型的数组,数组长度为连接的字符串的长度.把string的数据拷贝到char数组中,用私有的String构造函数创建以个新的字符串.不需要拷贝一个char[]的数组.所以2个对象被创建-String和其内部的char[].不幸的是,这种方法的效率很有限当你需要精确的连接2个字符串时

 

 

The third way of string concatenation is using StringBuilder class and its various append methods. This is definitely the fastest way when you need to concatenate many input values. It was introduced in Java 5 as a replacement for StringBuffer class. Their main difference is that a StringBuffer is thread-safe, while StringBuilder is not. Do you often create a string concurrently?

第三种字符串连接使用StringBuilder和它的多个append方法.这是最快速的方式当你需要连接多个输入时.在java5中被介绍用来替代StringBuffer.他们主要的不同是StringBuffer是线程安全的。而StringBuilder不是.你常常创建一个字符串吗?

 

As a test, all numbers between 0 and 100,000 were concatenated using String.concat, + operator and StringBuilder using code like this:

做一个测试.位于0到100,000的数值被连接起来用String.concat,  +操作符 和StringBulider ,代码如下:

 

String res = ""; 

for ( int i = 0; i < ITERS; ++i )

{

    final String s = Integer.toString( i );

    res = res.concat( s ); //second option: res += s;

}        

//third option:        

StringBuilder res = new StringBuilder(); 

for ( int i = 0; i < ITERS; ++i )

{

    final String s = Integer.toString( i );

    res.append( s );

}

 

 

String.concat+StringBuilder.append

10.145 sec42.677 sec0.012 sec

 

 

Results are obvious – O(n) algorithm is of course much faster than O(n2) algorithms. But in real life we have a lot of + operators in our programs – they are more convenient. In order to deal with it, -XX:+OptimizeStringConcat option was introduced in Java 6 update 20. It was turned on by default between Java 7_02 and Java 7_15 (and it is still off by default in Java 6_41), so you may have to explicitly turn it on. As many other -XX options, it is extremely badly documented:

 

Optimize String concatenation operations where possible. (Introduced in Java 6 Update 20)

 

 

结论相当明显-  0(n) 算法当然比 0(n2)的算法快.但在现实中,我们经常使用+操作符-他们太方便啦.为了处理这个问题。在java 6 更新版20本以上.参数-XX:+OptimizeStringConcat 可以被使用.这参数在java 7_02至java 7_15版本中默认被开启.(在java 6_41版本中依然没有启用). 所以你也许不得像其他-XX参数一样明确的开启它。

 

 

 

Let’s just assume that Oracle engineers did their best with this option. Anecdotal knowledge tells that it replaces some StringBuilder generated logic with logic similar to String.concat implementation – it creates a char[] with appropriate length for all concatenated values and copies them to that output array. After that it creates a result String. Probably, nested concatenations are also supported ( str1 + ( str2 + str3 ) + str4 ). Running our test with this option proves that time for + operator is getting very similar to String.concat implementation:

   我们假设oracle引擎使用了这个参数.经验告诉我们他将会使用StringBuilder逻辑来替代相似逻辑实现的String.concat-它会创建一个长度为所有输入连接的CHAR数组.然后再创建一个String. 适当的. 连接也支持这种( str1 + ( str2 + str3 ) + str4 ).  使用这些参数来进行测试用例,String.concat与+操作符的速率比较接近

   String.concat+StringBuilder.append

10.19 sec10.722 sec0.013 sec

 

 

Let’s make one more test for this option. As it was noticed before, default StringBuilder constructor allocates 16 characters buffer. The buffer is expanded when we need to add 17-th character to it. Let’s append each number between 100 and 100,000 to “12345678901234″ string. As a result we will have strings 17 to 20 characters long, so default + operator implementation will require StringBuilder resizing. As a counter example, let’s make another test in which we will explicitly create StringBuilder(21) to ensure that its buffer will not resize

让我们为这个参数做更多的测试。在他被关注之前.默认的StringBuilder构造函数分配16个characters的缓冲.这个缓冲必须被扩展当我们需要第17个character的话。 让我们把100到100,000的每个数字像这样联系起来"1234567891234". 这样我们的字符长度位于17到20直接。默认的的+操作符实现将需要重新分配StringBuilder的长度.像统计的示例.

让们确保另外一个测试.我们创建一个StringBuilder(21)的构造函数将不会重新分配长度.

  final String s = BASE + i;

  final String s = new StringBuilder( 21 ).append( BASE ).append( i ).toString();

 

  Without this option, time for + implementation is 50% higher than time for explicit StringBuilder implementation. Turning this option on makes both results equal. But what’s more interesting, even explicit StringBuilder implementation is getting faster with it!

 

没有使用这个参数.+操作符的实现比StringBuilder的实现所发时间多50%. 开启了该参数,2个发的时间基本一致.但更有意思的是.StringBuilder比之前更快啦

 

+, turned off+, turned onnew StringBuilder(21), turned offnew StringBuilder(21), turned on

0.958 sec0.494 sec0.663 sec0.494 sec

 

    

Summary

 

Never use concatenation with an empty string "" as a “to string conversion”. Use appropriate String.valueOf or wrapper types toString(value) methods instead.

不要使用空字符串""去连接. 使用更适合的String.valueOf 或者包装类的toString(value) 方法来替代    

 

Whenever possible, use StringBuilder for string concatenation. Check old code and get rid of StringBuffer is possible.

尽可能使用StringBuilder来连接。检查旧代码,尽可能抛弃StringBuffer

 

 

Use -XX:+OptimizeStringConcat option introduced in Java 6 update 20 in order to improve string concatenation performance. It is turned on by default in recent Java 7 releases, but it is still turned off in Java 6_41.

 

使用-XX:+OptimizeStringConcat参数来改善字符串连接的性能.在最近的java7版本中默认被开启了该参数.  但在java 6_41版本中没有被使用

 

 

 

 

 

 

 

 

 

 

 

  

 

 

  

分享到:
评论

相关推荐

    Java字符串类型转换

    2. **连接字符串**:`concat(String str)` 可以将两个字符串连接起来。 3. **字符串比较**:`equals(Object obj)` 方法用于比较两个字符串是否相等,而`equalsIgnoreCase(String str)` 则不区分大小写地进行比较。 4...

    Delphi字符串16进制互相转换

    在IT行业中,尤其是在进行底层通信或者数据处理时,16进制与字符串的相互转换是一项基本技能。在Delphi编程环境中,这样的转换操作是至关重要的,特别是在涉及到RS232和RS485串口通信以及SOCKET网络通讯时。这是因为...

    java中String类型转换方法.pdf

    * 使用字符串连接操作:`int i = 42; String str = "" + i;` 浮点型到字符串 将浮点型变量转换为字符串可以使用以下方法: * 使用 `Double.toString()` 方法:`double d = 3.14; String str = Double.toString(d)...

    以太坊solidity字符串拼接实现

    不过,在Solidity编程语言中,由于其特定的类型系统和内存管理机制,没有像其他高级编程语言(如JavaScript或Python)中那样直接支持字符串拼接运算符。因此,开发者需要了解如何在Solidity中实现字符串拼接,这通常...

    字符串转换16进制

    当用户点击button2时,程序会获取TextBox1中的文本,使用`Select`方法对每个字符调用`ToString("x2")`,这将返回字符的ASCII码值的16进制表示,最后通过`String.Join`将所有16进制字符连接成一个字符串。 为了增强...

    C语言实现String字符串及其函数stringUtil

    在C语言中,String字符串是字符数组的一种表现形式,它以空字符'\0'作为结束标志。本项目名为"C语言实现String字符串及其函数stringUtil",主要关注的是如何在C语言环境中自定义处理字符串的函数,以扩展标准库中...

    字符串处理类String实现

    在C++编程中,字符串处理是常见的操作,标准库提供了`std::string`类来处理字符串,但有时候为了满足特定需求或优化性能,开发者可能会选择自定义字符串类。本篇文章将详细探讨一个名为`String`的自定义实现,它采用...

    string字符串解释

    标题中的“string字符串解释”指的是在编程语言中对字符串数据类型的深入理解和应用。字符串是编程中常见的一种数据结构,用于存储和操作文本信息。在不同的编程语言中,字符串的实现和处理方式略有不同,但其核心...

    简单的string类

    这个简单的`string`类介绍将带我们回顾基础,深入理解其内部机制和常用方法。 首先,`std::string`是C++标准库中的一个类模板,它表示可变长度的字符序列。在声明时,你可以通过构造函数初始化字符串,例如: ```...

    java 数组和字符串

    - **`equals()`方法**:用于比较两个对象的值是否相等,对于基本类型和`String`类而言,`equals()`方法提供了基于值的比较。 #### 数组创建与引用 数组是存储固定数量同类型元素的容器。在Java中,可以通过以下...

    ADO.NET连接字符串大全

    例如,一个基本的ODBC连接字符串是: ``` Driver={SQL Server};Server=Aron1;Database=pubs;Uid=sa;Pwd=asdasd; ``` 2. SQL Server 2005 SQL Server 2005的连接字符串与SQL Server类似,但可能需要指定额外的参数,...

    JAVA 字符串应用笔记

    5. **字符串连接操作**: 在Java 5及以上版本,可以使用`+`操作符连接字符串,但在大量连接操作时,使用`StringBuilder`或`StringBuffer`更高效。 6. **常量池**: 字符串字面量会被放入常量池,如果两个字符串...

    c++和c字符串表达的区别

    1. **`strcat()`:**用于将一个字符串连接到另一个字符串的末尾。例如:`strcat(a, c);` 这行代码的作用是将字符串`c`连接到字符串`a`的末尾。需要注意的是,目标字符串(本例中的`a`)必须有足够的空间来容纳连接后...

    !实例研究:字符串类String.rar_字符串类_类String重载

    然而,在`std::string`中,`+=`被重载用于字符串连接,这使得我们可以方便地将两个字符串合并成一个新的字符串。 例如,如果我们有两个字符串`str1`和`str2`,我们可以这样使用`+=`操作符: ```cpp std::string ...

    String字符串

    - 使用`+`运算符或`StringBuilder`/`StringBuffer`类进行字符串连接。`+`在循环中连接大量字符串时效率较低,因为它会频繁创建新对象;`StringBuilder`/`StringBuffer`提供了`append()`方法,适合在多线程环境中...

    字符串 - C语言 - 连接两个字符串

    本文详细介绍了C语言中字符串连接的各种方法,从基本概念到高级技巧,帮助读者更好地理解和掌握字符串操作。无论是使用标准库函数还是手动实现,了解其内部机制都是非常重要的。希望本文能够对学习C语言的开发者有所...

    Java中颜色的String和Color对象之间的互相转换

    在将 Color 对象转换为字符串形式的颜色时,我们可以使用 `Integer.toHexString` 方法将颜色对象的红、绿、蓝三个分量的值转换为十六进制字符串,然后将这些字符串连接起来,形成完整的字符串形式的颜色。...

    自己写的字符串函数代码

    3. **字符串连接**:`my_strcat()`可能是作者实现的字符串连接函数,将一个字符串追加到另一个字符串的末尾,需要考虑目标字符串是否有足够的空间容纳新字符串。 4. **字符串长度**:`my_strlen()`可能用来计算字符...

    Vb-pointer-with-string-.rar_VB 字符串操作_vb字符串效率_vb指针字符串

    这是因为每次字符串操作(如连接、复制、查找等)都会创建新的字符串对象,这在内存分配和垃圾回收上消耗了额外的时间。 指针技术的引入提供了一种直接访问内存的方式,绕过了VB默认的字符串操作机制。通过使用...

    字符串检测VC

    "字符串检测VC"的标题和描述暗示了我们将会探讨如何在VC环境中对字符串进行基本的操作,如字符串的相加减、空串判断以及编辑框中的字符串管理。以下是对这些知识点的详细说明: 1. **字符串的相加减**: 在C++中,...

Global site tag (gtag.js) - Google Analytics