- 浏览: 73405 次
- 性别:
- 来自: 北京
文章分类
最新评论
-
george.gu:
lqjacklee 写道怎么解决。。 First: Conf ...
Bad version number in .class file -
lqjacklee:
怎么解决。。
Bad version number in .class file -
flyqantas:
would you pleade left more mate ...
UML Extension
Charset
Charset is a named mapping between sequences of sixteen-bit Unicode code units and sequences of bytes.
The uppercase letters 'A' through 'Z' ('0x41' through '0x5a'), The lowercase letters 'a' through 'z' ('0x61' through '0x7a'), The digits '0' through '9' ('0x30' through '0x39'), The dash character '-' ('0x2d', HYPHEN-MINUS), The period character '.' ('0x2e', FULL STOP), The colon character ':' ('0x3a', COLON), and The underscore character '_' ('0x5f', LOW LINE).
Unicode 4.0
The weakness for previous mapping is some special Characters cannot be represented, like Chinese, Greek cannot be represented. "unicode 4.0 standard" define basic multiple language encoding charset mapping, it is from \u0000 to \uFFFF.
Java save characters in unicode.
/** The value is used for character storage. */ private final char value[];
1 char = 2 bytes.
We can see following information from String javadoc:
A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String. The String class provides methods for dealing with Unicode code points (i.e., characters), in addition to those for dealing with Unicode code units (i.e., char values).
Encoding and Decoding String
String won't keep the Charset information, because all the characters are stored in unicode. Charset is only used when decoding the specified array of bytes to String characters.
String a = "èç"; byte[] b_defaultEncoding = a.getBytes(); // 0xe8, 0xe7 byte[] b_utf8 = a.getBytes("UTF-8"); // 0xc3, 0xa8, 0xc3, 0xa7 byte[] b_ucs2 = a.getBytes("ISO-10646-UCS-2"); // 0x00, 0xe8, 0x00, 0xe7 String a_defaultEncoding = new String(b_defaultEncoding); String a_mix = new String(b_ucs2); String a_ucs2 = new String(b_ucs2, "ISO-10646-UCS-2"); String a_utf8 = new String(b_utf8, "UTF-8"); System.out.println(a_defaultEncoding);//èç System.out.println(a_ucs2);//èç System.out.println(a_utf8);//èç System.out.println(a_mix); // è ç
Default Charset in java
You can specify default Charset with system property: "file.encoding". If not specified, normally "UTF-8" will be used. For more details please refer to Charset.defaultCharset().
Chinese Characters unicode charset
Normally, Simplified Chinese Characters unicode charset from \u4e00 to \u9fa5.
发表评论
-
javax.naming.CommunicationException: remote side declared peer gone on this JVM.
2012-07-11 09:44 2376javax.naming.ServiceUnavailable ... -
Generate special format numbers
2012-04-27 00:06 934DecimalFormat df = new DecimalF ... -
java.util.ConcurrentModificationException
2012-03-20 18:10 1654Where ConcurrentModificationExc ... -
Scheduled ThreadPool Executor suppressed or stopped after error happen
2012-03-20 16:54 1042ScheduledThreadPoolExecutor ... -
Memory Leak analyze
2012-03-12 06:21 924Shallow Heap Size Shallow siz ... -
Scalable and Easy to Integrate (3): Share Data with external Server by LDAP
2012-02-28 05:25 0Share Data with external Server ... -
Scalable and Easy to Integrate (2): Linstener
2012-02-28 05:20 0descript listener pattern in pr ... -
Scalable and Easy to Integrate (1): Callback
2012-02-28 05:19 0Descript the callback method de ... -
Wheels: Common utilities reuse and design
2012-02-18 05:57 831Search-->Copy-->Small ... -
Java Exception Design
2012-02-18 05:36 843Talk something used to happen i ... -
JMX in Weblogic
2012-02-17 19:20 1650Here I would like to list some ... -
Weblogic 11g JSP cache
2012-01-31 00:41 2888Normally, jsp_servlet classes ... -
2011,这样走过!
2011-12-31 06:01 101月11日,琪琪从巴黎 ... -
Security Manager
2011-12-20 21:20 0http://journals.ecs.soton.ac ... -
Unique Identifier generation in Java Application
2011-12-23 06:17 736During application design, ther ... -
WorkManager in Weblogic and Multiple Thread design in Enterprise Application
2011-12-15 05:25 0Descrip how to use workmanager ... -
AOP(3): Spring AOP
2011-12-20 05:24 1012AOP References Please refe ... -
JMX Usage
2011-12-13 06:22 828There is long times since last ... -
Using Taglib to Implement Dynamic UI
2011-12-12 06:13 949What's Taglib? Taglib is a JSP ... -
EJB: Singleton in EJB
2011-12-06 00:58 0When I was using EJB to impleme ...
相关推荐
例如,使用`new String(byte[], encoding)`构造函数: ```java byte[] bytes = ...; // 从文件或流中获取的字节 String str = new String(bytes, "UTF-8"); // 使用UTF-8编码解析字节 ``` 2. 字符串转换为字节数组...
在Java编程语言中,`file.encoding`是一个非常重要的系统属性,它定义了默认的字符编码。这个属性在处理文件输入输出、字符串与字节数组转换时起到关键作用。本文将深入探讨`file.encoding`的设置及其在Java中的工作...
2. **`java.nio`包中的Charset类**:Java标准库提供了`java.nio.charset`包,其中的`Charset`类用于表示字符集,提供对各种字符编码的支持。例如,`StandardCharsets.UTF_8`代表UTF-8编码。 3. **BufferedReader和...
public static void convertEncoding(String sourceFilePath, String targetFilePath, String sourceEncoding, String targetEncoding) throws IOException { File sourceFile = new File(sourceFilePath); File ...
Java 6引入了`java.nio.charset.CharsetDetector`类,它可以检测输入流的字符编码。以下是一个简单的示例: ```java import java.io.FileInputStream; import java.nio.charset.Charset; import java.nio....
在Java中,`java.nio.charset`包提供了处理字符编码的类和接口,例如`Charset`,`CharsetDecoder`和`CharsetEncoder`,它们可以用来读写不同编码的字节流。 `BufferedStream01.java`这个文件名暗示了一个关于缓冲流...
public static void convertEncoding(String sourceFilePath, String targetFilePath, String sourceEncoding, String targetEncoding) throws IOException { File sourceFile = new File(sourceFilePath); File ...
在Java中,String类提供了`getBytes()`方法,用于将字符串转换为字节数组,使用默认的平台编码。若需指定编码,可使用`getBytes(String charsetName)`,如`getBytes("UTF-8")`。 三、文件编码转换 Java的`...
Java还提供了`String`类的`getBytes()`和`new String(byte[], charset)`方法来处理字符集。`getBytes()`会根据默认字符集编码字符串为字节数组,而`new String(byte[], charset)`则可以指定字符集解码字节数组: ``...
在Java中,字符编码主要通过`Charset`类来处理。Java默认使用Unicode(更具体地说是UTF-16)作为内部编码,这允许Java支持世界上几乎所有的字符。然而,在读写文件或与外部系统交互时,可能需要处理其他编码格式,如...
在Java编程中,UTF-8编码是一个非常常见且广泛使用的字符编码格式,它能支持全球大部分语言的字符表示。然而,UTF-8有一个特殊特性,那就是它可以带有Byte Order Mark(BOM),这是一个特殊的字节序列,用于标识数据...
for (String encoding : Charset.availableCharsets().keySet()) { try { System.out.println("检测编码:" + encoding); if (input.equals(input.getBytes(encoding).toString())) { System.out.println("匹配...
entext = new String(Base64.getUrlEncoder().encode(textcomment.getBytes(CHARSET)), CHARSET); } catch (UnsupportedEncodingException e) { // 不会发生 } return entext; } @CrossOrigin @...
因此,当你在Java程序中创建一个 `String` 对象时,它默认就是Unicode格式的。 然而,在实际的应用场景中,我们常常需要处理不同编码格式的数据,如GBK、GB2312等。这就需要进行编码转换。 #### 三、编码转换的...
charset=UTF-8" pageEncoding="UTF-8"%> ``` 这将指定 JSP 输出的编码方式为 UTF-8。同时,也可以使用 request.setCharacterEncoding("UTF-8") 指令来设置请求的编码方式。 3. Tomcat 5.5 中文乱码问题解决方法 ...
String encoding = selectEncoding(request); if (encoding != null) { request.setCharacterEncoding(encoding); } } chain.doFilter(request, response); } public void init(FilterConfig filterConfig)...
public String charset(String input, String sourceEncoding) { try { return new String(input.getBytes(sourceEncoding), "UTF-8"); } catch (UnsupportedEncodingException e) { // 处理编码不支持的异常 ...
描述中提到的"NoSuchMethodError setCharacterEncoding(Ljava/lang/String;)V"是一个Java运行时异常,意味着在类装载时尝试调用的方法在该类的Class文件中存在,但在链接阶段找不到。这通常发生在试图执行的方法在...
Java标准库提供了广泛的字符集支持,通过`java.nio.charset.Charset`类可以访问和操作它们。在Java中,字符串默认使用Unicode(UTF-16)编码,但与其他系统交互时,可能需要进行编码转换。 3. **输入输出流的编码...