`
Alrale
  • 浏览: 62355 次
  • 来自: fantasy
社区版块
存档分类
最新评论

character encodings in the Java 2 platform(Java2 平台的字符编码)

    博客分类:
  • Java
阅读更多

Web components usually use PrintWriter to produce responses; PrintWriter automatically encodes using ISO-8859-1. Servlets can also output binary data using OutputStream classes, which perform no encoding. An application that uses a character set that cannot use the default encoding must explicitly set a different encoding.

 

For web components, three encodings must be considered:

  • Request

  • Page (JSP pages)

  • Response

Request Encoding

The request encoding is the character encoding in which parameters in an incoming request are interpreted. Currently, many browsers do not send a request encoding qualifier with the Content-Type header. In such cases, a web container will use the default encoding, ISO-8859-1, to parse request data.

If the client hasn’t set character encoding and the request data is encoded with a different encoding from the default, the data won’t be interpreted correctly. To remedy this situation, you can use the

ServletRequest.setCharacterEncoding(String enc) 

method to override the character encoding supplied by the container.

To control the request encoding from JSP pages, you can use the JSTL

 

fmt:requestEncoding 

 tag.

You must call the method or tag before parsing any request parameters or reading any input from the request. Calling the method or tag once data has been read will not affect the encoding.

Page Encoding

For JSP pages, the page encoding is the character encoding in which the file is encoded.

For JSP pages in standard syntax, the page encoding is determined from the following sources:

  • The page encoding value of a JSP property group (see Setting Properties for Groups of JSP Pages) whose URL pattern matches the page.

  • The pageEncoding attribute of the page directive of the page. It is a translation-time error to name different encodings in the pageEncoding attribute of the page directive of a JSP page and in a JSP property group.

  • The CHARSET value of the contentType attribute of the page directive.

If none of these is provided, ISO-8859-1 is used as the default page encoding.

 

The pageEncoding and contentType attributes determine the page character encoding of only the file that physically contains the page directive. A web container raises a translation-time error if an unsupported page encoding is specified.

Response Encoding

The response encoding is the character encoding of the textual response generated by a web component. The response encoding must be set appropriately so that the characters are rendered correctly for a given locale. A web container sets an initial response encoding for a JSP page from the following sources:

  • The CHARSET value of the contentType attribute of the page directive

  • The encoding specified by the pageEncoding attribute of the page directive

  • The page encoding value of a JSP property group whose URL pattern matches the page

If none of these is provided, ISO-8859-1 is used as the default response encoding.

The setCharacterEncoding, setContentType, and setLocale methods can be called repeatedly to change the character encoding. Calls made after the servlet response’s getWriter method has been called or after the response is committed have no effect on the character encoding. Data is sent to the response stream on buffer flushes (for buffered pages) or on encountering the first content on unbuffered pages.

 

Calls to setContentType set the character encoding only if the given content type string provides a value for the charset attribute. Calls to setLocale set the character encoding only if neither setCharacterEncoding nor setContentType has set the character encoding before. To control the response encoding from JSP pages, you can use the JSTL fmt.setLocale tag.

To obtain the character encoding for a locale, the setLocale method checks the locale encoding mapping for the web application. For example, to map Japanese to the Japanese-specific encoding Shift_JIS, follow these steps:

  1. Select the WAR.

  2. Click the Advanced Settings button.

  3. In the Locale Character Encoding table, Click the Add button.

  4. Enter ja in the Extension column.

  5. Enter Shift_JIS in the Character Encoding column.

If a mapping is not set for the web application, setLocale uses a Application Server mapping.

The first application in Chapter 5, JavaServer Pages Technology allows a user to choose an English string representation of a locale from all the locales available to the Java 2 platform and then outputs a date localized for that locale. To ensure that the characters in the date can be rendered correctly for a wide variety of character sets, the JSP page that generates the date sets the response encoding to UTF-8 by using the following directive:

<%@ page contentType="text/html; charset=UTF-8" %>

 

分享到:
评论

相关推荐

    Java字符集编码简记

    在IT行业中,字符集编码是基础且至关重要的概念,尤其对于Java开发者来说,理解编码机制是处理各种文本数据的关键。本文将围绕“Java字符集编码简记”这一主题,深入探讨相关知识点,并结合标签“源码”和“工具”,...

    IDEA编码GBK不可映射字符解决方法

    1. **设置IDEA全局编码**:进入IDEA的设置(Preferences),找到Editor -&gt; File Encodings,将全局编码、项目编码和默认文本编码都设置为GBK。 2. **针对单个文件设置编码**:对于特定的GBK文件,可以在文件的右下...

    JAVA编码问题[参考].pdf

    所以问题的关键在于,eclipse的控制台默认使用GBK编码,而JAVA源代码中的中文字符是以UTF-8编码存储的。 为了解决这个问题,我们可以采取以下几种策略: 1. **修改Eclipse配置**:在Eclipse中,可以设置工作空间的...

    java自动获取文件的编码

    在Java编程中,正确识别和处理文件的编码是至关重要的,因为不同的文件可能采用不同的字符编码,例如ASCII、UTF-8、GBK等。本篇将深入探讨如何在Java中自动获取文件的编码,并介绍一个编码检测的工具类——...

    开发常见的编码异常解决方案

    开发平台编码模式设置 在Eclipse中,可以进入`Window &gt; Preferences &gt; General &gt; Workspace`,设置`Text file encoding`为`UTF-8`;对于IntelliJ IDEA,可以在`File &gt; Settings &gt; Editor &gt; File Encodings`中将...

    Fonts & Encodings

    Learn about the morass of the data that accompanies each Unicode character, and how Unicode deals with normalization, the bidirectional algorithm, and the handling of East Asian characters. ...

    Text Processing with Java 6

    1. 字符和字符串处理:文档中提到了字符编码(Character Encodings)、字符类(CharacterClass)、CharSequence接口、String类、StringBuilder类、CharBuffer类、Charset类等。这些内容是Java进行文本处理的基石。...

    fonts and encodings

    ### 字符编码(Encodings) 字符编码是计算机科学中的一个关键概念,它定义了字符与数字之间的映射关系,使得文本可以在计算机系统中存储和传输。书中讲解了多种字符编码标准,包括ASCII、ISO-8859、Windows-1252和...

    gedit设置中文字符集

    这可以通过在gconf-editor中找到`/apps/gedit-2/preferences/encodings/default_save`路径并设置相应的编码值来实现。 - 当使用gedit打开文件时,如果仍然遇到乱码问题,可以尝试手动选择文件的编码格式。在gedit的...

    Linux下查看文件编码,文件编码格式转换和文件名编码.doc

    其中,encoding 指的是 Vim 内部使用的字符编码方式,fileencoding 指的是当前编辑的文件的字符编码方式,fileencodings 是 Vim 自动探测 fileencoding 的顺序列表,termencoding 指的是终端的字符编码方式。...

    Java 网络编程实验报告 含实验总结

    Java的I/O库非常强大,包括了流(Streams)、缓冲区(Buffer)、字符编码(Encodings)和对象序列化(Serialization)等多个方面。在这个实验中,可能使用了`java.io`和`java.nio`包下的类,如`InputStream`、`OutputStream`...

    修改Ubuntu中文编码格式

    Ubuntu操作系统中处理中文编码问题是一个常见的用户需求,尤其在与Windows系统交互数据时。由于两个系统默认使用的编码格式不同,通常情况下Windows中文版默认使用GBK编码,而Ubuntu在支持中文之后,默认使用的是UTF...

    解析php获取字符串的编码格式的方法(函数)

    - `in_charset`: 输入字符串的编码。 - `out_charset`: 输出字符串期望的编码。 - `str`: 需要转换的字符串。 以下示例展示了如何将UTF-8编码的字符串转换为GBK编码: ```php if ($encode == "UTF-8") { $...

    The latex2e sources

    ### LaTeX2ε源代码概述与关键技术点解析 #### 标题:LaTeX2ε源代码 #### 描述:LaTeX2ε源代码介绍 #### 标签:LaTeX2ε源代码 #### 部分内容摘要分析: ##### 1. 引言 LaTeX2ε(通常写作LaTeX2e)是LaTeX...

    html-encodings:已知HTML字符编码标签列表

    来自有关已知HTML字符编码标签的信息。 安装 该软件包仅适用于ESM:需要使用Node 12+才能使用它,并且必须将其import而不是require d。 : npm install html-encodings 用 import { list , groups } from '...

    2021-2022计算机二级等级考试试题及答案No.17518.docx

    7. InputStreamReader is a Java class used to convert a byte stream (InputStream) into a character stream (Reader), facilitating the reading of text data in different character encodings. 8. In Java, ...

    encodings.xml

    encodings.xml

Global site tag (gtag.js) - Google Analytics