- 浏览: 258345 次
- 性别:
- 来自: 北京
文章分类
最新评论
-
pxczy:
我的firefox下和chrome下试了都不行。。。
JS+SWF实现浏览器复制 -
mengjie133233:
貌似那个js版本的不对啊
看到PHP的一道面试题, 做了下, 不知道还有没好点方法 -
shengerweinan:
您好,我在第四步测试的时候输入ssh -T git@githu ...
GIT 入门使用(以GITHUB为服务器) -
qq550906358:
谢谢了,这个问题困扰了我半天,原来添加一个doLayout就没 ...
Ext.Panel 动态添加组件后,没有显示组件 。需要调用Ext.Panel 的doLayout()函数 -
0633bbs点com:
问题已经解决了,谢谢。
cumulus(浑天仪)使用手册
转自: http://hi.baidu.com/lssbing/blog/item/d2074a1635763910972b43d8.html
utf-8 是一种在web应用中经常使用的一种 unicode 字符的编码方式,使用 utf-8 的好处在于它是一种变长的编码方式,对于 ANSII 码编码长度为1个字节,这样的话在传输大量 ASCII 字符集的网页时,可以大量节约网络带宽。使用 utf-8 编码来编写网页的时候, 往往会因为 bom (Byte Order Mark) 的问题,导致网页中经常出现一些不明的空行或者乱码字符。 这些都是因为 utf-8 编码方式对于 bom 不是强制的。因此 utf-8 编码在保存文件的时候,会出现不同的处理方式。比如有的浏览器(FireFox)可以自动过滤掉所有 utf-8 bom , 有的 (IE) 只能过滤掉一次 bom (为什么是一次? 当你出现 Include 多次文件时就会碰上这个问题了)。
转自: http://www.w3.org/International/questions/qa-utf8-bom
When using UTF-8 encoded pages in some user agents, I get an extra line or unwanted characters at the top of my web page or included file. How do I remove them?
answer
If you are dealing with a file encoded in UTF-8, your display problems may be caused by the presence of a UTF-8 signature (BOM) that the user agent doesn't recognize. This used to be a problem for static HTML files, but is no longer in recent versions of major browsers. However, if you use PHP to generate your HTML, this was still an issue with PHP version 5.3.6.
The BOM is always at the beginning of the file, and so you would normally expect to see the display issues at the top of a page. However, you may also find blank lines appearing within the page if you include text from a separate file that begins with a UTF-8 signature.
We have a set of test pages and a summary of results for various recent browser versions that explore this behaviour.
This article will help you determine whether the UTF-8 is causing the problem. If there is no evidence of a UTF-8 signature at the beginning of the file, then you will have to look elsewhere for a solution.
What is a UTF-8 signature (BOM)?
Some applications insert a particular combination of bytes at the beginning of a file to indicate that the text contained in the file is Unicode. This combination of bytes is known as a signature or Byte Order Mark (BOM). Some applications - such as a text editor or a browser - will display the BOM as an extra line in the file, others will display unexpected characters, such as .
See the side panel for more detailed information about the BOM.
The BOM is the Unicode codepoint U+FEFF, corresponding to the Unicode character 'ZERO WIDTH NON-BREAKING SPACE' (ZWNBSP).
In UTF-16 and UTF-32 encodings, unless there is some alternative indicator, the BOM is essential to ensure correct interpretation of the file's contents. Each character in the file is represented by 2 or 4 bytes of data and the order in which these bytes are stored in the file is significant; the BOM indicates this order.
In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 or UTF-32 encodings, there is no alternative sequence of bytes in a character. The BOM may still occur in UTF-8 encoding text, however, either as a by-product of an encoding conversion or because it was added by an editor.
Detecting the BOM
First, we need to check whether there is indeed a BOM at the beginning of the file.
You can try looking for a BOM in your content, but if your editor handles the UTF-8 signature correctly you probably won't be able to see it. An editor which does not handle the UTF-8 signature correctly displays the bytes that compose that signature according to its own character encoding setting. (With the Latin 1 (ISO 8859-1) character encoding, the signature displays as characters .) With a binary editor capable of displaying the hexadecimal byte values in the file, the UTF-8 signature displays as EF BB BF.
Alternatively, your editor may tell you in a status bar or a menu what encoding your file is in, including information about the presence or not of the UTF-8 signature.
If not, some kind of script-based test (see below) may help. Alternatively, you could try this small web-based utility. (Note, if it’s a file included by PHP or some other mechanism that you think is causing the problem, type in the URI of the included file.)
Removing the BOM
If you have an editor which shows the characters that make up the UTF-8 signature you may be able to delete them by hand. Chances are, however, that the BOM is there in the first place because you didn't see it.
Check whether your editor allows you to specify whether a UTF-8 signature is added or kept during a save. Such an editor provides a way of removing the signature by simply reading the file in then saving it out again. For example, if Dreamweaver detects a BOM the Save As dialogue box will have a check mark alongside the text "Include Unicode Signature (BOM)". Just uncheck the box and save.
One of the benefits of using a script is that you can remove the signature quickly, and from multiple files. In fact the script could be run automatically as part of your process. If you use Perl, you could use a simple script created by Martin Dürst.
Note: You should check the process impact of removing the signature. It may be that some part of your content development process relies on the use of the signature to indicate that a file is in UTF-8. Bear in mind also that pages with a high proportion of Latin characters may look correct superficially but that occasional characters outside the ASCII range (U+0000 to U+007F) may be incorrectly encoded.
by the way
You will find that some text editors such as Windows Notepad will automatically add a UTF-8 signature to any file you save as UTF-8.
A UTF-8 signature at the beginning of a CSS file can sometimes cause the initial rules in the file to fail on certain user agents.
In some browsers, the presence of a UTF-8 signature will cause the browser to interpret the text as UTF-8 regardless of any character encoding declarations to the contrary.
utf-8 是一种在web应用中经常使用的一种 unicode 字符的编码方式,使用 utf-8 的好处在于它是一种变长的编码方式,对于 ANSII 码编码长度为1个字节,这样的话在传输大量 ASCII 字符集的网页时,可以大量节约网络带宽。使用 utf-8 编码来编写网页的时候, 往往会因为 bom (Byte Order Mark) 的问题,导致网页中经常出现一些不明的空行或者乱码字符。 这些都是因为 utf-8 编码方式对于 bom 不是强制的。因此 utf-8 编码在保存文件的时候,会出现不同的处理方式。比如有的浏览器(FireFox)可以自动过滤掉所有 utf-8 bom , 有的 (IE) 只能过滤掉一次 bom (为什么是一次? 当你出现 Include 多次文件时就会碰上这个问题了)。
转自: http://www.w3.org/International/questions/qa-utf8-bom
When using UTF-8 encoded pages in some user agents, I get an extra line or unwanted characters at the top of my web page or included file. How do I remove them?
answer
If you are dealing with a file encoded in UTF-8, your display problems may be caused by the presence of a UTF-8 signature (BOM) that the user agent doesn't recognize. This used to be a problem for static HTML files, but is no longer in recent versions of major browsers. However, if you use PHP to generate your HTML, this was still an issue with PHP version 5.3.6.
The BOM is always at the beginning of the file, and so you would normally expect to see the display issues at the top of a page. However, you may also find blank lines appearing within the page if you include text from a separate file that begins with a UTF-8 signature.
We have a set of test pages and a summary of results for various recent browser versions that explore this behaviour.
This article will help you determine whether the UTF-8 is causing the problem. If there is no evidence of a UTF-8 signature at the beginning of the file, then you will have to look elsewhere for a solution.
What is a UTF-8 signature (BOM)?
Some applications insert a particular combination of bytes at the beginning of a file to indicate that the text contained in the file is Unicode. This combination of bytes is known as a signature or Byte Order Mark (BOM). Some applications - such as a text editor or a browser - will display the BOM as an extra line in the file, others will display unexpected characters, such as .
See the side panel for more detailed information about the BOM.
The BOM is the Unicode codepoint U+FEFF, corresponding to the Unicode character 'ZERO WIDTH NON-BREAKING SPACE' (ZWNBSP).
In UTF-16 and UTF-32 encodings, unless there is some alternative indicator, the BOM is essential to ensure correct interpretation of the file's contents. Each character in the file is represented by 2 or 4 bytes of data and the order in which these bytes are stored in the file is significant; the BOM indicates this order.
In the UTF-8 encoding, the presence of the BOM is not essential because, unlike the UTF-16 or UTF-32 encodings, there is no alternative sequence of bytes in a character. The BOM may still occur in UTF-8 encoding text, however, either as a by-product of an encoding conversion or because it was added by an editor.
Detecting the BOM
First, we need to check whether there is indeed a BOM at the beginning of the file.
You can try looking for a BOM in your content, but if your editor handles the UTF-8 signature correctly you probably won't be able to see it. An editor which does not handle the UTF-8 signature correctly displays the bytes that compose that signature according to its own character encoding setting. (With the Latin 1 (ISO 8859-1) character encoding, the signature displays as characters .) With a binary editor capable of displaying the hexadecimal byte values in the file, the UTF-8 signature displays as EF BB BF.
Alternatively, your editor may tell you in a status bar or a menu what encoding your file is in, including information about the presence or not of the UTF-8 signature.
If not, some kind of script-based test (see below) may help. Alternatively, you could try this small web-based utility. (Note, if it’s a file included by PHP or some other mechanism that you think is causing the problem, type in the URI of the included file.)
Removing the BOM
If you have an editor which shows the characters that make up the UTF-8 signature you may be able to delete them by hand. Chances are, however, that the BOM is there in the first place because you didn't see it.
Check whether your editor allows you to specify whether a UTF-8 signature is added or kept during a save. Such an editor provides a way of removing the signature by simply reading the file in then saving it out again. For example, if Dreamweaver detects a BOM the Save As dialogue box will have a check mark alongside the text "Include Unicode Signature (BOM)". Just uncheck the box and save.
One of the benefits of using a script is that you can remove the signature quickly, and from multiple files. In fact the script could be run automatically as part of your process. If you use Perl, you could use a simple script created by Martin Dürst.
Note: You should check the process impact of removing the signature. It may be that some part of your content development process relies on the use of the signature to indicate that a file is in UTF-8. Bear in mind also that pages with a high proportion of Latin characters may look correct superficially but that occasional characters outside the ASCII range (U+0000 to U+007F) may be incorrectly encoded.
by the way
You will find that some text editors such as Windows Notepad will automatically add a UTF-8 signature to any file you save as UTF-8.
A UTF-8 signature at the beginning of a CSS file can sometimes cause the initial rules in the file to fail on certain user agents.
In some browsers, the presence of a UTF-8 signature will cause the browser to interpret the text as UTF-8 regardless of any character encoding declarations to the contrary.
发表评论
-
PHP 排列组合
2012-06-11 00:01 1995计算数组有多少种排列组合 <?php $arr ... -
PHP 常见算法【冒泡排序, 快速排序, 插入排序, 选择排序, 二分法查找, ..】
2012-06-09 23:58 2850// 冒泡排序 function bubblesort($a ... -
牛年求牛:有一母牛,到4岁可生育,每年一头,所生均是一样的母牛,到15岁绝育,不再能生,20岁死亡,问n年后有多少头牛
2012-06-09 23:42 5486问题: 牛年求牛:有一母牛,到4岁可生育,每年一头,所生均 ... -
PHP 递归实现层级树状展现数据
2012-06-09 22:26 13015<?php $db = mysql_conn ... -
php中require和include的几点区别
2012-03-07 01:26 1151如果php.ini配置文件配置了URL fopen wra ... -
chrome插件推荐
2012-03-01 23:14 13611. Vimiumhttps://chrome.google. ... -
PHP 读写 CSV
2012-02-12 21:57 39001. 读取csv数据, 输出到sales.csv文件中 $sa ... -
php安装好以后 apache2 无法启动
2011-12-27 19:12 1116在安装完成PHP后, 重新启动apache报如下错误 原因是L ... -
Linux下 手动安装配置PHP
2011-12-27 19:12 17250. 安装php前 需先安装 几个扩展 命令为 . ... -
为php增加mbstring扩展 等一般扩展
2011-12-27 19:12 1172注: 默认生成 extension.so的目录是 /usr/l ... -
PHP的mcrypt模块安装
2011-12-27 19:12 1228首先要下载三个软件(下载地址是我提供的,里面有不同的压缩版本) ... -
Linux下安装PHP Memcache扩展.
2011-12-27 19:12 1647转自http://koda.iteye.com/blog/66 ... -
安装php_cURL扩展
2011-12-28 22:53 7416cURL官网:http://curl.haxx.s ... -
[CI]发送Email
2011-12-28 22:53 10321. 在config/目录下添加 email.php类 $co ... -
XSS
2011-12-28 22:53 1199XSS又叫CSS (Cross Site Script) ,跨 ... -
[CI]登录验证
2011-12-28 22:53 7905[list] 预先加载数据库操作类和Session类 即在a ... -
[CI]多数据库配置使用
2011-12-26 13:05 17001.在database.php配置文件中 加入 $db['[ ... -
[Yii]表单下拉选框及查询下拉选框
2011-12-26 13:02 4321form表单 Views中: <?php echo $ ... -
Yii 语言设置
2011-12-26 13:01 37861. 在main.php配置文件中加入 'language' ... -
数据库Identity+Primarykey字段强行插入的处理
2011-12-26 13:00 1101插入|数据|数据库 由于是Identity,所以在强行插入的时 ...
相关推荐
在Java编程中,UTF-8编码是一个非常常见且广泛使用的字符编码格式,它能支持全球大部分语言的字符表示。然而,UTF-8有一个特殊特性,那就是它可以带有Byte Order Mark(BOM),这是一个特殊的字节序列,用于标识数据...
本文将深入探讨几种常见的编码格式,如GB2312、UTF-8以及UTF-8-BOM,并详细讲解如何在C#中进行这些编码格式之间的转换,同时会涉及到与Stream相关的操作。 GB2312,全称为“国标汉字编码字符集”,是中国大陆广泛...
在Windows操作系统环境下,经常需要进行这样的转换,因为某些程序或系统可能更倾向于识别带有BOM的UTF-8编码,尤其是在处理源代码文件或者非英文文本时。不带BOM的UTF-8文件可能会导致乱码或者程序无法正确解析。 ...
这个场景中,我们面临的挑战是如何正确处理UTF-8带有BOM(Byte Order Mark)的文件,因为BOM可能会导致文件内容显示为问号或者其他乱码。下面将详细介绍如何解决这个问题。 首先,我们需要理解什么是UTF-8的BOM。...
1.首先介绍一下本人应用场景,qt...3.此小工具主要针对utf-8编码文件,能够批量添加删除BOM,无识别转化ASIIC功能,添加BOM时,如果文件是utf-8(BOM),则跳过,删除亦然 4.当不选中添加删除时可用于文件数量统计。
标题"批量去掉UTF-8文件中BOM标示符"指的是处理这一问题的方法,即通过特定工具或代码删除UTF-8文件开头的BOM标识。这个过程通常是为了确保文件在不同的系统和环境中能够正确无误地被读取和处理。 描述中提到的博文...
由于项目需要,需要字符串转为XML文件,直接用Fileopen进行EncodingUTF8编码后,发现文件实际为UTF-8 BOM编码 问度娘发现有相同问题,但解决方式是利用新建一个UTF-8的TXT文件后,再进行COPY加内容。感觉这样操作...
本文将深入探讨PHP中的字符编码转换,特别是针对ANSI、Unicode(包括Little Endian和Big Endian)、UTF-8以及UTF-8+BOM的转换。 首先,让我们了解这些编码格式的含义: 1. ANSI编码:通常指的是Windows系统的默认...
Forcibly saves all files in UTF-8 (No BOM) encoding. ForceUTF8 的核心功能在于其智能识别并转换字符串编码的能力。即使字符串中混杂着多种编码,\ForceUTF8\Encoding::toUTF8() 都能成功将其转换为统一的UTF-8...
当上传文件存在中文时,修改上传文件编码为utf-8-bom
在Java中,避免UTF-8的csv文件打开中文出现乱码的方法是非常重要的。csv文件是 comma separated values 的缩写,常用于数据交换和导入导出操作。然而,在Java中读取和写入csv文件时,中文字符如果不正确地处理,可能...
标题中的“PB9转换utf-8例子”指的是在PowerBuilder 9(PB9)环境下将数据从非UTF-8编码转换为UTF-8编码的一种解决方案。由于PB9本身不直接支持这种转换,开发者通常需要利用外部库或者特定的编程技巧来实现这个功能...
在大多数情况下,BOM在UTF-8编码中并不必要,因为它默认是小端序,但对于某些程序或系统,BOM可能有助于识别文件的编码方式。 在处理带BOM的UTF-8文件时,IDEA提供了很好的兼容性。通常,BOM可能会导致一些编辑器或...
UTF-8编码是一种广泛使用的字符编码方式,尤其在互联网和软件开发中占据核心地位。它能够表示Unicode字符集中的所有字符,确保了不同语言的文字能在同一文档中和谐共存。然而,UTF-8编码有一个特性,就是对于某些...
UTF-8编码是一种广泛使用的字符编码标准,尤其在文本文件和网页中常见。它能够表示世界上几乎所有的文字,包括汉字和其他非拉丁字符。UTF-8的全称是“8位Unicode转换格式”,它以1到4个字节来编码Unicode字符。 在...
在UTF-8编码中,BOM是一个由三个字节组成的序列:0xEF, 0xBB, 0xBF,它位于文件的开头,用来表明该文件采用的是UTF-8编码。在C#编程中,有时我们需要在写入UTF-8文件时添加这个BOM头,以确保其他程序或系统能正确...
BOM是UTF-8编码的一个可选特征,它在文件开头放置三个特殊的字节来标识文件的字符编码,但这可能会导致在某些编辑器或浏览器中出现不必要的字符或者乱码问题。因此,开发这个小工具是为了帮助开发者处理这个问题。 ...
UTF-8是一种广泛使用的Unicode字符编码,能够表示Unicode字符集中的所有字符。它的优点在于对ASCII字符(如英文)使用单字节,而对于其他语言的字符使用多个字节,这样既节省空间又保持了兼容性。 2. **打开UTF-8...
如果一个GBK编码的文本包含非GBK字符,使用UTF-8编码器读取会出现乱码。因此,通过转换器将GBK编码转换为UTF-8编码,可以确保文本在各种系统和语言环境中都能正确显示。 4. **编码转换工具的实现**: - 接收输入:...