Oracle character set

TeddyWang

浏览: 521517 次
性别:
来自: 深圳

最近访客更多访客>>

meq1986

jibare

sunwf

u012391994

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Oracle（pl/sql_Erp_Pro*C）

Oracle SQL Server IBM SQL HP

Character Encoding Schemes

There are two general groups of encoding schemes, those based on 7-bit ASCII and those based on IBM EBCDIC. Within each group, all schemes normally use the same encoding for the 26 Latin characters (A to Z), but use different encoding for other characters used in languages other than English. ASCII and EBCDIC use different encodings, even for the Latin characters.

--------------------------

Specifying Language-Dependent Behavior

This section discusses the parameters that specify language-dependent operation. You can set language-dependent behavior defaults for the server and set language dependent behavior for the client that overrides these defaults.

Most NLS parameters can be used in three ways

As initialization parameters to specify language-dependent behavior defaults for the server.

For example, in your INIT.ORA file, include

		NLS_TERRITORY = FRANCE

As environment variables on client machines to specify language-dependant behavior defaults for a session. These defaults override the defaults set for the server.

For example, on a UNIX system

		setenv NLS_TERRITORY FRANCE

As ALTER SESSION parameters to change the language-dependent behavior of a session. These parameters override the defaults set for the session or for the server.

For example:

		ALTER SESSION SET NLS_TERRITORY = FRANCE

NLS Parameters

The NLS_LANGUAGE and NLS_TERRITORY parameters implicitly specify several aspects of language-dependent operation. Additional NLS parameters provide explicit control over these operations. The parameters listed below can be specified in the initialization file, or they can also be specified for each session with the ALTER SESSION command.

*Parameter*	*Description*
NLS_CALENDAR	Calendar system
NLS_CURRENCY	Local currency symbol
NLS_DATE_FORMAT	Default date format
NLS_DATE_LANGUAGE	Default language for dates
NLS_ISO_CURRENCY	ISO international currency symbol
NLS_LANGUAGE	Default language
NLS_NUMERIC_CHARACTERS	Decimal character and group separator
NLS_SORT	Character sort sequence
NLS_SPECIAL_CHARS
NLS_TERRITORY	Default territory

For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference.

NLS_CALENDAR

Many different calendar systems are in use throughout the world. NLS_CALENDAR specifies which calendar system Oracle uses.

NLS_CALENDAR can have one of the following values:

Arabic Hijrah

Gregorian

Japanese Imperial

Persian

ROC Official

Thai Buddha

For example, if NLS_CALENDAR is set to "Japanese Imperial", the date format is "YY-MM-DD", and the date is February 17, 1907, then the sysdate is displayed as follows:

SELECT SYSDATE FROM DUAL;

SYSDATE

--------

07-02-17

NLS_CURRENCY
This parameter specifies the character string returned by the number format mask L, the local currency symbol, overriding that defined implicitly by NLS_TERRITORY. For example, to set the local currency symbol to "Dfl" (including a space), the parameter should be set as follows:  
NLS_CURRENCY = "Dfl "
In this case, the query 
 
SELECT TO_CHAR(TOTAL, 'L099G999D99') "TOTAL"
   FROM ORDERS WHERE CUSTNO = 586
would return 
 
TOTAL
-------------
Dfl 12.673,49
You can alter the default value of NLS_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_CURRENCY command. 
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference. 
 
NLS_DATE_FORMAT
Defines the default date format to use with the TO_CHAR and TO_DATE functions. The default value of this parameter is determined by NLS_TERRITORY. The value of this parameter can be any valid date format mask, and the value must be surrounded by double quotes. For example:
 
NLS_DATE_FORMAT = "MM/DD/YYYY"
As another example, to set the default date format to display Roman numerals for months, you would include the following line in your initialization file: 
 
NLS_DATE_FORMAT = "DD RM YY"
With such a default date format, the following SELECT statement would return the month using Roman numerals (assuming today's date is February 13, 1991): 
 
SELECT TO_CHAR(SYSDATE) CURRDATE
   FROM DUAL;

CURRDATE
---------
13 II 91
The value of this parameter is stored in the tokenized internal date format. Each format element occupies two bytes, and each string occupies the number of bytes in the string plus a terminator byte. Also, the entire format mask has a two-byte terminator. For example, "MM/DD/YY" occupies 12 bytes internally because there are three format elements, two one-byte strings (the two slashes), and the two-byte terminator for the format mask. The tokenized format for the value of this parameter cannot exceed 24 bytes. 
Note: The applications you design may need to allow for a variable-length default date format. Also, the parameter value must be surrounded by double quotes: single quotes are interpreted as part of the format mask. 
You can alter the default value of NLS_DATE_FORMAT by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_FORMAT command. 
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference. 
 
NLS_DATE_ LANGUAGE
This parameter specifies the language for the spelling of day and month names by the functions TO_CHAR and TO_DATE, overriding that specified implicitly by NLS_LANGUAGE. NLS_DATE_LANGUAGE has the same syntax as the NLS_LANGUAGE parameter, and all supported languages are valid values. For example, to specify the date language as French, the parameter should be set as follows:
 
NLS_DATE_LANGUAGE = FRENCH
In this case, the query 
 
SELECT TO_CHAR(SYSDATE, 'Day:Dd Month yyyy')
   FROM DUAL;
would return 
 
Mercredi:13 Février 1991
Month and day name abbreviations are also in the language specified, for example: 
 
Me:13 Fév 1991
The default date format also uses the language-specific month name abbreviations. For example, if the default date format is DD-MON-YYYY, the above date would be inserted using: 
 
INSERT INTO tablename VALUES ('13-Fév-1991');
The abbreviations for AM, PM, AD, and BC are also returned in the language specified by NLS_DATE_LANGUAGE. Note that numbers spelled using the TO_CHAR function always use English spellings; for example: 
 
SELECT TO_CHAR(TO_DATE('27-Fév-91'),'Day: ddspth Month')
   FROM DUAL;
would return: 
 
Mercredi: twenty-seventh Février
You can alter the default value of NLS_DATE_LANGUAGE by changing its value in the initialization file and then restarting the instance, and you can alter the value during a session using an ALTER SESSION SET NLS_DATE_LANGUAGE command. 
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference. 
 
NLS_ISO_CURRENCY
This parameter specifies the character string returned by the number format mask C, the ISO currency symbol, overriding that defined implicitly by NLS_TERRITORY.
Local currency symbols can be ambiguous; for example, a dollar sign ($) can refer to US dollars or Australian dollars. ISO Specification 4217 1987-07-15 defines unique "international" currency symbols for the currencies of specific territories (or countries). 
For example, the ISO currency symbol for the US Dollar is USD, for the Australian Dollar AUD. To specify the ISO currency symbol, the corresponding territory name is used. 
NLS_ISO_CURRENCY has the same syntax as the NLS_TERRITORY parameter, and all supported territories are valid values. For example, to specify the ISO currency symbol for France, the parameter should be set as follows: 
 
NLS_ISO_CURRENCY = FRANCE
In this case, the query 
 
SELECT TO_CHAR(TOTAL, 'C099G999D99') "TOTAL"
   FROM ORDERS WHERE CUSTNO = 586
would return 
 
TOTAL
-------------
 FRF12.673,49
You can alter the default value of NLS_ISO_CURRENCY by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_ISO_CURRENCY command. 
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference. 
 
NLS_NUMERIC_ CHARACTERS
This parameter specifies the decimal character and grouping separator, overriding those defined implicitly by NLS_TERRITORY. The decimal character separates the integer and decimal parts of a number. The grouping separator is the character returned by the number format mask G. For example, to set the decimal character to a comma and the grouping separator to a period, the parameter should be set as follows:
 
NLS_NUMERIC_CHARACTERS = ",."
Both characters are single byte and must be different. Either can be a space. 
Note: When the decimal character is not a period (.) or when a group separator is used, numbers appearing in SQL statements must be enclosed in quotes. For example: 
 
        INSERT INTO SIZES (ITEMID, WIDTH, QUANTITY)
          VALUES (618, '45,5', TO_NUMBER('1.234','9G999'));
You can alter the default value of NLS_NUMERIC_CHARACTERS by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_DATE_LANGUAGE command. 
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference. 
 
NLS_SORT
This parameter specifies the type of sort for character data, overriding that defined implicitly by NLS_LANGUAGE. 
The syntax of NLS_SORT is: 
 
NLS_SORT = { BINARY | name }
BINARY specifies a binary sort and name specifies a particular linguistic sort sequence. For example, to specify the linguistic sort sequence called German, the parameter should be set as follows: 
 
NLS_SORT = German
The name given to a linguistic sort sequence has no direct connection to language names. Usually, however, each supported language will have an appropriate linguistic sort sequence defined that uses the same name. 
Note: Setting the NLS_SORT initialization parameter to BINARY causes a sort to use a full table scan, regardless of the path the optimizer chooses. 
You can alter the default value of NLS_SORT by changing its value in the initialization file and then restarting the instance, and you can alter its value during a session using an ALTER SESSION SET NLS_SORT command. 
For a complete description of ALTER SESSION, see Oracle7 Server SQL Reference. 
A complete list of linguistic definitions is provided in the "Linguistic Definitions" table .
 
_______________________________________________________________________
NLS Data
This section lists supported languages, territories, storage character sets, Arabic/Hebrew display character sets, linguistic definitions, and calendars. 
 

 
 
 
Table C-2 Oracle Character Sets for Operating System Locales   


Operating System Locale
Character Set















Arabic



AR8ASMO8X





Catalan



WE8PC850





Chinese (PRC)



ZHS16GBK





Chinese (Taiwan)



ZHT16MSWIN950





Czech 



EE8PC852 





Danish 



WE8PC850 





Dutch 



WE8PC850 





English (United Kingdom)



WE8PC850





English (United States)



US8PC437





Finnish 



WE8PC850 





French



WE8PC850 





German



WE8PC850 





Greek



EL8PC737 





Hungarian



EE8PC852 





Italian



WE8PC850





Japanese



JA16SJIS





Korean



KO16MSWIN949





Norwegian



WE8PC850 





Polish



EE8PC852 





Portuguese 



WE8PC850





Romanian



EE8PC852 





Russian



RU8PC866 





Slovak 



EE8PC852 





Slovenian 



EE8PC852





Spanish



WE8PC850 





Swedish



WE8PC850 





Turkish



TR8PC857




 
 

 
Storage Character Sets
The following storage character sets are supported in Oracle Server release 7.3: 







Name 

Description 


US7ASCII 
ASCII 7-bit American 


WE8DEC 
DEC 8-bit West European 


WE8HP 
HP LaserJet 8-bit West European 


US8PC437 
IBM-PC Code Page 437 8-bit American 


WE8EBCDIC37 
EBCDIC Code Page 37 8-bit West European 


WE8EBCDIC500 
EBCDIC Code Page 500 8-bit West European 


WE8PC850 
IBM-PC Code Page 850 8-bit West European 


D7DEC 
DEC VT 100 7-bit German 


F7DEC 
DEC VT 100 7-bit French 


S7DEC 
DEC VT100 7-bit Swedish 


E7DEC 
DEC VT100 7-bit Spanish 


SF7ASCII 
ASCII 7-bit Finnish 


NDK7DEC 
DEC VT100 7-bit Norwegian/Danish 


I7DEC 
DEC VT100 7-bit Italian 


NL7DEC 
DEC VT100 7-bit Dutch 


CH7DEC 
DEC VT100 7-bit Swiss (German/French) 


YUG7ASCII 
ASCII 7-bit Yugoslavian 


SF7DEC 
DEC VT 100 7-bit Finnish 


TR7DEC 
DEC VT100 7-bit Turkish 


WE8ISO8859P1 
ISO 8859-1 West European 


EE8ISO8859P2 
ISO 8859-2 East European 


SE8ISO8859P3 
ISO 8859-3 South European 


NEE8ISO8859P4 
ISO 8859-4 North and North-East European 


CL8ISO8859P5 
ISO 8859-5 Latin/Cyrillic 


AR8ISO8859P6 
ISO 8859-6 Latin/Arabic 


EL8ISO8859P7 
ISO 8859-7 Latin/Greek 


IW8ISO8859P8 
ISO 8859-8 Latin/Hebrew 


WE8ISO8859P9 
ISO 8859-9 West European & Turkish 


NE8ISO8859P10 
ISO 8859-10 North European 


TH8TISASCII 
Thai Industrial Standard 620-2533 - ASCII 8-bit 


TH8TISEBCDIC 
Thai Industrial Standard 620-2533 - EBCDIC 8-bit 


AR8EBCDICX 
EBCDIC XBASIC 8-bit Latin/Arabic 


EL8DEC 
DEC 8-bit Latin/Greek 


TR8DEC 
DEC 8-bit Turkish 


WE8EBCDIC37C 
EBCDIC Code Page 37 8-bit Oracle/c 


RU8PC866 
IBM-PC Code Page 866 8-bit Latin/Cyrillic 


WE8EBCDIC500C 
EBCDIC Code Page 500 8-bit Oracle/c 


EEC8EUROPA3 
EEC EUROPA3 8-bit West European/Greek 


EE8PC852 
IBM-PC Code Page 852 8-bit East European 


RU8BESTA 
BESTA 8-bit Latin/Cyrillic 


RU8PC855 
IBM-PC Code Page 855 8-bit Latin/Cyrillic 


TR8PC857 
IBM-PC Code Page 857 8-bit Turkish 


CL8MACCYRILLIC 
Mac Client 8-bit Latin/Cyrillic 


CL8MACCYRILLICS 
Mac Server 8-bit Latin/Cyrillic 


WE8PC860 
IBM-PC Code Page 860 8-bit West European 


IS8PC861 
IBM-PC Code Page 861 8-bit Icelandic 


EE8MACCES 
Mac Server 8-bit Central European 


EE8MACCROATIANS 
Mac Server 8-bit Croatian 


TR8MACTURKISHS 
Mac Server 8-bit Turkish 


IS 8MACICELANDICS 
Mac Server 8-bit Icelandic 


EL8MACGREEKS 
Mac Server 8-bit Greek 


EE8MSWIN 1250 
MS Windows Code Page 1250 8-bit East European 


CL8MSWIN1251 
MS Windows Code Page 1251 8-bit Latin/Cyrillic 


F8EBCDIC297 
EBCDIC Code Page 297 8-bit French 


BG8MSWIN 
MS Windows 8-bit Bulgarian Cyrillic 


EL8MSWIN1253 
MS Windows Code Page 1253 8-bit Latin/Greek 


D8EBCDIC273 
EBCDIC Code Page 273/18-bit Austrian German 


I8EBCDIC280 
EBCDIC Code Page 280/18-bit Italian 


DK8EBCDIC277 
EBCDIC Code Page 277/18-bit Danish 


S8EBCDIC278 
EBCDIC Code Page 278/18-bit Swedish 


EE8EBCDIC870 
EBCDIC Code Page 870 8-bit East European 


CL8EBCDIC1025 
EBCDIC Code Page 1025 8-bit Cyrillic 


N8PC865 
IBM-PC Code Page 865 8-bit Norwegian 


F7SIEMENS9780X 
Siemens 97801/97808 7-bit French 


E7SIEMENS9780X 
Siemens 97801/97808 7-bit Spanish 


S7SIEMENS9780X 
Siemens 97801/97808 7-bit Swedish 


DK7SIEMENS9780X 
Siemens 97801/97808 7-bit Danish 


N7SIEMENS9780X 
Siemens 97801/97808 7-bit Norwegian 


I7SIEMENS9780X 
Siemens 97801/97808 7-bit Italian 


D7SIEMENS9780X 
Siemens 97801/97808 7-bit German 


WE8GCOS7 
Bull EBCDIC GCOS7 8-bit West European 


US8BS2000 
Siemens 9750-62 EBCDIC 8-bit American 


D8BS2000 
Siemens 9750-62 EBCDIC 8-bit German 


F8BS2000 
Siemens 9750-62 EBCDIC 8-bit French 


E8BS2000 
Siemens 9750-62 EBCDIC 8-bit Spanish 


DK8BS2000 S 
Siemens 9750-62 EBCDIC 8-bit Danish 


WE8BS2000 
Siemens EBCDIC.DF.04 8-bit West European 


CL8BS2000 
Siemens EBCDIC.EHC.LC 8-bit Cyrillic 


WE8BS2000L5 
Siemens EBCDIC.DF.O4.L5 8-bit West European/Turkish 


WE8DG 
DG 8-bit West European 


WE8NCR4970 
NCR 4970 8-bit West European 


WE8ROMAN8 
HP Roman8 8-bit West European 


EE8MACCE 
Mac Client 8-bit Central European 


EE8MACCROATIAN 
Mac Client 8-bit Croatian 


TR8MACTURKISH 
Mac Client 8-bit Turkish 


IS8MACICELANDIC 
Mac Client 8-bit Icelandic 


EL8MACGREEK 
Mac Client 8-bit Greek 


US8ICL 
ICL EBCDIC 8-bit American 


WE8ICL 
ICL EBCDIC 8-bit West European 


WE8MACROMAN8 
Mac Client 8-bit Extended Roman8 West European 


WE8MACROMAN8S 
Mac Server 8-bit Extended Roman8 West European 


TH8MACTHAI 
Mac Client 8-bit Latin/Thai 


TH8MACTHAIS 
Mac Server 8-bit Latin/Thai 


HU8CWI2 
Hungarian 8-bit CWI-2 


TR8ISO8859P9 
Turkish version ISO 8859-9 West European & Turkish 


EL8PC437S 
IBM-PC Code Page 437 8-bit (Greek modification) 


EL8EBCDIC875 
EBCDIC Code Page 875 8-bit Greek 


EL8PC737 
IBM-PC Code Page 737 8-bit Greek/Latin 


LT8PC772 
IBM-PC Code Page 772 8-bit Lithuanian (Latin/Cyrillic) 


LT8PC774 
IBM-PCCode Page 774 8-bit Lithuanian (Latin) 


CDN8PC863 
IBM-PC Code Page 863 8-bit Canadian French 


AR8ASMO8X 
ASMO Extended 708 8-bit Latin/Arabic 


AR8NAFITHA711 
Nafitha Enhanced 711 Server 8-bit Latin/Arabic 


AR8SAKHR707 
SAKHR 707 Server 8-bit Latin/Arabic 


AR8MUSSAD768 
Mussa'd Alarabi/2 768 Server 8-bit Latin/Arabic 


AR8ADOS710 
Arabic MS-DOS 710 Server 8-bit Latin/Arabic 


AR8ADOS720 
Arabic MS-DOS 720 Server 8-bit Latin/Arabic 


AR8APTEC715 
APTEC 715 Server 8-bit Latin/Arabic 


AR8MSWIN1256 
MS Windows Code Page 1256 8-Bit Latin/Arabic 


AR8NAFITHA721 
Nafitha International 721 Server 8-bit Latin/Arabic 


AR8SAKHR706 
SAKHR 706 Server 8-bit Latin/Arabic 


AR8ARABICMAC 
Mac Client 8-bit Latin/Arabic 


AR8ARABICMACS 
Mac Server 8-bit Latin/Arabic 


JA16VMS 
JVMS 16-bit Japanese 


JA16EUC 
EUC 16-bit Japanese 


JA16SJIS 
Shift-JIS 16-bit Japanese 


JA16DBCS 
IBM DBCS 16-bit Japanese 


JA16HP 
HP 16-bit Japanese 


JA16EBCDIC930 
IBM DBCS Code Page 290 16-bit Japanese 


JA16TOSHIBAEUC 
Toshiba EUC 16-bit Japanese 


KO16KSC5601 
KSC5601 16-bit Korean 


KO16DBCS 
IBM DBCS 16-bit Korean 


ZHS16CGB231280 
CGB2312-80 16-bit Simplified Chinese 


ZHT32EUC 
EUC 32-bit Traditional Chinese 


ZHT32SOPS 
SOPS 32-bit Traditional Chinese 


ZHT16DBT 
Taiwan Taxation 16-bit Traditional Chinese 


ZHT32TRIS 
TRIS 32-bit Traditional Chinese 


ZHT16BIG5 
BIG5 16-bit Traditional Chinese 


AL24UTFFSS 
Unicode UTF-FSS 



JA16TSTSET2 
ASCII-based 16-bit Test Character Set 


JA16TSTSET 
Shift-sensitive ASCII-based Test Character Set 



Table 4 - 2. (continued) Storage Character Sets
Arabic/Hebrew Display Character Sets
The following Arabic/Hebrew display character sets are supported in Oracle Server release 7.3: 







Name 

Description 


AR8ASMO708PLUS 
ASMO 708 Plus 8-bit Latin/Arabic 


AR7ASMO449PLUS 
ASMO 449 Plus 7-bit Latin/Arabic 


AR7AMEER 
Ameer 7-bit Latin/Arabic 


AR8XBASIC 
XBASIC Right-to-Left Arabic Character Set 


AR8NAFITHA711T 
Nafitha Enhanced 711 Client 8-bit Latin/Arabic 


AR8SAKHR707T 
SAKHR 707 Client 8-bit Latin/Arabic 


AR8MUSSAD768T 
Mussa'd Alarabi/2 768 Client 8-bit Latin/Arabic 


AR8ADOS710T 
Arabic MS-DOS 710 Client 8-bit Latin/Arabic 


AR8ADOS720T 
Arabic MS-DOS 720 Client 8-bit Latin/Arabic 


AR8APTEC715T 
APTEC 7 15 Client 8-bit Latin/Arabic 


AR8NAFITHA721T 
Nafitha International 721 Client 8-bit Latin/Arabic 


AR7SEDCOT 
SEDCO/ESPRIT/DATA GENERAL 7-bit Latin/Arabic 


AR8HPARABIC8T 
HP ARABIC8 8-bit Latin/Arabic 



_____________________________________________________________________
 
摘要至itpub
AL16UTF16 和 UTF8 这两种选择都适用于国家字符集
AL16UFT16 是宽度固定的双字节 Unicode 字符集

UTF8 是宽度可变的、一至三个字节的 Unicode 字符集
欧洲字符在 UTF8 中按一至两个字节存储，而在 AL16UTF16 中按两个字节存储，相比之下，UTF8可以节省空间
亚洲字符在 UTF8 中按三个字节存储，这样，所需的空间比在 AL16UTF16 中要多

AL16UTF16 是宽度固定的编码，因此在执行速度上要比宽度可变的 UTF8 快
 
翻译的一段：   
    
  字符集类型   
    
          CREATE   DATABASE语句中有CHARACTER   SET从句和附加的NATIONAL   CHARACTER   SET从句用来定义   
  数据库的字符集和国家字符集。这两个字符集在数据库创建之后都无法修改。如果不指明NATIONAL   
  CHARACTER   SET从句，则国家字符集缺省取数据库字符集。   
          因为数据库字符集用于标识并装载SQL和PL/SQL源代码，所以数据库字符集必须将EBCDIC或7位ASCII   
  作为子集。因此，固定宽度，多字节字符集不可能作为数据库字符集，而只能作为国家字符集。数据类型   
  NCHAR,NVARCHAR2和NCLOB是基本数据类型CHAR,VARCHAR2和BLOB的变体，来指明它们用国家字符集而   
  不是数据库字符集存储数据。   
    
        NCHAR用于使用国家字符集定义固定长度的字符项。   
        NVARCHAR2用于使用国家字符集定义变长度的字符项。   
        NCLOB用于使用国家字符集定义字符大对象，来保存固定宽度，多字节字符。   
    
        数据库字符集存储变宽度字符，国家字符集存储固定宽度和变宽度多字节字符。
 
 
原文   
    
  Character   Set   Types   
  The   CREATE   DATABASE   statement   has   the   CHARACTER   SET   clause   and   the   
  additional   optional   clause   NATIONAL   CHARACTER   SET   to   declare   the   character   set   
  to   be   used   as   the   database   character   set   and   the   national   character   set.   Neither   
  character   set   can   be   changed   after   creating   the   database.   If   no   NATIONAL   
  CHARACTER   SET   clause   is   present,   the   national   character   set   defaults   to   the   
  database   character   set.   
  Because   the   database   character   set   is   used   to   identify   and   to   hold   SQL   and   PL/SQL   
  source   code,   it   must   have   either   EBCDIC   or   7-bit   ASCII   as   a   subset,   whichever   is   
  native   to   the   platform.   Therefore,   it   is   not   possible   to   use   a   fixed-width,   multibyte   
  character   set   as   the   database   character   set,   only   as   the   national   character   set.   
  The   data   types   NCHAR,   NVARCHAR2,   and   NCLOB   are   provided   to   declare   columns   
  as   variants   of   the   basic   types   CHAR,   VARCHAR2,   and   CLOB,   to   note   that   they   are   
  stored   using   the   national   character   set   and   not   the   database   character   set.   
  &#8226;   To   declare   a   fixed-length   character   item   that   uses   the   national   character   set,   use   the   
  data   type   specification   NCHAR   [(size)].   
  &#8226;   To   declare   a   variable-length   character   item   that   uses   the   national   character   set,   use   
  the   data   type   specification   NVARCHAR2   (size).   
  &#8226;   To   declare   a   character   large   object   (CLOB)   item   containing   fixed-width,   multibyte   
  characters   that   uses   the   national   character   set,   use   the   data   type   specification   
  NCLOB   (size).
 
效率
　　从上述编码原理中得出的结论是：
　　1.每个英文字母、数字所占的空间为1 Byte；
　　2.泛欧语系、斯拉夫语字母占2 Bytes；
　　3.汉字占3 Bytes。
　　由此可见UTF8对英文来说是个非常诱人的方案，但对中文来说则不太合算，无论用ANSI还是 Unicode/UCS2来编码都只用2 Bytes，但用UTF8则需要3 Bytes。
　　以下是一些统计资料，显示用UTF8来储存文件每个字符所需的平均字节：
　　1.拉丁语系平均用1.1 Bytes；
　　2.希腊文、俄文、阿拉伯文和希伯莱文平均用1.7 Bytes；
　　3.其他大部份文字如中文、日文、韩文、Hindi(北印度语)用约3 Bytes；
　　4.用超过4 Bytes的都是些非常少用的文字符号。

分享到：

索引表 | oracle 字符集/转换函数

2010-09-02 17:52
浏览 2935
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论