论坛首页 综合技术论坛

R13开始支持binary unicode

浏览 2094 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者 正文
   发表时间:2009-02-27   最后修改:2009-02-27
1 。bitstring语法改动 添加了unicode数据类型

6.16 Bit Syntax Expressions
...
The types utf8, utf16, and utf32 specifies encoding/decoding of the Unicode Transformation Formats UTF-8, UTF-16, and UTF-32, respectively.

When constructing a segment of a utf type, Value must be an integer in one of the ranges 0..16#D7FF, 16#E000..16#FFFD, or 16#10000..16#10FFFF (i.e. a valid Unicode code point). Construction will fail with a badarg exception if Value is outside the allowed ranges. The size of the resulting binary segment depends on the type and/or Value. For utf8, Value will be encoded in 1 through 4 bytes. For utf16, Value will be encoded in 2 or 4 bytes. Finally, for utf32, Value will always be encoded in 4 bytes.

When constructing, a literal string may be given followed by one of the UTF types, for example: <<"abc"/utf8>> which is syntatic sugar for <<$a/utf8,$b/utf8,$c/utf8>>.

A successful match of a segment of a utf type results in an integer in one of the ranges 0..16#D7FF, 16#E000..16#FFFD, or 16#10000..16#10FFFF (i.e. a valid Unicode code point). The match will fail if returned value would fall outside those ranges.

A segment of type utf8 will match 1 to 4 bytes in the binary, if the binary at the match position contains a valid UTF-8 sequence. (See RFC-2279 or the Unicode standard.)

A segment of type utf16 may match 2 or 4 bytes in the binary. The match will fail if the binary at the match position does not contain a legal UTF-16 encoding of a Unicode code point. (See RFC-2781 or the Unicode standard.)

A segment of type utf32 may match 4 bytes in the binary in the same way as an integer segment matching 32 bits. The match will fail if the resulting integer is outside the legal ranges mentioned above.
....

2. 新增加了 binary_to_atom  atom_to_binary等bif.
3. re模块也支持unicode匹配。


具体的请参看EEP10.


   发表时间:2009-02-27  
传说中的FFI还没有添加进去!
0 请登录后投票
   发表时间:2009-02-27  
哈哈.方便多了.期待哟.
0 请登录后投票
   发表时间:2009-02-27  
控制台还是一贯作风 不显示汉字
0 请登录后投票
   发表时间:2009-04-15  
mryufeng 写道
控制台还是一贯作风 不显示汉字

可惜可惜。
0 请登录后投票
论坛首页 综合技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics