- 浏览: 222687 次
- 性别:
- 来自: 北京
文章分类
最新评论
-
dysking:
SWT 和 JFace -
wangyuhfut:
东西不少啊。学习中。。。
一个比较好、中文说明的emacs配置文件 1 -
pacocai:
呵呵!学习,学习~~~不过要说编辑器的话个人更喜欢用VIM,比 ...
一个比较好、中文说明的emacs配置文件 1 -
zhf1zhf2:
这两百多个记起来也不容易啊
英国人是这样背的! -
regex:
试了两次,都是乱码,版本是23.1.1,看来不适合
汉化 Emacs 菜单
Jump to: INVALID_SEQUENCE AsciiChar AsciiString Latin1Char Latin1String Windows1252Char Windows1252String isValidCodePoint encodingName canEncode isValidCodeUnit isValid validLength sanitize firstSequence lastSequence count index decode decodeReverse safeDecode encodedLength encode codePoints codeUnits transcode EncodingException EncodingScheme register create toString names replacementSequence EncodingSchemeASCII EncodingSchemeLatin1 EncodingSchemeWindows1252 EncodingSchemeUtf8 EncodingSchemeUtf16Native EncodingSchemeUtf32Native
Classes and functions for handling and transcoding between various encodings.
For cases where the encoding is known at compile-time, functions are provided for arbitrary encoding and decoding of characters, arbitrary transcoding between strings of different type, as well as validation and sanitization.
Encodings currently supported are UTF-8, UTF-16, UTF-32, ASCII, ISO-8859-1 (also known as LATIN-1), and WINDOWS-1252.
* The type AsciiChar represents an ASCII character.
* The type AsciiString represents an ASCII string.
* The type Latin1Char represents an ISO-8859-1 character.
* The type Latin1String represents an ISO-8859-1 string.
* The type Windows1252Char represents a Windows-1252 character.
* The type Windows1252String represents a Windows-1252 string.
For cases where the encoding is not known at compile-time, but is known at run-time, we provide the abstract class EncodingScheme and its subclasses. To construct a run-time encoder/decoder, one does e.g.
auto e = EncodingScheme.create("utf-8");
This library supplies EncodingScheme subclasses for ASCII, ISO-8859-1 (also known as LATIN-1), WINDOWS-1252, UTF-8, and (on little-endian architectures) UTF-16LE and UTF-32LE; or (on big-endian architectures) UTF-16BE and UTF-32BE.
This library provides a mechanism whereby other modules may add EncodingScheme subclasses for any other encoding.
Authors:
Janice Caron
Date:
2008.02.27 - 2008.05.07
License:
Public Domain
dchar INVALID_SEQUENCE;
Special value returned by safeDecode
typedef AsciiChar;
alias AsciiString;
Defines various character sets.
typedef Latin1Char;
Defines an Latin1-encoded character.
alias Latin1String;
Defines an Latin1-encoded string (as an array of invariant(Latin1Char)).
typedef Windows1252Char;
Defines a Windows1252-encoded character.
alias Windows1252String;
Defines an Windows1252-encoded string (as an array of invariant(Windows1252Char)).
bool isValidCodePoint(dchar c);
Returns true if c is a valid code point
Note that this includes the non-character code points U+FFFE and U+FFFF, since these are valid code points (even though they are not valid characters).
Supercedes:
This function supercedes std.utf.startsValidDchar().
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
dchar c the code point to be tested
string encodingName(T)();
Returns the name of an encoding.
The type of encoding cannot be deduced. Therefore, it is necessary to explicitly specify the encoding type.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Examples:
assert(encodingName!(Latin1Char) == "ISO-8859-1");
bool canEncode(E)(dchar c);
Returns true iff it is possible to represent the specifed codepoint in the encoding.
The type of encoding cannot be deduced. Therefore, it is necessary to explicitly specify the encoding type.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Examples:
assert(canEncode!(Latin1Char)('A'));
bool isValidCodeUnit(E)(E c);
Returns true if the code unit is legal. For example, the byte 0x80 would not be legal in ASCII, because ASCII code units must always be in the range 0x00 to 0x7F.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
c the code unit to be tested
bool isValid(E)(const(E)[] s);
Returns true if the string is encoded correctly
Supercedes:
This function supercedes std.utf.validate(), however note that this function returns a bool indicating whether the input was valid or not, wheras the older funtion would throw an exception.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string to be tested
uint validLength(E)(const(E)[] s);
Returns the length of the longest possible substring, starting from the first code unit, which is validly encoded.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string to be tested
immutable(E)[] sanitize(E)(immutable(E)[] s);
Sanitizes a string by replacing malformed code unit sequences with valid code unit sequences. The result is guaranteed to be valid for this encoding.
If the input string is already valid, this function returns the original, otherwise it constructs a new string by replacing all illegal code unit sequences with the encoding's replacement character, Invalid sequences will be replaced with the Unicode replacement character (U+FFFD) if the character repertoire contains it, otherwise invalid sequences will be replaced with '?'.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string to be sanitized
uint firstSequence(E)(const(E)[] s);
Returns the length of the first encoded sequence.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string to be sliced
uint lastSequence(E)(const(E)[] s);
Returns the length the last encoded sequence.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string to be sliced
uint count(E)(const(E)[] s);
Returns the total number of code points encoded in a string.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Supercedes:
This function supercedes std.utf.toUCSindex().
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string to be counted
int index(E)(const(E)[] s, int n);
Returns the array index at which the (n+1)th code point begins.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Supercedes:
This function supercedes std.utf.toUTFindex().
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string to be counted
dchar decode(S)(ref S s);
Decodes a single code point.
This function removes one or more code units from the start of a string, and returns the decoded code point which those code units represent.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Supercedes:
This function supercedes std.utf.decode(), however, note that the function codePoints() supercedes it more conveniently.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string whose first code point is to be decoded
dchar decodeReverse(E)(ref const(E)[] s);
Decodes a single code point from the end of a string.
This function removes one or more code units from the end of a string, and returns the decoded code point which those code units represent.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string whose first code point is to be decoded
dchar safeDecode(S)(ref S s);
Decodes a single code point. The input does not have to be valid.
This function removes one or more code units from the start of a string, and returns the decoded code point which those code units represent.
This function will accept an invalidly encoded string as input. If an invalid sequence is found at the start of the string, this function will remove it, and return the value INVALID_SEQUENCE.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string whose first code point is to be decoded
uint encodedLength(E)(dchar c);
Returns the number of code units required to encode a single code point.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding as a template parameter.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
c the code point to be encoded
E[] encode(E)(dchar c);
Encodes a single code point.
This function encodes a single code point into one or more code units. It returns a string containing those code units.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding as a template parameter.
Supercedes:
This function supercedes std.utf.encode(), however, note that the function codeUnits() supercedes it more conveniently.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
c the code point to be encoded
uint encode(E)(dchar c, E[] array);
Encodes a single code point into an array.
This function encodes a single code point into one or more code units The code units are stored in a user-supplied fixed-size array, which must be passed by reference.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding as a template parameter.
Supercedes:
This function supercedes std.utf.encode(), however, note that the function codeUnits() supercedes it more conveniently.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
c the code point to be encoded
Returns:
the number of code units written to the array
uint encode(E, R)(dchar c, R range);
Encodes c in units of type E and writes the result to the output range R. Returns the number of Es written.
void encode(E)(dchar c, void delegate(E) dg);
Encodes a single code point to a delegate.
This function encodes a single code point into one or more code units. The code units are passed one at a time to the supplied delegate.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding as a template parameter.
Supercedes:
This function supercedes std.utf.encode(), however, note that the function codeUnits() supercedes it more conveniently.
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
c the code point to be encoded
CodePoints!(E) codePoints(E)(immutable(E)[] s);
Returns a foreachable struct which can bidirectionally iterate over all code points in a string.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
You can foreach either with or without an index. If an index is specified, it will be initialized at each iteration with the offset into the string at which the code point begins.
Supercedes:
This function supercedes std.utf.decode().
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the string to be decoded
Examples:
string s = "hello world";
foreach(c;codePoints(s))
{
// do something with c (which will always be a dchar)
}
Note that, currently, foreach(c:codePoints(s)) is superior to foreach(c;s) in that the latter will fall over on encountering U+FFFF.
CodeUnits!(E) codeUnits(E)(dchar c);
Returns a foreachable struct which can bidirectionally iterate over all code units in a code point.
The input to this function MUST be a valid code point. This is enforced by the function's in-contract.
The type of the output cannot be deduced. Therefore, it is necessary to explicitly specify the encoding type in the template parameter.
Supercedes:
This function supercedes std.utf.encode().
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
d the code point to be encoded
Examples:
dchar d = '\u20AC';
foreach(c;codeUnits!(char)(d))
{
writefln("%X",c)
}
// will print
// E2
// 82
// AC
uint encode(Tgt, Src, R)(in Src[] s, R range);
Encodes c in units of type E and writes the result to the output range R. Returns the number of Es written.
void transcode(Src, Dst)(immutable(Src)[] s, out immutable(Dst)[] r);
Convert a string from one encoding to another. (See also to!() below).
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Supercedes:
This function supercedes std.utf.toUTF8(), std.utf.toUTF16() and std.utf.toUTF32() (but note that to!() supercedes it more conveniently).
Standards:
Unicode 5.0, ASCII, ISO-8859-1, WINDOWS-1252
Parameters:
s the source string
r the destination string
Examples:
wstring ws;
transcode("hello world",ws);
// transcode from UTF-8 to UTF-16
Latin1String ls;
transcode(ws, ls);
// transcode from UTF-16 to ISO-8859-1
class EncodingException: object.Exception;
The base class for exceptions thrown by this module
abstract class EncodingScheme;
Abstract base class of all encoding schemes
static void register(string className);
Registers a subclass of EncodingScheme.
This function allows user-defined subclasses of EncodingScheme to be declared in other modules.
Examples:
class Amiga1251 : EncodingScheme
{
static this()
{
EncodingScheme.register("path.to.Amiga1251");
}
}
static EncodingScheme create(string encodingName);
Obtains a subclass of EncodingScheme which is capable of encoding and decoding the named encoding scheme.
This function is only aware of EncodingSchemes which have been registered with the register() function.
Examples:
auto scheme = EncodingScheme.create("Amiga-1251");
abstract const string toString();
Returns the standard name of the encoding scheme
abstract const immutable(char)[][] names();
Returns an array of all known names for this encoding scheme
abstract const bool canEncode(dchar c);
Returns true if the character c can be represented in this encoding scheme.
abstract const uint encodedLength(dchar c);
Returns the number of ubytes required to encode this code point.
The input to this function MUST be a valid code point.
Parameters:
dchar c the code point to be encoded
Returns:
the number of ubytes required.
abstract const uint encode(dchar c, ubyte[] buffer);
Encodes a single code point into a user-supplied, fixed-size buffer.
This function encodes a single code point into one or more ubytes. The supplied buffer must be code unit aligned. (For example, UTF-16LE or UTF-16BE must be wchar-aligned, UTF-32LE or UTF-32BE must be dchar-aligned, etc.)
The input to this function MUST be a valid code point.
Parameters:
dchar c the code point to be encoded
Returns:
the number of ubytes written.
abstract const dchar decode(ref const(ubyte)[] s);
Decodes a single code point.
This function removes one or more ubytes from the start of an array, and returns the decoded code point which those ubytes represent.
The input to this function MUST be validly encoded.
Parameters:
const(ubyte)[] s the array whose first code point is to be decoded
abstract const dchar safeDecode(ref const(ubyte)[] s);
Decodes a single code point. The input does not have to be valid.
This function removes one or more ubytes from the start of an array, and returns the decoded code point which those ubytes represent.
This function will accept an invalidly encoded array as input. If an invalid sequence is found at the start of the string, this function will remove it, and return the value INVALID_SEQUENCE.
Parameters:
const(ubyte)[] s the array whose first code point is to be decoded
abstract const immutable(ubyte)[] replacementSequence();
Returns the sequence of ubytes to be used to represent any character which cannot be represented in the encoding scheme.
Normally this will be a representation of some substitution character, such as U+FFFD or '?'.
bool isValid(const(ubyte)[] s);
Returns true if the array is encoded correctly
Parameters:
const(ubyte)[] s the array to be tested
uint validLength(const(ubyte)[] s);
Returns the length of the longest possible substring, starting from the first element, which is validly encoded.
Parameters:
const(ubyte)[] s the array to be tested
immutable(ubyte)[] sanitize(immutable(ubyte)[] s);
Sanitizes an array by replacing malformed ubyte sequences with valid ubyte sequences. The result is guaranteed to be valid for this encoding scheme.
If the input array is already valid, this function returns the original, otherwise it constructs a new array by replacing all illegal sequences with the encoding scheme's replacement sequence.
Parameters:
immutable(ubyte)[] s the string to be sanitized
uint firstSequence(const(ubyte)[] s);
Returns the length of the first encoded sequence.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Parameters:
const(ubyte)[] s the array to be sliced
uint count(const(ubyte)[] s);
Returns the total number of code points encoded in a ubyte array.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Parameters:
const(ubyte)[] s the string to be counted
int index(const(ubyte)[] s, int n);
Returns the array index at which the (n+1)th code point begins.
The input to this function MUST be validly encoded. This is enforced by the function's in-contract.
Parameters:
const(ubyte)[] s the string to be counted
class EncodingSchemeASCII: std.encoding.EncodingScheme;
EncodingScheme to handle ASCII
This scheme recognises the following names: "ANSI_X3.4-1968", "ANSI_X3.4-1986", "ASCII", "IBM367", "ISO646-US", "ISO_646.irv:1991", "US-ASCII", "cp367", "csASCII" "iso-ir-6", "us"
class EncodingSchemeLatin1: std.encoding.EncodingScheme;
EncodingScheme to handle Latin-1
This scheme recognises the following names: "CP819", "IBM819", "ISO-8859-1", "ISO_8859-1", "ISO_8859-1:1987", "csISOLatin1", "iso-ir-100", "l1", "latin1"
class EncodingSchemeWindows1252: std.encoding.EncodingScheme;
EncodingScheme to handle Windows-1252
This scheme recognises the following names: "windows-1252"
class EncodingSchemeUtf8: std.encoding.EncodingScheme;
EncodingScheme to handle UTF-8
This scheme recognises the following names: "UTF-8"
class EncodingSchemeUtf16Native: std.encoding.EncodingScheme;
EncodingScheme to handle UTF-16 in native byte order
This scheme recognises the following names: "UTF-16LE" (little-endian architecture only) "UTF-16BE" (big-endian architecture only)
class EncodingSchemeUtf32Native: std.encoding.EncodingScheme;
EncodingScheme to handle UTF-32 in native byte order
This scheme recognises the following names: "UTF-32LE" (little-endian architecture only) "UTF-32BE" (big-endian architecture only)
发表评论
-
core.memory(std.gc)
2009-05-05 13:03 863Jump to: memory GC enable dis ... -
D2 的 range设计
2009-04-23 09:03 1383betty_betty2008 2009-04-08 ... -
std.range (2.030)
2009-04-22 12:03 15495.19 15点 更新 (2.030 翻译完成,格式已调整) ... -
std.array
2009-04-21 18:07 1058机器翻译,还未校对,仅供参考 Jump to: empty ... -
phobos 2.015
2008-06-20 14:03 1113这几天摸到鼠标就有点恶心了,听说是患了鼠标手了,一查还真是, ... -
std.boxer
2008-06-16 22:26 1007Jump to: TypeClass Bool Integer ... -
std.bitmanip
2008-06-16 22:24 958Jump to: bitfields FloatRep Dou ... -
std.bind
2008-06-16 22:22 1315Jump to: bind _0 _1 _2 _3 _4 _5 ... -
std.base64
2008-06-16 22:21 1196Jump to: base64 Base64Exception ... -
std.algorithm 算法--1(2.030)
2008-06-16 22:20 15546.1 20 点 更新 5.21 19点 更新 ... -
std.cstream
2008-05-04 23:59 1320链接没有维护 The std.cstream module ... -
Phobos Runtime Library
2008-05-04 00:43 1223(5.7更新) 注:看过 D ...
相关推荐
std::string、char*、const char*...std::string、char*、const char*可以使用不同的方法转换为托管的byte数组或字符串,但它们的转换方法都需要使用Marshal::Copy函数和System::Text::Encoding::UTF8->GetString函数。
在众多的压缩算法中,Run Length Encoding(RLE)是一种简单且高效的无损压缩方法。本文将深入探讨RLE编码的概念、原理、C++实现以及其在实际应用中的价值。 **一、RLE编码概述** Run Length Encoding(RLE)是...
Contents 1. Designing a Microprocessor.................................................................................................................................2 1.1 Overview of a ...
string received = System.Text.Encoding.ASCII.GetString(data, 0, bytesRead); Console.WriteLine("Received: {0}", received); // 关闭连接 stream.Close(); client.Close(); } catch (Exception e) { ...
3. **处理字符串参数**:在传递和接收包含中文的字符串时,使用宽字符或`std::wstring`,并进行适当的编码转换。 4. **Python脚本处理**:在Python脚本中,可以使用`sys.stdin.encoding`和`sys.stdout.encoding`...
std::vector<unsigned char> base64_decode(const std::string& encoded) { // 实现细节略... } ``` 在这个例子中,`base64_encode`函数接收一个二进制数据指针和长度,返回一个Base64编码的字符串。`base64_...
# encoding: UTF-8 用的字体: 英文: serif: Times New Roman PS Std sans-serif: Mosquito Formal Std monospace: Lucida Sans Typewriter Std 中文: 宋体:Adobe Song Std 黑体:Adobe Heiti Std 楷体...
string data = System.Text.Encoding.ASCII.GetString(buffer, 0, bytesRead); Console.WriteLine("客户端发送: " + data); // 反馈给客户端 stream.Write(Encoding.ASCII.GetBytes("你好,客户端!"), 0, ...
.NET框架提供了丰富的字符串转换API,如`System.Text.Encoding`类,可以进行不同编码之间的转换,例如`Encoding::ASCII`、`Encoding::UTF8`等。 5. **C++/CLI桥接**: C++/CLI是微软提供的一个混合编译器,可以...
本篇文章将深入探讨“std:Golang标准库”的相关知识点。 1. **标准库概述** Go的标准库非常丰富,包括了网络、文件系统、加密/解密、压缩、数学计算、文本处理、并发控制等多个方面的工具和接口。这些包被组织成...
`FileUtil`类可能提供类似`std::string convertEncoding(const std::string& src, Encoding from, Encoding to)`的方法,用于在不同编码间进行转换。 字符串处理是C++开发中的常见任务,`FileUtil`类可能包含以下...
- **C++/C#**:依赖于库函数,如C++的`std::wstring_convert`或C#的`System.Text.Encoding`类。 4. **编码问题与解决**: - **乱码**:当使用错误的编码读取文件时,会出现乱码现象。 - **BOM(Byte Order Mark...
虽然ASCII是URL编码的基础,但在国际化的背景下,URL可能包含非ASCII字符,这时需要使用更复杂的编码方式,如`percent-encoding`的变体`UTF-8 percent-encoding`。在C++中,正确处理这些字符可能需要使用`std::...
Run Length Encoding(RLE)是一种简单的无损数据压缩算法,常用于处理连续重复的数据,比如图像数据中的背景颜色。在图像处理中,如果一个颜色连续出现多次,通过记录该颜色出现的次数,而不是连续写入相同的像素值...
attribute Encoding of states : type is "0001 0010 0100 1000"; -- 定义状态编码为 one-hot 编码 signal Current_state : states; -- 当前状态 signal Next_state : states; -- 下一个状态 begin -- 时序状态...
std::string base64_encode(const std::vector<uint8_t>& input) { static const char encoding[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; std::string output; output.reserve...
std::string convertEncoding(const std::string& src, const char* from_encoding, const char* to_encoding) { iconv_t cd = iconv_open(to_encoding, from_encoding); if (cd == reinterpret_cast(-1)) { // ...
std::cerr << "Error encoding QR code" << std::endl; return 1; } // 输出二维码的像素数据,通常会将其保存为图像文件 for (int y = 0; y < code->width; y++) { for (int x = 0; x < code->width; x++) {...
K1, K2, K3, K4, K5, K6, K7, K8, S1 : in std_logic; D1, D2, D3, D4, D5, D6, D7, D8 : out std_logic ); end entity exp1; ``` 定义了一个名为`exp1`的实体,该实体有8个输入端口`K1`至`K8`用于接收二进制...
- **编码方式**:详细了解如何编码和解码ProtoBuf数据,可以参考[Protocol Buffer编码方式](http://code.google.com/apis/protocolbuffers/docs/encoding.html)。 - **API参考**:针对Java、C++和Python的API参考...