`
leonzhx
  • 浏览: 793855 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

Encoding

 
阅读更多

1.   Varints are a method of serializing integers using one or more bytes. Smaller numbers take a smaller number of bytes. Each byte in a varint, except the last byte, has the most significant bit (msb) set – this indicates that there are further bytes to come. The lower 7 bits of each byte are used to store the two's complement representation of the number in groups of 7 bits, least significant group first. 1 --> 00000001 , 300 --> 10101100 00000010.

 

2.   The binary version of a message just uses the field's number as the key – the name and declared type for each field can only be determined on the decoding end by referencing the message type's definition. When a message is encoded, the keys and values are concatenated into a byte stream.

 

3.   When the message is being decoded, the parser needs to be able to skip fields that it doesn't recognize. The "key" for each pair in a wire-format message is actually two values – the field number from your .proto file, plus a wire type that provides just enough information to find the length of the following value:

Type

Meaning

Used For

0

Varint

int32, int64, uint32, uint64, sint32, sint64, bool, enum

1

64-bit

fixed64, sfixed64, double

2

Length-delimited

string, bytes, embedded messages, packed repeated fields

3

Start group

groups (deprecated)

4

End group

groups (deprecated)

5

32-bit

fixed32, sfixed32, float

Each key in the streamed message is a varint with the value (field_number << 3) | wire_type – in other words, the last three bits of the number store the wire type.

 

4.   There is an important difference between the signed int types (sint32 and sint64 ) and the "standard" int types (int32 and int64 ) when it comes to encoding negative numbers. If you use int32 or int64 as the type for a negative number, the resulting varint is always ten bytes long – it is, effectively, treated like a very large unsigned integer. If you use one of the signed types, the resulting varint uses ZigZag encoding, which is much more efficient.


5.   ZigZag encoding maps signed integers to unsigned integers so that numbers with a small absolute value (for instance, -1) have a small varint encoded value too. It does this in a way that "zig-zags" back and forth through the positive and negative integers, so that -1 is encoded as 1, 1 is encoded as 2, -2 is encoded as 3, and so on.

 

6.   Non-varint numeric types are stored in little-endian byte order.

 

7.   A wire type of 2 (length-delimited) means that the value is a varint encoded length.  The tag number and wire type are followed by the specified number of bytes of data.

 

8.   If your message definition has repeated elements (without the [packed=true] option), the encoded message has zero or more key-value pairs with the same tag number. These repeated values do not have to appear consecutively; they may be interleaved with other fields. The order of the elements with respect to each other is preserved when parsing, though the ordering with respect to other fields is lost.

 

9.   Normally, an encoded message would never have more than one instance of an optional or required field. However, parsers are expected to handle the case in which they do. For numeric types and strings, if the same value appears multiple times, the parser accepts the last value it sees. For embedded message fields, the parser merges multiple instances of the same field, as if with the Message.MergeFrom method – that is, all singular scalar fields in the latter instance replace those in the former, singular embedded messages are merged, and repeated fields are concatenated. The effect of these rules is that parsing the concatenation of two encoded messages produces exactly the same result as if you had parsed the two messages separately and merged the resulting objects:

MyMessage message;

message.ParseFromString(str1 + str2);
 

is equivalent to this:

MyMessage message, message2;

message.ParseFromString(str1);

message2.ParseFromString(str2);

message.MergeFrom(message2); 
 

 

 

10.   A packed repeated field containing zero elements does not appear in the encoded message. Otherwise, all of the elements of the field are packed into a single key-value pair with wire type 2 (length-delimited). Each element is encoded the same way it would be normally, except without a tag preceding it. Only repeated fields of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) can be declared "packed".

 

11.  W hile you can use field numbers in any order in a .proto , when a message is serialized its known fields should be written sequentially by field number. This allows parsing code to use optimizations that rely on field numbers being in sequence. However, protocol buffer parsers must be able to parse fields in any order, as not all messages are created by simply serializing an object – for instance, it's sometimes useful to merge two messages by simply concatenating them.

 

12.  If a message has unknown fields , the current Java implementations write them in arbitrary order after the sequentially-ordered known fields.

分享到:
评论

相关推荐

    vs FileEncoding插件 2019

    《Visual Studio FileEncoding插件:提升代码编辑体验的利器》 在软件开发过程中,文件编码格式的选择和管理是不可忽视的重要环节。尤其是在处理跨平台或多语言项目时,正确的编码格式能确保代码的可读性和兼容性。...

    谷歌设置编码插件SetCharacterEncoding

    **谷歌设置编码插件SetCharacterEncoding详解** 在日常的网页浏览和开发过程中,我们经常会遇到网页内容编码不正确的问题,导致乱码现象。为了解决这个问题,开发者们创建了一款名为"SetCharacterEncoding"的谷歌...

    Gma.QrCodeNet.Encoding.Net35.dll、Gma.QrCodeNet.Encoding.Net45.dll二维码工具类库

    《Gma.QrCodeNet.Encoding.Net35.dll与Gma.QrCodeNet.Encoding.Net45.dll:二维码编码库解析》 在信息化飞速发展的今天,二维码作为一种高效的信息载体,已经广泛应用在我们的生活中。无论是产品包装、广告宣传还是...

    com.lifesting.tool.encoding_1.0.0.jar

    《Eclipse文件转码插件:com.lifesting.tool.encoding_1.0.0.jar解析》 在IT行业中,开发工具的效率与便利性对于程序员来说至关重要。Eclipse作为一款广泛应用的Java集成开发环境(IDE),其丰富的插件库使得开发者...

    Auto-Encoding Variational Bayes.pdf

    ### Auto-Encoding Variational Bayes (AEVB) #### 概述 《Auto-Encoding Variational Bayes》是一篇关于高效地在存在连续隐变量的有向概率模型中进行推断和学习的研究论文。该文由Diederik P. Kingma和Max ...

    Gma.QrCodeNet.Encoding(2.0-4.5)版本合集下载

    《Gma.QrCodeNet.Encoding库的全面解析与应用指南》 Gma.QrCodeNet.Encoding是一个用于生成和解码二维码的.NET库,它支持多种.NET框架版本,包括从2.0到4.5,以及.NET Core 4.5。这个库以其高效、灵活和易于使用的...

    Set Character Encoding_0.51.zip

    标题中的“Set Character Encoding_0.51.zip”指的是一个版本为0.51的名为“Set Character Encoding”的软件插件的压缩包文件。这个插件是专为谷歌浏览器(Google Chrome)设计的,其主要功能是允许用户手动调整...

    mod_encoding-

    4. 配置Apache服务器,将 `mod_encoding` 加入到加载模块的配置中,如在 `httpd.conf` 文件中添加 `LoadModule encoding_module modules/mod_encoding.so`。 5. 重启Apache服务器使更改生效,可以使用 `sudo service...

    parquet-encoding-1.8.2-API文档-中文版.zip

    赠送jar包:parquet-encoding-1.8.2.jar; 赠送原API文档:parquet-encoding-1.8.2-javadoc.jar; 赠送源代码:parquet-encoding-1.8.2-sources.jar; 赠送Maven依赖信息文件:parquet-encoding-1.8.2.pom; 包含...

    mod_encoding_2010.zip

    "mod_encoding_2010.zip"这个压缩包文件,显然与Apache服务器的一个特定模块——mod_encoding有关,该模块主要解决的是Apache在处理包含非ASCII字符(如中文)的URL路径时的问题。 Apache默认情况下,可能会对包含...

    64位环境的mod_encoding模块

    针对中文网址(中文URL)的支持问题,"64位环境的mod_encoding模块"提供了一个解决方案。这个模块是专门为了解决在64位操作系统,如64位CentOS 5.5上,Apache2.2.15版本对中文URL处理不兼容的问题而设计的。 Apache...

    Encoding内码查看工具

    在IT领域,编码(Encoding)是数据转换成可读格式的过程,特别是在文本处理中,它涉及到将字符转换为数字表示,以便计算机可以处理和存储。本文将深入探讨C#编程语言中的编码概念以及如何使用“Encoding内码查看工具...

    字符编码转换器(Encoding Tool)EncodingTool

    《字符编码转换器(Encoding Tool)——深入理解与应用》 字符编码是计算机处理文本的基础,不同的编码方式决定了如何存储和显示各种语言的字符。在信息化社会中,由于全球化的需求,我们常常需要面对不同编码格式...

    Encoding类使用说明

    ### Encoding类使用说明 #### 一、概述 在.NET Framework中,`System.Text.Encoding`类提供了处理字符编码的强大工具。编码是指将Unicode字符转换为字节序列的过程;而解码则是相反的操作,即将字节序列转换回...

    WEBLOGIC8+AJAX setCharacterEncoding报错

    描述中提到的"NoSuchMethodError setCharacterEncoding(Ljava/lang/String;)V"是一个Java运行时异常,意味着在类装载时尝试调用的方法在该类的Class文件中存在,但在链接阶段找不到。这通常发生在试图执行的方法在...

    mod_encoding-2.2.0-1.i386.tar.gz

    LoadModule encoding_module modules/mod_encoding.so Header add MS-Author-Via "DAV" &lt;IfModule mod_encoding.c&gt; EncodingEngine on NormalizeUsername on SetServerEncoding GBK ...

    mod_encoding(apache支持中文路径工具)

    `mod_encoding`模块是Apache的一个扩展,专门设计来解决这个问题,使得Apache能够正确地识别和处理包含多语言字符的URL和文件路径。 ### 1. `mod_encoding`模块介绍 `mod_encoding`是Apache HTTP服务器的一个模块...

    com.lifesting.tool.encoding_1.0.0.jar及源码

    "com.lifesting.tool.encoding_1.0.0.jar及源码" 是一个专为解决此问题而设计的工具,它是一个Eclipse插件,用于帮助开发者批量转换Eclipse项目中的文件编码格式。这个插件的名字揭示了其功能核心,即`...

Global site tag (gtag.js) - Google Analytics