Google Protocol Buffer“序列化.写入”代码流程一点分析

lin_style

浏览: 345210 次
性别:
来自: 福建福州

最近访客更多访客>>

xiaohuih1985

zzc125

叫我周一杆

winting

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

果实:枝头的青涩

Google iOS 数据结构 performance Go

(为啥在可视化编辑里的字都好好的，出来后就忽大忽小的，尤其在代码和文字混排的时候)

本篇主要是对“序列化.写入”所做的代码跟踪，会稍微提到点其他的。采取的例子是自带的addressbook

对我目前而言，主要关心这么几个点，对字段的管理，对协议的管理。

其中在一些代码分析的思路上是这样的：因为是对这套代码的整个需求不是太了解，所以采取的方式是，会先分析各个模块是干吗的，然后根据它们的行为开始推导。最后再将所有的模块串联起来。

要注意的是，作为一个阅读者，这套代码产生的环境、需求、历史都不太了解，所以在一些细节上的东西会稍微忽略，会有自己的疑问，但是不对里面的方法进行比较和评价，做到原原本本的展现出来。

流程图就不画了，我太懒了。反正也不是什么正规的，就是一个流水的记录。

1.字段管理.普通字段

对每个成员变量，都会有这几类的接口，一些set的接口还可能有若干的重载以

message Person {
  required string name = 1;
}

为例，分别有以下接口

  inline bool has_name() const;
  inline void clear_name();
  inline const ::std::string& name() const;
  inline void set_name(const ::std::string& value);
  inline void set_name(const char* value);
  inline void set_name(const char* value, size_t size);

以及三个标志位的接口

inline bool Person::has_name() const {
  return (_has_bits_[0] & 0x00000001u) != 0;
}
inline void Person::set_has_name() {
  _has_bits_[0] |= 0x00000001u;
}
inline void Person::clear_has_name() {
  _has_bits_[0] &= ~0x00000001u;
}

标志位类型如下

::google::protobuf::uint32 _has_bits_[(4 + 31) / 32];

在set_name(),clear_name()中，分别会调用相应的标志位接口。

因为值是和xxx=tag，中的tag绑定的，所以在向后或向前兼容上，tag不能够重复的使用

2.字段管理.数组

数组模板，类似Vector，参考repeated_field文件。如果你想问动态长度变量的话，就是它了。

3.序列化.写入

有点复杂，从SerializeToOstream开始跟调：

发现类似的接口有如下：

  bool SerializeToFileDescriptor(int file_descriptor) const;
  bool SerializePartialToFileDescriptor(int file_descriptor) const;
  bool SerializeToOstream(ostream* output) const;
  bool SerializePartialToOstream(ostream* output) const;

以SerializeToOstream为例，接下来的调用如下(其他几个都一样)

bool Message::SerializeToOstream(ostream* output) const {
  {
    io::OstreamOutputStream zero_copy_output(output);
    if (!SerializeToZeroCopyStream(&zero_copy_output)) return false;
  }
  return output->good();
}

bool MessageLite::SerializeToZeroCopyStream(
    io::ZeroCopyOutputStream* output) const {
  io::CodedOutputStream encoder(output);
  return SerializeToCodedStream(&encoder);
}

bool MessageLite::SerializeToCodedStream(io::CodedOutputStream* output) const {
  GOOGLE_DCHECK(IsInitialized()) << InitializationErrorMessage("serialize", *this);
  return SerializePartialToCodedStream(output);
}

有些函数有Partial之分，最终都会调用到SerializePartialToCodedStream,因此整个类大体的调用层次如下:

N(用户层数据源头)->OstreamOutputStream->CodedOutputStream(统一的)->SerializePartialToCodedStream(

io::CodedOutputStream* output)

我习惯先反着看底层的函数需要什么，这样可以更好的把握上层的抽象需求。

看看SerializePartialToCodedStream都做了什么：

bool MessageLite::SerializePartialToCodedStream(
    io::CodedOutputStream* output) const {
  const int size = ByteSize();  // Force size to be cached.
  uint8* buffer = output->GetDirectBufferForNBytesAndAdvance(size);
  if (buffer != NULL) {
    uint8* end = SerializeWithCachedSizesToArray(buffer);
    if (end - buffer != size) {
      ByteSizeConsistencyError(size, ByteSize(), end - buffer);
    }
    return true;
  } else {
    int original_byte_count = output->ByteCount();
    SerializeWithCachedSizes(output);
    if (output->HadError()) {
      return false;
    }
    int final_byte_count = output->ByteCount();

    if (final_byte_count - original_byte_count != size) {
      ByteSizeConsistencyError(size, ByteSize(),
                               final_byte_count - original_byte_count);
    }

    return true;
  }
}

1).有两种写入的方式,SerializeWithCachedSizesToArray和SerializeWithCachedSizes

2).接口参数是CodedOutputStream,主要从里面获取具体的内容数据和长度等

3).SerializeWithCachedSizesToArray和SerializeWithCachedSizes都是虚函数，每个协议类自己具体的实现

继续理清OstreamOutputStream、CodedOutputStream几者的关系。看看标准类CodedOutputStream，希望底层最终的数据是怎样的形态来表示。

把源代码的一些东西简化后如下:

class LIBPROTOBUF_EXPORT CodedOutputStream {
 public:
  // Create an CodedOutputStream that writes to the given ZeroCopyOutputStream.
  explicit CodedOutputStream(ZeroCopyOutputStream* output);


  // Skips a number of bytes, leaving the bytes unmodified in the underlying
  // buffer.  Returns false if an underlying write error occurs.  This is
  // mainly useful with GetDirectBufferPointer().
  bool Skip(int count);

  // Sets *data to point directly at the unwritten part of the
  // CodedOutputStream's underlying buffer, and *size to the size of that
  // buffer, but does not advance the stream's current position.  This will
  // always either produce a non-empty buffer or return false.  If the caller
  // writes any data to this buffer, it should then call Skip() to skip over
  // the consumed bytes.  This may be useful for implementing external fast
  // serialization routines for types of data not covered by the
  // CodedOutputStream interface.
  bool GetDirectBufferPointer(void** data, int* size);

  // If there are at least "size" bytes available in the current buffer,
  // returns a pointer directly into the buffer and advances over these bytes.
  // The caller may then write directly into this buffer (e.g. using the
  // *ToArray static methods) rather than go through CodedOutputStream.  If
  // there are not enough bytes available, returns NULL.  The return pointer is
  // invalidated as soon as any other non-const method of CodedOutputStream
  // is called.
  inline uint8* GetDirectBufferForNBytesAndAdvance(int size);

  // Write raw bytes, copying them from the given buffer.
  void WriteRaw(const void* buffer, int size);
  // Like WriteRaw()  but writing directly to the target array.
  // This is _not_ inlined, as the compiler often optimizes memcpy into inline
  // copy loops. Since this gets called by every field with string or bytes
  // type, inlining may lead to a significant amount of code bloat, with only a
  // minor performance gain.
  static uint8* WriteRawToArray(const void* buffer, int size, uint8* target);

  // Equivalent to WriteRaw(str.data(), str.size()).
  void WriteString(const string& str);
  // Like WriteString()  but writing directly to the target array.
  static uint8* WriteStringToArray(const string& str, uint8* target);
  // Write a 32-bit little-endian integer.
  void WriteLittleEndian32(uint32 value);

  // Returns the total number of bytes written since this object was created.
  inline int ByteCount() const;

  // Returns true if there was an underlying I/O error since this object was
  // created.
  bool HadError() const { return had_error_; }

 private:
  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(CodedOutputStream);

  ZeroCopyOutputStream* output_;
  uint8* buffer_;
  int buffer_size_;
  int total_bytes_;  // Sum of sizes of all buffers seen so far.
  bool had_error_;   // Whether an error occurred during output.

  // Advance the buffer by a given number of bytes.
  void Advance(int amount);

  // Called when the buffer runs out to request more data.  Implies an
  // Advance(buffer_size_).
  bool Refresh();
};

这个类干了这么几件事

1) 维护一个ZeroCopyOutputStream

2) 维护一个uint8* buffer_,各种write函数都是和它绑定,这也是他希望的意识形态

3) uint8* buffer_和ZeroCopyOutputStream通过Refresh()转换

4) Refresh()的转换调用buffer_和ZeroCopyOutputStream通过Refresh::Next函数。而且Next必然是个虚函数

在XXXOutputStream结构类如下,以OstreamOutputStream为例，源码简化如下：

class LIBPROTOBUF_EXPORT OstreamOutputStream : public ZeroCopyOutputStream {
 public:
  // Creates a stream that writes to the given C++ ostream.
  // If a block_size is given, it specifies the size of the buffers
  // that should be returned by Next().  Otherwise, a reasonable default
  // is used.
  explicit OstreamOutputStream(ostream* stream, int block_size = -1);
  ~OstreamOutputStream();

  // implements ZeroCopyOutputStream ---------------------------------
  bool Next(void** data, int* size);
  void BackUp(int count);
  int64 ByteCount() const;

 private:
  class LIBPROTOBUF_EXPORT CopyingOstreamOutputStream : public CopyingOutputStream {
   public:
    CopyingOstreamOutputStream(ostream* output);
    ~CopyingOstreamOutputStream();

    // implements CopyingOutputStream --------------------------------
    bool Write(const void* buffer, int size);

   private:
    // The stream.
    ostream* output_;

    GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(CopyingOstreamOutputStream);
  };

  CopyingOstreamOutputStream copying_output_;
  CopyingOutputStreamAdaptor impl_;

  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(OstreamOutputStream);
};

1) OstreamOutputStream本身继承ZeroCopyOutputStream

2) 有个内置类Copying...,继承CopyingOutputStream

3) 及成员变量copying_output_和一个impl_

我们先看看OstreamOutputStream和copying_output_、impl_是怎么交互的。

 // implements ZeroCopyOutputStream ---------------------------------
  bool Next(void** data, int* size);
  void BackUp(int count);
  int64 ByteCount() const;
  
bool OstreamOutputStream::Next(void** data, int* size) {
  return impl_.Next(data, size);
}

void OstreamOutputStream::BackUp(int count) {
  impl_.BackUp(count);
}

int64 OstreamOutputStream::ByteCount() const {
  return impl_.ByteCount();
}

而copying_output_只是给impl_构造用

OstreamOutputStream::OstreamOutputStream(ostream* output, int block_size)
  : copying_output_(output),
    impl_(&copying_output_, block_size) {
}

可以看到，OstreamOutputStream，copying_output_都继承了ZeroCopyOutputStream,但实现都是在copying_output_中，OstreamOutputStream只是起到接口约束。

继续跟调CopyingOutputStreamAdaptor。

1).维护scoped_array<uint8> buffer_; CopyingOutputStream* copying_stream_;

2).围绕buffer_做了很多事，主要是字段，位置，写入等等

3).buffer_和copying_stream_交互主要通过一个Write的虚函数，比如

if (copying_stream_->Write(buffer_.get(), buffer_used_)) {

4).buffer_是一个连续的空间，大小由外部传入

至此，几个大模块功能都差不多过了一遍，现在把他们串起来。

自定义协议继承google::protobuf::Message,当你要把协议体序列化到某个介质的时候，如下：

std::fstream output(filename.c_str(), ios::out | ios::trunc | ios::binary);

addressbook.SerializeToOstream(&output);

SerializeToXXX，XXX可以是用户的自定义格式

进行一个IO流的封装，可以叫FileOutputStream,也可以叫OstreamOutputStream,以后者为例,都继承自一个叫ZeroCopyOutputStream接口类，需要实现以下三个函数

bool Next(void** data, int* size);

void BackUp(int count);

int64 ByteCount() const;

为了重写这3个接口的方便和统一，只要求用户在数据的导出上做一个重写。于是抽象出

CopyingOutputStream类，这个类里面只有一个bool Write(const void* buffer, int size);函数，也就是把第三方的数据源导入到buffer里面。

Next,BackUp,ByteCount自然可以起到一个重用的机制，于是抽象出叫CopyingOutputStreamAdaptor。

其继承自ZeroCopyOutputStream，主要是为了Next,BackUp,ByteCount接口约束。在父类OstreamOutputStream里的Next,BackUp,ByteCount，只是对CopyingOutputStreamAdaptor封装调用

(一开始对OstreamOutputStream,CopyingOutputStream,CopyingOutputStreamAdaptor有点迷惑，理清关系后，发现层次挺清晰的)

CopyingOutputStreamAdaptor维护着scoped_array<uint8> buffer_;会调用CopyingOutputStream的接口Write导入数据

OK,那现在OstreamOutputStream已经有数据了，进行CodedOutputStream

CodedOutputStream是为两者提供服务，一个是 ZeroCopyOutputStream* output_;也就是我们前文中转换后的OstreamOutputStream;一个是静态数据，供第三方直接调用.

CodedOutputStream提供了一个uint8* buffer_;指针，其实是直接从ZeroCopyOutputStream* output_读取指针值的，这也是为什么叫ZeroCopyOutputStream。

最后调用MessageLite::SerializePartialToCodedStream函数，里面会判断调用虚函数SerializeWithCachedSizesToArray,

SerializeWithCachedSizes。(前者最后还是会调用SerializeWithCachedSizes)

在虚函数SerializeWithCachedSizesToArray里，参数是一个uint8* buffer_，把协议里的值和tag号顺序的写入入。tag|长度|值

4.序列化.读出

代码架构和写入的一样，主要关注最终的MergePartialFromCodedStream函数。

2
顶

0
踩

分享到：

由google protocol buffer想到协议序列化的 ... | lua调试器：运行时的值查看

2011-06-09 23:34
浏览 16742
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论