什么是Protocol Buffers
先看官网定义:
protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
Protocol Buffers 是一种结构化数据的存储格式,可以用于结构化数据的序列化和反序列化。它很适合做数据存储、 RPC 数据交换格式。
Protocol Buffers的作用可以类比JSON、XML。
对于传输双方,如果约定好使用Protocol Buffer为数据传输的格式,那么这将是一种比JSON和XML都高效轻便的途径。
Protocol Buffers支持多种编程语言,包括C++、Java、Python、Go、C#等。
安装Protocol Buffers
1、官网下载压缩包
2、解压
tar -zxvf protobuf-all-3.6.0.tar.gz
3、protobuf的包需要自己编译
cd protobuf-3.6.0
./configure
make
make install
4、在protobuf-3.6.0
目录下有对应着各种语言的文件夹,每个文件夹下都有README.md,里面有相应的安装步骤。
对于python:
cd protobuf-3.6.0/python
python setup.py build
python setup.py test
python setup.py install
5、到此为止,protobuf的python版本已经安装完成,可在命令行键入protoc
试试。
Demo
1、创建addressbook.proto
文件。.proto
文件定义了要序列化的数据结构。
syntax = "proto2";
package tutorial;
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phones = 4;
}
message AddressBook {
repeated Person people = 1;
}
2、编译addressbook.proto
protoc --python_out=/Users/ya/developer/protoc-test addressbook.proto
这一条命令执行完后会在你指定的目录下生成addressbook_pb2.py
文件。
addressbook_pb2.py
会提供关于所定义的数据结构的操作方法。
3、编写writer.py
,负责序列化数据并保存到文件
# See README.txt for information and build instructions.
import addressbook_pb2
import sys
try:
raw_input # Python 2
except NameError:
raw_input = input # Python 3
# This function fills in a Person message based on user input.
def PromptForAddress(person):
person.id = int(raw_input("Enter person ID number: "))
person.name = raw_input("Enter name: ")
email = raw_input("Enter email address (blank for none): ")
if email != "":
person.email = email
while True:
number = raw_input("Enter a phone number (or leave blank to finish): ")
if number == "":
break
phone_number = person.phones.add()
phone_number.number = number
type = raw_input("Is this a mobile, home, or work phone? ")
if type == "mobile":
phone_number.type = addressbook_pb2.Person.MOBILE
elif type == "home":
phone_number.type = addressbook_pb2.Person.HOME
elif type == "work":
phone_number.type = addressbook_pb2.Person.WORK
else:
print("Unknown phone type; leaving as default value.")
# Main procedure: Reads the entire address book from a file,
# adds one person based on user input, then writes it back out to the same
# file.
if len(sys.argv) != 2:
print("Usage:", sys.argv[0], "ADDRESS_BOOK_FILE")
sys.exit(-1)
address_book = addressbook_pb2.AddressBook()
# Read the existing address book.
try:
with open(sys.argv[1], "rb") as f:
address_book.ParseFromString(f.read())
except IOError:
print(sys.argv[1] + ": File not found. Creating a new file.")
# Add an address.
PromptForAddress(address_book.people.add())
# Write the new address book back to disk.
with open(sys.argv[1], "wb") as f:
f.write(address_book.SerializeToString())
5、执行writer.py
python writer.py addressbook.data
执行完脚本后数据将会序列化并存储到addressbook.data
中。
6、编写reader.py
,负责读取文件并反序列化数据
import addressbook_pb2
import sys
# Iterates though all people in the AddressBook and prints info about them.
def ListPeople(address_book):
for person in address_book.people:
print "Person ID:", person.id
print "Name:", person.name
if person.email != "":
print "E-mail address:", person.email
for phone_number in person.phones:
if phone_number.type == addressbook_pb2.Person.MOBILE:
print "Mobile phone :",
elif phone_number.type == addressbook_pb2.Person.HOME:
print "Home phone :",
elif phone_number.type == addressbook_pb2.Person.WORK:
print "Work phone :",
print(phone_number.number)
# Main procedure: Reads the entire address book from a file and prints all
# the information inside.
if len(sys.argv) != 2:
print("Usage:", sys.argv[0], "ADDRESS_BOOK_FILE")
sys.exit(-1)
address_book = addressbook_pb2.AddressBook()
# Read the existing address book.
with open(sys.argv[1], "rb") as f:
address_book.ParseFromString(f.read())
ListPeople(address_book)
7、执行reader.py
python reader.py addressbook.data
输出结果示例:
Person ID: 1
Name: kanon
E-mail address: 540004716@qq.com
Mobile phone : 123456
两个很重要的方法
-
SerializeToString( )
:serializes the message and returns it as a string. -
ParseFromString( data )
:parses a message from the given string.
Protocol Buffers 相比 XML
- are simpler
- are 3 to 10 times smaller
- are 20 to 100 times faster
- are less ambiguous
- generate data access classes that are easier to use programmatically
在ODPS(MaxCompute)中的应用
ODPS的Tunnel HTTP Server采用了Protobuf作为其序列化机制。客户端在上传数据时,需要先对结构化数据进行序列化(生成二进制流),Tunnel服务端接收到数据后,(对二进制流)执行反序列化,还原出结构化数据,写到ODPS表中。