scrapy_redis中序列化源码及其在程序设计中的应用-阿里云开发者社区

scrapy_redis中序列化源码及其在程序设计中的应用

2019-03-17 5558

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

Redis 开源版，标准版 2GB

云数据库 Tair（兼容Redis），内存型 2GB

简介： 序列化 (Serialization)是将对象的状态信息转换为可以存储或传输的形式的过程。在序列化期间，对象将其当前状态写入到临时或持久性存储区。以后，可以通过从存储区中读取或反序列化对象的状态，重新创建该对象。

序列化 (Serialization)是将对象的状态信息转换为可以存储或传输的形式的过程。在序列化期间，对象将其当前状态写入到临时或持久性存储区。以后，可以通过从存储区中读取或反序列化对象的状态，重新创建该对象。

在scrapy_redis中，一个Request对象先经过DupeFilter去重，然后递交给scheduler调度储存在Redis中，这就面临一个问题，Request是一个对象，Redis不能存储该对象，这时就需要将request序列化储存。

scrapy中序列化模块如下：

from scrapy_redis import picklecompat

"""A pickle wrapper module with protocol=-1 by default."""



try:

    import cPickle as pickle  # PY2

except ImportError:

    import pickle





def loads(s):

    return pickle.loads(s)





def dumps(obj):

    return pickle.dumps(obj, protocol=-1)

当然python3直接使用pickle模块，已经没有cPickle，该模块最为重要的两个方法，序列化与反序列化如上，通过序列化后的对象我们可以存储在数据库、文本等文件中，并快速恢复。

同时模式设计中的备忘录模式通过这种方式达到最佳效果《python设计模式（十九）：备忘录模式》；可序列化的对象和数据类型如下：

None, True,False
整数，长整数，浮点数，复数
普通字符串和Unicode字符串
元组、列表、集合和字典，只包含可选择的对象。
在模块顶层定义的函数
在模块顶层定义的内置函数
在模块的顶层定义的类。
这些类的实例

尝试对不可序列化对象进行操作，将引发 PicklingError 异常；发生这种情况时，可能已经将未指定的字节数写入基础文件。尝试选择高度递归的数据结构可能会超过最大递归深度， RuntimeError 在这种情况下会被提起。

模块API

pickle.dump(obj, file[, protocol])

Write a pickled representation of obj to the open file object file . This is equivalent to Pickler(file, protocol).dump(obj).

If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL , the highest protocol version will be used.

Changed in version 2.3: Introduced the protocol parameter.

file must have a write() method that accepts a single string argument. It can thus be a file object opened for writing, a StringIO object, or any other custom object that meets this interface.
pickle.load(file)
Read a string from the open file object file and interpret it as a pickle data stream, reconstructing and returning the original object hierarchy. This is equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return a string. Thus file can be a file object opened for reading, a StringIO object, or any other custom object that meets this interface.

This function automatically determines whether the data stream was written in binary mode or not.
pickle.dumps(obj[, protocol])
Return the pickled representation of the object as a string, instead of writing it to a file.

If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL , the highest protocol version will be used.

Changed in version 2.3: The protocol parameter was added.
pickle.loads(string)
Read a pickled object hierarchy from a string. Characters in the string past the pickled object’s representation are ignored.

至于应用场景，比较常见的有如下几种：

程序重启时恢复上次的状态、会话存储、对象的网络传输。

scrapy_redis中序列化源码及其在程序设计中的应用

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

scrapy_redis中序列化源码及其在程序设计中的应用

热门文章

最新文章

相关课程

相关电子书

相关实验场景