Python 'xxx' codec can't decode byte xxx常见编码错

2023-04-20 144

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Python 'xxx' codec can't decode byte xxx常见编码错

'xxx' codec can't decode byte xxx常见编码错误处理

测试环境

python 3.3.2

win7

问题描述

利用python文件io方法open打开文件，读取文件时报错，提示类似如下错误：

'xxx' codec can't decode byte xxx in position xxxx

经过好一番摸索，才大致搞懂其中的来弄去脉，暂且不说原因吧，来看下笔者做的几个实验。

源代码文件大致如下：

#!/usr/bin/env python

# -*- coding:utf-8 -*-

__author__ = 'shouke'

def testfn():

str_dic_list = []

f = open('d:\\saofu-weixin.log.2016-11-08.log', 'r') #代码行8

counter = 0

is_found= 0

for line in f:

……（做一些处理）

testfn()

实践探索

实验1

文件(saofu-weixin.log.2016-11-08.log,以下不再赘述)编码设置：ANSI格式编码

代码行8：f = open('d:\\saofu-weixin.log.2016-11-08.log', 'r')

运行报错：

UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 4055: illegal multibyte sequence

代码行8：f = open(''d:\\saofu-weixin.log.2016-11-08.log'', 'r',encoding='utf-8')

运行报错：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 744: invalid start byte

实验2

文件编码设置：UTF-8无BOM格式编码

代码行8：f = open('d:\\saofu-weixin.log.2016-11-08.log', 'r')

运行报错：

UnicodeDecodeError: 'gbk' codec can't decode byte 0x81 in position 756: illegal multibyte sequence

代码行8：

f = open('d:\\saofu-weixin.log.2016-11-08.log', 'r',encoding='utf-8')

运行不报错

实验3

文件编码设置：USC-2 Big Endia格式编码、USC-2 Little Endia格式编码

代码行8：f = open('d:\\saofu-weixin.log.2016-11-08.log', 'r',encoding='utf-8')

运行报错：

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfe in position 0: invalid start byte

代码行8：f = open('d:\\saofu-weixin.log.2016-11-08.log', 'r')

运行报错：

UnicodeDecodeError: 'gbk' codec can't decode byte 0xfe in position 0: illegal multibyte sequence

实验4

文件编码设置：UTF-8格式编码

源代码文件编码设置：

# -*- coding:gbk -*-

或

# -*- coding:gb2312 -*-

或

# -*- coding:utf-8 -*-

代码行8：f = open('d:\\saofu-weixin.log.2016-11-08.log', 'r')

运行报错：

UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

代码行：f = open('d:\\saofu-weixin.log.2016-11-08.log', 'r',encoding='utf-8')

运行不报错

原因分析

通过上述错误提示，我们可以得出结论：

1、出错了，错误类型为“UnicodeDecodeError”，大致意思是Unicode解码错误

2、具体原因是：'xxx' codec can't decode byte xxxx in position xx,大致意思就是解码器codec用‘xxx’编码去解码位于xx位置处的xxxx字节

3、进一步细化错误为：illegal multibyte sequence（非法多字节序列）或者invalid start byte（非法的起始字符）

通过实验，我们可以得出结论：

按日志文件自身的编码打开并读取文件内容时，运行不报错。

综上结论

1、python对编码转换的处理：从一种编码到另一种编(暂且称为目标编码)的转换，python会先把目标按某种编码解码为Unicode编码，然后再转换为目标编码。

2、利用python的open打开文件时，最好显示的指定编码，即按指定编码打开文件，且该指定编码必须和被打开文件自身的编码设置保持一致，否则可能会导致解码出错，直白的说，被打开文件是什么编码，就用什么编码去打开文件进行解码。

3、python源代码文件中的注释# -*- coding: encoding -*-和文件解码无关，仅针对脚本文件中在内容，比如中文字符串。

附：关于源代码编码说明

默认的，python源代码文件编码被视为UTF-8编码。按那种编码方式，世界上大多数语言的字符可以同时用于字符串字面量，标识符和注释-尽管标准库只使用ASCII字符作为标识符，任何可移植代码应该遵循的约定。为了更恰当的展示所有这些字符，你的编辑器必须能够识别到源代码文件为UTF-8，且必须使用一种能支持文件中所有字符的字体。

我们也可以为源代码文件指定其它不同的的编码。在“#!”行之后添加如下注释语句：

# -*- coding: encoding -*-

指定编码后，源文件中的所有东西都被视为按指定编码格式编码，而非UTF-8编码。

官方原文：By default, Python source files are treated as encoded in UTF-8. In that encoding, characters of most languages in the world can be used simultaneously in string literals, identifiers and comments — although the standard library only uses ASCII characters for identifiers, a convention that any portable code should follow. To display all these characters properly, your editor must recognize that the file is UTF-8, and it must use a font that supports all the characters in the file

It is also possible to specify a different encoding for source files. In order to do this, put one more special comment line right after the #! line to define the source file encoding:

# -*- coding: encoding -*-

With that declaration, everything in the source file will be treated as having the encoding encoding instead of UTF-8

Python 'xxx' codec can't decode byte xxx常见编码错

测试环境

问题描述

实践探索

原因分析

综上结论

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Python 'xxx' codec can't decode byte xxx常见编码错

测试环境

问题描述

实践探索

原因分析

综上结论

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像