This article is also posted on my blog, feel free to check the latest revision: The Encode and Decode in Python
Do you really know the encode and decode in Python?
The encode and decode in Python are used to convert between strings and bytes. That we all know that the string in the computer storage and communication in the network is in the form of byte sequence, not the unicode.
Encode
So the encode is used to transform the string to the byte sequence. And when you call the encode function, you need to specify the encoding type, such as utf-8
, gbk
, gb2312
, etc. And python will use the encoding type to transform every character in the string to the corresponding byte sequence.
s = "你好,世界"
encoded_s = s.encode('utf-8')
print(encoded_s)
# b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c'
# the b is the prefix of the byte sequence.
Decode
And the decode is the function of byte sequence. It transform the byte sequence to the string. And you should all use the same encoding type to transform the byte sequence to the string.
b = b'\xe4\xbd\xa0\xe5\xa5\xbd\xef\xbc\x8c\xe4\xb8\x96\xe7\x95\x8c'
decoded_b = b.decode('utf-8')
print(decoded_b)
# 你好,世界