highlight: a11y-dark
结合上一篇文章《一次算法读图超时引起的urllib3源码分析》,我们学习了 urllib3 的基本语法、常见姿势和请求管理模式,以及PoolManager
、HTTPConnectionPool
、HTTPConnection
等模块部分源码。对于学习 Python 的小伙伴来说,urllib3 强大的功能几乎能实现所有 HTTP 请求场景,但这就足够了吗?
接下来我们做个验证,通过 POST 发送请求并将请求结果转 JSON 存储的小例子,如下:
urllib3发送POST请求
import json
import urllib3
# 1 创建连接
http = urllib3.PoolManager()
# 2 编码参数
from urllib.parse import urlencode
encoded_args = urlencode({'arg': 'value'})
# 3 请求
url = 'http://httpbin.org/post?' + encoded_args
r = http.request('POST', url)
# 4 解码结果
decode_data = r.data.decode('utf-8')
# 5 转JSON
data = json.loads(decode_data)['args']
print(data)
# 输出
{'arg': 'value'}
发送一个 POST 请求,我们总共经历了五个步骤,是不是有些太麻烦了😂
本着程序员的极简原则,能用轮子解决的问题为啥还非得自己手撸代码,我们看看 requests 都如何操作的,如下:
requests发送POST请求
import requests
# 1 发送请求
r = requests.post('https://httpbin.org/post', data={'key': 'value'})
# 2 获取结果
data = r.json()
print(data['form'])
# 输出
{'key': 'value'}
可见,通过 requests 发送 POST 请求,只需要简单的两个步骤即可,请求-接收
模式也更加符合我们日常语言的交流习惯,这也许就是 requests 成为当今下载量最大的 Python 包之一的原因吧!
Requests is one of the most downloaded Python packages today, pulling in around 30M downloads / week— according to GitHub, Requests is currently depended upon by 1,000,000+ repositories. You may certainly put your trust in this code.
接下来这篇文章,包含了对requests
的基础应用、超时机制、请求流程的学习,辅以流程图和部分源码的分析帮助理解。篇幅较短,预计阅读时间 15 分钟,如果对您有帮助,还望不吝评价,求点赞、求评论、求转发
。
开始之前,我们先简单聊聊 urllib
、urllib2
、urllib3
和requests
的区别。
urllib 和 urllib2 都是 Python 代码模块,用作 URL 请求相关的工作,提供不同的功能
- urllib2 可以接受一个 Request 对象来设置 URL 请求头,urllib 只接受一个 URL
- urllib 提供了 urlencode/unquote 方法,用于生成 GET 查询字符串,urllib2 没有类似功能,所以 urllib 和 urllib2 经常一起使用的原因
- urllib3 是一个第三方 URL 库,提供了许多 Python 标准库中缺少的关键特性:线程安全、连接池、SSL/TLS验证、重试请求和HTTP重定向等等
requests 封装了urllib3
- 使之更简洁易用。
requests
的设计初衷就是基于简单,下面引用作者的一段话:为人类而建,可见作者的良苦用心。当然,本篇文章也是为各位大佬而写,记得给号主点个赞👍啊~!
A simple, yet elegant, HTTP library.
Requests is an elegant and simple HTTP library for Python, built for human beings.
requests 常用的两种姿势
一、直接使用
>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'[{"id":"20714286674","type":"PushEven... '
>>> r.json()
[{'id': '20714286674', 'type': 'PushEvent'..}]
二、通过 Session 使用
>>> import requests
>>> s = requests.Session()
>>> r = s.get("https://api.github.com/events")
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'[{"id":"20714286674","type":"PushEven... '
>>> r.json()
[{'id': '20714286674', 'type': 'PushEvent'..}]
第一种属于基本使用,满足日常大部分请求场景,第二种requests.Session
对象允许跨请求持久化某些参数、持久化 Cookie 和使用 urllib3 的连接池。因此,在向同一主机发送多个请求的场景,底层 TCP 连接将被重用,这可能显著提升请求性能。
requests 架构其实很简单
整个架构包括两部分:Session
持久化参数和HTTPAdapter
适配器连接请求,其余部分都是 urllib3 的内容。
读到这里,号主分享的内容基本就结束了,后续的部分涉及源码,如果没兴趣可以直接跳到文章末尾。
requests 源码不过如此
源码版本:v2.26.0
源码路径: https://github.com/psf/requests/tree/v2.26.0
requests
包除了常见的GET
、POST
、DELETE
、PUT
之外,还有timeout
参数功能也非常好用,可以防止请求阻塞太长时间,具体如下:
>>> import requests
>>> requests.get("https://api.github.com/events", timeout=1)
<Response [200]>
>>> requests.get("https://api.github.com/events", timeout=0.00001)
Traceback (most recent call last):
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
...
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
...
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connection.py", line 146, in _new_conn
(self.host, self.timeout))
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
...
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /events (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
...
File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/adapters.py", line 496, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /events (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)'))
由上,我们逆序追踪超时报错流程为:-> requests/adapters.py(ConnectTimeoutError(<urllib3.connection.HTTPConnection)
-> urllib3.util/retry.py(ConnectTimeoutError)
-> urllib3/connection.py(ConnectTimeoutError)
-> urllib3/util/connection.py(socket.timeout: timed out)
接下来,我们逆序追踪超时异常流程代码,由于requests
包发出的HTTP
请求是基于urllib3
包进行开发,Timeout
机制也是直接沿用urllib3
的超时逻辑进行处理,如下:
# 入口
# https://github.com/psf/requests/blob/v2.26.0/requests/api.py#L16
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request <Request>`.
:param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary, list of tuples or bytes to send
in the query string for the :class:`Request`.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
请求被发送至session.request
# https://github.com/psf/requests/blob/v2.26.0/requests/sessions.py#L470
def request(self, method, url,
params=None, data=None, headers=None, cookies=None, files=None,
auth=None, timeout=None, allow_redirects=True, proxies=None,
hooks=None, stream=None, verify=None, cert=None, json=None):
"""Constructs a :class:`Request <Request>`, prepares it and sends it.
Returns :class:`Response <Response>` object.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query
string for the :class:`Request`.
...
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple
...
:rtype: requests.Response
"""
# Create the Request.
req = Request(
method=method.upper(),
url=url,
headers=headers,
files=files,
data=data or {},
json=json,
params=params or {},
auth=auth,
cookies=cookies,
hooks=hooks,
)
prep = self.prepare_request(req)
proxies = proxies or {}
settings = self.merge_environment_settings(
prep.url, proxies, stream, verify, cert
)
# Send the request.
send_kwargs = {
'timeout': timeout,
'allow_redirects': allow_redirects,
}
send_kwargs.update(settings)
resp = self.send(prep, **send_kwargs)
return resp
从架构图可知,request
请求通过调用HTTPAdapter.send
请求处理,具体如下
# https://github.com/psf/requests/blob/v2.26.0/requests/adapters.py#L394
def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
"""Sends PreparedRequest object. Returns Response object.
:param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
:param stream: (optional) Whether to stream the request content.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple or urllib3 Timeout object
...
:rtype: requests.Response
"""
try:
conn = self.get_connection(request.url, proxies)
except LocationValueError as e:
raise InvalidURL(e, request=request)
...
# 超时配置包括 tuple 和 float 两种方式
if isinstance(timeout, tuple):
try:
connect, read = timeout
timeout = TimeoutSauce(connect=connect, read=read)
except ValueError as e:
# this may raise a string formatting error.
err = ("Invalid timeout {}. Pass a (connect, read) "
"timeout tuple, or a single float to set "
"both timeouts to the same value".format(timeout))
raise ValueError(err)
elif isinstance(timeout, TimeoutSauce):
pass
else:
timeout = TimeoutSauce(connect=timeout, read=timeout)
try:
if not chunked:
resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False,
assert_same_host=False,
preload_content=False,
decode_content=False,
retries=self.max_retries,
timeout=timeout
)
# Send the request.
else:
...
except (ProtocolError, socket.error) as err:
raise ConnectionError(err, request=request)
except MaxRetryError as e:
if isinstance(e.reason, ConnectTimeoutError):
# TODO: Remove this in 3.0.0: see #2811
if not isinstance(e.reason, NewConnectionError):
raise ConnectTimeout(e, request=request)
...
raise ConnectionError(e, request=request)
...
except (_SSLError, _HTTPError) as e:
if isinstance(e, _SSLError):
# This branch is for urllib3 versions earlier than v1.22
raise SSLError(e, request=request)
elif isinstance(e, ReadTimeoutError):
raise ReadTimeout(e, request=request)
elif isinstance(e, _InvalidHeader):
raise InvalidHeader(e, request=request)
else:
raise
return self.build_response(request, resp)
总结一下
通过对比Python
、urllib3
、requests
三个开源项目的Sponsor
、Watch
、Fork
、Star
指标,requests 的 stars 竟然比 python 还多3.7k?
一个好的成功的开源项目要么技术壁垒足够强大,如果设计足够巧妙,即使技术没那么复杂也能弯道超车。
参考文档
- https://github.com/urllib3/urllib3
- https://urllib3.readthedocs.io/en/stable/index.html
- https://github.com/psf/requests
- https://docs.python-requests.org/en/latest/user/quickstart/
❤️❤️❤️读者每一份热爱都是笔者前进的动力!
我是三十一,感谢各位朋友:求点赞、求评论、求转发,大家下期见!