开源项目 requests 的 stars 为啥比 python 还多3.7k?

简介:

highlight: a11y-dark

结合上一篇文章《一次算法读图超时引起的urllib3源码分析》,我们学习了 urllib3 的基本语法、常见姿势和请求管理模式,以及PoolManagerHTTPConnectionPoolHTTPConnection等模块部分源码。对于学习 Python 的小伙伴来说,urllib3 强大的功能几乎能实现所有 HTTP 请求场景,但这就足够了吗?

接下来我们做个验证,通过 POST 发送请求并将请求结果转 JSON 存储的小例子,如下:

urllib3发送POST请求

  import json
  import urllib3
  # 1 创建连接
  http = urllib3.PoolManager()
  # 2 编码参数
  from urllib.parse import urlencode
  encoded_args = urlencode({'arg': 'value'})
  # 3 请求
  url = 'http://httpbin.org/post?' + encoded_args
  r = http.request('POST', url)
  # 4 解码结果
  decode_data = r.data.decode('utf-8')
  # 5 转JSON
  data = json.loads(decode_data)['args']
  print(data)
  
  # 输出
  {'arg': 'value'}

发送一个 POST 请求,我们总共经历了五个步骤,是不是有些太麻烦了😂

本着程序员的极简原则,能用轮子解决的问题为啥还非得自己手撸代码,我们看看 requests 都如何操作的,如下:

requests发送POST请求

  import requests
  # 1 发送请求
  r = requests.post('https://httpbin.org/post', data={'key': 'value'})
  # 2 获取结果
  data = r.json()
  print(data['form'])
  
  # 输出
  {'key': 'value'}

可见,通过 requests 发送 POST 请求,只需要简单的两个步骤即可,请求-接收模式也更加符合我们日常语言的交流习惯,这也许就是 requests 成为当今下载量最大的 Python 包之一的原因吧!

Requests is one of the most downloaded Python packages today, pulling in around 30M downloads / week— according to GitHub, Requests is currently depended upon by 1,000,000+ repositories. You may certainly put your trust in this code.

https://github.com/psf/requests#requests

接下来这篇文章,包含了对requests的基础应用、超时机制、请求流程的学习,辅以流程图和部分源码的分析帮助理解。篇幅较短,预计阅读时间 15 分钟,如果对您有帮助,还望不吝评价,求点赞、求评论、求转发

开始之前,我们先简单聊聊 urlliburllib2urllib3requests的区别。

  • urllib 和 urllib2 都是 Python 代码模块,用作 URL 请求相关的工作,提供不同的功能

    • urllib2 可以接受一个 Request 对象来设置 URL 请求头,urllib 只接受一个 URL
    • urllib 提供了 urlencode/unquote 方法,用于生成 GET 查询字符串,urllib2 没有类似功能,所以 urllib 和 urllib2 经常一起使用的原因
  • urllib3 是一个第三方 URL 库,提供了许多 Python 标准库中缺少的关键特性:线程安全、连接池、SSL/TLS验证、重试请求和HTTP重定向等等
  • requests 封装了urllib3

    • 使之更简洁易用。

requests的设计初衷就是基于简单,下面引用作者的一段话:为人类而建,可见作者的良苦用心。当然,本篇文章也是为各位大佬而写,记得给号主点个赞👍啊~!

A simple, yet elegant, HTTP library.
Requests is an elegant and simple HTTP library for Python, built for human beings.

4912a416e5ca47b481b4ec0004f29c7b~tplv-k3u1fbpfcp-zoom-1.image

requests 常用的两种姿势

一、直接使用

>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'[{"id":"20714286674","type":"PushEven... '
>>> r.json()
[{'id': '20714286674', 'type': 'PushEvent'..}]

二、通过 Session 使用

>>> import requests
>>> s = requests.Session()
>>> r = s.get("https://api.github.com/events")
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'[{"id":"20714286674","type":"PushEven... '
>>> r.json()
[{'id': '20714286674', 'type': 'PushEvent'..}]

第一种属于基本使用,满足日常大部分请求场景,第二种requests.Session对象允许跨请求持久化某些参数、持久化 Cookie 和使用 urllib3 的连接池。因此,在向同一主机发送多个请求的场景,底层 TCP 连接将被重用,这可能显著提升请求性能。

requests 架构其实很简单

整个架构包括两部分:Session持久化参数和HTTPAdapter适配器连接请求,其余部分都是 urllib3 的内容。

读到这里,号主分享的内容基本就结束了,后续的部分涉及源码,如果没兴趣可以直接跳到文章末尾。

requests 源码不过如此

源码版本:v2.26.0
源码路径: https://github.com/psf/requests/tree/v2.26.0

requests包除了常见的GETPOSTDELETEPUT之外,还有timeout参数功能也非常好用,可以防止请求阻塞太长时间,具体如下:

>>> import requests
>>> requests.get("https://api.github.com/events", timeout=1)
<Response [200]>
>>> requests.get("https://api.github.com/events", timeout=0.00001)
Traceback (most recent call last):
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  ...
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  ...
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connection.py", line 146, in _new_conn
    (self.host, self.timeout))
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  ...
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /events (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  ...
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/adapters.py", line 496, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /events (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)'))

由上,我们逆序追踪超时报错流程为:-> requests/adapters.py(ConnectTimeoutError(<urllib3.connection.HTTPConnection) -> urllib3.util/retry.py(ConnectTimeoutError) -> urllib3/connection.py(ConnectTimeoutError) -> urllib3/util/connection.py(socket.timeout: timed out)

接下来,我们逆序追踪超时异常流程代码,由于requests包发出的HTTP请求是基于urllib3包进行开发,Timeout机制也是直接沿用urllib3的超时逻辑进行处理,如下:

# 入口 
# https://github.com/psf/requests/blob/v2.26.0/requests/api.py#L16

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

请求被发送至session.request

# https://github.com/psf/requests/blob/v2.26.0/requests/sessions.py#L470

def request(self, method, url,
        params=None, data=None, headers=None, cookies=None, files=None,
        auth=None, timeout=None, allow_redirects=True, proxies=None,
        hooks=None, stream=None, verify=None, cert=None, json=None):
    """Constructs a :class:`Request <Request>`, prepares it and sends it.
    Returns :class:`Response <Response>` object.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query
        string for the :class:`Request`.
    ...
    :param timeout: (optional) How long to wait for the server to send
        data before giving up, as a float, or a :ref:`(connect timeout,
        read timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    ...
    :rtype: requests.Response
    """
    # Create the Request.
    req = Request(
        method=method.upper(),
        url=url,
        headers=headers,
        files=files,
        data=data or {},
        json=json,
        params=params or {},
        auth=auth,
        cookies=cookies,
        hooks=hooks,
    )
    prep = self.prepare_request(req)

    proxies = proxies or {}

    settings = self.merge_environment_settings(
        prep.url, proxies, stream, verify, cert
    )

    # Send the request.
    send_kwargs = {
        'timeout': timeout,
        'allow_redirects': allow_redirects,
    }
    send_kwargs.update(settings)
    resp = self.send(prep, **send_kwargs)

    return resp

从架构图可知,request请求通过调用HTTPAdapter.send请求处理,具体如下

# https://github.com/psf/requests/blob/v2.26.0/requests/adapters.py#L394

def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
    """Sends PreparedRequest object. Returns Response object.

    :param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
    :param stream: (optional) Whether to stream the request content.
    :param timeout: (optional) How long to wait for the server to send
        data before giving up, as a float, or a :ref:`(connect timeout,
        read timeout) <timeouts>` tuple.
    :type timeout: float or tuple or urllib3 Timeout object
    ...
    :rtype: requests.Response
    """

    try:
        conn = self.get_connection(request.url, proxies)
    except LocationValueError as e:
        raise InvalidURL(e, request=request)
    ...
    # 超时配置包括 tuple 和 float 两种方式
    if isinstance(timeout, tuple):
        try:
            connect, read = timeout
            timeout = TimeoutSauce(connect=connect, read=read)
        except ValueError as e:
            # this may raise a string formatting error.
            err = ("Invalid timeout {}. Pass a (connect, read) "
                    "timeout tuple, or a single float to set "
                    "both timeouts to the same value".format(timeout))
            raise ValueError(err)
    elif isinstance(timeout, TimeoutSauce):
        pass
    else:
        timeout = TimeoutSauce(connect=timeout, read=timeout)

    try:
        if not chunked:
            resp = conn.urlopen(
                method=request.method,
                url=url,
                body=request.body,
                headers=request.headers,
                redirect=False,
                assert_same_host=False,
                preload_content=False,
                decode_content=False,
                retries=self.max_retries,
                timeout=timeout
            )

        # Send the request.
        else:
            ...

    except (ProtocolError, socket.error) as err:
        raise ConnectionError(err, request=request)

    except MaxRetryError as e:
        if isinstance(e.reason, ConnectTimeoutError):
            # TODO: Remove this in 3.0.0: see #2811
            if not isinstance(e.reason, NewConnectionError):
                raise ConnectTimeout(e, request=request)
        ...
        raise ConnectionError(e, request=request)
    ...
    except (_SSLError, _HTTPError) as e:
        if isinstance(e, _SSLError):
            # This branch is for urllib3 versions earlier than v1.22
            raise SSLError(e, request=request)
        elif isinstance(e, ReadTimeoutError):
            raise ReadTimeout(e, request=request)
        elif isinstance(e, _InvalidHeader):
            raise InvalidHeader(e, request=request)
        else:
            raise
    return self.build_response(request, resp)

总结一下



通过对比Pythonurllib3requests三个开源项目的SponsorWatchForkStar指标,requests 的 stars 竟然比 python 还多3.7k?

一个好的成功的开源项目要么技术壁垒足够强大,如果设计足够巧妙,即使技术没那么复杂也能弯道超车。

参考文档

❤️❤️❤️读者每一份热爱都是笔者前进的动力!
我是三十一,感谢各位朋友:求点赞、求评论、求转发,大家下期见!

相关文章
|
18天前
|
数据采集 存储 API
Python 网络请求:深入理解Requests库
Python 网络请求:深入理解Requests库
91 0
|
15天前
|
关系型数据库 Java 分布式数据库
实时计算 Flink版操作报错合集之在使用 Python UDF 时遇到 requests 包的导入问题,提示 OpenSSL 版本不兼容如何解决
在使用实时计算Flink版过程中,可能会遇到各种错误,了解这些错误的原因及解决方法对于高效排错至关重要。针对具体问题,查看Flink的日志是关键,它们通常会提供更详细的错误信息和堆栈跟踪,有助于定位问题。此外,Flink社区文档和官方论坛也是寻求帮助的好去处。以下是一些常见的操作报错及其可能的原因与解决策略。
35 5
|
18天前
|
数据采集 Dart Apache
Github 2024-05-07 Python开源项目日报 Top10
在2024年5月7日的Github Trendings中,Python开源项目占据主导,共有10个项目上榜。其中热门项目包括:yt-dlp,一个增强版的youtube-dl分支,具有64K+星标;Home Assistant,专注本地控制和隐私的开源家庭自动化项目,拥有65K+星标;以及openpilot,一个开源驾驶辅助系统,支持多种车型,45K+星标。其他项目涵盖爬虫工具、实时应用框架Flet、可观测性平台Logfire等,涉及Python、Dart和C++等多种语言。
82 10
|
18天前
|
JSON 数据格式 Python
Python 的 requests 库是一个强大的 HTTP 客户端库,用于发送各种类型的 HTTP 请求
【5月更文挑战第9天】`requests` 库是 Python 中用于HTTP请求的强大工具。要开始使用,需通过 `pip install requests` 进行安装。发送GET请求可使用 `requests.get(url)`,而POST请求则需结合 `json.dumps(data)` 以JSON格式发送数据。PUT和DELETE请求类似,分别调用 `requests.put()` 和 `requests.delete()`。
36 2
|
18天前
|
JSON 数据格式 开发者
pip和requests在Python编程中各自扮演着不同的角色
【5月更文挑战第9天】`pip`是Python的包管理器,用于安装、升级和管理PyPI上的包;`requests`是一个HTTP库,简化了HTTP通信,支持各种HTTP请求类型及数据交互。两者在Python环境中分别负责包管理和网络请求。
36 5
|
18天前
|
数据采集 JSON API
如何用Python Requests发送请求
如何用Python Requests发送请求
13 0
|
18天前
|
存储 Python
Python网络数据抓取(3):Requests
Python网络数据抓取(3):Requests
17 5
|
18天前
|
数据采集 存储 JSON
Python爬虫面试:requests、BeautifulSoup与Scrapy详解
【4月更文挑战第19天】本文聚焦于Python爬虫面试中的核心库——requests、BeautifulSoup和Scrapy。讲解了它们的常见问题、易错点及应对策略。对于requests,强调了异常处理、代理设置和请求重试;BeautifulSoup部分提到选择器使用、动态内容处理和解析效率优化;而Scrapy则关注项目架构、数据存储和分布式爬虫。通过实例代码,帮助读者深化理解并提升面试表现。
30 0
|
18天前
|
Python
使用Python的Requests库进行网络请求和抓取网页数据
【4月更文挑战第20天】使用Python Requests库进行网络请求和网页数据抓取的步骤包括:安装库(`pip install requests`)、导入库、发送GET/POST请求、检查响应状态码、解析内容、处理Cookies、设置请求头以及异常处理。通过`response`对象访问响应信息,如`status_code`、`text`、`content`和`cookies`。可设置`headers`模拟用户代理,用`try-except`处理异常。
24 7
|
18天前
|
数据挖掘 API 数据安全/隐私保护
python请求模块requests如何添加代理ip
python请求模块requests如何添加代理ip