httpbin是requests作者Kenneth Reitz的项目,是一个使用flask制作的http协议演示项目。学习这个项目,我们大概可以获得两个小收获:
- 学习如何使用flask制作一个网站
- 学习一些http协议的细节
正式开始之前,对flask不熟悉的朋友,欢迎去回顾flask的源码解析:
httpbin的项目结构
我们选用httpbin的v0.7.0版本,项目大概结构如下:
模块 | 功能 |
templates | 模版文件 |
core | 功能实现 |
fileters | 一些装饰器实现 |
helpers | 一些帮助类 |
structures | 数据结构实现 |
utils | 一些工具类 |
Dockerfile | docker镜像文件 |
test_httpbin.py | 单元测试用例 |
httpbin的使用
httpbin项目,可以直接在httpbin.org/网站感受,网站交互式的展示了一些http的使用, 比如get
请求
- 使用http协议的GET方法请求数据
- request的header中设置 accept:application/json 接收json输出
- 展示response的状态码,header和body
我们也可以在终端中使用curl观测:
curl -v -X GET "https://httpbin.org/get" -H "accept: application/json" ... < HTTP/2 200 < date: Sun, 09 Jan 2022 12:34:55 GMT < content-type: application/json < content-length: 269 < server: gunicorn/19.9.0 < access-control-allow-origin: * < access-control-allow-credentials: true < { "args": {}, "headers": { "Accept": "application/json", "Host": "httpbin.org", "User-Agent": "curl/7.64.1", "X-Amzn-Trace-Id": "Root=1-61dad66f-2405a8151152a4664c258b05" }, "origin": "111.201.135.46", "url": "https://httpbin.org/get" } 复制代码
-v 参数跟踪请求过程
对比可发现这和网站上展示的数据是一致的。httpbin网站上还有很多http方法的演示,大家可以自己逐一尝试。
httpbin的实现
httpbin的部署
Dockerfile文件描述了httpbin如何使用gunicorn部署运行的:
# python基础镜像 FROM python:3-alpine # 设置环境变量 ENV WEB_CONCURRENCY=4 # 添加httpbin的代码 ADD . /httpbin # 安装依赖 RUN apk add -U ca-certificates libffi libstdc++ && \ apk add --virtual build-deps build-base libffi-dev && \ # Pip pip install --no-cache-dir gunicorn /httpbin && \ # Cleaning up apk del build-deps && \ rm -rf /var/cache/apk/* # 申明端口 EXPOSE 8080 # 使用gunicorn启动服务 CMD ["gunicorn", "-b", "0.0.0.0:8080", "httpbin:app"] 复制代码
gunicorn启动httpbin:app,这个app在httpbin包下,由core模块提供:
... # Find the correct template folder when running from a different location tmpl_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'templates') app = Flask(__name__, template_folder=tmpl_dir) ... 复制代码
- 启动app的同时,设置flask项目的模版文件路径在templates目录,这个目录是和core文件同级。
get API的实现
/get
API返回请求的url,args,header和origin,并将结果json化输出:
@app.route('/get', methods=('GET',)) def view_get(): """Returns GET Data.""" return jsonify(get_dict('url', 'args', 'headers', 'origin')) 复制代码
jsonify输出使用了flask提供的jsonify功能,仅仅在默认结果上增加一个换行输出:
from flask import jsonify as flask_jsonify def jsonify(*args, **kwargs): response = flask_jsonify(*args, **kwargs) if not response.data.endswith(b'\n'): response.data += b'\n' return response 复制代码
get_dict是对request的操作,flask的request会绑定到线程上,所以不需要传递request参数到get_dict函数:
def get_dict(*keys, **extras): """Returns request dict of given keys.""" _keys = ('url', 'args', 'form', 'data', 'origin', 'headers', 'files', 'json', 'method') assert all(map(_keys.__contains__, keys)) ... d = dict( url=get_url(request), # 从request上获取args args=semiflatten(request.args), form=form, data=json_safe(data), origin=request.headers.get('X-Forwarded-For', request.remote_addr), headers=get_headers(), files=get_files(), json=_json, method=request.method, ) out_d = dict() # 复制 for key in keys: out_d[key] = d.get(key) out_d.update(extras) return out_d 复制代码
可以使用下面的命令行演示args参数, name=shawn&age=18的查询,会自动转换成args字典:
curl -X GET "https://httpbin.org/get?name=game404&age=18" { "args": { "age": "18", "name": "game404" }, "headers": { "Accept": "*/*", "Host": "httpbin.org", "User-Agent": "curl/7.64.1", "X-Amzn-Trace-Id": "Root=1-61dadb92-7bd4d2a3130e8df54f2ebeb4" }, "origin": "111.201.135.46", "url": "https://httpbin.org/get?name=shawn&age=18" } 复制代码
http是超文本协议,所以age参数默认是字符串,而不是数字
http-bin还提供了两个基于flask的Middlewares实现,其中一个是after_request,在请求完成后处理跨域问题,给响应header增加两个跨域标志:
@app.after_request def set_cors_headers(response): # 设置跨域 response.headers['Access-Control-Allow-Origin'] = request.headers.get('Origin', '*') response.headers['Access-Control-Allow-Credentials'] = 'true' ... return response 复制代码
在chrome浏览的console中可以这样验证:
var xmlHttp = new XMLHttpRequest(); # 发送请求 xmlHttp.open( "GET", "https://httpbin.org/get", false ); xmlHttp.send( null ); # 展示结果 xmlHttp.status 200 xmlHttp.responseText; '{\n "args": {}, \n "headers": {\n "Accept": "*/*", \n "Accept-Encoding": "gzip, deflate, br", \n "Accept-Language": "en,zh;q=0.9,zh-TW;q=0.8,zh-CN;q=0.7", \n "Host": "httpbin.org", \n "Origin": "https://stackoverflow.com", \n "Referer": "https://stackoverflow.com/", \n "Sec-Ch-Ua": "\\" Not A;Brand\\";v=\\"99\\", \\"Chromium\\";v=\\"96\\", \\"Google Chrome\\";v=\\"96\\"", \n "Sec-Ch-Ua-Mobile": "?0", \n "Sec-Ch-Ua-Platform": "\\"macOS\\"", \n "Sec-Fetch-Dest": "empty", \n "Sec-Fetch-Mode": "cors", \n "Sec-Fetch-Site": "cross-site", \n "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36", \n "X-Amzn-Trace-Id": "Root=1-61dadc7c-70a2cde54a07ab3a6df28d5c"\n }, \n "origin": "111.201.135.46", \n "url": "https://httpbin.org/get"\n}\n' 复制代码
seo
在web2.0时代,seo的支持很重要,可以免费利用搜索引擎带来很多访问量。robots.txt是网站和搜索引擎爬虫的约定,httpbin中提供了一个简单的实现:
@app.route('/robots.txt') def view_robots_page(): """Simple Html Page""" response = make_response() response.data = ROBOT_TXT response.content_type = "text/plain" return response 复制代码
- robots.txt使用纯文本方式输出
robots.txt的内容是禁止/deny
目录的访问:
ROBOT_TXT = """User-agent: * Disallow: /deny """ 复制代码
如果不遵守robots.txt访问了/deny
目录,http-bin的示意是会生气,大家可以自己去测试感受一下。
Kenneth Reitz 这里设计的挺有意思的,包括代码里面402的FWord,展示作者活泼的一面
压缩
http的压缩,支持gzip,deflate和brotli三种算法。下面是gzip支持实现:
@app.route('/gzip') @filters.gzip def view_gzip_encoded_content(): """Returns GZip-Encoded Data.""" return jsonify(get_dict( 'origin', 'headers', method=request.method, gzipped=True)) 复制代码
gzip使用装饰器实现:
from decorator import decorator import gzip as gzip2 @decorator def gzip(f, *args, **kwargs): """GZip Flask Response Decorator.""" data = f(*args, **kwargs) if isinstance(data, Response): content = data.data else: content = data gzip_buffer = BytesIO() gzip_file = gzip2.GzipFile( mode='wb', compresslevel=4, fileobj=gzip_buffer ) gzip_file.write(content) gzip_file.close() gzip_data = gzip_buffer.getvalue() if isinstance(data, Response): data.data = gzip_data data.headers['Content-Encoding'] = 'gzip' data.headers['Content-Length'] = str(len(data.data)) return data return gzip_data 复制代码
- 使用gzip压缩数据
- 压缩完成的数据要修改2个响应的http头Content-Encoding和Content-Length
比较特别的是这里的gzip装饰器使用了decorator这个库实现。和普通的装饰器不一样,decorator号称给人类使用的装饰器,核心特点就是没有多层嵌套的函数结构,函数的第一个参数就是函数,然后args和kwargs是原生函数的动态参数。
Basic-Auth认证
http-bin还提供了简单认证的实现。简单认证情况下,浏览器默认会提供一个用户名和密码的输入框,验证通过后才可以继续访
问:
下面是其代码:
@app.route('/basic-auth/<user>/<passwd>') def basic_auth(user='user', passwd='passwd'): """Prompts the user for authorization using HTTP Basic Auth.""" if not check_basic_auth(user, passwd): return status_code(401) return jsonify(authenticated=True, user=user) ... def check_basic_auth(user, passwd): """Checks user authentication using HTTP Basic Auth.""" auth = request.authorization # 基础的用户名密码认证 return auth and auth.username == user and auth.password == passwd 复制代码
使用curl更容易跟踪到这个过程:
curl -v -X GET "https://httpbin.org/basic-auth/game_404/123456" -H "accept: application/json" ... < HTTP/2 401 < date: Sun, 09 Jan 2022 13:33:00 GMT < content-length: 0 < server: gunicorn/19.9.0 < www-authenticate: Basic realm="Fake Realm" < access-control-allow-origin: * < access-control-allow-credentials: true 复制代码
可以看到第一次会收到401,这时候响应头上有WWW-Authenticate:
code_map = { ... 401: dict(headers={'WWW-Authenticate': 'Basic realm="Fake Realm"'}), ... } 复制代码
然后 浏览器 会自动弹出用户名和密码输入框,用户输入用户名和密码后通过认证。这个窗口不需要应用程序进行开发。
流
stream流传输,可以用于http文件的下载,比如下面的实现:
@app.route('/stream/<int:n>') def stream_n_messages(n): """Stream n JSON messages""" response = get_dict('url', 'args', 'headers', 'origin') n = min(n, 100) def generate_stream(): for i in range(n): response['id'] = i # 利用yield关键字进行输出部分 yield json.dumps(response) + '\n' return Response(generate_stream(), headers={ "Content-Type": "application/json", }) 复制代码
测试中我们可以看到一次请求,响应分成了多段接收,这样对于大的文件,可以进行断点续传。
can't parse JSON. Raw result: {"url": "https://httpbin.org/stream/3", "args": {}, "headers": {"Host": "httpbin.org", "X-Amzn-Trace-Id": "Root=1-61dae496-15998ef6666f82c444ca483c", "Sec-Ch-Ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"96\", \"Google Chrome\";v=\"96\"", "Accept": "application/json", "Sec-Ch-Ua-Mobile": "?0", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36", "Sec-Ch-Ua-Platform": "\"macOS\"", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Dest": "empty", "Referer": "https://httpbin.org/", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "en,zh;q=0.9,zh-TW;q=0.8,zh-CN;q=0.7"}, "origin": "111.201.135.46", "id": 0} {"url": "https://httpbin.org/stream/3", "args": {}, "headers": {"Host": "httpbin.org", "X-Amzn-Trace-Id": "Root=1-61dae496-15998ef6666f82c444ca483c", "Sec-Ch-Ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"96\", \"Google Chrome\";v=\"96\"", "Accept": "application/json", "Sec-Ch-Ua-Mobile": "?0", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36", "Sec-Ch-Ua-Platform": "\"macOS\"", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Dest": "empty", "Referer": "https://httpbin.org/", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "en,zh;q=0.9,zh-TW;q=0.8,zh-CN;q=0.7"}, "origin": "111.201.135.46", "id": 1} {"url": "https://httpbin.org/stream/3", "args": {}, "headers": {"Host": "httpbin.org", "X-Amzn-Trace-Id": "Root=1-61dae496-15998ef6666f82c444ca483c", "Sec-Ch-Ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"96\", \"Google Chrome\";v=\"96\"", "Accept": "application/json", "Sec-Ch-Ua-Mobile": "?0", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36", "Sec-Ch-Ua-Platform": "\"macOS\"", "Sec-Fetch-Site": "same-origin", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Dest": "empty", "Referer": "https://httpbin.org/", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "en,zh;q=0.9,zh-TW;q=0.8,zh-CN;q=0.7"}, "origin": "111.201.135.46", "id": 2} 复制代码
httpbin里还提供了一些其它的http示例,大家可以自行体验,本文就不再一一介绍了。
单元测试
/get
API的单元测试展示了如何使用unittest测试一个http接口:
class HttpbinTestCase(unittest.TestCase): """Httpbin tests""" def setUp(self): httpbin.app.debug = True self.app = httpbin.app.test_client() def test_get(self): response = self.app.get('/get', headers={'User-Agent': 'test'}) self.assertEqual(response.status_code, 200) data = json.loads(response.data.decode('utf-8')) self.assertEqual(data['args'], {}) self.assertEqual(data['headers']['Host'], 'localhost') self.assertEqual(data['headers']['Content-Length'], '0') self.assertEqual(data['headers']['User-Agent'], 'test') # self.assertEqual(data['origin'], None) self.assertEqual(data['url'], 'http://localhost/get') self.assertTrue(response.data.endswith(b'\n')) 复制代码
- 在setUp方法中httpbin.app.test_client()返回一个测试app,模拟服务
- self.app.get('/get', headers={'User-Agent': 'test'}) 模拟requests请求
- response方法就和真实的http响应一致
这种单元测试方法,脱离了http服务,执行更高效。在django框架中也有类似的方式。
小结
本文我们学习了基于flask框架实现的网站httpbin源码,了解了一些http协议实现细节,相信对大家掌握http协议也有一定的帮助。
小技巧
在utils中提供了一个很巧妙的带权重的随机算法:
def weighted_choice(choices): """Returns a value from choices chosen by weighted random selection choices should be a list of (value, weight) tuples. eg. weighted_choice([('val1', 5), ('val2', 0.3), ('val3', 1)]) 带权重的随机, 需要传入数值和权重的元祖 """ values, weights = zip(*choices) total = 0 # 权重的递增和 cum_weights = [] for w in weights: total += w cum_weights.append(total) # 随机一个浮点数 x = random.uniform(0, total) # 二分查找 i = bisect.bisect(cum_weights, x) return values[i] 复制代码