安装splash
1、安装docker(参考:mac安装docker)
2、安装splash
docker pull scrapinghub/splash # 安装 docker run -p 8050:8050 scrapinghub/splash # 运行
访问测试: http://localhost:8050/
代码示例
import requests import time from scrapy import Selector def timer(func): def inner(*args): start = time.time() response = func(*args) print("time: %s" % (time.time() - start)) return response return inner @timer def use_request(url): return requests.get(url) @timer def use_splash(url): splash_url = "http://localhost:8050/render.html" args = { "url": url, "timeout": 5, "image": 0 } return requests.get(splash_url, params=args) if __name__ == '__main__': url = "http://quotes.toscrape.com/js/" r1 = use_request(url) sel1 = Selector(r1) text = sel1.css(".quote .text::text").extract_first() print(text) r2 = use_splash(url) sel2 = Selector(r2) text = sel2.css(".quote .text::text").extract_first() print(text) """ time: 0.632809877396 None time: 0.685022830963 “The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.” """
通过测试,发现需要splash对网页进行了渲染,获取到了数据,而且速度还很快
args参数说明:
url: 需要渲染的页面地址
timeout: 超时时间
proxy:代理
wait:等待渲染时间
images: 是否下载,默认1(下载)
js_source: 渲染页面前执行的js代码