一般程序写的爬虫程序都会自带请求头,不知不觉就被网站拒绝了,请求之前可以看看自己的请求头是什么,确保不被禁
如果网站太慢打不开,想在本地搭建测试环境,可以在docker环境下启动:
$ docker run -p 80:80 kennethreitz/httpbin
不过简单测试就没必要折腾了,直接访问:https://httpbin.org/get
{ args: { }, headers: { Accept: "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", Accept-Encoding: "gzip, deflate, br", Accept-Language: "zh-CN,zh;q=0.9,en;q=0.8", Connection: "close", Cookie: "_gauges_unique_day=1; _gauges_unique_month=1; _gauges_unique_year=1; _gauges_unique=1", Host: "httpbin.org", Upgrade-Insecure-Requests: "1", User-Agent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36" }, origin: "xxx.xxx.xx.xx", url: "https://httpbin.org/get" }
关于docker的国内镜像源修改,可以参考:
https://blog.csdn.net/baidu_19473529/article/details/78126869