middlewares.py
from w3lib.http import basic_auth_header class CustomProxyMiddleware(object): def process_request(self, request, spider): request.meta['proxy'] = "https://<PROXY_IP_OR_URL>:<PROXY_PORT>" request.headers['Proxy-Authorization'] = basic_auth_header( '<PROXY_USERNAME>', '<PROXY_PASSWORD>')
settings.py
DOWNLOADER_MIDDLEWARES = {
'<PROJECT_NAME>.middlewares.CustomProxyMiddleware': 350,'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 400,
}
问题
1、如果代理验证设置不对,状态码会返回407
407 Proxy Authentication Required
刚开始采用以下格式配置,发现部分请求可以发送,不过会有一个重试,部分请求直接报错
request.meta['proxy'] = "https://<PROXY_USERNAME>:<PROXY_PASSWORD>@<PROXY_IP_OR_URL>:<PROXY_PORT>"
正确的设置是在请求头中设置 Proxy-Authorization
参考
</div>