请教一下各位大神。 我需要用HttpClient4.x来模拟登陆一个网站,然后再打开里面的一个链接进行数据抓取。 HttpClient的使用策略等应该是这么样设置? 我实例出来一个HttpClient之后用它进行了登陆Post,然后再使用这个HttpClient去请求受限资源,报没登陆的错误。
HttpClient是这样设置的:
// 设置组件参数, HTTP协议的版本,1.1/1.0/0.9
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setUserAgent(params, "HttpComponents/1.1");
HttpProtocolParams.setUseExpectContinue(params, true);
//设置连接超时时间
int REQUEST_TIMEOUT = 10*1000; //设置请求超时10秒钟
int SO_TIMEOUT = 10*1000; //设置等待数据超时时间10秒钟
//HttpConnectionParams.setConnectionTimeout(params, REQUEST_TIMEOUT);
//HttpConnectionParams.setSoTimeout(params, SO_TIMEOUT);
params.setParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, REQUEST_TIMEOUT);
params.setParameter(CoreConnectionPNames.SO_TIMEOUT, SO_TIMEOUT);
//设置访问协议
SchemeRegistry schreg = new SchemeRegistry();
schreg.register(new Scheme("http",80,PlainSocketFactory.getSocketFactory()));
schreg.register(new Scheme("https", 443, SSLSocketFactory.getSocketFactory()));
//多连接的线程安全的管理器
PoolingClientConnectionManager pccm = new PoolingClientConnectionManager(schreg);
pccm.setDefaultMaxPerRoute(20); //每个主机的最大并行链接数
pccm.setMaxTotal(100); //客户端总并行链接最大数
HttpClient httpClient = new DefaultHttpClient(pccm, params);
//这两个策略都试过了,不行。
//httpClient.getParams().setParameter(ClientPNames.COOKIE_POLICY, CookiePolicy.BROWSER_COMPATIBILITY);
httpClient.getParams().setParameter(ClientPNames.COOKIE_POLICY, CookiePolicy.BEST_MATCH);
求救大神给个Demo或指导HttpClient应该怎么设置。
不用特殊设置,httpClient会自动提交登录成功后保存session的cookie。
我这样用过,可以抓取:
public class Spider {
private DefaultHttpClient httpClient;
private HttpResponse response;
private HttpEntity entity;
public Spider()
{
this.httpClient = new DefaultHttpClient();
HttpParams params = httpClient.getParams();
/*连接超时*/
HttpConnectionParams.setConnectionTimeout(params, 30000);
/*读取超时*/
HttpConnectionParams.setSoTimeout(params, 30000);
}
public void post(String url, List<NameValuePair> nameValuePair) throws ClientProtocolException, IOException {
HttpPost httpost = new HttpPost(url);
if(nameValuePair != null)
{
httpost.setEntity(new UrlEncodedFormEntity(nameValuePair, HTTP.UTF_8));
}
this.response = this.httpClient.execute(httpost);
this.entity = response.getEntity();
}
public void get(String url) throws ClientProtocolException, IOException {
HttpGet httpGet = new HttpGet(url);
this.response = this.httpClient.execute(httpGet);
this.entity = response.getEntity();
}
public void readResponseContent() throws UnsupportedEncodingException, IllegalStateException, IOException
{
BufferedReader reader = new BufferedReader(new InputStreamReader(this.entity.getContent(), "utf-8"));
//读取你需要的信息
releaseEntity();
}
private void releaseEntity() throws IOException
{
if(this.entity != null){
this.entity.consumeContent();
}
}
}
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。