涉及的demo下载RometePro.rar ,编码utf-8
两大jar简介
以下列出的是 HttpClient 提供的主要的功能,要知道更多详细的功能可以参见 HttpClient 的主页。
实现了所有 HTTP 的方法(GET,POST,PUT,HEAD 等)
支持自动转向
支持 HTTPS 协议
支持代理服务器等
jsoup(强大的网页内容解析,也可以做网页内容下载,但是网页处理等方面没有httpclient强大)
jsoup 是一款Java 的HTML解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据。(比HTMLParser优秀多了)
jsoup的主要功能如下:
下载httpclient
下载JSOUP
涉及的demo下载RometePro.rar ,编码utf-8
先来个效果
sina博文解析内容,原地址:http://blog.sina.com.cn/s/blog_89cc52f20101d1sh.html
textview内容显示的效果有以下
1.有链接的自动设置链接(android:autoLink="all")
2.链接地址可以像editview一样选中(可以通过触摸移动来选中链接地址),然后长安弹出复制对话框
3.单击链接跳转到浏览器中
实现访问解析sina博文
AndroidManifest.xml中添加一下权限
1
2
|
<
uses-permission
android:name
=
"android.permission.INTERNET"
></
uses-permission
>
<
uses-permission
android:name
=
"android.permission.ACCESS_NETWORK_STATE"
/>
|
布局使用滚动条布局 ScrollView
1)在textview中设置超链接 android:autoLink
=
"all"
2)android:fadingEdge
=
"vertical"
(可选项)
设置拉滚动条时 ,边框渐变的放向。none(边框颜色不变),horizontal(水平方向颜色变淡),vertical(垂直方向颜色变淡)。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
<?
xml
version
=
"1.0"
encoding
=
"utf-8"
?>
<
ScrollView
xmlns:android
=
"http://schemas.android.com/apk/res/android"
xmlns:tools
=
"http://schemas.android.com/tools"
android:layout_width
=
"match_parent"
android:layout_height
=
"match_parent"
android:background
=
"@drawable/app_choose_btn_normalbg"
android:fadingEdge
=
"vertical"
android:scrollbars
=
"vertical"
>
<
LinearLayout
android:layout_width
=
"match_parent"
android:layout_height
=
"match_parent"
android:orientation
=
"vertical"
>
<
LinearLayout
android:layout_width
=
"match_parent"
android:layout_height
=
"wrap_content"
android:padding
=
"5dp"
android:orientation
=
"horizontal"
android:background
=
"@drawable/grid_pictures_gdbg"
>
<
ImageView
android:id
=
"@+id/remote_searchhome"
android:layout_width
=
"40dp"
android:layout_height
=
"40dp"
android:src
=
"@drawable/remote_search_home"
/>
<
EditText
android:id
=
"@+id/remote_searedit"
android:layout_width
=
"0dp"
android:layout_height
=
"40dp"
android:layout_weight
=
"1"
android:singleLine
=
"true"
/>
<
ImageView
android:id
=
"@+id/remote_searchbtn"
android:layout_width
=
"40dp"
android:layout_height
=
"40dp"
android:src
=
"@drawable/search_btn_icon"
/>
</
LinearLayout
>
<
TextView
android:id
=
"@+id/remotetext"
android:layout_height
=
"match_parent"
android:layout_width
=
"match_parent"
android:gravity
=
"top|left"
android:background
=
"@drawable/backmain_bg"
android:textColor
=
"@color/red"
android:autoLink
=
"all"
/>
</
LinearLayout
>
</
ScrollView
>
|
将httpclient和jsoup加载进libs(拖入libs即可)
编写java文件
涉及的sina博文内容以 http://blog.sina.com.cn/s/blog_89cc52f20101d1sh.html 为例
涉及的51cto博文内容以 《两年来的IT资源汇总 》
获取网页内容
截取博文内容关键:在Jsoup
中有个根据网页的class标签为记号提取内容的函数
1
2
|
Document myDocument = Jsoup.parse(str);
Elements links = myDocument.getElementsByClass(divclass);
|
在一个sina博文网页中通过网页分析得知博文内容的class为articalContent
;
网页内容获取与文章内容的提取MySelfHttpClient
.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
|
import
java.io.IOException;
import
org.apache.http.HttpResponse;
import
org.apache.http.HttpStatus;
import
org.apache.http.client.ClientProtocolException;
import
org.apache.http.client.HttpClient;
import
org.apache.http.client.methods.HttpGet;
import
org.apache.http.impl.client.DefaultHttpClient;
import
org.apache.http.util.EntityUtils;
import
org.jsoup.Jsoup;
import
org.jsoup.nodes.Document;
import
org.jsoup.nodes.Element;
import
org.jsoup.select.Elements;
public
class
MySelfHttpClient {
//String divclass = "showContent";//51cto博客内容
String divclass =
"articalContent"
;
//sina博客内容
public
MySelfHttpClient() {
// TODO Auto-generated constructor stub
}
/**
*
*
* @param link 链接地址
* @param charSet 网页内容的编码类型
* @return
*/
public
String getStringFromLink(String link,String charSet){
//获取网页完整内容
String str =
""
;
HttpGet request =
new
HttpGet(link);
HttpClient httpClient =
new
DefaultHttpClient();
try
{
HttpResponse response = httpClient.execute(request);
if
(response.getStatusLine().getStatusCode() == HttpStatus.SC_OK){
str = EntityUtils.toString(response.getEntity(), charSet);
}
else
{
str =
"请求错误"
;
}
}
catch
(ClientProtocolException e){
e.printStackTrace();
}
catch
(IOException e){
e.printStackTrace();
}
return
str;
}
/**
*
* @param str 截取divclass为标签的内容
* @return 解析到的文章内容
*/
public
String getContent(String str){
//截取divclass为标签的内容
String content =
""
;
Document myDocument = Jsoup.parse(str);
Elements links = myDocument.getElementsByClass(divclass);
//Log.d("str", links.toString());
for
(Element link : links) {
content =content + link.text();
}
return
content;
}
}
|
判断系统是否联网
网络诊断ConnectionDetector
.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
import
android.content.Context;
import
android.net.ConnectivityManager;
import
android.net.NetworkInfo;
public
class
ConnectionDetector {
private
Context _context;
public
ConnectionDetector(Context context){
this
._context = context;
}
/**
*
*
* @return true false 诊断是否联网
*/
public
boolean
isConnectingToInternet(){
ConnectivityManager connectivity = (ConnectivityManager) _context.getSystemService(Context.CONNECTIVITY_SERVICE);
if
(connectivity !=
null
)
{
NetworkInfo[] info = connectivity.getAllNetworkInfo();
if
(info !=
null
)
for
(
int
i =
0
; i < info.length; i++)
if
(info[i].getState() == NetworkInfo.State.CONNECTED)
{
return
true
;
}
}
return
false
;
}
}
|
主要的.java实现
关键1:判断你要解析的网页的编码 ,在sina跟51cto的网页中均没有看到关于页面编码的,不过大多网页都是utf-8或gbk
关键2:设置textview类似editview一样能长安链接然后进行复制
1
2
3
4
5
6
7
8
|
/**************************/
//使textview能像edittext一样能复制文本的链接内容
remoteText.setFocusableInTouchMode(
true
);
remoteText.setFocusable(
true
);
remoteText.setClickable(
true
);
remoteText.setLongClickable(
true
);
remoteText.setMovementMethod(ArrowKeyMovementMethod.getInstance());
/**************************/
|
主要实现RemoteText
.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
|
package
com.remote;
import
com.remotepro.R;
import
android.app.Activity;
import
android.app.ProgressDialog;
import
android.os.Bundle;
import
android.os.Handler;
import
android.os.Message;
import
android.text.method.ArrowKeyMovementMethod;
import
android.view.View;
import
android.view.Window;
import
android.view.View.OnClickListener;
import
android.widget.EditText;
import
android.widget.ImageView;
import
android.widget.TextView;
import
android.widget.Toast;
public
class
RemoteText
extends
Activity{
TextView remoteText;
EditText myEditText;
ImageView mySearchBtn;
ImageView myHomeBtn;
MySelfHttpClient mySelfHttpClient;
String link =
"http://blog.sina.com.cn/s/blog_89cc52f20101d1sh.html"
; //sina博客
String charSet =
"utf-8"
;
//sina博客
//String link = "http://7071976.blog.51cto.com/7061976/1289909";
//String charSet = "gbk";
String myText;
//这句判断链接类型,在toast提示是否符合本次解析的网址类型
String linktag =
"http://blog.sina.com.cn"
;//以sina为列子
ConnectionDetector myConnectionDetector;
//诊断时否联网
ProgressDialog myProgressDialog =
null
;
//加载进度条
@Override
protected
void
onCreate(Bundle savedInstanceState) {
// TODO Auto-generated method stub
super
.onCreate(savedInstanceState);
requestWindowFeature(Window.FEATURE_NO_TITLE);
setContentView(R.layout.remotemain);
init();
}
public
void
init(){
remoteText = (TextView)findViewById(R.id.remotetext);
myEditText = (EditText)findViewById(R.id.remote_searedit);
mySearchBtn = (ImageView)findViewById(R.id.remote_searchbtn);
myHomeBtn = (ImageView)findViewById(R.id.remote_searchhome);
mySelfHttpClient =
new
MySelfHttpClient();
myConnectionDetector =
new
ConnectionDetector(
this
);
mySearchBtn.setOnClickListener(mySearcClick);
myHomeBtn.setOnClickListener(myHomeClckListener);
initText();
}
/***************************/
public
void
initText(){
if
(myConnectionDetector.isConnectingToInternet()){
myProgressDialog = ProgressDialog.show(
this
, getString(R.string.waiting), getResources().getString(R.string.loading));
new
InitTextThead().start();
}
}
class
InitTextThead
extends
Thread{
@Override
public
void
run() {
// TODO Auto-generated method stub
super
.run();
//获取解析内容
myText = mySelfHttpClient.getContent(mySelfHttpClient.getStringFromLink(link, charSet));
myHandler.sendEmptyMessage(
1
);
}
}
Handler myHandler =
new
Handler(){
@Override
public
void
handleMessage(Message msg) {
// TODO Auto-generated method stub
super
.handleMessage(msg);
switch
(msg.what) {
case
1
:
/**************************/
//使textview能像edittext一样能复制文本的链接内容
remoteText.setFocusableInTouchMode(
true
);
remoteText.setFocusable(
true
);
remoteText.setClickable(
true
);
remoteText.setLongClickable(
true
);
remoteText.setMovementMethod(ArrowKeyMovementMethod.getInstance());
/**************************/
remoteText.setText(myText);
myProgressDialog.dismiss();
break
;
case
2
:
remoteText.setFocusableInTouchMode(
true
);
remoteText.setFocusable(
true
);
remoteText.setClickable(
true
);
remoteText.setLongClickable(
true
);
remoteText.setMovementMethod(ArrowKeyMovementMethod.getInstance());
remoteText.setText(myText);
myProgressDialog.dismiss();
break
;
case
3
:
myProgressDialog.dismiss();
Toast.makeText(RemoteText.
this
, R.string.errorlingaddr, Toast.LENGTH_LONG).show();
break
;
default
:
break
;
}
}
};
/********************************/
OnClickListener mySearcClick =
new
OnClickListener() {
@Override
public
void
onClick(View v) {
// TODO Auto-generated method stub
searchclick();
}
};
public
void
searchclick(){
if
(myConnectionDetector.isConnectingToInternet()){
myProgressDialog = ProgressDialog.show(
this
, getResources().getString(R.string.waiting), getResources().getString(R.string.loading));
new
SearchThread().start();
}
}
class
SearchThread
extends
Thread{
@Override
public
void
run() {
// TODO Auto-generated method stub
super
.run();
String link = myEditText.getText().toString();
if
(link.startsWith(linktag)){
myText = mySelfHttpClient.getContent(mySelfHttpClient.getStringFromLink(link, charSet));
myHandler.sendEmptyMessage(
2
);
}
else
{
myHandler.sendEmptyMessage(
3
);
}
}
}
/********************************/
OnClickListener myHomeClckListener =
new
OnClickListener() {
@Override
public
void
onClick(View v) {
// TODO Auto-generated method stub
initText();
}
};
}
|
解析51cto博文,
将MySelfHttpClient
.java,RemoteText
.java的注释修改
修改 MySelfHttpClient
.java
1
2
|
//String divclass = "showContent";//51cto博客内容
String divclass =
"articalContent"
;
//sina博客内容
|
RemoteText
.java的注释,51cto的网页编码为gbk
1
2
3
4
5
|
String link =
"http://blog.sina.com.cn/s/blog_89cc52f20101d1sh.html"
; //sina博客
String charSet =
"utf-8"
;
//sina博客
//String link = "http://7071976.blog.51cto.com/7061976/1289909";
//String charSet = "gbk";
|
如果要使用本软件中的edit输入框使用链接,还需修改RemoteText.java中的linktag内容,
MySelfHttpClient
.java
1
|
String linktag =
"http://blog.sina.com.cn"
;//判断editview中的链接是否合法,这里以sina为例
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
class
SearchThread
extends
Thread{
@Override
public
void
run() {
// TODO Auto-generated method stub
super
.run();
String link = myEditText.getText().toString();
if
(link.startsWith(linktag)){
myText = mySelfHttpClient.getContent(mySelfHttpClient.getStringFromLink(link, charSet));
myHandler.sendEmptyMessage(
2
);
}
else
{
myHandler.sendEmptyMessage(
3
);
}
}
}
|
解析http://7071976.blog.51cto.com/7061976/1289909 博文内容效果如下
总结:本文以获取博文内容为例,使用httpclient抓取网页内容,以jsoup为解析提取博文内容,看起来在text上显示的内容有点混乱,但这是可以改进的
技术推广:就以井冈山大学图书管理系统为例,这套图书系统是学校租用外面公司的,安一般思路要开发图书馆里系统客户端需要后台数据库接出个站点提供数据检索,但那个公司不提供这方面的服务,那么可以通过httpclient解析网页实现登录,查询,续借等功能,这样一个android客户端的就能实现了。