利用Python mechanize模块模拟浏览器实现百度搜索
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
# -*- coding:utf-8 -*-
import
mechanize
import
sys
reload
(sys)
sys.setdefaultencoding(
'utf8'
)
br
=
mechanize.Browser()
br.set_handle_equiv(
True
)
br.set_handle_redirect(
True
)
br.set_handle_referer(
True
)
br.set_handle_robots(
False
)
br.set_handle_gzip(
False
)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time
=
1
)
br.addheaders
=
[(
'User-agent'
,
'Mozilla/5.0 (X11; U; Linux i686; en-US;rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'
)]
br.
open
('
for
form
in
br.forms():
print
form
br.select_form(name
=
'f'
)
br.form[
'wd'
]
=
'python'
br.submit()
for
link
in
br.links():
print
link.url
+
':'
+
link.text
|
在开头的地方要加上sys模块并设置utf8编码,否则会报ascii编码错误,通过open打开的链接,打印返回的form框架,获取name是f的form,并将关键字字段wd设置要搜索的内容,提交即可,通过br.response().read()可以返回完整的内容,这段代码在返回内容上过滤出链接的内容
本文转自 无心低语 51CTO博客,原文链接:http://blog.51cto.com/fengzhankui/1946336,如需转载请自行联系原作者