终于可以用python实践一下html的爬虫了,之前零散的也学过一些,这次希望能通过在DC学院的学习慢慢深入的了解爬虫的理论知识。
OK,来看今天的数据分析学习笔记!
希望能有所收获( ̄︶ ̄)↗
from bs4 import BeautifulSoup
html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """
使用BeautifulSoup解析HTML文档示例
soup = BeautifulSoup(html_doc,'html.parser')
“html_doc”表示这个文档名称,在上面的代码中已经定义,“html_parser”是解析网页所需的解析器,所以使用BeautifulSoup解析HTML文档的一般格式为soup=BeautifulSoup(网页名称,'html.parser')
用 soup.prettify 打印网页
print(soup.prettify())
<html>
<head>
<title>
The Dormouse's story
</title>
</head>
<body>
<p class="title">
<b>
The Dormouse's story
</b>
</p>
<p class="story">
Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">
Elsie
</a>
,
<a class="sister" href="http://example.com/lacie" id="link2">
Lacie
</a>
and
<a class="sister" href="http://example.com/tillie" id="link3">
Tillie
</a>
; and they lived at the bottom of a well.
</p>
<p class="story">
...
</p>
</body>
</html>
BeautifulSoup 解析网页的一些基本操作
soup.title
<title>The Dormouse's story</title>
soup.title.name
'title'
soup.title.string
"The Dormouse's story"
soup.find_all("a")
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
爬取“NATIONAL WEATHER”的天气数据
DC学院中提供的示例时旧金山天气页面地址:
http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972#.WUnSFhN95E4
小技巧:可以使用浏览其中的开发者工具查看代码
如图:
1.通过url.request返回网页内容
import urllib.request as urlrequest
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.7771&lon=-122.4196'
web_page=urlrequest.urlopen(weather_url).read()
## print(web_page) 这个太多了。。。此处省略一万字
2.通过BeautifulSoup抓取网页中的天气信息
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body').get_text())
Today
SunnyHigh: 74 °F
Tonight
ClearLow: 52 °F
Thursday
SunnyHigh: 73 °F
ThursdayNight
ClearLow: 51 °F
Friday
SunnyHigh: 68 °F
FridayNight
Mostly ClearLow: 50 °F
Saturday
SunnyHigh: 64 °F
SaturdayNight
Mostly ClearLow: 50 °F
Sunday
SunnyHigh: 66 °F
// equalize forecast heights
$(function () {
var maxh = 0;
$(".forecast-tombstone .short-desc").each(function () {
var h = $(this).height();
if (h > maxh) { maxh = h; }
});
$(".forecast-tombstone .short-desc").height(maxh);
});
发现上面打印出来的前面部分很完美,但是后面却多了js的代码,那好,怎么去掉呢?
重新打印一下整个的div
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
var maxh = 0;
$(".forecast-tombstone .short-desc").each(function () {
var h = $(this).height();
if (h > maxh) { maxh = h; }
});
$(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>
我们发现在上面的代码最后面,之前多余的js代码是在最外层的div里面的,也就是在div class="panel-body" id="seven-day-forecast-body"这个里面的,而div id="seven-day-forecast-container"之中并没有包含我们不需要的这一段js代码。那就好办了:把id="seven-day-forecast-body"改为id="seven-day-forecast-container"
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
var maxh = 0;
$(".forecast-tombstone .short-desc").each(function () {
var h = $(this).height();
if (h > maxh) { maxh = h; }
});
$(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container'))
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
这样看着就舒服多了,好了,js代码终于没有了,执行一下之前的操作看看
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').get_text())
Today
SunnyHigh: 74 °F
Tonight
ClearLow: 52 °F
Thursday
SunnyHigh: 73 °F
ThursdayNight
ClearLow: 51 °F
Friday
SunnyHigh: 68 °F
FridayNight
Mostly ClearLow: 50 °F
Saturday
SunnyHigh: 64 °F
SaturdayNight
Mostly ClearLow: 50 °F
Sunday
SunnyHigh: 66 °F
但这样我们也不太好提取,通过prettify美化一下,再看看怎么提取我们需要的信息
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').prettify())
<div id="seven-day-forecast-container">
<ul class="list-unstyled" id="seven-day-forecast-list">
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Today
<br/>
<br/>
</p>
<p>
<img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 74 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Tonight
<br/>
<br/>
</p>
<p>
<img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/>
</p>
<p class="short-desc">
Clear
</p>
<p class="temp temp-low">
Low: 52 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Thursday
<br/>
<br/>
</p>
<p>
<img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 73 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Thursday
<br/>
Night
</p>
<p>
<img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/>
</p>
<p class="short-desc">
Clear
</p>
<p class="temp temp-low">
Low: 51 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Friday
<br/>
<br/>
</p>
<p>
<img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 68 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Friday
<br/>
Night
</p>
<p>
<img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/>
</p>
<p class="short-desc">
Mostly Clear
</p>
<p class="temp temp-low">
Low: 50 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Saturday
<br/>
<br/>
</p>
<p>
<img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 64 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Saturday
<br/>
Night
</p>
<p>
<img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/>
</p>
<p class="short-desc">
Mostly Clear
</p>
<p class="temp temp-low">
Low: 50 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Sunday
<br/>
<br/>
</p>
<p>
<img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 66 °F
</p>
</div>
</li>
</ul>
</div>
从上面的HTML代码来看,我们发现我们需要的信息分别对应三个class:period-name,short-desc,temp
soup_forecast = soup.find(id='seven-day-forecast-container')
soup_forecast.find_all(class_='period-name')
[<p class="period-name">Today<br/><br/></p>,
<p class="period-name">Tonight<br/><br/></p>,
<p class="period-name">Thursday<br/><br/></p>,
<p class="period-name">Thursday<br/>Night</p>,
<p class="period-name">Friday<br/><br/></p>,
<p class="period-name">Friday<br/>Night</p>,
<p class="period-name">Saturday<br/><br/></p>,
<p class="period-name">Saturday<br/>Night</p>,
<p class="period-name">Sunday<br/><br/></p>]
3.最后,将我们需要的信息完整的输出
soup_forecast=soup.find(id='seven-day-forecast-container')
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')
for i in range(9):
date=date_list[i].get_text()
desc=desc_list[i].get_text()
temp=temp_list[i].get_text()
print("{} {} {}".format(date,desc,temp))
Today Sunny High: 74 °F
Tonight Clear Low: 52 °F
Thursday Sunny High: 73 °F
ThursdayNight Clear Low: 51 °F
Friday Sunny High: 68 °F
FridayNight Mostly Clear Low: 50 °F
Saturday Sunny High: 64 °F
SaturdayNight Mostly Clear Low: 50 °F
Sunday Sunny High: 66 °F
完整代码:
#导入需要的包和模块,这里需要的是 urllib.request 和 Beautifulsoup
import urllib.request as urlrequest
from bs4 import BeautifulSoup
#通过urllib来获取我们需要爬取的网页
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972'
web_page=urlrequest.urlopen(weather_url).read()
#用 BeautifulSoup 来解析和获取我们想要的内容块
soup=BeautifulSoup(web_page,'html.parser')
soup_forecast=soup.find(id='seven-day-forecast-container')
#找到我们想要的那一部分内容
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')
#将获取的内容更好地展示出来,用for循环来实现
for i in range(9):
date=date_list[i].get_text()
desc=desc_list[i].get_text()
temp=temp_list[i].get_text()
print("{}{}{}".format(date,desc,temp))
TodaySunnyHigh: 74 °F
TonightClearLow: 52 °F
ThursdaySunnyHigh: 73 °F
ThursdayNightClearLow: 51 °F
FridaySunnyHigh: 68 °F
FridayNightMostly ClearLow: 50 °F
SaturdaySunnyHigh: 64 °F
SaturdayNightMostly ClearLow: 50 °F
SundaySunnyHigh: 66 °F