开发者社区> kissjz> 正文
阿里云
为了无法计算的价值
打开APP
阿里云APP内打开

DC学院数据分析学习笔记(三):基于HTML的网页爬虫

简介: 基于HTML,用BeautifulSoup实现的简单网页爬虫
+关注继续查看

终于可以用python实践一下html的爬虫了,之前零散的也学过一些,这次希望能通过在DC学院的学习慢慢深入的了解爬虫的理论知识。
OK,来看今天的数据分析学习笔记!

希望能有所收获( ̄︶ ̄)↗ 

from bs4 import BeautifulSoup

html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """

使用BeautifulSoup解析HTML文档示例

soup = BeautifulSoup(html_doc,'html.parser') 

“html_doc”表示这个文档名称,在上面的代码中已经定义,“html_parser”是解析网页所需的解析器,所以使用BeautifulSoup解析HTML文档的一般格式为soup=BeautifulSoup(网页名称,'html.parser')

用 soup.prettify 打印网页

print(soup.prettify()) 
<html>
 <head>
  <title>
   The Dormouse's story
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    The Dormouse's story
   </b>
  </p>
  <p class="story">
   Once upon a time there were three little sisters; and their names were
   <a class="sister" href="http://example.com/elsie" id="link1">
    Elsie
   </a>
   ,
   <a class="sister" href="http://example.com/lacie" id="link2">
    Lacie
   </a>
   and
   <a class="sister" href="http://example.com/tillie" id="link3">
    Tillie
   </a>
   ; and they lived at the bottom of a well.
  </p>
  <p class="story">
   ...
  </p>
 </body>
</html>

BeautifulSoup 解析网页的一些基本操作

soup.title
<title>The Dormouse's story</title>
soup.title.name
'title'
soup.title.string
"The Dormouse's story"
soup.find_all("a")
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

爬取“NATIONAL WEATHER”的天气数据

DC学院中提供的示例时旧金山天气页面地址:
http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972#.WUnSFhN95E4

小技巧:可以使用浏览其中的开发者工具查看代码

如图:

image

1.通过url.request返回网页内容

import urllib.request as urlrequest
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.7771&lon=-122.4196'
web_page=urlrequest.urlopen(weather_url).read()
## print(web_page) 这个太多了。。。此处省略一万字

2.通过BeautifulSoup抓取网页中的天气信息

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body').get_text())



Today
SunnyHigh: 74 °F

Tonight
ClearLow: 52 °F

Thursday
SunnyHigh: 73 °F

ThursdayNight
ClearLow: 51 °F

Friday
SunnyHigh: 68 °F

FridayNight
Mostly ClearLow: 50 °F

Saturday
SunnyHigh: 64 °F

SaturdayNight
Mostly ClearLow: 50 °F

Sunday
SunnyHigh: 66 °F

// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
 

发现上面打印出来的前面部分很完美,但是后面却多了js的代码,那好,怎么去掉呢?

重新打印一下整个的div

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>

我们发现在上面的代码最后面,之前多余的js代码是在最外层的div里面的,也就是在div class="panel-body" id="seven-day-forecast-body"这个里面的,而div id="seven-day-forecast-container"之中并没有包含我们不需要的这一段js代码。那就好办了:把id="seven-day-forecast-body"改为id="seven-day-forecast-container"

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container'))
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>

这样看着就舒服多了,好了,js代码终于没有了,执行一下之前的操作看看

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').get_text())


Today
SunnyHigh: 74 °F

Tonight
ClearLow: 52 °F

Thursday
SunnyHigh: 73 °F

ThursdayNight
ClearLow: 51 °F

Friday
SunnyHigh: 68 °F

FridayNight
Mostly ClearLow: 50 °F

Saturday
SunnyHigh: 64 °F

SaturdayNight
Mostly ClearLow: 50 °F

Sunday
SunnyHigh: 66 °F

但这样我们也不太好提取,通过prettify美化一下,再看看怎么提取我们需要的信息

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').prettify())
<div id="seven-day-forecast-container">
 <ul class="list-unstyled" id="seven-day-forecast-list">
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Today
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 74 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Tonight
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/>
    </p>
    <p class="short-desc">
     Clear
    </p>
    <p class="temp temp-low">
     Low: 52 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Thursday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 73 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Thursday
     <br/>
     Night
    </p>
    <p>
     <img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/>
    </p>
    <p class="short-desc">
     Clear
    </p>
    <p class="temp temp-low">
     Low: 51 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Friday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 68 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Friday
     <br/>
     Night
    </p>
    <p>
     <img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/>
    </p>
    <p class="short-desc">
     Mostly Clear
    </p>
    <p class="temp temp-low">
     Low: 50 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Saturday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 64 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Saturday
     <br/>
     Night
    </p>
    <p>
     <img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/>
    </p>
    <p class="short-desc">
     Mostly Clear
    </p>
    <p class="temp temp-low">
     Low: 50 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Sunday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 66 °F
    </p>
   </div>
  </li>
 </ul>
</div>


从上面的HTML代码来看,我们发现我们需要的信息分别对应三个classperiod-name,short-desc,temp

soup_forecast = soup.find(id='seven-day-forecast-container')
soup_forecast.find_all(class_='period-name')
[<p class="period-name">Today<br/><br/></p>,
 <p class="period-name">Tonight<br/><br/></p>,
 <p class="period-name">Thursday<br/><br/></p>,
 <p class="period-name">Thursday<br/>Night</p>,
 <p class="period-name">Friday<br/><br/></p>,
 <p class="period-name">Friday<br/>Night</p>,
 <p class="period-name">Saturday<br/><br/></p>,
 <p class="period-name">Saturday<br/>Night</p>,
 <p class="period-name">Sunday<br/><br/></p>]

3.最后,将我们需要的信息完整的输出

soup_forecast=soup.find(id='seven-day-forecast-container')
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')
for i in range(9):
    date=date_list[i].get_text()
    desc=desc_list[i].get_text()
    temp=temp_list[i].get_text()
    print("{} {} {}".format(date,desc,temp))
Today Sunny High: 74 °F
Tonight Clear Low: 52 °F
Thursday Sunny High: 73 °F
ThursdayNight Clear Low: 51 °F
Friday Sunny High: 68 °F
FridayNight Mostly Clear Low: 50 °F
Saturday Sunny High: 64 °F
SaturdayNight Mostly Clear Low: 50 °F
Sunday Sunny High: 66 °F

完整代码:

#导入需要的包和模块,这里需要的是 urllib.request 和 Beautifulsoup
import urllib.request as urlrequest
from bs4 import BeautifulSoup

#通过urllib来获取我们需要爬取的网页
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972'
web_page=urlrequest.urlopen(weather_url).read()

#用 BeautifulSoup 来解析和获取我们想要的内容块
soup=BeautifulSoup(web_page,'html.parser')
soup_forecast=soup.find(id='seven-day-forecast-container')

#找到我们想要的那一部分内容
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')

#将获取的内容更好地展示出来,用for循环来实现
for i in range(9):
    date=date_list[i].get_text()
    desc=desc_list[i].get_text()
    temp=temp_list[i].get_text()
    print("{}{}{}".format(date,desc,temp))
TodaySunnyHigh: 74 °F
TonightClearLow: 52 °F
ThursdaySunnyHigh: 73 °F
ThursdayNightClearLow: 51 °F
FridaySunnyHigh: 68 °F
FridayNightMostly ClearLow: 50 °F
SaturdaySunnyHigh: 64 °F
SaturdayNightMostly ClearLow: 50 °F
SundaySunnyHigh: 66 °F

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
《HTML 5+CSS 3入门经典》——1.2 HTML 5 的优势
本节书摘来自华章计算机《HTML 5+CSS 3入门经典》一书中的第1章,第1.2节,作者:管媛辉 潘凯华著, 更多章节内容可以访问云栖社区“华章计算机”公众号查看。
1017 0
《HTML 5+CSS 3入门经典》——1.4 小结
本节书摘来自华章计算机《HTML 5+CSS 3入门经典》一书中的第1章,第1.4节,作者:管媛辉 潘凯华著, 更多章节内容可以访问云栖社区“华章计算机”公众号查看。
1047 0
《HTML 5+CSS 3入门经典》——2.4 上机实践
本节书摘来自华章计算机《HTML 5+CSS 3入门经典》一书中的第2章,第2.4节,作者:管媛辉 潘凯华著, 更多章节内容可以访问云栖社区“华章计算机”公众号查看。
1058 0
《HTML 5+CSS 3入门经典》——2.5 小结
本节书摘来自华章计算机《HTML 5+CSS 3入门经典》一书中的第2章,第2.5节,作者:管媛辉 潘凯华著, 更多章节内容可以访问云栖社区“华章计算机”公众号查看。
905 0
《HTML 5+CSS 3入门经典》——第3章 你很重要——HTML 5 中的表单
本节书摘来自华章计算机《HTML 5+CSS 3入门经典》一书中的第3章,作者:管媛辉 潘凯华著, 更多章节内容可以访问云栖社区“华章计算机”公众号查看。
913 0
《HTML 5+CSS 3入门经典》——1.1 HTML的历史
本节书摘来自华章计算机《HTML 5+CSS 3入门经典》一书中的第1章,第1.1节,作者:管媛辉 潘凯华著, 更多章节内容可以访问云栖社区“华章计算机”公众号查看。
1122 0
+关注
kissjz
Keep It Simple , Stupid. 独立博客:白水东城(www.baishuidongcheng.com)
246
文章
44
问答
文章排行榜
最热
最新
相关电子书
更多
低代码开发师(初级)实战教程
立即下载
阿里巴巴DevOps 最佳实践手册
立即下载
冬季实战营第三期:MySQL数据库进阶实战
立即下载