DC学院数据分析学习笔记(三):基于HTML的网页爬虫

本文涉及的产品
全局流量管理 GTM,标准版 1个月
云解析 DNS,旗舰版 1个月
公共DNS(含HTTPDNS解析),每月1000万次HTTP解析
简介: 基于HTML,用BeautifulSoup实现的简单网页爬虫

终于可以用python实践一下html的爬虫了,之前零散的也学过一些,这次希望能通过在DC学院的学习慢慢深入的了解爬虫的理论知识。
OK,来看今天的数据分析学习笔记!

希望能有所收获( ̄︶ ̄)↗ 

from bs4 import BeautifulSoup

html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """

使用BeautifulSoup解析HTML文档示例

soup = BeautifulSoup(html_doc,'html.parser') 

“html_doc”表示这个文档名称,在上面的代码中已经定义,“html_parser”是解析网页所需的解析器,所以使用BeautifulSoup解析HTML文档的一般格式为soup=BeautifulSoup(网页名称,'html.parser')

用 soup.prettify 打印网页

print(soup.prettify()) 
<html>
 <head>
  <title>
   The Dormouse's story
  </title>
 </head>
 <body>
  <p class="title">
   <b>
    The Dormouse's story
   </b>
  </p>
  <p class="story">
   Once upon a time there were three little sisters; and their names were
   <a class="sister" href="http://example.com/elsie" id="link1">
    Elsie
   </a>
   ,
   <a class="sister" href="http://example.com/lacie" id="link2">
    Lacie
   </a>
   and
   <a class="sister" href="http://example.com/tillie" id="link3">
    Tillie
   </a>
   ; and they lived at the bottom of a well.
  </p>
  <p class="story">
   ...
  </p>
 </body>
</html>

BeautifulSoup 解析网页的一些基本操作

soup.title
<title>The Dormouse's story</title>
soup.title.name
'title'
soup.title.string
"The Dormouse's story"
soup.find_all("a")
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
 <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

爬取“NATIONAL WEATHER”的天气数据

DC学院中提供的示例时旧金山天气页面地址:
http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972#.WUnSFhN95E4

小技巧:可以使用浏览其中的开发者工具查看代码

如图:

image

1.通过url.request返回网页内容

import urllib.request as urlrequest
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.7771&lon=-122.4196'
web_page=urlrequest.urlopen(weather_url).read()
## print(web_page) 这个太多了。。。此处省略一万字

2.通过BeautifulSoup抓取网页中的天气信息

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body').get_text())



Today
SunnyHigh: 74 °F

Tonight
ClearLow: 52 °F

Thursday
SunnyHigh: 73 °F

ThursdayNight
ClearLow: 51 °F

Friday
SunnyHigh: 68 °F

FridayNight
Mostly ClearLow: 50 °F

Saturday
SunnyHigh: 64 °F

SaturdayNight
Mostly ClearLow: 50 °F

Sunday
SunnyHigh: 66 °F

// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
 

发现上面打印出来的前面部分很完美,但是后面却多了js的代码,那好,怎么去掉呢?

重新打印一下整个的div

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>

我们发现在上面的代码最后面,之前多余的js代码是在最外层的div里面的,也就是在div class="panel-body" id="seven-day-forecast-body"这个里面的,而div id="seven-day-forecast-container"之中并没有包含我们不需要的这一段js代码。那就好办了:把id="seven-day-forecast-body"改为id="seven-day-forecast-container"

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
    var maxh = 0;
    $(".forecast-tombstone .short-desc").each(function () {
        var h = $(this).height();
        if (h > maxh) { maxh = h; }
    });
    $(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container'))
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>

这样看着就舒服多了,好了,js代码终于没有了,执行一下之前的操作看看

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').get_text())


Today
SunnyHigh: 74 °F

Tonight
ClearLow: 52 °F

Thursday
SunnyHigh: 73 °F

ThursdayNight
ClearLow: 51 °F

Friday
SunnyHigh: 68 °F

FridayNight
Mostly ClearLow: 50 °F

Saturday
SunnyHigh: 64 °F

SaturdayNight
Mostly ClearLow: 50 °F

Sunday
SunnyHigh: 66 °F

但这样我们也不太好提取,通过prettify美化一下,再看看怎么提取我们需要的信息

from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').prettify())
<div id="seven-day-forecast-container">
 <ul class="list-unstyled" id="seven-day-forecast-list">
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Today
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 74 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Tonight
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/>
    </p>
    <p class="short-desc">
     Clear
    </p>
    <p class="temp temp-low">
     Low: 52 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Thursday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 73 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Thursday
     <br/>
     Night
    </p>
    <p>
     <img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/>
    </p>
    <p class="short-desc">
     Clear
    </p>
    <p class="temp temp-low">
     Low: 51 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Friday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 68 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Friday
     <br/>
     Night
    </p>
    <p>
     <img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/>
    </p>
    <p class="short-desc">
     Mostly Clear
    </p>
    <p class="temp temp-low">
     Low: 50 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Saturday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 64 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Saturday
     <br/>
     Night
    </p>
    <p>
     <img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/>
    </p>
    <p class="short-desc">
     Mostly Clear
    </p>
    <p class="temp temp-low">
     Low: 50 °F
    </p>
   </div>
  </li>
  <li class="forecast-tombstone">
   <div class="tombstone-container">
    <p class="period-name">
     Sunday
     <br/>
     <br/>
    </p>
    <p>
     <img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/>
    </p>
    <p class="short-desc">
     Sunny
    </p>
    <p class="temp temp-high">
     High: 66 °F
    </p>
   </div>
  </li>
 </ul>
</div>


从上面的HTML代码来看,我们发现我们需要的信息分别对应三个classperiod-name,short-desc,temp

soup_forecast = soup.find(id='seven-day-forecast-container')
soup_forecast.find_all(class_='period-name')
[<p class="period-name">Today<br/><br/></p>,
 <p class="period-name">Tonight<br/><br/></p>,
 <p class="period-name">Thursday<br/><br/></p>,
 <p class="period-name">Thursday<br/>Night</p>,
 <p class="period-name">Friday<br/><br/></p>,
 <p class="period-name">Friday<br/>Night</p>,
 <p class="period-name">Saturday<br/><br/></p>,
 <p class="period-name">Saturday<br/>Night</p>,
 <p class="period-name">Sunday<br/><br/></p>]

3.最后,将我们需要的信息完整的输出

soup_forecast=soup.find(id='seven-day-forecast-container')
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')
for i in range(9):
    date=date_list[i].get_text()
    desc=desc_list[i].get_text()
    temp=temp_list[i].get_text()
    print("{} {} {}".format(date,desc,temp))
Today Sunny High: 74 °F
Tonight Clear Low: 52 °F
Thursday Sunny High: 73 °F
ThursdayNight Clear Low: 51 °F
Friday Sunny High: 68 °F
FridayNight Mostly Clear Low: 50 °F
Saturday Sunny High: 64 °F
SaturdayNight Mostly Clear Low: 50 °F
Sunday Sunny High: 66 °F

完整代码:

#导入需要的包和模块,这里需要的是 urllib.request 和 Beautifulsoup
import urllib.request as urlrequest
from bs4 import BeautifulSoup

#通过urllib来获取我们需要爬取的网页
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972'
web_page=urlrequest.urlopen(weather_url).read()

#用 BeautifulSoup 来解析和获取我们想要的内容块
soup=BeautifulSoup(web_page,'html.parser')
soup_forecast=soup.find(id='seven-day-forecast-container')

#找到我们想要的那一部分内容
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')

#将获取的内容更好地展示出来,用for循环来实现
for i in range(9):
    date=date_list[i].get_text()
    desc=desc_list[i].get_text()
    temp=temp_list[i].get_text()
    print("{}{}{}".format(date,desc,temp))
TodaySunnyHigh: 74 °F
TonightClearLow: 52 °F
ThursdaySunnyHigh: 73 °F
ThursdayNightClearLow: 51 °F
FridaySunnyHigh: 68 °F
FridayNightMostly ClearLow: 50 °F
SaturdaySunnyHigh: 64 °F
SaturdayNightMostly ClearLow: 50 °F
SundaySunnyHigh: 66 °F

目录
相关文章
|
6月前
|
数据采集 存储 数据挖掘
Python 爬虫实战之爬拼多多商品并做数据分析
Python爬虫可以用来抓取拼多多商品数据,并对这些数据进行数据分析。以下是一个简单的示例,演示如何使用Python爬取拼多多商品数据并进行数据分析。
|
6月前
|
数据采集 数据挖掘 API
主流电商平台数据采集API接口|【Python爬虫+数据分析】采集电商平台数据信息采集
随着电商平台的兴起,越来越多的人开始在网上购物。而对于电商平台来说,商品信息、价格、评论等数据是非常重要的。因此,抓取电商平台的商品信息、价格、评论等数据成为了一项非常有价值的工作。本文将介绍如何使用Python编写爬虫程序,抓取电商平台的商品信息、价格、评论等数据。 当然,如果是电商企业,跨境电商企业,ERP系统搭建,我们经常需要采集的平台多,数据量大,要求数据稳定供应,有并发需求,那就需要通过接入电商API数据采集接口,封装好的数据采集接口更方便稳定高效数据采集。
|
3月前
|
机器学习/深度学习 数据采集 数据可视化
基于爬虫和机器学习的招聘数据分析与可视化系统,python django框架,前端bootstrap,机器学习有八种带有可视化大屏和后台
本文介绍了一个基于Python Django框架和Bootstrap前端技术,集成了机器学习算法和数据可视化的招聘数据分析与可视化系统,该系统通过爬虫技术获取职位信息,并使用多种机器学习模型进行薪资预测、职位匹配和趋势分析,提供了一个直观的可视化大屏和后台管理系统,以优化招聘策略并提升决策质量。
178 4
|
2月前
|
Web App开发 前端开发 JavaScript
HTML/CSS/JS学习笔记 Day3(HTML--网页标签 下)
HTML/CSS/JS学习笔记 Day3(HTML--网页标签 下)
|
3月前
|
数据采集 存储 JSON
基于网络爬虫的天气数据分析
本文介绍了一个基于Python网络爬虫的天气数据分析项目,详细阐述了爬虫的设计原理、程序架构、整体执行流程及相关技术,包括数据爬取、解析、存储以及反爬虫策略,并展示了爬虫程序框架和流程图。
基于网络爬虫的天气数据分析
|
3月前
|
数据采集 数据可视化 数据挖掘
【优秀python案例】基于python爬虫的深圳房价数据分析与可视化实现
本文通过Python爬虫技术从链家网站爬取深圳二手房房价数据,并进行数据清洗、分析和可视化,提供了房价走势、区域房价比较及房屋特征等信息,旨在帮助购房者更清晰地了解市场并做出明智决策。
133 2
|
3月前
|
数据采集 算法 数据可视化
【优秀python算法设计】基于Python网络爬虫的今日头条新闻数据分析与热度预测模型构建的设计与实现
本文设计并实现了一个基于Python网络爬虫和机器学习模型的今日头条新闻数据分析与热度预测系统,通过数据采集、特征工程、模型构建和可视化展示,挖掘用户行为信息和内容特征,预测新闻热度,为内容推荐和舆情监控提供决策支持。
133 0
【优秀python算法设计】基于Python网络爬虫的今日头条新闻数据分析与热度预测模型构建的设计与实现
|
数据采集 存储 数据挖掘
Python 爬虫实战之爬拼多多商品并做数据分析
在上面的代码中,我们使用pandas库创建DataFrame存储商品数据,并计算平均价格和平均销量。最后,我们将计算结果打印出来。此外,我们还可以使用pandas库提供的其他函数和方法来进行更复杂的数据分析和处理。 需要注意的是,爬取拼多多商品数据需要遵守拼多多的使用协议和规定,避免过度请求和滥用数据。
|
6月前
|
移动开发 JavaScript 前端开发
webgl学习笔记3_javascript的HTML DOM
webgl学习笔记3_javascript的HTML DOM
64 0
webgl学习笔记3_javascript的HTML DOM
|
6月前
|
数据采集 存储 数据挖掘
Python 爬虫实战之爬拼多多商品并做数据分析
在上面的代码中,我们使用pandas库创建DataFrame存储商品数据,并计算平均价格和平均销量。最后,我们将计算结果打印出来。此外,我们还可以使用pandas库提供的其他函数和方法来进行更复杂的数据分析和处理。 需要注意的是,爬取拼多多商品数据需要遵守拼多多的使用协议和规定,避免过度请求和滥用数据。