本期,练习一下python爬虫,长时间不写爬虫,写个简单的爬虫来练习一下。爬取一下某网图书畅销榜的排名并保存成Excel文件。
这个网站长这个样子:
具体代码为:
import requestsfrom bs4 import BeautifulSoupimport pandas as pdimport timeall_data=pd.DataFrame()def crawl(i): url =f'http://bang.dangdang.com/books/bestsellers/1-{i}' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') book_items = soup.select('.bang_list li') temp=pd.DataFrame() for item in book_items: book_name = item.select('.name a')[0].text.strip() pic=item.select('.pic > a > img')[0].get('src') star=item.select('.star a')[0].text.strip() author = item.select('.publisher_info')[0].text.strip().split('/')[0] press = item.select('.publisher_info')[1].text.strip().split('\xa0')[-1] price_r = item.select('.price_r')[0].text.strip() price_n = item.select('.price_n')[0].text.strip() data=[{'书名':book_name,'图片':pic,'评论数':star,'作者':author, '出版社':press,'原价':price_r, '现价':price_n }] temp=temp.append(data) return tempfor i in range(1,26): all_data=all_data.append(crawl(i)) time.sleep(0.5)all_data.to_excel('当当网图书畅销榜排名.xlsx',index=False)
跑一遍,得到的Excel为:
爬虫时不时要拿出来练习一下,不然很容易忘掉