对爬取到的图片进行分类命名,下面开始了。
一、首先给出URL地址www.wmpic.me/touxiang/nvsheng
二、下载图片,进行分析,并保存图片至本地,直接上代码
import requests
from bs4 import BeautifulSoup
import random
user_agent = [
'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
]
Yes_or_Not = ['y','n']
def download(url,folder,count=0): #下载图片
headers = {'User-Agent':random.choice(user_agent)}
content = requests.get(url,headers=headers).content
typ = random.choice(Yes_or_Not) #机器随机选择是Y还是N
path = folder + "\\" + typ + '_'+str(count)+'.jpg' #图片格式
with open(path,'wb') as f:
f.write(content) #写入并保存图片至本地文件
base_url = 'http://www.wmpic.me/touxiang/nvsheng/page/'
count = 1
for i in range(1,10):
url = base_url + str(i) #url地址
headers = {'User_Agent':random.choice(user_agent)}
html = requests.get(url,headers=headers).text
soup = BeautifulSoup(html,'lxml') #通过BeautifulSoup的lxml方法解析html
for item in soup.select('li img'): #解析定位到li img
picture_url = item['src']
if picture_url.find('215x185') != -1: #215x185图片的宽度和高度
if count <= 300:
download(picture_url,'train_pictures',count) #保存图片至train_pictures文件夹下
print(picture_url)
count += 1
else:
download(picture_url,'test_pictures',count) #保存图片至test_pictures文件夹下
print(picture_url)
count += 1
三、运行结果
详细请参考(七)美女分类器