往期文章:
Python创建多线程的方法
- 准备一个函数
def my_func(a,b): do_craw(a,b)
- 怎么创建一个线程
import threading t = threading.Thread(target=my_func,args=(100,200))
- 启动线程
t.start()
- 等待结束
t.join()
使用多线程,python爬虫被加速10倍
下面通过一个简单的案例,通过爬取博客园的内容,通过使用单线程和多线程来对比,来看看爬取的速度。
##cnblogs_spider
import requests
urls = [
"https://www.cnblogs.com/#p{}".format(page)
for page in range(1,51)
]
def craw(url):
r =requests.get(url)
print(url,len(r.text))
urls 是通过列表推导式生成50个url的连接。
import cnblogs_spider
import threading
from loguru import logger
import time
def single_thread():
logger.info("single_thread begin")
for url in cnblogs_spider.urls:
cnblogs_spider.craw(url)
logger.info("single_thread end")
def multi_thread():
logger.info("single_thread begin")
threads = []
for url in cnblogs_spider.urls:
threads.append(
threading.Thread(target=cnblogs_spider.craw,args=(url,))
)
for task in threads:
task.start()
for task in threads:
task.join()
logger.info("single_thread end")
if __name__ == "__main__":
start = time.time()
single_thread()
end = time.time()
logger.info("single thread cost: {}".format(end-start))
start = time.time()
multi_thread()
end = time.time()
logger.info("multi thread cost: {}".format(end-start))
执行完之后,看到多线程明显比单线程花费的时间更少: