线程 Thread / 进程 Process
进程 |
|
线程 |
|
线程/进程 池 |
|
池化思想 |
|
Python 中 线程 Thread / 进程 Process 的使用
最常用的两种方式
ThreadPoolExecutor ProcessPoolExecutor |
使用“池”完成一组同类型任务 executor.map 会保持顺序 |
multiprocessing.Process |
新建进程启动任意任务 |
例子:
ThreadPoolExecutor
from concurrent.futures import ThreadPoolExecutor
def main():
with ThreadPoolExecutor() as executor:
executor.map(download, links, timeout=30)
ProcessPoolExecutor
from concurrent.futures import ProcessPoolExecutor
def main():
with ProcessPoolExecutor as executor:
executor.map(is_prime, PRIMES)
具体 API 参考:concurrent.futures — Launching parallel tasks — Python 3.9.4 documentation
Process
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
具体 API 参考:multiprocessing — Process-based parallelism — Python 3.9.4 documentation
Multithreading vs Multiprocessing
IO bound |
multithreading (less overhead), multiprocessing |
CPU bound |
multiprocessing |
multiple machines |
对于 IO 密集型的任务,使用线程和进程均可,使用线程可以减少资源消耗
对于 CPU 密集型的任务,使用进程
优化
使用全局变量 或 multiprocessing.Queue |
Don't pickle the input, This will save a lot of communication overhead, especially if the output is small compared to the input |
使用 chunk |
ProcessPoolExecutor.map(chunksize =x) 对于大量小计算量任务,使用 chunk 充分利用 Process 资源,减少 overhead |
常见问题
死锁 |
the function needs to be defined at the top-level, nested functions won't be importable by the child and already trying to pickle them raises an exception |
Reference
线程概念,Java ThreadPoolExecutor |
|
|
|
https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing |
官网文档 |
官网文档 |
|
|