Python multiprocessing WHY and HOW

简介:

I am working on a Python script which will migrate data from one database to another. In a simple way, I need selectfrom a database and then insert into another.
In the first version, I designed to use multithreading, just because I am more familiar with it than multiprocessing. But after fewer month, I found several problems in my workaround.

  • can only use one of 24 cpus in case of GIL
  • can not handle singal for each thread. I want to use a simple timeout decarator, to set a signal.SIGALRM for a specified function. But for multithreading, the signal will get caught by a random thread.

So I start to refactor to multiprocessing.

multiprocessing

multiprocessing is a package that supports spawning processes using an API similar to the threading module.

But it's not so elegant and sweet as it described.

multiprocessing.pool

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(func=f, [1, 2, 3]))

It looks like a good solution, but we can not set a bound function as a target func. Because bound function can not be serialized in pickle. And multithreading.pool use pickle to serialize object and send to new processes.pathos.multiprocessing is a good instead. It uses dill as an instead of pickle to give better serialization.

share memory

Memory in multithreading is shared naturally. In a multiprocessing environment, there are some wrappers to wrap a sharing object.

  • multiprocessing.Value and multiprocessing.Array is the most simple way to share Objects between two processes. But it can only contain ctype Objects.
  • multiprocessing.Queue is very useful and use an API similar to Queue.Queue

Python and GCC version

I didn't know that even the GCC version will affect behavior of my code. On my centos5 os, same Python version with different GCC version will have different behaviors.

Python 2.7.2 (default, Jan 10 2012, 11:17:45)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.queues import JoinableQueue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/oracle/dbapython/lib/python2.7/multiprocessing/queues.py", line 48, in <module>
    from multiprocessing.synchronize import Lock, BoundedSemaphore, Semaphore, Condition
  File "/home/oracle/dbapython/lib/python2.7/multiprocessing/synchronize.py", line 59, in <module>
    " function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.


Python 2.7.2 (default, Oct 15 2013, 13:15:26)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.queues import JoinableQueue
>>>

目录
相关文章
|
Linux Python
How to Install Python on Linux
Summary Hostmonster uses the preinstalled version of Python that ships with CentOS. Because of this it is often not the latest release. This article will explain how to install an updated v
1708 0
|
6月前
|
Unix Linux API
Python multiprocessing模块
Python multiprocessing模块
|
6月前
|
数据采集 并行计算 安全
Python并发编程:多进程(multiprocessing模块)
在处理CPU密集型任务时,Python的全局解释器锁(GIL)可能会成为瓶颈。为了充分利用多核CPU的性能,可以使用Python的multiprocessing模块来实现多进程编程。与多线程不同,多进程可以绕过GIL,使得每个进程在自己的独立内存空间中运行,从而实现真正的并行计算。
|
7月前
|
Python
在Python中,`multiprocessing`模块提供了一种在多个进程之间共享数据和同步的机制。
在Python中,`multiprocessing`模块提供了一种在多个进程之间共享数据和同步的机制。
|
9月前
|
数据采集 Java Python
python并发编程:使用多进程multiprocessing模块加速程序的运行
python并发编程:使用多进程multiprocessing模块加速程序的运行
178 1
|
9月前
|
并行计算 程序员 API
Python多进程编程:利用multiprocessing模块实现并行计算
Python多进程编程:利用multiprocessing模块实现并行计算
916 0
|
9月前
|
安全 Python Windows
Python 的并发编程:在 Python 中如何使用 `threading` 和 `multiprocessing` 模块?
Python 的并发编程:在 Python 中如何使用 `threading` 和 `multiprocessing` 模块?
89 1
|
安全 前端开发 API
第52天:Python multiprocessing 模块
第52天:Python multiprocessing 模块
162 0

热门文章

最新文章

推荐镜像

更多