第十五章 Python多进程与多线程-阿里云开发者社区

15.1 multiprocessing

multiprocessing是多进程模块，多进程提供了任务并发性，能充分利用多核处理器。避免了GIL（全局解释锁）对资源的影响。

有以下常用类：

类	描述
Process(group=None, target=None, name=None, args=(), kwargs={})	派生一个进程对象，然后调用start()方法启动
Pool(processes=None, initializer=None, initargs=())	返回一个进程池对象，processes进程池进程数量
Pipe(duplex=True)	返回两个连接对象由管道连接
Queue(maxsize=0)	返回队列对象，操作方法跟Queue.Queue一样
multiprocessing.dummy	这个库是用于实现多线程

Process()类有以下些方法：

run()
start()	启动进程对象
join([timeout])	等待子进程终止，才返回结果。可选超时。
name	进程名字
is_alive()	返回进程是否存活
daemon	进程的守护标记，一个布尔值
pid	返回进程ID
exitcode	子进程退出状态码
terminate()	终止进程。在unix上使用SIGTERM信号，在windows上使用TerminateProcess()。

Pool()类有以下些方法：

apply(func, args=(), kwds={})	等效内建函数apply()
apply_async(func, args=(), kwds={}, callback=None)	异步，等效内建函数apply()
map(func, iterable, chunksize=None)	等效内建函数map()
map_async(func, iterable, chunksize=None, callback=None)	异步，等效内建函数map()
imap(func, iterable, chunksize=1)	等效内建函数itertools.imap()
imap_unordered(func, iterable, chunksize=1)	像imap()方法，但结果顺序是任意的
close()	关闭进程池
terminate()	终止工作进程，垃圾收集连接池对象
join()	等待工作进程退出。必须先调用close()或terminate()

Pool.apply_async()和Pool.map_aysnc()又提供了以下几个方法：

get([timeout])	获取结果对象里的结果。如果超时没有，则抛出TimeoutError异常
wait([timeout])	等待可用的结果或超时
ready()	返回调用是否已经完成
successful()

举例：

1）简单的例子，用子进程处理函数

 
        from 
        multiprocessing 
        import 
        Process 
       
        import 
        os 
       
        def 
        worker(name): 
       
        print 
        name 
       
        print 
        'parent process id:'
        , os.getppid() 
       
        print 
        'process id:'
        , os.getpid() 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        p 
        = 
        Process(target
        =
        worker, args
        =
        (
        'function worker.'
        ,)) 
       
        p.start() 
       
        p.join() 
       
        print 
        p.name 
       
        # python test.py
       
        function worker.
       
        parent process 
        id
        : 
        9079 
       
        process 
        id
        : 
        9080 
       
        Process
        -
        1

Process实例传入worker函数作为派生进程执行的任务，用start()方法启动这个实例。

2）加以说明join()方法

 
        from 
        multiprocessing 
        import 
        Process 
       
        import 
        os 
       
        def 
        worker(n): 
       
        print 
        'hello world'
        , n 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        print 
        'parent process id:'
        , os.getppid() 
       
        for 
        n 
        in 
        range
        (
        5
        ): 
       
        p 
        = 
        Process(target
        =
        worker, args
        =
        (n,)) 
       
        p.start() 
       
        p.join() 
       
        print 
        'child process id:'
        , p.pid 
       
        print 
        'child process name:'
        , p.name 
       
        # python test.py
       
        parent process 
        id
        : 
        9041 
       
        hello world 
        0 
       
        child process 
        id
        : 
        9132 
       
        child process name: Process
        -
        1 
       
        hello world 
        1 
       
        child process 
        id
        : 
        9133 
       
        child process name: Process
        -
        2 
       
        hello world 
        2 
       
        child process 
        id
        : 
        9134 
       
        child process name: Process
        -
        3 
       
        hello world 
        3 
       
        child process 
        id
        : 
        9135 
       
        child process name: Process
        -
        4 
       
        hello world 
        4 
       
        child process 
        id
        : 
        9136 
       
        child process name: Process
        -
        5 
       
        # 把p.join()注释掉再执行
       
        # python test.py
       
        parent process 
        id
        : 
        9041 
       
        child process 
        id
        : 
        9125 
       
        child process name: Process
        -
        1 
       
        child process 
        id
        : 
        9126 
       
        child process name: Process
        -
        2 
       
        child process 
        id
        : 
        9127 
       
        child process name: Process
        -
        3 
       
        child process 
        id
        : 
        9128 
       
        child process name: Process
        -
        4 
       
        hello world 
        0 
       
        hello world 
        1 
       
        hello world 
        3 
       
        hello world 
        2 
       
        child process 
        id
        : 
        9129 
       
        child process name: Process
        -
        5 
       
        hello world 
        4

可以看出，在使用join()方法时，输出的结果都是顺序排列的。相反是乱序的。因此join()方法是堵塞父进程，要等待当前子进程执行完后才会继续执行下一个子进程。否则会一直生成子进程去执行任务。

在要求输出的情况下使用join()可保证每个结果是完整的。

3）给子进程命名，方便管理

 
        from 
        multiprocessing 
        import 
        Process 
       
        import 
        os, time 
       
        def 
        worker1(n): 
       
        print 
        'hello world'
        , n 
       
        def 
        worker2(): 
       
        print 
        'worker2...' 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        print 
        'parent process id:'
        , os.getppid() 
       
        for 
        n 
        in 
        range
        (
        3
        ): 
       
        p1 
        = 
        Process(name
        =
        'worker1'
        , target
        =
        worker1, args
        =
        (n,)) 
       
        p1.start() 
       
        p1.join() 
       
        print 
        'child process id:'
        , p1.pid 
       
        print 
        'child process name:'
        , p1.name 
       
        p2 
        = 
        Process(name
        =
        'worker2'
        , target
        =
        worker2) 
       
        p2.start() 
       
        p2.join() 
       
        print 
        'child process id:'
        , p2.pid 
       
        print 
        'child process name:'
        , p2.name 
       
        # python test.py
       
        parent process 
        id
        : 
        9041 
       
        hello world 
        0 
       
        child process 
        id
        : 
        9248 
       
        child process name: worker1
       
        hello world 
        1 
       
        child process 
        id
        : 
        9249 
       
        child process name: worker1
       
        hello world 
        2 
       
        child process 
        id
        : 
        9250 
       
        child process name: worker1
       
        worker2...
       
        child process 
        id
        : 
        9251 
       
        child process name: worker2

4）设置守护进程，父进程退出也不影响子进程运行

 
        from 
        multiprocessing 
        import 
        Process 
       
        def 
        worker1(n): 
       
        print 
        'hello world'
        , n 
       
        def 
        worker2(): 
       
        print 
        'worker2...' 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        for 
        n 
        in 
        range
        (
        3
        ): 
       
        p1 
        = 
        Process(name
        =
        'worker1'
        , target
        =
        worker1, args
        =
        (n,)) 
       
        p1.daemon 
        = 
        True 
       
        p1.start() 
       
        p1.join() 
       
        p2 
        = 
        Process(target
        =
        worker2) 
       
        p2.daemon 
        = 
        False 
       
        p2.start() 
       
        p2.join()

5）使用进程池

 
        #!/usr/bin/python
       
        # -*- coding: utf-8 -*-
       
        from 
        multiprocessing 
        import 
        Pool, current_process 
       
        import 
        os, time, sys 
       
        def 
        worker(n): 
       
        print 
        'hello world'
        , n 
       
        print 
        'process name:'
        , current_process().name  
        # 获取当前进程名字 
       
        time.sleep(
        1
        )    
        # 休眠用于执行时有时间查看当前执行的进程 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        p 
        = 
        Pool(processes
        =
        3
        ) 
       
        for 
        i 
        in 
        range
        (
        8
        ): 
       
        r 
        = 
        p.apply_async(worker, args
        =
        (i,)) 
       
        r.get(timeout
        =
        5
        )  
        # 获取结果中的数据 
       
        p.close() 
       
        # python test.py
       
        hello world 
        0 
       
        process name: PoolWorker
        -
        1 
       
        hello world 
        1 
       
        process name: PoolWorker
        -
        2 
       
        hello world 
        2 
       
        process name: PoolWorker
        -
        3 
       
        hello world 
        3 
       
        process name: PoolWorker
        -
        1 
       
        hello world 
        4 
       
        process name: PoolWorker
        -
        2 
       
        hello world 
        5 
       
        process name: PoolWorker
        -
        3 
       
        hello world 
        6 
       
        process name: PoolWorker
        -
        1 
       
        hello world 
        7 
       
        process name: PoolWorker
        -
        2

进程池生成了3个子进程，通过循环执行8次worker函数，进程池会从子进程1开始去处理任务，当到达最大进程时，会继续从子进程1开始。

在运行此程序同时，再打开一个终端窗口会看到生成的子进程：

 
  
    
      
     
        # ps -ef |grep python
       
 
        root      
        40244   
        9041  
        4 
        16
        :
        43 
        pts
        /
        3   
        00
        :
        00
        :
        00 
        python test.py 
       
 
        root      
        40245  
        40244  
        0 
        16
        :
        43 
        pts
        /
        3    
        00
        :
        00
        :
        00 
        python test.py 
       
 
        root      
        40246  
        40244  
        0 
        16
        :
        43 
        pts
        /
        3    
        00
        :
        00
        :
        00 
        python test.py 
       
 
        root      
        40247  
        40244  
        0 
        16
        :
        43 
        pts
        /
        3    
        00
        :
        00
        :
        00 
        python test.py 
       
 
    

   
 

6）进程池map()方法

map()方法是将序列中的元素通过函数处理返回新列表。

 
        from 
        multiprocessing 
        import 
        Pool 
       
        def 
        worker(url): 
       
        return 
        'http://%s' 
        % 
        url 
       
        urls 
        = 
        [
        'www.baidu.com'
        , 
        'www.jd.com'
        ] 
       
        p 
        = 
        Pool(processes
        =
        2
        ) 
       
        r 
        = 
        p.
        map
        (worker, urls) 
       
        p.close()
       
        print 
        r 
       
        # python test.py
       
        [
        'http://www.baidu.com'
        , 
        'http://www.jd.com'
        ]

7）Queue进程间通信

multiprocessing支持两种类型进程间通信：Queue和Pipe。

Queue库已经封装到multiprocessing库中，在第十章 Python常用标准库已经讲解到Queue库使用，有需要请查看以前博文。

例如：一个子进程向队列写数据，一个子进程读取队列数据

 
        #!/usr/bin/python
       
        # -*- coding: utf-8 -*-
       
        from 
        multiprocessing 
        import 
        Process, Queue 
       
        # 写数据到队列
       
        def 
        write(q): 
       
        for 
        n 
        in 
        range
        (
        5
        ): 
       
        q.put(n) 
       
        print 
        'Put %s to queue.' 
        % 
        n 
       
        # 从队列读数据
       
        def 
        read(q): 
       
        while 
        True
        : 
       
        if 
        not 
        q.empty(): 
       
        value 
        = 
        q.get() 
       
        print 
        'Get %s from queue.' 
        % 
        value 
       
        else
        : 
       
        break 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        q 
        = 
        Queue() 
       
        pw 
        = 
        Process(target
        =
        write, args
        =
        (q,)) 
       
        pr 
        = 
        Process(target
        =
        read, args
        =
        (q,))    
       
        pw.start() 
       
        pw.join() 
       
        pr.start() 
       
        pr.join() 
       
        # python test.py
       
        Put 
        0 
        to queue. 
       
        Put 
        1 
        to queue. 
       
        Put 
        2 
        to queue. 
       
        Put 
        3 
        to queue. 
       
        Put 
        4 
        to queue. 
       
        Get 
        0 
        from 
        queue. 
       
        Get 
        1 
        from 
        queue. 
       
        Get 
        2 
        from 
        queue. 
       
        Get 
        3 
        from 
        queue. 
       
        Get 
        4 
        from 
        queue.

8）Pipe进程间通信

 
        from 
        multiprocessing 
        import 
        Process, Pipe 
       
        def 
        f(conn): 
       
        conn.send([
        42
        , 
        None
        , 
        'hello'
        ]) 
       
        conn.close() 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        parent_conn, child_conn 
        = 
        Pipe() 
       
        p 
        = 
        Process(target
        =
        f, args
        =
        (child_conn,)) 
       
        p.start() 
       
        print 
        parent_conn.recv()  
       
        p.join() 
       
        # python test.py
       
        [
        42
        , 
        None
        , 
        'hello'
        ]

Pipe()创建两个连接对象，每个链接对象都有send()和recv()方法，

9）进程间对象共享

Manager类返回一个管理对象，它控制服务端进程。提供一些共享方式：Value()、Array()、list()、dict()、Event()等

创建Manger对象存放资源，其他进程通过访问Manager获取。

 
  
    
      
      
        from 
        multiprocessing 
        import 
        Process, Manager 
       
 
        def 
        f(v, a, l, d): 
       
 
            
        v.value 
        = 
        100 
       
 
            
        a[
        0
        ] 
        = 
        123 
       
 
            
        l.append(
        'Hello'
        ) 
       
 
            
        d[
        'a'
        ] 
        = 
        1 
       
 
        mgr 
        = 
        Manager() 
       
 
        v 
        = 
        mgr.Value(
        'v'
        , 
        0
        ) 
       
 
        a 
        = 
        mgr.Array(
        'd'
        , 
        range
        (
        5
        )) 
       
 
        l 
        = 
        mgr.
        list
        () 
       
 
        d 
        = 
        mgr.
        dict
        () 
       
 
        p 
        = 
        Process(target
        =
        f, args
        =
        (v, a, l, d)) 
       

        p.start()
       

        p.join()
       
 
        print
        (v) 
       
 
        print
        (a) 
       
 
        print
        (l) 
       
 
        print
        (d) 
       

         
       

        # python test.py
       
 
        Value(
        'v'
        , 
        100
        ) 
       
 
        array(
        'd'
        , [
        123.0
        , 
        1.0
        , 
        2.0
        , 
        3.0
        , 
        4.0
        ]) 
       
 
        [
        'Hello'
        ] 
       
 
        {
        'a'
        : 
        1
        } 
       
 
    

   
 

10）写一个多进程的例子

比如：多进程监控URL是否正常

 
        from 
        multiprocessing 
        import 
        Pool, current_process 
       
        import 
        urllib2 
       
        urls 
        = 
        [ 
       
        'http://www.baidu.com'
        , 
       
        'http://www.jd.com'
        , 
       
        'http://www.sina.com'
        , 
       
        'http://www.163.com'
        , 
       
        ]
       
        def 
        status_code(url): 
       
        print 
        'process name:'
        , current_process().name 
       
        try
        : 
       
        req 
        = 
        urllib2.urlopen(url, timeout
        =
        5
        ) 
       
        return 
        req.getcode() 
       
        except 
        urllib2.URLError: 
       
        return 
       
        p 
        = 
        Pool(processes
        =
        4
        ) 
       
        for 
        url 
        in 
        urls: 
       
        r 
        = 
        p.apply_async(status_code, args
        =
        (url,)) 
       
        if 
        r.get(timeout
        =
        5
        ) 
        =
        = 
        200
        : 
       
        print 
        "%s OK" 
        %
        url 
       
        else
        : 
       
        print 
        "%s NO" 
        %
        url 
       
        # python test.py
       
        process name: PoolWorker
        -
        1 
       
        http:
        /
        /
        www.baidu.com OK 
       
        process name: PoolWorker
        -
        2 
       
        http:
        /
        /
        www.jd.com OK 
       
        process name: PoolWorker
        -
        3 
       
        http:
        /
        /
        www.sina.com OK 
       
        process name: PoolWorker
        -
        4 
       
        http:
        /
        /
        www.
        163.com 
        OK

博客地址：http://lizhenliang.blog.51cto.com

QQ群：323779636（Shell/Python运维开发群）

15.2 threading

threading模块类似于multiprocessing多进程模块，使用方法也基本一样。threading库是对thread库进行二次封装，我们主要用到Thread类，用Thread类派生线程对象。

1）使用Thread类实现多线程

 
        from 
        threading 
        import 
        Thread, current_thread 
       
        def 
        worker(n): 
       
        print 
        'thread name:'
        , current_thread().name 
       
        print 
        'hello world'
        , n 
       
        for 
        n 
        in 
        range
        (
        5
        ): 
       
        t 
        = 
        Thread(target
        =
        worker, args
        =
        (n, )) 
       
        t.start() 
       
        t.join()  
        # 等待主进程结束 
       
        # python test.py
       
        thread name: Thread
        -
        1 
       
        hello world 
        0 
       
        thread name: Thread
        -
        2 
       
        hello world 
        1 
       
        thread name: Thread
        -
        3 
       
        hello world 
        2 
       
        thread name: Thread
        -
        4 
       
        hello world 
        3 
       
        thread name: Thread
        -
        5 
       
        hello world 
        4

2）还有一种方式继承Thread类实现多线程，子类可以重写__init__和run()方法实现功能逻辑。

 
        #!/usr/bin/python
       
        # -*- coding: utf-8 -*-
       
        from 
        threading 
        import 
        Thread, current_thread 
       
        class 
        Test(Thread): 
       
        # 重写父类构造函数，那么父类构造函数将不会执行 
       
        def 
        __init__(
        self
        , n): 
       
        Thread.__init__(
        self
        ) 
       
        self
        .n 
        = 
        n 
       
        def 
        run(
        self
        ): 
       
        print 
        'thread name:'
        , current_thread().name 
       
        print 
        'hello world'
        , 
        self
        .n 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        for 
        n 
        in 
        range
        (
        5
        ): 
       
        t 
        = 
        Test(n) 
       
        t.start() 
       
        t.join() 
       
        # python test.py
       
        thread name: Thread
        -
        1 
       
        hello world 
        0 
       
        thread name: Thread
        -
        2 
       
        hello world 
        1 
       
        thread name: Thread
        -
        3 
       
        hello world 
        2 
       
        thread name: Thread
        -
        4 
       
        hello world 
        3 
       
        thread name: Thread
        -
        5 
       
        hello world 
        4

3）Lock

 
        from 
        threading 
        import 
        Thread, Lock, current_thread 
       
        lock 
        = 
        Lock() 
       
        class 
        Test(Thread): 
       
        # 重写父类构造函数，那么父类构造函数将不会执行 
       
        def 
        __init__(
        self
        , n): 
       
        Thread.__init__(
        self
        ) 
       
        self
        .n 
        = 
        n 
       
        def 
        run(
        self
        ): 
       
        lock.acquire()  
        # 获取锁 
       
        print 
        'thread name:'
        , current_thread().name 
       
        print 
        'hello world'
        , 
        self
        .n 
       
        lock.release()  
        # 释放锁 
       
        if 
        __name__ 
        =
        = 
        '__main__'
        : 
       
        for 
        n 
        in 
        range
        (
        5
        ): 
       
        t 
        = 
        Test(n) 
       
        t.start() 
       
        t.join()

众所周知，Python多线程有GIL全局锁，意思是把每个线程执行代码时都上了锁，执行完成后会自动释放GIL锁，意味着同一时间只有一个线程在运行代码。由于所有线程共享父进程内存、变量、资源，很容易多个线程对其操作，导致内容混乱。

当你在写多线程程序的时候如果输出结果是混乱的，这时你应该考虑到在不使用锁的情况下，多个线程运行时可能会修改原有的变量，导致输出不一样。

由此看来Python多线程是不能利用多核CPU提高处理性能，但在IO密集情况下，还是能提高一定的并发性能。也不必担心，多核CPU情况可以使用多进程实现多核任务。Python多进程是复制父进程资源，互不影响，有各自独立的GIL锁，保证数据不会混乱。能用多进程就用吧！

本文转自李振良OK 51CTO博客，原文链接：http://blog.51cto.com/lizhenliang/1875753，如需转载请自行联系原作者

第十五章 Python多进程与多线程

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件