掌握并行处理：理解并构建自己的线程池-阿里云开发者社区

掌握并行处理：理解并构建自己的线程池

2023-09-23 140

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云数据库 Tair（兼容Redis），内存型 2GB

Redis 开源版，标准版 2GB

简介： 本文深入探讨了线程池的原理和实现，并提供了一个详细的、分步指南，帮助读者掌握并行处理的核心概念。文章开始介绍了并行处理的重要性，以及为什么线程池是一种有效管理并发任务的技术。接着，文章详细解释了线程池的基本原理，包括线程池的组成结构、线程的生命周期和任务队列的管理方法。随后，文章展示了如何使用C++编程语言实现一个简单的线程池，并介绍了线程安全性和任务调度的关键考虑因素。

一、线程池存在原因

（1）线程使用场景：某类任务特别耗时，会严重影响该线程处理其他任务，因此需要在其他线程异步执行该任务。

（2）线程开销：随着这类任务越来越多，需要异步执行任务而开启的线程也越来越多，但是每个CPU的核心数和线程数是固定，过多的线程并不能提高效率。因此，线程资源的开销与CPU核心之间要平衡选择。

（3）原因：

线程资源的开销与CPU核心之间要平衡选择，自然就产生线程池，即需要固定线程的数量。
从前面的执行流程可以看出，每次耗时任务到来都需要不断的创建线程，执行完成后销毁线程。不断的创建和销毁线程会浪费系统资源。
因此需要有线程池，提前分配固定线程数量，不会将它们关闭，当任务到来时复用这些线程。

（3）线程池的作用：

复用线程资源。
充分利用系统资源。
异步执行耗时任务。
减少了多个任务（不是一个任务）的执行时间。

二、线程池原理

线程池是一个生产–消费模型。发布任务的线程是生产者，其他线程是消费者。

线程池的构成：

（1）生产者线程：发布任务。

（2）队列：亦称任务队列，存放具体的任务。因为任务是异步执行的，任务的内容就包括了任务的上下文以及任务的执行函数。

（3）线程池：即消费者，是固定数量的线程集合；主要完成取出任务、执行任务、任务调度。

2.1、线程调度

由于任务的密疏程度是未知的，即任务是间歇性的，有时候任务很多，有时候任务很少。当任务很少时，需要将不执行任务的线程休眠，不能让其浪费系统资源。这就需要线程调度。

线程的调度主要通过mutex和condition实现。 即互斥锁和条件变量。

线程有两种状态：从无任务到有任务（从无到有）以及从有任务到无任务（从有到无）。

利用条件变量，从无到有时，唤醒线程；从有到无时，休眠线程。那么如何确定条件呢？就是依据任务队列的状态，如果任务队列中有任务，将线程唤醒；如果任务队列为空，将线程休眠。

仔细点说，就是：

当生产者线程发布任务时，任务队列就有任务，进入从无到有状态；通知某一个线程唤醒，取出任务、执行任务。
线程判断任务队列是否有任务，如果任务队列为空，则进入从有到无状态，condition通知线程休眠。

2.2、平衡选择

线程资源的开销与CPU核心之间做平衡选择；平衡选择依据耗时任务而定。耗时任务分为IO密集型和CPU密集型。

IO密集型：IO的操作是同步的，系统调用会阻塞的将内核资源拷贝到用户态或者用户态资源阻塞的将资源拷贝到内核中；线程会阻塞等待系统调用完成。
CPU密集型：长时间占用CPU，使线程无法处理其他任务。

根据这两个类型，可以确定线程池的线程数量。一般，CPU密集型的线程池数量等于CPU核心数；IO密集型的线程池线程数量等于2倍核心数+2。

有这样一个公式：（IO等待时间+CPU运算时间）核心数/cpu运算时间。根据公式对线程池数量做优化调整，使其符合特定业务逻辑。

三、实现一个线程池

3.1、接口设计

（1）创建线程池的接口。确定线程池的线程数量以及任务队列的长度。

（2）销毁线程池的接口。线程判断线程池销毁标志，如果标记了线程池销毁，线程退出；并且通知所有线程。

（3）生产者线程抛出任务的接口。目的是构造一个任务，并把任务放到任务队列中，通知线程唤醒。

3.2、代码示例

thread_pool.h

#ifndef _THREAD_POOL_H#define _THREAD_POOL_Htypedefstructthread_pool_tthread_pool_t;
typedefvoid(*handler_pt)(void*);
enumTHREAD_POOL_ERROR_CODE{
THREAD_POOL_NULL=-1,
THREAD_MUTEX_FAIL=-2,
THREAD_POOL_CLOSED=-3,
THREAD_COND_FAIL=-4,
THREAD_POOL_TASK_QUE_FULL=-5,
THREAD_COND_SIGNAL_FAIL=-6,
THREAD_POOL_SUCCESS=0};
thread_pool_t*thread_pool_create(intthread_count,intqueue_size);
intthread_pool_destory(thread_pool_t*pool);
intthread_pool_post(thread_pool_t*pool,handler_ptfunc,void*arg);
intwait_pool_done(thread_pool_t*pool);
#endif

thread_pool.c

#include <pthread.h>#include <stdint.h>#include <stddef.h>#include <stdlib.h>#include "thread_pool.h"typedefstructtask_t{
handler_ptfunc;
void*arg;
} task_t;
typedefstructtask_queue_t{
uint32_thead;// 队列头索引uint32_ttail;// 队列尾索引uint32_tcount;//任务数量task_t*queue;//队列数组} task_queue_t;
structthread_pool_t{
pthread_mutex_tmutex;
pthread_cond_tcondition;
pthread_t*threads;
task_queue_ttask_queue;
intclosed;//销毁线程池标记intstarted;//当前运行的线程数intthread_count;
intqueue_size;
};
staticvoidthread_pool_free(thread_pool_t*pool)
{
if (pool==NULL||pool->started>0)
return;
if (pool->threads)
    {
free(pool->threads);
pool->threads=NULL;
pthread_mutex_lock(&(pool->mutex));
pthread_mutex_destroy(&(pool->mutex));
pthread_cond_destroy(&(pool->condition));
    }
if (pool->task_queue.queue)
    {
free(pool->task_queue.queue);
pool->task_queue.queue=NULL;
    }
free(pool);
}
staticvoid*thread_worker(void*thread_pool)
{
thread_pool_t*pool=thread_pool;
task_queue_t*que;
task_ttask;
for (;;)
    {
pthread_mutex_lock(&(pool->mutex));//加锁que=&(pool->task_queue);
// 判断虚假唤醒while (que->count==0&&pool->closed==0)
        {
// pthread_mutex_unlock(&(pool->mutex))// 阻塞在 condition// 唤醒信号===================================// 解除阻塞// pthread_mutex_lock(&(pool->mutex));pthread_cond_wait(&(pool->condition),&(pool->mutex));
        }
if (pool->closed==1)
break;
task=que->queue[que->head];//取出任务que->head= (que->head+1) %pool->queue_size;
que->count--;
pthread_mutex_unlock(&(pool->mutex));
        (*(task.func))(task.arg);
    }
pool->started--;
pthread_mutex_unlock(&(pool->mutex));
pthread_exit(NULL);
returnNULL;
}
thread_pool_t*thread_pool_create(intthread_count,intqueue_size)
{
if(thread_count<=0||queue_size<=0)
returnNULL;
thread_pool_t*pool;
pool=(thread_pool_t*)malloc(sizeof(*pool));
if(pool==NULL)
returnNULL;
pool->thread_count=0;//从0开始计数pool->queue_size=queue_size;
pool->task_queue.head=0;
pool->task_queue.tail=0;
pool->task_queue.count=0;
pool->started=pool->closed=0;
pool->threads=NULL;
pool->task_queue.queue=NULL;
pool->task_queue.queue= (task_t*)malloc(sizeof(task_t)*queue_size);
if (pool->task_queue.queue==NULL)
    {
//free poolthread_pool_free(pool);
returnNULL;
    }
pool->threads= (pthread_t*)malloc(sizeof(pthread_t)*thread_count);
if (pool->threads==NULL)
    {
//free poolthread_pool_free(pool);
returnNULL;
    }
inti=0;
for (i=0; i<thread_count; i++)
    {
if (pthread_create(&(pool->threads[i]), NULL,thread_worker, (void*)pool) !=0)
        {
//free poolthread_pool_free(pool);
returnNULL;
        }
pool->thread_count++;
pool->started++;
    }
returnpool;
}
intthread_pool_post(thread_pool_t*pool, handler_ptfunc, void*arg)
{
if (pool==NULL||func==NULL)
returnTHREAD_POOL_NULL;
task_queue_t*task_queue=&(pool->task_queue);//取出队列if (pthread_mutex_lock(&(pool->mutex)) !=0)
returnTHREAD_MUTEX_FAIL;
if (pool->closed)
    {
pthread_mutex_unlock(&(pool->mutex));
returnTHREAD_POOL_CLOSED;
    }
if (task_queue->count==pool->queue_size)
    {
pthread_mutex_unlock(&(pool->mutex));
returnTHREAD_POOL_TASK_QUE_FULL;
    }
task_queue->queue[task_queue->tail].func=func;
task_queue->queue[task_queue->tail].arg=arg;
task_queue->tail= (task_queue->tail+1) %pool->queue_size;
task_queue->count++;
if (pthread_cond_signal(&(pool->condition)) !=0)
    {
pthread_mutex_unlock(&(pool->mutex));
returnTHREAD_COND_SIGNAL_FAIL;
    }
pthread_mutex_unlock(&(pool->mutex));
return0;
}
intwait_pool_done(thread_pool_t*pool)
{
inti, ret=0;
for (i=0; i<pool->thread_count; i++)
    {
if (pthread_join(pool->threads[i], NULL) !=0)
ret=1;
    }
returnret;
}
intthread_pool_destory(thread_pool_t*pool)
{
if (pool==NULL)
returnTHREAD_POOL_NULL;
if (pthread_mutex_unlock(&(pool->mutex)) !=0)
returnTHREAD_MUTEX_FAIL;
if (pool->closed)
    {
thread_pool_free(pool);
returnTHREAD_POOL_CLOSED;
    }
pool->closed=1;
if(pthread_cond_broadcast(&(pool->condition))!=0||pthread_mutex_unlock(&(pool->mutex)) !=0)
    {
thread_pool_free(pool);
returnTHREAD_COND_FAIL;
    }
wait_pool_done(pool);
thread_pool_free(pool);
return0;
}

main.c，使用线程池：

#include <stdio.h>#include <stdlib.h>#include <pthread.h>#include <unistd.h>#include "thread_pool.h"intnum=0;
intdone=0;
pthread_mutex_tlock;
voiddo_task(void*arg)
{
usleep(3000);
pthread_mutex_lock(&lock);
done++;
printf("doing %d task\n", done);
pthread_mutex_unlock(&lock);
}
intmain(intargc,char**argv)
{
intthreads=4;
intqueue_size=256;
if (argc==2)
    {
queue_size=threads=atoi(argv[1]);
if (threads<=0||queue_size<=0)
        {
printf("threads number or queue size error: %d,%d\n", threads, queue_size);
return1;
        }
    }
thread_pool_t*pool=thread_pool_create(threads, queue_size);
if (pool==NULL)
returnTHREAD_POOL_NULL;
while (thread_pool_post(pool, &do_task, NULL) ==0)
    {
pthread_mutex_lock(&lock);
num++;
pthread_mutex_unlock(&lock);
    }
printf("add %d tasks\n", num);
wait_pool_done(pool);
printf("did %d tasks\n", done);
thread_pool_destory(pool);
return0;
}

四、哪些开源项目使用了线程池

4.1、nginx中的线程池

niginx github开源地址：https://github.com/nginx/nginx

nginx中线程池的作用是处理文件缓冲。 nginx线程池默认关闭，configure 时，需要 --with-threads

来指定。

线程池作用阶段：

（1）使用线程池的情况：nginx可以应用于静态web服务器，主要是处理文件缓冲，文件读写比较耗时。nginx推荐使用sendfile、directio、aio来处理耗时的任务，线程池不是重点推荐。nginx的耗时主要集中在文件操作，即compute阶段，这时可以使用线程池解决。

（2）使用线程池原因：磁盘IO读写比较耗时。nginx推荐使用sendfile、directio、aio来处理耗时的任务，线程池不是重点推荐。

（3）使用线程池：nginx线程池会有两个队列，任务消息队列和完成消息队列；任务消息队列存放发布的任务，将任务pull到线程池；线程池处理完会将结果push到完成消息队列，通知主线程获取结果。

static ngx_int_t
ngx_http_cache_thread_handler(ngx_thread_task_t *task, ngx_file_t *file)
{
  // 其他代码...
  task->event.data = r;
    task->event.handler = ngx_http_cache_thread_event_handler;
  if (ngx_thread_task_post(tp, task) != NGX_OK) {
        return NGX_ERROR;
    }
    // 其他代码...
}

调用逻辑：

4.2、redis中的线程池

redis是作为一个数据库，需要读写大量的数据、解析协议，这样一来读IO和写IO压力都非常大。因此，redis中线程池的作用是读写IO处理和数据包解析、压缩。

线程池作用阶段（recv、decode、encode、send）：

redis使用线程池的条件：有多个客户端并发请求，并且有读写IO的问题（写日志业务、读大量数据等）。

redis线程池运行原理：

主线程收集所有的读事件，并放到一个队列中；线程池为每个线程都准备一个自己线程的队列；然后主线程将收集的事件分发到线程池IO线程的队列中，线程池的线程从自己的队列中取出任务、执行任务；主线程既是生产者也是消费者，主线程处理Compute阶段的业务逻辑。

每个线程都有自己的队列原因：避免加锁。

总结

（1）线程池，就是固定线程数量，复用线程不销毁。

（2）线程池是一种生产者–消费者模型，某类任务特别耗时，会严重影响该线程处理其他任务，因此需要线程池。

（3）线程池是面向生产者的，生产者使用线程。线程池最好至少设计两个队列，任务队列和完成队列。

（4）通常，线程池的线程调度使用互斥锁（mutex）和条件变量(condition)。但也不一定非得使用condition（条件变量），可以只有互斥锁，比如redis，主线程获取mutex，其他线程会一直阻塞着，巧妙的利用互斥锁使其他线程休眠。

（5）线程池的线程数量选择，依据业务是IO密集型还是CPU密集型；假设CPU核心数为N，IO密集型一般设置为2*N+2，CPU密集型设为N。可以参考这样一个公式：（IO处理时间+CPU运行时间）*核心数/CPU运行时间。

（6）线程池的作用：复用线程资源，充分利用系统资源，异步执行耗时任务。

关注公众号《Lion 莱恩呀》随时随地学习技术。

掌握并行处理：理解并构建自己的线程池

一、线程池存在原因

二、线程池原理

2.1、线程调度

2.2、平衡选择

三、实现一个线程池

3.1、接口设计

3.2、代码示例

四、哪些开源项目使用了线程池

4.1、nginx中的线程池

4.2、redis中的线程池

总结

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

掌握并行处理：理解并构建自己的线程池

一、线程池存在原因

二、线程池原理

2.1、线程调度

2.2、平衡选择

三、实现一个线程池

3.1、接口设计

3.2、代码示例

四、哪些开源项目使用了线程池

4.1、nginx中的线程池

4.2、redis中的线程池

总结

热门文章

最新文章

相关电子书