C++11的thread代码分析

简介:

本文分析的是llvm libc++的实现:http://libcxx.llvm.org/

class thread

thread类直接包装了一个pthread_t,在linux下实际是unsigned long int。

class  thread
{
    pthread_t __t_;
    id get_id() const _NOEXCEPT {return __t_;}
}

用了一个std::unique_ptr来包装用户定义的线程函数:

创建线程用的是

template <class _Fp>
void*
__thread_proxy(void* __vp)
{
    __thread_local_data().reset(new __thread_struct);
    std::unique_ptr<_Fp> __p(static_cast<_Fp*>(__vp));
    (*__p)();
    return nullptr;
}

template <class _Fp>
thread::thread(_Fp __f)
{
    std::unique_ptr<_Fp> __p(new _Fp(__f));
    int __ec = pthread_create(&__t_, 0, &__thread_proxy<_Fp>, __p.get());
    if (__ec == 0)
        __p.release();
    else
        __throw_system_error(__ec, "thread constructor failed");
}

thread::joinable() , thread::join(), thread::detach() 

再来看下thread::joinable() , thread::join(), thread::detach() 函数。

也是相应调用了posix的函数。在调用join()之后,会把_t设置为0,这样再调用joinable()时就会返回false。对于_t变量没有memory barrier同步,感觉可能会有问题。

bool joinable() const {return __t_ != 0;}
void
thread::join()
{
    int ec = pthread_join(__t_, 0);
    __t_ = 0;
}

void
thread::detach()
{
    int ec = EINVAL;
    if (__t_ != 0)
    {
        ec = pthread_detach(__t_);
        if (ec == 0)
            __t_ = 0;
    }
    if (ec)
        throw system_error(error_code(ec, system_category()), "thread::detach failed");
}

thread::hardware_concurrency()

thread::hardware_concurrency()函数,获取的是当前可用的processor的数量。

调用的是sysconf(_SC_NPROCESSORS_ONLN)函数,据man手册:

        - _SC_NPROCESSORS_ONLN
              The number of processors currently online (available).

unsigned
thread::hardware_concurrency() _NOEXCEPT
{
    long result = sysconf(_SC_NPROCESSORS_ONLN);
    // sysconf returns -1 if the name is invalid, the option does not exist or
    // does not have a definite limit.
    // if sysconf returns some other negative number, we have no idea
    // what is going on. Default to something safe.
    if (result < 0)
        return 0;
    return static_cast<unsigned>(result);
}

thread::sleep_for和thread::sleep_until

sleep_for函数实际调用的是nanosleep函数:

void
sleep_for(const chrono::nanoseconds& ns)
{
    using namespace chrono;
    if (ns > nanoseconds::zero())
    {
        seconds s = duration_cast<seconds>(ns);
        timespec ts;
        typedef decltype(ts.tv_sec) ts_sec;
        _LIBCPP_CONSTEXPR ts_sec ts_sec_max = numeric_limits<ts_sec>::max();
        if (s.count() < ts_sec_max)
        {
            ts.tv_sec = static_cast<ts_sec>(s.count());
            ts.tv_nsec = static_cast<decltype(ts.tv_nsec)>((ns-s).count());
        }
        else
        {
            ts.tv_sec = ts_sec_max;
            ts.tv_nsec = giga::num - 1;
        }

        while (nanosleep(&ts, &ts) == -1 && errno == EINTR)
            ;
    }
}
sleep_until函数用到了mutex, condition_variable, unique_lock,实际上调用的还是pthread_cond_timedwait函数:

template <class _Clock, class _Duration>
void
sleep_until(const chrono::time_point<_Clock, _Duration>& __t)
{
    using namespace chrono;
    mutex __mut;
    condition_variable __cv;
    unique_lock<mutex> __lk(__mut);
    while (_Clock::now() < __t)
        __cv.wait_until(__lk, __t);
}

void
condition_variable::__do_timed_wait(unique_lock<mutex>& lk,
     chrono::time_point<chrono::system_clock, chrono::nanoseconds> tp) _NOEXCEPT
{
    using namespace chrono;
    if (!lk.owns_lock())
        __throw_system_error(EPERM,
                            "condition_variable::timed wait: mutex not locked");
    nanoseconds d = tp.time_since_epoch();
    if (d > nanoseconds(0x59682F000000E941))
        d = nanoseconds(0x59682F000000E941);
    timespec ts;
    seconds s = duration_cast<seconds>(d);
    typedef decltype(ts.tv_sec) ts_sec;
    _LIBCPP_CONSTEXPR ts_sec ts_sec_max = numeric_limits<ts_sec>::max();
    if (s.count() < ts_sec_max)
    {
        ts.tv_sec = static_cast<ts_sec>(s.count());
        ts.tv_nsec = static_cast<decltype(ts.tv_nsec)>((d - s).count());
    }
    else
    {
        ts.tv_sec = ts_sec_max;
        ts.tv_nsec = giga::num - 1;
    }
    int ec = pthread_cond_timedwait(&__cv_, lk.mutex()->native_handle(), &ts);
    if (ec != 0 && ec != ETIMEDOUT)
        __throw_system_error(ec, "condition_variable timed_wait failed");
}

std::notify_all_at_thread_exit 的实现

先来看个例子,这个notify_all_at_thread_exit函数到底有什么用:

#include <mutex>
#include <thread>
#include <condtion_variable>
 
std::mutex m;
std::condition_variable cv;
 
bool ready = false;
ComplexType result;  // some arbitrary type
 
void thread_func()
{
    std::unique_lock<std::mutex> lk(m);
    // assign a value to result using thread_local data
    result = function_that_uses_thread_locals();
    ready = true;
    std::notify_all_at_thread_exit(cv, std::move(lk));
} // 1. destroy thread_locals, 2. unlock mutex, 3. notify cv
 
int main()
{
    std::thread t(thread_func);
    t.detach();
 
    // do other work
    // ...
 
    // wait for the detached thread
    std::unique_lock<std::mutex> lk(m);
    while(!ready) {
        cv.wait(lk);
    }
    process(result); // result is ready and thread_local destructors have finished
}

可以看到std::notify_all_at_thread_exit 函数,实际上是注册了一对condition_variable,mutex,当线程退出时,notify_all。

下面来看下具体的实现:

这个是通过Thread-specific Data来实现的,具体可以参考:http://www.ibm.com/developerworks/cn/linux/thread/posix_threadapi/part2/

但我个人觉得这个应该叫线程特定数据比较好,因为它是可以被别的线程访问的,而不是某个线程”专有“的。

简而言之,std::thread在构造的时候,创建了一个__thread_struct_imp对象。

__thread_struct_imp对象里,用一个vector来保存了pair<condition_variable*, mutex*>

class  __thread_struct_imp
{
    typedef vector<__assoc_sub_state*,
                          __hidden_allocator<__assoc_sub_state*> > _AsyncStates;
<strong>    typedef vector<pair<condition_variable*, mutex*>,
               __hidden_allocator<pair<condition_variable*, mutex*> > > _Notify;</strong>

    _AsyncStates async_states_;
    _Notify notify_;

当调用notify_all_at_thread_exit函数时,把condition_variable和mutex,push到vector里

void
__thread_struct_imp::notify_all_at_thread_exit(condition_variable* cv, mutex* m)
{
    notify_.push_back(pair<condition_variable*, mutex*>(cv, m));
}

当线程退出时,会delete掉__thread_struct_imp,也就是会调用__thread_struct_imp的析构函数。

在析构函数里,会调用历遍vector,unlock每个mutex,和调用condition_variable.notify_all()函数

__thread_struct_imp::~__thread_struct_imp()
{
    for (_Notify::iterator i = notify_.begin(), e = notify_.end();
            i != e; ++i)
    {
        i->second->unlock();
        i->first->notify_all();
    }
    for (_AsyncStates::iterator i = async_states_.begin(), e = async_states_.end();
            i != e; ++i)
    {
        (*i)->__make_ready();
        (*i)->__release_shared();
    }
}

更详细的一些封闭代码,我提取出来放到了gist上:https://gist.github.com/hengyunabc/d48fbebdb9bddcdf05e9


其它的一些东东:

关于线程的yield, detch, join,可以直接参考man文档:

pthread_yield:

       pthread_yield() causes the calling thread to relinquish the CPU.  The
       thread is placed at the end of the run queue for its static priority
       and another thread is scheduled to run.  For further details, see
       sched_yield(2)
pthread_detach:

       The pthread_detach() function marks the thread identified by thread
       as detached.  When a detached thread terminates, its resources are
       automatically released back to the system without the need for
       another thread to join with the terminated thread.

       Attempting to detach an already detached thread results in
       unspecified behavior.
pthread_join:

       The pthread_join() function waits for the thread specified by thread
       to terminate.  If that thread has already terminated, then
       pthread_join() returns immediately.  The thread specified by thread
       must be joinable.

总结:

个人感觉像 join, detach这两个函数实际没多大用处。绝大部分情况下,线程创建之后,都应该detach掉。

像join这种同步机制不如换mutex等更好。

参考:

http://en.cppreference.com/w/cpp/thread/notify_all_at_thread_exit

http://man7.org/linux/man-pages/man3/pthread_detach.3.html

http://man7.org/linux/man-pages/man3/pthread_join.3.html

http://stackoverflow.com/questions/19744250/c11-what-happens-to-a-detached-thread-when-main-exits

http://man7.org/linux/man-pages/man3/pthread_yield.3.html

http://man7.org/linux/man-pages/man2/sched_yield.2.html

http://www.ibm.com/developerworks/cn/linux/thread/posix_threadapi/part2/

man pthread_key_create


目录
相关文章
|
Linux C语言
Linux入门教程:centos升级glibc至2.18,
官方的glibc源只更新到2.12版,很多业务需要升级到更高级版,这里介绍编译glibc升级的方式。
4438 0
|
4月前
|
存储 分布式计算 数据建模
淘宝闪购基于阿里云 EMR Serverless Spark&Paimon的湖仓实践:超大规模下的特征生产&多维分析双提效
本文介绍阿里云 Serverless Spark + Paimon 在淘宝闪购大数据湖仓场景的应用。
|
8月前
|
人工智能 运维 监控
Flink 智能调优:从人工运维到自动化的实践之路
本文由阿里云Flink产品专家黄睿撰写,基于平台实践经验,深入解析流计算作业资源调优难题。针对人工调优效率低、业务波动影响大等挑战,介绍Flink自动调优架构设计,涵盖监控、定时、智能三种模式,并融合混合计费实现成本优化。展望未来AI化方向,推动运维智能化升级。
951 8
Flink 智能调优:从人工运维到自动化的实践之路
|
11月前
|
存储 SQL Cloud Native
热烈祝贺 Flink 2.0 存算分离入选 VLDB 2025
Apache Flink 2.0架构实现重大突破,论文《Disaggregated State Management in Apache Flink® 2.0》被VLDB 2025收录。该研究提出解耦式状态管理架构,通过异步执行框架与全新存储引擎ForSt,实现状态与计算分离,显著提升扩展性、容错能力与资源效率,推动Flink向云原生演进,开启流计算新时代。
1438 1
热烈祝贺 Flink 2.0 存算分离入选 VLDB 2025
|
10月前
|
存储 分布式计算 数据处理
「48小时极速反馈」阿里云实时计算Flink广招天下英雄
阿里云实时计算Flink团队,全球领先的流计算引擎缔造者,支撑双11万亿级数据处理,推动Apache Flink技术发展。现招募Flink执行引擎、存储引擎、数据通道、平台管控及产品经理人才,地点覆盖北京、杭州、上海。技术深度参与开源核心,打造企业级实时计算解决方案,助力全球企业实现毫秒洞察。
881 0
「48小时极速反馈」阿里云实时计算Flink广招天下英雄
|
11月前
|
分布式计算 Java 流计算
Fluss on 鲲鹏 openEuler 大数据实战
本文介绍了基于华为鲲鹏ARM架构服务器与openEuler操作系统,构建包含HDFS、ZooKeeper、Flink、Fluss及Paimon的实时大数据环境的完整实战过程。涵盖了软硬件配置、组件部署、集群规划、环境变量设置、安全认证及启停脚本编写等内容,适用于企业级实时数据平台搭建与运维场景。
1537 0
Fluss on 鲲鹏 openEuler 大数据实战
|
存储 关系型数据库 MySQL
PostgreSQL与MySQL优劣势比较浅谈
PostgreSQL与MySQL优劣势比较浅谈
3235 0

热门文章

最新文章