Comparison of Performance of Different poll implementations

简介: epoll Scalability Web PageIntroduction Interface Description Man Pages Testing dphttpd dphttpd SMP results dp...


epoll Scalability Web Page

Introduction

Davide Libenzi wrote an event poll implementation and described it at the /dev/epoll home page here. His performance testing led to the conclusion that epoll scaled linearly regardless of the load as defined by the number of dead connections. However, the main hindrance to having epoll accepted into the mainline Linux kernel by Linus Torvalds was the interface to epoll being in /dev. Therefore, a new interface to epoll was added via three new system calls. That interface will hereafter be referred to as sys_epoll. Download sys_epoll here.


sys_epoll Interface


  • int epoll_create(int maxfds);

    The system call epoll_create() creates a sys_epoll "object" by allocating space for "maxfds" descriptors to be polled. The sys_epoll "object" is referenced by a file descriptor, and this enables the new interface to :

    • Maintain compatibility with the existing interface
    • Avoid the creation of a epoll_close() syscall
    • Reuse 95% of the existing code
    • Inherit the file* automatic clean up code

  • int epoll_ctl(int epfd, int op, int fd, unsigned int events);

    The system call epoll_ctl() is the controller interface. The "op" parameter is either EP_CTL_ADD, EP_CTL_DEL or EP_CTL_MOD. The parameter "fd" is the target of the operation. The last parameter, "events", is used in both EP_CTL_ADD and EP_CTL_MOD and represents the event interest mask.

  • int epoll_wait(int epfd, struct pollfd * events, int maxevents, int timeout);

    The system call epoll_wait() waits for events by allowing a maximum timeout, "timeout", in milliseconds and returns the number of events ( struct pollfd ) that the caller will find available at "*events".

sys_epoll Man Pages

Testing

We tested using two applications:

  • dphttpd
  • pipetest

dphttpd

Software

The http client is httperf from David Mosberger. Download httperf here. The http server is dphttpd from Davide Libenzi. Download dphttpd here. The deadconn client is also provided by Davide Libenzi. Download deadconn here.

Two client programs (deadconn_last and httperf) run on the client machine and establish connections to the HTTP server (dphttpd) running on the server machine. Connections established by deadconn_last are "dead". These send a single HTTP get request at connection setup and remain idle for the remainder of the test. Connections established by httperf are "active". These continuously send HTTP requests to the server at a fixed rate. httperf reports the rate at which the HTTP server replies to its requests. This reply rate is the metric reported on the Y-axis of the graphs below.

For the tests, the number of active connections is kept constant and the number of dead connections increased from 0 to 60000 (X-axis of graphs below). Consequently, dphttpd spends a fixed amount of time responding to requests and a variable amount of time looking for requests to service. The mechanism used to look for active connections amongst all open connections is one of standard poll(), /dev/epoll or sys_epoll. As the number of dead connections is increased, the scalability of these mechanisms starts impacting dphttpd's reply rate, measured by httperf.

dphttpd SMP

Server

  • Hardware: 8-way PIII Xeon 700MHz, 2.5 GB RAM, 2048 KB L2 cache
  • OS : RedHat 7.3 with 2.5.44 kernel, patched with ONE of:
  • /proc/sys/fs/file-max = 131072
  • /proc/sys/net/ipv4/tcp_fin_timeout = 15
  • /proc/sys/net/ipv4/tcp_max_syn_backlog = 16384
  • /proc/sys/net/ipv4/tcp_tw_reuse = 1
  • /proc/sys/net/ipv4/tcp_tw_recycle = 1
  • /proc/sys/net/ipv4/ip_local_port_range = 1024 65535
  • # ulimit -n 131072
  • # dphttpd --maxfds 20000 --stksize 4096
  • default size of reply = 128 bytes

Client

  • Hardware: 4-way PIII Xeon 500MHz, 3 GB RAM, 512 KB L2 cache
  • OS : RedHat 7.3 with 2.4.18-3smp kernel
  • /proc/sys/fs/file-max = 131072
  • /proc/sys/net/ipv4/tcp_fin_timeout = 15
  • /proc/sys/net/ipv4.tcp_tw_recycle = 1
  • /proc/sys/net/ipv4.tcp_max_syn_backlog = 16384
  • /proc/sys/net/ipv4.ip_local_port_range = 1024 65535
  • # ulimit -n 131072
  • # deadconn_last $SERVER $SVRPORT num_connections
    where num_connections is one of 0, 50, 100, 200, 400, 800, 1000, 5000, 10000, 15000, 20000, 25000, 30000, 35000, 40000, 45000, 50000, 55000, 60000.
  • After deadconn_last reports num_connections established
    # httperf --server=$SERVER --port=$SVRPORT --think-timeout 5 --timeout 5 --num-calls 20000 --num-conns 100 --hog --rate 100

Results for dphttpd SMP

dphttpd UP

Server

  • 1-way PIII, 866MHz, 256 MB RAM
  • OS: 2.5.44 gcc 2.96 (RedHat 7.3)
  • /proc/sys/fs/file-max = 65536
  • /proc/sys/net/ipv4/tcp_fin_timeout = 15
  • /proc/sys/net/ipv4/tcp_tw_recycle = 1
  • # ulimit -n 65536
  • # dphttpd --maxfds 20000 --stksize 4096
  • default size of reply = 128 bytes

    Client

    • 1-way PIII, 866MHz, 256 MB RAM
    • OS: 2.4.18, gcc 2.96 (RedHat 7.3)
    • /proc/sys/fs/file-max = 65536
    • /proc/sys/net/ipv4/tcp_fin_timeout = 15
    • /proc/sys/net/ipv4.tcp_tw_recycle = 1
    • # ulimit -n 65536
    • # deadconn_last $SERVER $SVRPORT num_connections
      where num_connections is one of 0, 50, 100, 200, 400, 800, 1000, 2000, 4000, 6000, 8000, 10000, 12000, 14000, 16000.
    • After deadconn_last reports num_connections established
      # httperf --server=$SERVER --port=$SVRPORT --think-timeout 5 --timeout 5 --num-calls 20000 --num-conns 100 --hog --rate 100

    Results for dphttpd UP



    Pipetest

    David Stevens added support for sys_epoll to Ben LaHaise's original pipetest.c application. Download Ben LaHaise's Ottawa Linux Symposium 2002 paper including pipetest.c here. Download David Steven's patch to add sys_epoll to pipetest.c here.


    Pipetest SMP

    System and Configuration

    • 8-way PIII Xeon 900MHz, 4 GB RAM, 2048 KB L2 cache
    • OS: RedHat 7.2 with 2.5.44 kernel, patched with one of:
    • /proc/sys/fs/file-max = 65536
    • # ulimit -n 65536
    • # pipetest [poll|aio-poll|sys-epoll] num_pipes message_threads max_generation
      • where num_pipes is one of 10, 100, 500, or 1000-16000 in increments of 1000
      • message_threads is 1
      • max_generantion is 300

    Results for pipetest on an SMP

    Results for Pipetest on a UP

    Same Hardware and Configuration as with the SMP pipetest above
    with CONFIG_SMP = n being the only change.

    sys_epoll stability comparisons Oct 30, 2002


    Following are performance results comparing version 0.14 of the (Download v0.14 here) to the version, v0.9, originally used for the performance testing outlined above. (Download v0.9 here) Testing was done using two measures: pipetest details here. and dphttpd details here..

    Analysis and Conclusion

    The system call interface to epoll performs as well as the /dev interface to epoll. In addition sys_epoll scales better than poll and AIO poll for large numbers of connections. Other points in favour of sys_epoll are:

    • sys_epoll is compatible with synchronous read() and write() and thus makes it usable with existing libraries that have not migrated to AIO.
    • Applications using poll() can be easily migrated to sys_epoll while continuing to use the existing I/O infrastructure.
    • sys_epoll will be invisible to people who don't want to use it.
    • sys_epoll has a low impact on the existing source code.
      
      arch/i386/kernel/entry.S  |    4
      fs/file_table.c           |    4
      fs/pipe.c                 |   36 +
      include/asm-i386/poll.h   |    1
      include/asm-i386/unistd.h |    3
      include/linux/fs.h        |    4
      include/linux/list.h      |    5
      include/linux/pipe_fs_i.h |    4
      include/linux/sys.h       |    2
      include/net/sock.h        |   12
      net/ipv4/tcp.c            |    4
      

    Due to these factors sys_epoll should be seriously considered for inclusion in the mainline Linux kernel.

    Acknowledgments

    Thank You to: Davide Libenzi Who wrote /dev/epoll, sys_epoll and dphttpd.
    He was an all around great guy to work with.
    Also Thanks to the following people who helped with testing and this web site:
    Shailabh Nagar, Paul Larson , Hanna Linder, and David Stevens.




    目录
    相关文章
    |
    3月前
    |
    算法 数据挖掘 数据处理
    文献解读-Sentieon DNAscope LongRead – A highly Accurate, Fast, and Efficient Pipeline for Germline Variant Calling from PacBio HiFi reads
    PacBio® HiFi 测序是第一种提供经济、高精度长读数测序的技术,其平均读数长度超过 10kb,平均碱基准确率达到 99.8% 。在该研究中,研究者介绍了一种准确、高效的 DNAscope LongRead 管道,用于从 PacBio® HiFi 读数中调用胚系变异。DNAscope LongRead 是对 Sentieon 的 DNAscope 工具的修改和扩展,该工具曾获美国食品药品管理局(FDA)精密变异调用奖。
    40 2
    文献解读-Sentieon DNAscope LongRead – A highly Accurate, Fast, and Efficient Pipeline for Germline Variant Calling from PacBio HiFi reads
    |
    4月前
    |
    算法 数据挖掘
    文献解读-Consistency and reproducibility of large panel next-generation sequencing: Multi-laboratory assessment of somatic mutation detection on reference materials with mismatch repair and proofreading deficiency
    Consistency and reproducibility of large panel next-generation sequencing: Multi-laboratory assessment of somatic mutation detection on reference materials with mismatch repair and proofreading deficiency,大panel二代测序的一致性和重复性:对具有错配修复和校对缺陷的参考物质进行体细胞突变检测的多实验室评估
    38 6
    文献解读-Consistency and reproducibility of large panel next-generation sequencing: Multi-laboratory assessment of somatic mutation detection on reference materials with mismatch repair and proofreading deficiency
    |
    7月前
    (145) Table ‘./addon_collect_wukong_spider‘ is marked as crashed and should be repaired解决思路
    (145) Table ‘./addon_collect_wukong_spider‘ is marked as crashed and should be repaired解决思路
    31 0
    |
    SQL Oracle 数据可视化
    译|Thinking Clearly about Performance (Part 1)(下)
    译|Thinking Clearly about Performance (Part 1)(下)
    114 0
    |
    Rust 小程序
    小程序警告:Now you can provide attr wxkey for a wxfor to improve performance
    首先,无论什么程序,控制台中的警告都是会影响程序性能的。我们需要减少此类警告的出现,以提高程序的运行性能。 小程序开发的时候,遇到了如下的警告:
    186 0
    解决Mapped Statements collection already contains value for experiment4.UserMapper.listUser错误~
    解决Mapped Statements collection already contains value for experiment4.UserMapper.listUser错误~
    121 0
    |
    SQL 缓存 Oracle
    译|Thinking Clearly about Performance (Part 2)(上)
    译|Thinking Clearly about Performance (Part 2)
    89 0
    |
    存储 Oracle 关系型数据库
    译|Thinking Clearly about Performance (Part 1)(上)
    译|Thinking Clearly about Performance (Part 1)
    83 0
    |
    SQL 资源调度 Oracle
    译|Thinking Clearly about Performance (Part 2)(下)
    译|Thinking Clearly about Performance (Part 2)(下)
    56 0
    |
    SQL 算法 关系型数据库
    Optimizing Queries over Partitioned Tables in MPP Systems
    随着互联网数据的爆炸性增长,传统数据库系统在单表数据容量方面承受了越来越大的压力。以前公司内部的数据库,存放的主要是来自公司业务或内部管理系统的信息,中小型公司甚至一个MySQL实例就搞定了。但现在数据源不仅更丰富,数据量也在指数级增长,从业务的角度,基于hash/range的分区表变得越来越有吸引力。
    265 0
    Optimizing Queries over Partitioned Tables in MPP Systems