开发者社区> 沙加10> 正文
阿里云
为了无法计算的价值
打开APP
阿里云APP内打开

MOM系列文章之 - zero copy 那些事(上)

简介:     最近准备了两篇文章,主要是针对MOM中的关键技术zero copy(物理层面和逻辑层面)进行一些介绍。     在基于文件存储的MOM Kafka,ActiveMQ以及其它诸如Hornetq,Kestrel中的Journal设计实现中,无不见zero copy的神威。为此我准备了一个系列文章,希望能够为大家解开zero copy的神秘面纱,也希望大家能够喜欢。    
+关注继续查看

    最近准备了两篇文章,主要是针对MOM中的关键技术zero copy(物理层面和逻辑层面)进行一些介绍

    在基于文件存储的MOM KafkaActiveMQ以及其它诸如Hornetq,Kestrel中的Journal设计实现中,无不见zero copy的神威。为此我准备了一个系列文章,希望能够为大家解开zero copy的神秘面纱,也希望大家能够喜欢。

    这篇文章主要聚焦在zero copy的基础部分。首先通过E文导读来理解其内在原理,理解为什么zero copy能够提升一些IO密集型应用的性能,为什么能够将上下文切换从4次降到2次,数据copy4次降低到3次(注:只有一次会占用CPU cycle)?其次,简单介绍下Java世界,尤其是Netty中的zero-copy的设计实现。最后通过几篇扩展阅读,开阔一下视野,带领大家了解一下国外同行在zero copy上的一些技术性研究及其成果。OK,开篇~

Zero copy View


    Many Web applications serve a significant amount of static content, which amounts to reading data off of a disk and writing the exact same data back to the response socket. This activity might appear to require relatively little CPU activity, but it's somewhat inefficient: the kernel reads the data off of disk and pushes it across the kernel-user boundary to the application, and then the application pushes it back across the kernel-user boundary to be written out to the socket. In effect, the application serves as an inefficient intermediary that gets the data from the disk file to the socket.
    Each time data traverses the user-kernel boundary, it must be copied, which consumes CPU cycles and memory bandwidth. Fortunately, you can eliminate these copies through a technique called — appropriately enough — zero copy. Applications that use zero copy request that the kernel copy the data directly from the disk file to the socket, without going through the application. Zero copy greatly improves application performance and reduces the number of context switches between kernel and user mode.
    
    下面,我们以数据传输为例,来重点分析一下传统与零拷贝传输方式: 

    traditional approach
                                   

                                     figure 1: Traditional data copying approach 


                                         

                                                                               Figure 2:Traditional context switching:


    The steps involved are:
    The read() call causes a context switch (see Figure 2) from user mode to kernel mode. Internally a sys_read() (or equivalent) is issued to read the data from the file. The first copy (see Figure 1) is performed by the direct memory access (DMA) engine, which reads file contents from the disk and stores them into a kernel address space buffer.
    The requested amount of data is copied from the read buffer into the user buffer, and the read() call returns. The return from the call causes another context switch from kernel back to user mode. Now the data is stored in the user address space buffer.
    The send() socket call causes a context switch from user mode to kernel mode. A third copy is performed to put the data into a kernel address space buffer again. This time, though, the data is put into a different buffer, one that is associated with the destination socket.
    The send() system call returns, creating the fourth context switch. Independently and asynchronously, a fourth copy happens as the DMA engine passes the data from the kernel buffer to the protocol engine.
Use of the intermediate kernel buffer (rather than a direct transfer of the data into the user buffer) might seem inefficient. But intermediate kernel buffers were introduced into the process to improve performance. Using the intermediate buffer on the read side allows the kernel buffer to act as a "readahead cache" when the application hasn't asked for as much data as the kernel buffer holds. This significantly improves performance when the requested data amount is less than the kernel buffer size. The intermediate buffer on the write side allows the write to complete asynchronously.
    Unfortunately, this approach itself can become a performance bottleneck if the size of the data requested is considerably larger than the kernel buffer size. The data gets copied multiple times among the disk, kernel buffer, and user buffer before it is finally delivered to the application.
Zero copy improves performance by eliminating these redundant data copies.

    zero copy approach


                                    


                                      figure 3: zero copy data copying approach 




                                  

                                      figure 4: zero copy context switch

   The steps taken when you use transferTo() as in Listing 4 are:
   The transferTo() method causes the file contents to be copied into a read buffer by the DMA engine. Then the data is copied by the kernel into the kernel buffer associated with the output socket.
   The third copy happens as the DMA engine passes the data from the kernel socket buffers to the protocol engine.
This is an improvement: we've reduced the number of context switches from four to two and reduced the number of data copies from four to three (only one of which involves the CPU). But this does not yet get us to our goal of zero copy. We can further reduce the data duplication done by the kernel if the underlying network interface card supports gather operations. In Linux kernels 2.4 and later, the socket buffer descriptor was modified to accommodate this requirement. This approach not only reduces multiple context switches but also eliminates the duplicated data copies that require CPU involvement. The user-side usage still remains the same, but the intrinsics have changed:
   The transferTo() method causes the file contents to be copied into a kernel buffer by the DMA engine.
No data is copied into the socket buffer. Instead, only descriptors with information about the location and length of the data are appended to the socket buffer. The DMA engine passes data directly from the kernel buffer to the protocol engine, thus eliminating the remaining final CPU copy.

   关于zero-copy的性能:
        Michael Santyhttp://zeromq.org/results:copying做了一些实验,对于一个256MB的数据,单次数据拷贝延迟达到了0.1秒,由此可见在大数据传输过程中,这块有多么大的提升空间。

   Zero copy In java


      Java中跟zero copy相关的主要集中在FileChannel和MappedByteBuffer中。对应的,我们所熟知的网络通讯框架Netty4中跟zero copy相关的则主要集中在FileRegion和CompositeByteBuf中。

     Zero copy readings


      文献2主要介绍了Sockets Direct Protocol(他们改进了最初的开源SDP实现,在TCP SOCK_STREAM语义的开源实现中添加了zero copy对同步操作的支持,同时宣称在同一个主机下同时开启8个连接,CPU使用率降低了8倍,而这一切的损失,仅仅是带宽的压力从500 MB/S增长到800 MB/S)。而SDP协议被广泛应用在Infiniband架构(低延迟、高带宽数据中心互联架构,采用RDMA 实现高性能IPC。它用于广泛的关键通信计算环境,适用于HPC系统、大型数据中心和嵌入式应用等广泛环境)。
      文献3主要描述了MoD场景下 zero-copy buffer的分配方式(static or  dynamic allocation,before the transmission starts) 。使用静态分配能够避免了per-operation allocation of memory,从而降低per-packet cost(such as.CPU cycles)。
      文献4为我们展示了一种基于zero copy的Advanced Data Transfer Service (ADTS) 跨广域网高效FTP设计思路,从他们的测试数据来看,在传输大数据的时候,这种策略有近乎80%的速度提升。
      文献5将zero copy分为两种形式,passive zero-copy非常适合那些有着deterministic communication timing and sizes的应用,而active zero copy则恰恰相反(适用那些non-deterministic...)
     
      ok,这篇文章差不多就这些内容。下篇文章我会通过代码,监控等多种方式来重点阐述一下零拷贝Internal~

参考文献:

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
Cypress系列(40)- viewport() 命令详解
Cypress系列(40)- viewport() 命令详解
27 0
【nodejs】让nodejs像后端mvc框架(asp.net mvc)一样处理请求--控制器和处理函数的注册篇(4/8)【controller+action】
文章目录 前情概要 前边的文章把一些基本的前置任务都完成了。接下就是比较重要的处理函数action是如何自动发现和注册的拉,也就是入口函数RouteHandler(也是我们的第一个express中间件)里面的一些细节。
772 0
1001.Add More Zero
Problem Description There is a youngster known for amateur propositions concerning several mathematical hard problems.
742 0
AspNetCore-MVC实战系列(四)之结尾
AspNetCore - MVC实战系列目录 . 爱留图网站诞生 . git源码:https://github.com/shenniubuxing3/LovePicture.Web . AspNetCore - MVC实战系列(一)之Sqlserver表映射实体模型 .
899 0
x3d
copy
web app copy中
620 0
ExtJs Ext.panel.Panel和Ext.container.Viewport布局问题
Ext.container.Viewport Ext.panel.Panel Viewport 它的布局会占用整个 body,也应该是这样,它会随着浏览器的高度和宽度的变化而变化。 Panel 布局时需要提供一定的高度和宽度值,这个值是固定的,它不会随着浏览器的变化而变化。
845 0
php移动文件的函数 move_uploaded_file()和copy
最近遇到的问题就是:用户提交一个图片,我这边给缩放成三种格式的大小,缩放的方法存在,所以我这边直接就调用三次,结果只有一张图片上传,采用循环调用的方法还是只有一个图片,后来发现 提交文件的时候有一个移动函数 move_uploaded_file(),这个函数虽然多次调用但是只能执行一次,原因就在于...
628 0
+关注
46
文章
0
问答
文章排行榜
最热
最新
相关电子书
更多
低代码开发师(初级)实战教程
立即下载
阿里巴巴DevOps 最佳实践手册
立即下载
冬季实战营第三期:MySQL数据库进阶实战
立即下载