网络数据流,先了解一下用户态协议栈在什么位置
这里以DPDK为例:(目的是为了获得原始的网络数据,除了DPDK,socket raw,netmap也能获取获取以太网数据)
1 默认数据流
默认情况下,网络数据经物理网卡,内核协议栈,VFS,最后到达APP
2 DPDK
DPDK接管网卡,它可以把数据送入用户态协议栈,也可以把数据传到sk_buffer中。
因为dpdk的这种特性,实现的用户态协议栈可以直接读写应用程序内存,更加灵活地控制网络数据流,实现更多自定义功能;可以避免系统调用、内核态切换等开销,减少网络数据包传输时的延迟和CPU使用率,提高处理吞吐量。
DPDK 编译与配置
设置环境变量
sudo su
cd 至 dpdk-stable-19.08.2
export RTE_SDK=“路径/”dpdk-stable-19.08.2/
export RTE_TARGET=x86_64-native-linux-gcc
运行./usertools/dpdk-setup.sh ,部分选项解释
Step 1: Select the DPDK environment to build 选择一种编译器编译
[1] arm64-armada-linuxapp-gcc
[2] arm64-armada-linux-gcc
[3] arm64-armv8a-linuxapp-clang
[4] arm64-armv8a-linuxapp-gcc
[5] arm64-armv8a-linux-clang
[6] arm64-armv8a-linux-gcc
[7] arm64-bluefield-linuxapp-gcc
[8] arm64-bluefield-linux-gcc
[9] arm64-dpaa2-linuxapp-gcc
[10] arm64-dpaa2-linux-gcc
[11] arm64-dpaa-linuxapp-gcc
[12] arm64-dpaa-linux-gcc
[13] arm64-octeontx2-linuxapp-gcc
[14] arm64-octeontx2-linux-gcc
[15] arm64-stingray-linuxapp-gcc
[16] arm64-stingray-linux-gcc
[17] arm64-thunderx2-linuxapp-gcc
[18] arm64-thunderx2-linux-gcc
[19] arm64-thunderx-linuxapp-gcc
[20] arm64-thunderx-linux-gcc
[21] arm64-xgene1-linuxapp-gcc
[22] arm64-xgene1-linux-gcc
[23] arm-armv7a-linuxapp-gcc
[24] arm-armv7a-linux-gcc
[25] i686-native-linuxapp-gcc
[26] i686-native-linuxapp-icc
[27] i686-native-linux-gcc
[28] i686-native-linux-icc
[29] ppc_64-power8-linuxapp-gcc
[30] ppc_64-power8-linux-gcc
[31] x86_64-native-bsdapp-clang
[32] x86_64-native-bsdapp-gcc
[33] x86_64-native-freebsd-clang
[34] x86_64-native-freebsd-gcc
[35] x86_64-native-linuxapp-clang
[36] x86_64-native-linuxapp-gcc
[37] x86_64-native-linuxapp-icc
[38] x86_64-native-linux-clang
[39] x86_64-native-linux-gcc ==我这里选择 x86_64-native-linux-gcc,因为我用的系统 ubuntu server x64==
[40] x86_64-native-linux-icc
[41] x86_x32-native-linuxapp-gcc
[42] x86_x32-native-linux-gcc
Step 2: Setup linux environment
[43] Insert IGB UIO module // 插入Intel Gigabit Ethernet驱动程序的用户态I/O(UIO)模块。这个模块可以帮助操作系统与Intel网卡进行通信,并提供网络连接服务。
[44] Insert VFIO module // 将物理设备分配给虚拟机以进行直接访问。这使得虚拟机可以在不影响主机性能的情况下获得更好的 I/O 性能,并提供更高的安全性和隔离性。
[45] Insert KNI module // (KNI)模块,以支持在用户空间和内核空间之间传输数据包。
[46] Setup hugepage mappings for non-NUMA systems //设置巨页系统,
[47] Setup hugepage mappings for NUMA systems
// NUMA systems是一种多处理器计算机体系结构,在多核,多内存条,实现统一编址访问
// 如果接收10G数据,只设置4k大小的内存页的话,就需要频繁访问页表,内存页置换,效率不高,这里根据实际情况设置巨页就很有必要。
[48] Display current Ethernet/Baseband/Crypto device settings
// 显示当前以太网/基带/加密设备的设置。这通常指的是计算机或网络设备上的硬件设置,例如网络适配器的速度、双工模式、MAC地址和加密协议等
[49] Bind Ethernet/Baseband/Crypto device to IGB UIO module // 将以太网/基带/加密设备绑定到IGB UIO模块
[50] Bind Ethernet/Baseband/Crypto device to VFIO module // 将以太网/基带/加密设备绑定到VFIO模块的
[51] Setup VFIO permissions // 为VFIO设备分配权限,以便可以在虚拟机中使用该设备
Step 3: Run test application for linux environment
[52] Run test application ($RTE_TARGET/app/test)
[53] Run testpmd application in interactive mode ($RTE_TARGET/app/testpmd)
Step 4: Other tools
[54] List hugepage info from /proc/meminfo
Step 5: Uninstall and system cleanup
[55] Unbind devices from IGB UIO or VFIO driver
[56] Remove IGB UIO module
[57] Remove VFIO module
[58] Remove KNI module
[59] Remove hugepage mappings
[60] Exit Script
Option: // 这里输入39回车,就对应step1中的编译环境([39] x86_64-native-linux-gcc)编译了
编译一次即可,编译完成后,就可以按需step2。配置好后,可运行step3中的测试程序。step4,5可根据实际情况使用。
运行./usertools/dpdk-setup.sh shell步骤记录
- 输入43, 设置uio module
```bash
Option: 43
Unloading any existing DPDK UIO module
Loading uio module
Loading DPDK UIO module
2. 输入44, 设置VFIO module
```bash
Option: 44
Unloading any existing VFIO module
Loading VFIO module
chmod /dev/vfio
OK
- 输入45, 设置KNI module
```bash
Option: 45
Unloading any existing DPDK KNI module
Loading DPDK KNI module
4. 输入46,设置hugepages
```bash
Option: 46
Removing currently reserved hugepages
Unmounting /mnt/huge and removing directory
Input the number of 1048576kB hugepages
Example: to have 128MB of hugepages available in a 2MB huge page system,
enter '64' to reserve 64 * 2MB pages
Number of pages: 512
Reserving hugepages
Creating /mnt/huge and mounting as hugetlbfs
- 输入47,设置hugepages for each node
Option: 47
Removing currently reserved hugepages
Unmounting /mnt/huge and removing directory
Input the number of 1048576kB hugepages for each node
Example: to have 128MB of hugepages available per node in a 2MB huge page system,
enter '64' to reserve 64 * 2MB pages on each node
Number of pages for node0: 512
Reserving hugepages
Creating /mnt/huge and mounting as hugetlbfs
- 输入48,显示设备
Network devices using kernel driver
===================================
0000:02:01.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=eth2 drv=e1000 unused=igb_uio,vfio-pci
0000:02:06.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=eth3 drv=e1000 unused=igb_uio,vfio-pci
0000:03:00.0 'VMXNET3 Ethernet Controller 07b0' if=eth0 drv=vmxnet3 unused=igb_uio,vfio-pci
0000:0b:00.0 'VMXNET3 Ethernet Controller 07b0' if=eth1 drv=vmxnet3 unused=igb_uio,vfio-pci *Active*
No 'Baseband' devices detected
==============================
No 'Crypto' devices detected
============================
No 'Eventdev' devices detected
==============================
No 'Mempool' devices detected
=============================
No 'Compress' devices detected
==============================
No 'Misc (rawdev)' devices detected
===================================
- 输入49,修改绑定的设备,这里是: bind to IGB UIO driver: eth0(或者填入:0000:03:00.0 都可以)
==这里的绑定是为了让DPDK接管网卡==
```bash
Option: 49Network devices using kernel driver
0000:02:01.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=eth2 drv=e1000 unused=igb_uio,vfio-pci
0000:02:06.0 '82545EM Gigabit Ethernet Controller (Copper) 100f' if=eth3 drv=e1000 unused=igb_uio,vfio-pci
0000:03:00.0 'VMXNET3 Ethernet Controller 07b0' if=eth0 drv=vmxnet3 unused=igb_uio,vfio-pci
0000:0b:00.0 'VMXNET3 Ethernet Controller 07b0' if=eth1 drv=vmxnet3 unused=igb_uio,vfio-pci Active
No 'Baseband' devices detected
No 'Crypto' devices detected
No 'Eventdev' devices detected
No 'Mempool' devices detected
No 'Compress' devices detected
No 'Misc (rawdev)' devices detected
Enter PCI address of device to bind to IGB UIO driver: (这里输入pci地址)0000:0b:00.0
Warning: routing table indicates that interface 0000:0b:00.0 is active. Not modifying ====> 注意有警告,是因为这个设备被占用了,
这个是后可以另起一个终端,执行sudo ifconfig eth0 down, 把它关掉
(查看网卡信息 lspci -k | grep -A 2 -i "Ethernet")
OK
绑定好的可以通过输入55,删除绑定,如下,vmxnet3正是.vmx文件中设置的 (删除绑定)
Enter PCI address of device to unbind: 0000:03:00.0
Enter name of kernel driver to bind the device to: vmxnet3
### 注意事项
在有多个网卡的情况下,ifconfig 看到的eth0,eth1与.vmx文件中的 ethernet0, ethernet1可能不是一一对应的关系
### 测试
```bash
Option: 53
Enter hex bitmask of cores to execute testpmd app on
Example: to execute app on cores 0 to 7, enter 0xff
bitmask: 7
Launching app
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:02:01.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:100f net_e1000_em
EAL: PCI device 0000:02:06.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:100f net_e1000_em
EAL: PCI device 0000:03:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 15ad:7b0 net_vmxnet3
EAL: PCI device 0000:0b:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 15ad:7b0 net_vmxnet3
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=163456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
Configuring Port 0 (socket 0)
Port 0: 00:0C:29:A3:11:BF
Checking link statuses...
Done
testpmd> help
Help is available for the following sections:
help control : Start and stop forwarding.
help display : Displaying port, stats and config information.
help config : Configuration information.
help ports : Configuring ports.
help registers : Reading and setting port registers.
help filters : Filters configuration help.
help traffic_management : Traffic Management commmands.
help devices : Device related cmds.
help all : All of the above sections.
testpmd> help control
Control forwarding:
-------------------
start
Start packet forwarding with current configuration.
start tx_first
Start packet forwarding with current config after sending one burst of packets.
stop
Stop packet forwarding, and display accumulated statistics.
quit
Quit to prompt.
testpmd> start
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=1
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x0 - TX RS bit threshold=0
testpmd> show port info 0
********************* Infos for port 0 *********************
MAC address: 00:0C:29:A3:11:BF
Device name: 0000:03:00.0
Driver name: net_vmxnet3
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 10000 Mbps
Link duplex: full-duplex
MTU: 1500
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 1
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
strip off
filter off
qinq(extend) off
Supported RSS offload flow types:
ipv4
ipv4-tcp
ipv6
ipv6-tcp
Minimum size of RX buffer: 1646
Maximum configurable length of RX packet: 16384
Current number of RX queues: 1
Max possible RX queues: 16
Max possible number of RXDs per queue: 4096
Min possible number of RXDs per queue: 128
RXDs number alignment: 1
Current number of TX queues: 1
Max possible TX queues: 8
Max possible number of TXDs per queue: 4096
Min possible number of TXDs per queue: 512
TXDs number alignment: 1
Max segment number per packet: 255
Max segment number per MTU/TSO: 16
文章参考与<零声教育>的C/C++linux服务期高级架构系统教程学习:链接