5.2. 端到端性能
5.2.1. Taurus 测试环境
为了评估 Taurus 的端到端性能,我们使用工业标准的 SDN 工具、Tofino 交换机以及实现了 MapReduce 模块的 FPGA 建立了一个测试平台,如图 12 所示。基于该测试平台,可以看到 Taurus 以每包为基础进行决策,比传统控制面机器学习解决方案更快,并且具有更大的准确性(第 5.2.2 节)。
图 12. 用于推理的 Taurus 测试平台。服务器运行领先的开源软件,即 ONOS[46]、XDP[129]、InfluxDB[33],以及使用 TensorFlow[1]的控制面推理。Tofino 交换机和 FPGA 分别实现了 Taurus 的 MAT 流水线和 MapReduce 模块。
控制面机器学习: 基线(baseline)。 为了确定比较基线[6],控制面服务器通过 10Gbps 链接,利用支持 XDP 的英特尔 X710 网卡[79] (运行一个 84 行的自定义 XDP/eBPF 程序)对遥测数据包进行采样,并将其存储在 InfluxDB 流数据库中[33]。向量机器学习模型(Keras[64]编写,272 行)分批对数据包进行推理,开放网络操作系统(ONOS)[46]将模型结果作为流规则配置在交换机上。
[6] XDP 2.6.32 | Tensorflow/Keras 2.4.0 | InfluxDB 1.7.4 | ONOS 2.2.2 | Stratum/Barefoot SDE 9.2.0 | Xilinx Vivado 2020.2 | Spatial 40d182 | MoonGen 525d991
用户面机器学习: Taurus。 在 Taurus 中[6],Stratum OS[48]运行在可编程 Barefoot Wedge 100BF-32X[116]交换机上,实现 Taurus 的 PISA 组件。Tofino 交换机的匹配动作流水线实现了 Taurus 的解析和预处理 MAT(172 行)。100Gbps 以太网链接将交换机连接到 Xilinx Alveo U250 FPGA[164],模拟 MapReduce 硬件。Spatial[88] (105 行)和 Xilinx OpenNIC Shell[166]将机器学习模型编译到 FPGA。为了传输数据包,我们将 Xilinx 100G CMAC[167]与 AXI 流接口[165] (298 行)集成到 FPGA 的 MapReduce 模块。一旦被 FPGA 处理,数据包在被转发到网络之前要经过交换机的后处理 MAT。最后,用另外两个运行 MoonGen[39]的 80 核英特尔至强服务器来产生和接收流量。
5.2.2. 异常检测案例研究
我们在控制面和数据面实现了机器学习异常检测模型(第 3 节)。我们从 NSL-KDD[36]数据集生成标记的数据包级 trace(labeled packet-level trace),方法是将连接级记录扩展为分档的数据包 trace(即每个 trace 元素代表一组数据包),并对其状态进行标记(异常或正常)。从原始 trace 中取样获取流量大小的分布、混合和数据包头域的变化率,以创造现实的工作负载。我们用这些数据来训练 DNN,离线 F1 得分(典型机器学习指标[157])为 71.1。
为了测试数据面机器学习,我们将所有流量发送到交换机和 FPGA。交换机使用 MAT 对特征进行预处理。首先用数据包五元组来索引一组有状态寄存器,这些寄存器积累了整个数据包的特征(例如紧急标志的数量),然后将这些特征格式化为定点数,并将数据包转发到 FPGA,由机器学习模型将其标记为异常或正常。然后,数据包返回交换机,后处理 MAT 基于机器学习决策对包进行处理。
在基线系统中,交换机积累特征并将其作为遥测数据包发送到控制面进行推理。如果该模型确定某个数据包不正常,就会提取相应的 IP,并配置一个交换机规则,将其数据包标记为异常。任何在规则安装前通过的数据包都会被(错误的)原样转发。流量以固定的 5Gbps 发送,而控制面采样从 100 kbps (10−5)到 100 Mbps (10−2)不等。
Taurus 对每个包作出响应。 表 8 显示控制面的毫秒级延迟。即使在低采样率下,基线也有 32 毫秒的延迟。随着负载增长,批处理的规模也在增长,因为批处理的第一个元素必须等待整个批处理的完成,这又进一步增加了延迟。此外,当实现逐包处理的系统时,机器学习不是瓶颈。相反,表 8 显示,规则配置和数据包收集使系统不堪重负[7]。
表 8. 基线机器学习系统的批大小、延迟和准确性。Taurus 发现重要事件的能力相对基线系统提升了两个数量级。
[7] 我们还探索了在交换机 CPU 而不是在服务器上配置规则,以消除控制器和交换机操作系统之间的 RTT。然而,由于交换机 CPU 性能较差(8 个 1.6GHz CPU,而服务器有 80 个 2.5GHz CPU),减少的 RTT 时间被额外的流量规则计算时间所抵消,平均延迟高出 112%。
Taurus 维持了全部模型的准确性。 大多数异常数据包都没有被基线系统检测到,而 Taurus 捕获了一半以上(表 8)。考虑到被识别的异常情况、遗漏的异常情况和被错误标记为异常的正常数据包的数量[8][153],我们使用 F1 分数来评估准确性。Taurus 实现了与孤立模型相同的 F1 得分。然而,由于规则配置和其他处理延迟,基线系统错过了很多数据包,因此有效 F1 得分较低。
5.2.3. 在线训练
Taurus 的机器学习模型也可以被更新以优化全局指标,这对于单个交换机无法观察到的行为(如下游拥堵)很有帮助。我们将遥测数据包送入控制面的训练应用程序,并使用流量规则配置时间作为估计,评估更新数据面模型权重所需的时间。
图 13 显示,更高的采样率(对应更多的批处理数据)收敛速度更快(几十到几百毫秒),表明在线训练是可能的,并切控制面采样数据频率越高越好。此外,图 14 显示,具有最小批量(64)和最多周期(10)的配置会获得最高的 F1 分数,并且增加的训练时间被更快的收敛所抵消。因此,较小批量和较大周期会产生较少的但更有意义的更新,从而避免频繁但没有意义的更新。
6. 当前的限制以及未来的工作
虽然 Taurus 可以比现有平台更有效的支持逐包 ML 算法,但模型大小仍然受到可用硬件资源的限制。为了适应更大的模型,必须研究模型压缩技术。此外,需要更多研究来提供正确性可证明的模型验证和更快的训练时间。
模型压缩。 Taurus 的主要应用是网络控制和协调。神经网络可以解决各种控制问题[10][149][156],而且越来越小。例如,用于非线性控制的结构化控制网[149]的性能几乎与 512 个神经元的 DNN 一样好,每层只需要 4 个神经元。有了这样的小网络,Taurus 可以同时运行多个模型(例如,一个模型用于入侵检测,另一个用于流量优化)。此外,像量化(quantization)、修剪(pruning)和蒸馏(distillation)等技术可以进一步减少模型大小[7][69][81][158]。
正确性。 某些应用(如路由)有明确的正确和错误答案,如果没有安全措施,将会很难保证。相反,机器学习最适合于本质上是启发式的应用,如拥塞控制[162][169]、负载均衡[4][85]和异常检测[106][153],其决定只影响网络性能和安全性,而不是其核心的包转发行为。当机器学习决策可能影响网络正确性时,后处理规则可以确保最终决策是有边界的。
训练速度。 最后,对于较大的数据中心网络,训练反应时间可能会更高。这意味着瞬时网络问题(如特定链路上的负载)不能通过训练来处理,因为在训练完成之前,事件可能就已经结束了。相反,训练会逐渐学习是什么导致这些问题以及如何避免这些问题。为了检测和应对具体事件,较低延迟的技术,如 INT,反而可以将网络状态作为特征提供给模型。
7. 相关工作
面向机器学习的架构。 现场可编程门阵列(FPGA, Field programmable gate array)是最为广泛使用的可重配架构,既可用作定制加速器[140],也可用作原型(例如 NetFPGA[104][113])。然而,FPGA 的片上网络消耗了高达 70%的芯片总功率[23],而且其可变、缓慢的时钟频率使网络和交换速度(每秒 Tb 级)操作变得复杂。CGRA 为机器学习运算进行了优化,通常有快速、固定的时钟频率,可以和 Taurus 中的 MapReduce 模块和 MAT 进行无缝集成[32][50][58][105][107][143]。其他架构,如 Eyeriss[26]、Brainwave[49]和 EIE[68],专注于特定算法实现,从而提高效率。这些实现都可用于交换机内机器学习,但过于死板,如果基于特定加速器实现标准化,就有可能由于缺乏灵活的抽象(如 MapReduce),使得网络无法从未来的机器学习研究中获益。
面向机器学习的数据面。 在现代数据中心内部,基于 MAT 的可编程数据面交换机(如 Barefoot Tofino[114][115]),允许网络轻松支持、执行不同任务(如重击者检测[148]、负载均衡[85]、安全[93, 175]、快速重路由[74]和调度[147]),而这些任务之前需要通过终端主机服务器、中间件或固定功能交换机实现。最近业界正在努力(N2Net[144]和 IIsy[168])研究在交换机上运行更复杂的机器学习算法。然而,仅靠 MAT 是不适合支持机器学习算法的,为了让数据面机器学习变得无处不在,需要更有效的硬件来减少不必要的资源使用(即面积和功率)。我们相信 Taurus 是朝着这个方向迈出的第一步。
在数据中心终端主机上,现代智能网卡(和网络加速器)为服务器提供了更高的计算能力,不仅可以卸载数据包处理逻辑,还可以处理特定应用逻辑,以实现更低的延迟和更高的吞吐量。像英特尔的基础设施单元(IPU, Infrastructure Unit)[80]、英伟达的数据处理单元(DPU, Data Processing Unit)[120]和基于 FPGA 的智能网卡[44, 164]等产品都旨在加速各种终端主机工作负载(例如,管理程序、虚拟交换、基于批处理的机器学习训练和推理、存储)。然而,这些智能网卡并不适合需要在数据中心全网范围内进行的逐包操作。数据包仍然需要从交换机访问服务器(和网卡),从而导致从收到数据包到交换机上的应用做出相应决策之间有一个 RTT 的延迟。中间件在其固定功能硬件实现上也有类似的问题,从而限制了灵活性并促使其频繁升级[141]。
面向网络的机器学习。 许多网络应用可以从机器学习中受益。例如,拥塞控制学习算法[162][169]已被证明优于人类设计的同类算法[37][66][171]。此外,Boutaba 等人[17]确定了网络任务的机器学习用例,如流量分类[40][41]、流量预测[24][27]、主动队列管理[151][179]和安全[124]。所有这些应用都可以立即基于 Taurus 进行部署。
面向机器学习的网络。 特定的网络也可以加速机器学习算法本身。通过对现代数据面硬件的细微改进,交换机可以在网络中聚合梯度,将训练速度提高 300%[51][92][137]。Gaia(一个用于分布式机器学习的系统[76])也考虑了广域网络带宽并在训练过程中调节梯度的移动。虽然 Taurus 没有明确为分布式训练加速而设计,但 MapReduce 可以比 MAT 更有效的聚合包含在数据包中的数字化权重。
8. 结论
目前,数据中心网络被划分为线速网络、逐包数据面和较慢的控制面,其中控制面运行复杂的、数据驱动的管理策略来配置数据面。这种方法太慢了,控制面的传输增加了不可避免的延迟,因此控制面无法对网络事件做出足够快的反应。Taurus 把复杂决策放到数据面,降低反应时间,并提高了管理和控制策略的特殊性。我们证明,Taurus 可以线速运行,并在可编程交换机流水线(RMT)上增加了很小的开销(3.8%的面积和 122 ns 的平均延迟),同时可以加速最近提出的几个机器学习网络基准测试。
展望未来,为了实现自动驾驶网络(self-driving network, 又称自智网络),必须在大规模训练开始之前部署硬件。我们相信,Taurus 为网内机器学习提供了一个立足点,其硬件可以安装在下一代数据平面上,以提高性能和安全性。
A. 附录
A.1. 概要
该工件包含文中提出的 Taurus 的 MapReduce 模块(第 3.3 节)和端到端性能评估中使用的异常检测(AD)应用程序(第 5.2 节)的源代码。MapReduce 模块与 Xilinx OpenNIC Shell[166]集成,并通过称为 Spatial[88]的高级 DSL 进行编程。AD 应用程序在端到端测试平台上运行,由各种组件组成: P4 程序、Python 脚本、ONOS 应用和 Spatial 代码(第 5.2.2 节)。
A.2. 范围
该工件包含两个新组件: MapReduce 模块和异常检测代码,目标是提供实现完整测试平台(第 5.2 节)所需的两个缺失组件,以运行端到端性能评估。其余组件是现有的专用硬件(Barefoot Tofino Switch [114], Xilinx Alveo Board [164])和软件代码库(P4 [14], ONOS [46], Spatial [88], 等等)的集合,可以随时购买和下载(请见下文的依赖关系)[8]。
[8] 我们希望用户购买所需硬件,并获得必要的工具和许可证的个人访问权,因此在工件中不包含这些组件的设置。
A.3. 内容
- FPGA 中的 MapReduce 模块。 该资源库包含了构建基于 FPGA 的 MapReduce 模块实现的源代码和说明。在 FPGA 上运行 MapReduce 模块的详细文档在这里提供: https://gitlab.com/dataplane-ai/taurus/mapreduce。
- 异常检测应用。 该资源库含有异常检测应用(AD)的源代码(P4、Python、ONOS、Spatial)。此外还提供了复制用于评估 AD 应用的端到端测试平台所需的细节。对 AD 应用的介绍在这里: https://gitlab.com/dataplane-ai/taurus/applications/anomalydetection-asplos22。
A.4. 托管
源代码托管在 GitLab[9]和 figshare[10]上。
A.5. 依赖
该工件依赖于以下第三方硬件和软件工具:
- EdgeCore Wedge 100BF-32X Switch
- Xilinx Alveo U250 FPGA board
- Intel P4 Studio
- Xilinx Vivado Design Tool
- Xilinx OpenNIC Shell
- Spatial DSL
- ONF Stratum OS
- ONF Open Network Operating System (ONOS)
- MoonGen Traffic Generator
参考文献
[1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A System For Large-Scale Machine Learning. In USENIX OSDI ’16.
[2] Preeti Aggarwal and Sudhir Kumar Sharma. 2015. Analysis of KDD Dataset Attributes-Class Wise For Intrusion Detection. Computer Science 57 (2015), 842–851. https://doi.org/10.1016/j.procs.2015.07.490
[3] Anurag Agrawal and Changhoon Kim. 2020. Intel Tofino2 – A 12.9Tbps P4-Programmable Ethernet Switch. In Hot Chips ’20.
[4] Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. 2014. CONGA: Distributed Congestion-aware Load Balancing for Datacenters. In ACM SIGCOMM ’14. https://doi.org/10.1145/2619239.2626316
[5] Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In ACM SIGCOMM ’10. https://doi.org/10.1145/1851182.1851192
[6] Tom Auld, Andrew W Moore, and Stephen F Gull. 2007. Bayesian Neural Networks For Internet Traffic Classification. IEEE Transactions on Neural Networks ’07 18, 1 (2007), 223–239. https://doi.org/10.1109/TNN.2006.883010
[7] Jimmy Ba and Rich Caruana. 2014. Do Deep Nets Really Need to be Deep?. In NeurIPS ’14.
[8] Jarrod Bakker, Bryan Ng, Winston KG Seah, and Adrian Pekar. 2019. Traffic Classification with Machine Learning in a Live Network. In 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM) ’19.
[9] Rajeev Balasubramonian, Andrew B Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Transactions on Architecture and Code Optimization (TACO ’17) 14, 2 (2017), 1–25. https://doi.org/10.1145/3085572
[10] Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research (JAIR) 47 (2013), 253–279.
[11] Laurent Bernaille, Renata Teixeira, Ismael Akodkenou, Augustin Soule, and Kave Salamatian. 2006. Traffic Classification on the Fly. ACM SIGCOMM Computer Communication Review (CCR) 36, 2 (2006), 23–26. https://doi.org/10.1145/1129582.1129589
[12] Kirti Bhanushali and W Rhett Davis. 2015. FreePDK15: An open-source predictive process design kit for 15nm FinFET technology. In Proceedings of the 2015 Symposium on International Symposium on Physical Design. 165–170. https://doi.org/10.1145/2717764.2717782
[13] Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming protocol-independent packet processors. ACM SIGCOMM Computer Communication Review 44, 3 (2014), 87–95. https://doi.org/10.1145/2656877.2656890
[14] Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, et al. 2014. P4: Programming Protocol-Independent Packet Processors. ACM SIGCOMM Computer Communication Review (CCR) 44, 3 (2014), 87–95. https://doi.org/10.1145/2656877.2656890
[15] Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN. In ACM SIGCOMM ’13. https://doi.org/10.1145/2534169.2486011
[16] Leon Bottou. 2010. Feature Engineering. https://www.cs.princeton.edu/ courses/archive/spring10/cos424/slides/18-feat.pdf. Accessed on 08/12/2021.
[17] Raouf Boutaba, Mohammad A Salahuddin, Noura Limam, Sara Ayoubi, Nashid Shahriar, Felipe Estrada-Solano, and Oscar M Caicedo. 2018. A Comprehensive Survey on Machine Learning for Networking: Evolution, Applications and Research Opportunities. Journal of Internet Services and Applications (JISA) 9, 1(2018), 16. https://doi.org/10.1186/s13174-018-0087-2
[18] Andrey Brito, Andre Martin, Thomas Knauth, Stephan Creutz, Diogo Becker, Stefan Weigert, and Christof Fetzer. 2011. Scalable and Low-Latency Data Processing with Stream MapReduce. In IEEE CLOUDCOM ’11. https://doi.org/10.1109/CloudCom.2011.17
[19] Broadcom. [n.d.]. Tomahawk/BCM56960 Series. https://www.broadcom.com/products/ethernet-connectivity/switching/strataxgs/bcm56960-series. Accessed on 08/12/2021.
[20] Kevin J Brown, HyoukJoong Lee, Tiark Romp, Arvind K Sujeeth, Christopher De Sa, Christopher Aberger, and Kunle Olukotun. 2016. Have Abstraction and Eat Performance, too: Optimized Heterogeneous Computing with Parallel Patterns. In IEEE/ACM CGO ’16. https://doi.org/10.1145/2854038.2854042
[21] Kevin J Brown, Arvind K Sujeeth, Hyouk Joong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In IEEE PACT ’11. https://doi.org/10.1109/PACT.2011.15
[22] Douglas Ronald Burdick, Amol Ghoting, Rajasekar Krishnamurthy, Edwin Peter Dawson Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar Vaithyanathan. 2013. Systems and methods for processing machine learning algorithms in a MapReduce environment. US Patent 8,612,368.
[23] Benton Highsmith Calhoun, Joseph F Ryan, Sudhanshu Khanna, Mateja Putic, and John Lach. 2010. Flexible Circuits and Architectures for Ultralow Power. Proc. IEEE 98, 2 (2010), 267–282.
[24] Samira Chabaa, Abdelouhab Zeroual, and Jilali Antari. 2010. Identification and Prediction of Internet Traffic Using Artificial Neural Networks. Journal of Intelligent Learning Systems and Applications (JILSA) 2, 03 (2010), 147. https://doi.org/10.4236/jilsa.2010.23018
[25] Huan Chen and Theophilus Benson. 2017. The Case for Making Tight Control Plane Latency Guarantees in SDN Switches. In ACM SOSR ’17. https://doi.org/10.1145/3050220.3050237
[26] Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (2016), 127–138. https://doi.org/10.1109/JSSC.2016.2616357
[27] Zhitang Chen, Jiayao Wen, and Yanhui Geng. 2016. Predicting Future Traffic Using Hidden Markov Models. In IEEE ICNP ’16.
[28] Cheng-Tao Chu, Sang K Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Kunle Olukotun, and Andrew Y Ng. 2007. Map-Reduce for Machine Learning on Multicore. In NeurIPS ’07. 281–288.
[29] Cisco Systems, Inc. [n.d.]. Cisco Meraki (MX450): Powerful Security and SD-WAN for the Branch & Campus. https://meraki.cisco.com/products/appliances/mx450. Accessed on 08/12/2021.
[30] Graham Cormode and Shan Muthukrishnan. 2005. An Improved Data Stream Summary: The Count-Min Sketch and its Applications. Journal of Algorithms 55, 1 (2005), 58–75. https://doi.org/10.1016/j.jalgor.2003.12.001
[31] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for Youtube Recommendations. In ACM RecSys ’16. https://doi.org/10.1145/2959100.2959190
[32] Darren C Cronquist, Chris Fisher, Miguel Figueroa, Paul Franklin, and Carl Ebeling. 1999. Architecture Design of Reconfigurable Pipelined Datapaths. In IEEE ARVLSI ’99.
[33] Influx Data. [n.d.]. InfluxDB. https://www.influxdata.com/products/influxdboverview/. Accessed on 08/12/2021.
[34] Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R Aberger, Kunle Olukotun, and Christopher Ré. 2018. High-Accuracy Low-Precision Training. arXiv preprint arXiv:1803.03383 (2018).
[35] Dell EMC. [n.d.]. Data Center Switching Quick ReferenceGuide. https://i.dell.com/sites/doccontent/shared-content/datasheets/en/Documents/Dell-Networking-Data-Center-Quick-ReferenceGuide.pdf. Accessed on 08/12/2021.
[36] L Dhanabal and SP Shantharajah. 2015. A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms. International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE) 4, 6 (2015), 446–452. https://doi.org/10.17148/IJARCCE.2015.4696
[37] Mo Dong, Qingxi Li, Doron Zarchy, P Brighten Godfrey, and Michael Schapira. 2015. PCC: Re-architecting Congestion Control for Consistent High Performance. In USENIX NSDI ’15.
[38] DPDK. [n.d.]. DPDK. https://www.dpdk.org/. Accessed on 08/12/2021.
[39] Paul Emmerich, Sebastian Gallenmüller, Daniel Raumer, Florian Wohlfart, and Georg Carle. 2015. Moongen: A Scriptable High-Speed Packet Generator. In ACM IMC ’15.
[40] Jeffrey Erman, Martin Arlitt, and Anirban Mahanti. 2006. Traffic Classification Using Clustering Algorithms. In ACM MineNet ’06. https://doi.org/10.1145/1162678.1162679
[41] Jeffrey Erman, Anirban Mahanti, Martin Arlitt, and Carey Williamson. 2007. Identifying and Discriminating Between Web and Peer-to-Peer Traffic in the Network Core. In WWW ’07. https://doi.org/10.1145/1242572.1242692
[42] Alice Este, Francesco Gringoli, and Luca Salgarelli. 2009. Support Vector Machines For TCP Traffic Classification. Computer Networks 53, 14 (2009), 2476–2490. https://doi.org/10.1016/j.comnet.2009.05.003
[43] Nick Feamster and Jennifer Rexford. 2018. Why (and How) Networks Should Run Themselves. In ACM ANRW ’18.
[44] Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, et al. 2018. Azure Accelerated Networking: SmartNICs in the Public Cloud. In USENIX NSDI ’18.
[45] Sally Floyd and Van Jacobson. 1993. Random early detection gateways for congestion avoidance. IEEE/ACM Transactions on networking 1, 4 (1993), 397–413.
[46] Open Networking Foundation. [n.d.]. ONOS: Open Network Operating System. https://www.opennetworking.org/onos/. Accessed on 08/12/2021.
[47] Open Networking Foundation. [n.d.]. ONOS: Single Bench Flow Latency Test. https://wiki.onosproject.org/display/ONOS/2.2%3A+Experiment+I+-+Single+Bench+Flow+Latency+Test. Accessed on 08/12/2021.
[48] Open Networking Foundation. [n.d.]. Stratum OS. https://www.opennetworking.org/stratum/. Accessed on 08/12/2021.
[49] Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A Configurable Cloud-Scale DNN Processor for Real-Time AI. In IEEE ISCA ’18. https://doi.org/10.1109/ISCA.2018.00012
[50] Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing. In IEEE HPCA ’16.
[51] Nadeen Gebara, Manya Ghobadi, and Paolo Costa. 2021. In-network Aggregation for Shared Machine Learning Clusters. In MLSys ’21.
[52] Yilong Geng, Shiyu Liu, Feiran Wang, Zi Yin, Balaji Prabhakar, and Mendel Rosenblum. 2017. Self-Programming Networks: Architecture and Algorithms. In IEEE Allerton ’17.
[53] Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. 2019. SIMON: A Simple and Scalable Method for Sensing, Inference and Measurement in Data Center Networks. In USENIX NSDI ’19.
[54] Amol Ghoting, Prabhanjan Kambadur, Edwin Pednault, and Ramakrishnan Kannan. 2011. NIMBLE: A Toolkit for the Implementation of Parallel Data Mining and Machine Learning Algorithms on MapReduce. In ACM SIGKDD KDD ’11. https://doi.org/10.1145/2020408.2020464
[55] Amol Ghoting, Rajasekar Krishnamurthy, Edwin Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar Vaithyanathan. 2011. SystemML: Declarative Machine Learning on MapReduce. In IEEE ICDE ’11.
[56] Glen Gibb, George Varghese, Mark Horowitz, and Nick McKeown. 2013. Design principles for packet parsers. In ACM/IEE ANCS ’13. https://doi.org/10.1109/ANCS.2013.6665172
[57] Dan Gillick, Arlo Faria, and John DeNero. 2006. MapReduce: Distributed Computing for Machine Learning. Berkley, Dec 18 (2006).
[58] Seth Copen Goldstein, Herman Schmit, Mihai Budiu, Srihari Cadambi, Matthew Moe, and R Reed Taylor. 2000. PipeRench: A Reconfigurable Architecture and Compiler. Computer 33, 4 (2000), 70–77.
[59] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. Vol. 1. MIT Press, Cambridge.
[60] Google. [n.d.]. A look inside Google’s Data Center Networks. https://-cloudplatform.googleblog.com/2015/06/A-Look-Inside-Googles-Data-CenterNetworks.html. Accessed on 08/12/2021.
[61] Google. 2021. Tensorflow Lite. https://www.tensorflow.org/tflite.
[62] Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: A Scalable and Flexible Data Center Network. In ACM SIGCOMM ’09. https://doi.org/10.1145/1592568.1592576
[63] P4.org Archictecture Working Group. 2018. P4-16 Portable Switch Architecture. https://p4.org/p4-spec/docs/PSA-v1.1.0.pdf.
[64] Antonio Gulli and Sujit Pal. 2017. Deep learning with Keras. Packt Publishing Ltd.
[65] Vladimir Gurevich. 2018. Programmable Data Plane at Terabit Speeds. https://p4.org/assets/p4_d2_2017_programmable_data_plane_at_terabit_-speeds.pdf. Accessed on 08/12/2021.
[66] Sangtae Ha, Injong Rhee, and Lisong Xu. 2008. CUBIC: A New TCP-Friendly High-Speed TCP Variant. ACM SIGOPS Operating Systems Review 42, 5 (2008), 64–74. https://doi.org/10.1145/1400097.1400105
[67] Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, et al. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. In ACM/SIGDA FPGA ’17. https://doi.org/10.1145/3020078.3021745
[68] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: Efficient Inference Engine on Compressed Deep Neural Network. In ACM/IEEE ISCA ’16. https://doi.org/10.1145/3007787.3001163
[69] Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning Both Weights and Connections for Efficient Neural Network. In NeurIPS ’15.
[70] B Hariri and N Sadati. 2007. NN-RED: an AQM mechanism based on neural networks. Electronics Letters 43, 19 (2007), 1053–1055. https://doi.org/10.1049/el:20071791
[71] Robert Harper, David MacQueen, and Robin Milner. 1986. Standard ML. Department of Computer Science, University of Edinburgh.
[72] Marti A. Hearst, Susan T Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. 1998. Support Vector Machines (SVMs). IEEE Intelligent Systems and their Applications 13, 4 (1998), 18–28.
[73] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory (LSTM). Neural Computation 9, 8 (1997), 1735–1780.
[74] Thomas Holterbach, Edgar Costa Molero, Maria Apostolaki, Alberto Dainotti, Stefano Vissicchio, and Laurent Vanbever. 2019. Blink: Fast Connectivity Recovery Entirely in the Data Plane. In USENIX NSDI ’19.
[75] Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural networks 4, 2 (1991), 251–257. https://doi.org/10.1016/0893-6080(91)90009-T
[76] Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R Ganger, Phillip B Gibbons, and Onur Mutlu. 2017. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds. In USENIX NSDI ’17.
[77] Peng Huang, Chuanxiong Guo, Lidong Zhou, Jacob R Lorch, Yingnong Dang, Murali Chintalapati, and Randolph Yao. 2017. Gray Failure: The Achilles’ Heel of Cloud-Scale Systems. In ACM HotOS ’17. https://doi.org/10.1145/3102980.3103005
[78] Intel. [n.d.]. Intel Deep Insight Network Analytics Software. https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernetswitch/network-analytics/deep-insight.html. Accessed on 08/12/2021.
[79] Intel. [n.d.]. Intel® Ethernet Network Adapter X710-DA2 for OCP 3.0. https://ark.intel.com/content/www/us/en/ark/products/184822/intelethernet-network-adapter-x710-da4-for-ocp-3-0.html. Accessed on 08/12/2021.
[80] Intel. 2021. Intel Infrastructure Processing Unit (Intel IPU) and SmartNICs. https://www.intel.com/content/www/us/en/products/network-io/smartnic.html.
[81] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In IEEE CVPR ’18.
[82] Nathan Jay, Noga Rotman, Brighten Godfrey, Michael Schapira, and Aviv Tamar. 2019. A deep reinforcement learning perspective on internet congestion control. In ICML ’19.
[83] Eun Young Jeong, Shinae Woo, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. MTCP: A Highly Scalable UserLevel TCP Stack for Multicore Systems. In USENIX NSDI ’14.
[84] Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In IEEE ISCA ’17. https://doi.org/10.1145/3079856.3080246
[85] Naga Katta, Mukesh Hira, Changhoon Kim, Anirudh Sivaraman, and Jennifer Rexford. 2016. HULA: Scalable Load Balancing Using Programmable Data Planes. In ACM SOSR ’16. https://doi.org/10.1145/2890955.2890968
[86] Changhoon Kim. [n.d.]. Programming The Network Data Plane: What, How, and Why? https://conferences.sigcomm.org/ events/apnet2017/slides/chang.pdf. Accessed on 08/12/2021.
[87] Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, and Lawrence J Wobker. 2015. In-Band Network Telemetry via Programmable Dataplanes. In ACM SIGCOMM ’15 (Demo).
[88] David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2018. Spatial: A Language and Compiler for Application Accelerators. In ACM/SIGPLAN PLDI ’18. https://doi.org/10.1145/3192366.3192379
[89] Vibhore Kumar, Henrique Andrade, Buğra Gedik, and Kun-Lung Wu. 2010. DEDUCE: At the Intersection of MapReduce and Stream Processing. In ACM EDBT ’10. https://doi.org/10.1145/1739041.1739120
[90] Maciej Kuźniar, Peter Perešíni, and Dejan Kostić. 2015. What You Need to Know About SDN Flow Tables. In PAM ’15. Springer.
[91] Adam Langley, Alistair Riddoch, Alyssa Wilk, Antonio Vicente, Charles Krasic, Dan Zhang, Fan Yang, Fedor Kouranov, Ian Swett, Janardhan Iyengar, et al. 2017.
The QUIC Transport Protocol: Design and Internet-Scale Deployment. In ACM SIGCOMM ’17. https://doi.org/10.1145/3098822.3098842
[92] ChonLam Lao, Yanfang Le, Kshiteej Mahajan, Yixi Chen, Wenfei Wu, Aditya Akella, and Michael M Swift. 2021. ATP: In-network Aggregation for Multitenant Learning. In NSDI ’21.
[93] Ângelo Cardoso Lapolli, Jonatas Adilson Marques, and Luciano Paschoal Gaspary. 2019. Offloading real-time ddos attack detection to programmable data planes. In 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM) ’19. IEEE, 19–27.
[94] Yann LeCun, LD Jackel, Leon Bottou, A Brunot, Corinna Cortes, JS Denker, Harris Drucker, I Guyon, UA Muller, Eduard Sackinger, P Simard, and V Vapnik. 1995. Comparison of Learning Algorithms for Handwritten Digit Recognition. In ICANN ’95.
[95] Guanyu Li, Menghao Zhang, Shicheng Wang, Chang Liu, Mingwei Xu, Ang Chen, Hongxin Hu, Guofei Gu, Qi Li, and Jianping Wu. 2021. Enabling Performant, Flexible and Cost-Efficient DDoS Defense With Programmable Switches. IEEE/ACM Transactions on Networking ’21 (2021). https://doi.org/10.1109/TNET.2021.3062621
[96] Ruey-Hsia Li and Geneva G. Belford. 2002. Instability of Decision Tree Classification Algorithms. In ACM SIGKDD ’02. https://doi.org/10.1145/775047.775131
[97] Youjie Li, Iou-Jen Liu, Yifan Yuan, Deming Chen, Alexander Schwing, and Jian Huang. 2019. Accelerating Distributed Reinforcement Learning with In-Switch Computing. In ACM/IEEE ISCA ’19. https://doi.org/10.1145/3307650.3322259
[98] Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, Mohammad Alizadeh, and Minlan Yu. 2019. HPCC: High Precision Congestion Control. In ACM SIGCOMM ’19. https://doi.org/10.1145/3341302.3342085
[99] Eric Liang, Hang Zhu, Xin Jin, and Ion Stoica. 2019. Neural Packet Classification. In ACM SIGCOMM ’19. https://doi.org/10.1145/3341302.3342221
[100] Darryl Lin, Sachin Talathi, and Sreekanth Annapureddy. 2016. Fixed Point Quantization of Deep Convolutional Networks. In ICML ’16.
[101] Yingqiu Liu, Wei Li, and Yun-Chun Li. 2007. Network Traffic Classification Using K-Means Clustering. In IEEE IMSCCS ’07. https://doi.org/10.1109/IMSCCS.2007.52
[102] Zaoxing Liu, Hun Namkung, Georgios Nikolaidis, Jeongkeun Lee, Changhoon Kim, Xin Jin, Vladimir Braverman, Minlan Yu, and Vyas Sekar. 2021. Jaqen: A High-Performance Switch-Native Approach for Detecting and Mitigating Volumetric DDoS Attacks with Programmable Switches. In USENIX Security ’21.
[103] Loadbalancer.org. [n.d.]. Hardware ADC. https://www.loadbalancer.org/products/hardware/. Accessed on 08/12/2021.
[104] John W Lockwood, Nick McKeown, Greg Watson, Glen Gibb, Paul Hartke, Jad Naous, Ramanan Raghuraman, and Jianying Luo. 2007. NetFPGA: An Open Platform for Gigabit-Rate Network Switching and Routing. In IEEE MSE ’07.
[105] Alan Marshall, Tony Stansfield, Igor Kostarnov, Jean Vuillemin, and Brad Hutchings. 1999. A Reconfigurable Arithmetic Array for Multimedia Applications. In IEEE FPGA ’99. https://doi.org/10.1145/296399.296444
[106] Tahir Mehmood and Helmi B Md Rais. 2015. SVM for Network Anomaly Detection using ACO Feature Subset. In IEEE iSMSC ’15.
[107] Bingfeng Mei, Serge Vernalde, Diederik Verkest, Hugo De Man, and Rudy Lauwereins. 2002. DRESC: A Retargetable Compiler for Coarse-Grained Reconfigurable Architectures. In IEEE FPT ’02.
[108] Albert Mestres, Alberto Rodriguez-Natal, Josep Carner, Pere Barlet-Ros, Eduard Alarcón, Marc Solé, Victor Muntés-Mulero, David Meyer, Sharon Barkai, Mike J Hibbett, et al. 2017. Knowledge-Defined Networking. ACM SIGCOMM Computer Communication Review (CCR) 47, 3 (2017), 2–10. https://doi.org/10.1145/3138808.3138810
[109] Yaron Minsky, Anil Madhavapeddy, and Jason Hickey. 2013. Real World OCaml: Functional Programming for the Masses. O’Reilly Media, Inc.
[110] Andrew W Moore and Denis Zuev. 2005. Internet Traffic Classification using Bayesian Analysis Techniques. In ACM SIGMETRICS ’05. https://doi.org/10.1145/1064212.1064220
[111] Manya Ghobadi Nadeen Gebara, Paolo Costa. 2021. In-Network Aggregation for Shared Machine Learning Clusters. In MlSys.
[112] Vinod Nair and Geoffrey E Hinton. 2010. Rectified Linear Units Improve Restricted Boltzmann Machines. In ICML ’10.
[113] NetFPGA. [n.d.]. NetFPGA: A Line-rate, Flexible, and Open Platform for Research and Classroom Experimentation. https://netfpga.org/. Accessed on 08/12/2021.
[114] Barefoot Networks. [n.d.]. Barefoot Tofino. https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofinoseries.html. Accessed on 08/12/2021.
[115] Barefoot Networks. [n.d.]. Barefoot Tofino 2. https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofino2-series.html. Accessed on 08/12/2021.
[116] Edgecore Networks. [n.d.]. WEDGE 100BF-32X: 100GBE Data Center Switch. https://www.edge-core.com/productsInfo.php?cls=1&cls2=5& cls3=181&id=335. Accessed on 08/12/2021.
[117] Rolf Neugebauer, Gianni Antichi, José Fernando Zazo, Yury Audzevich, Sergio López-Buedo, and Andrew W. Moore. 2018. Understanding PCIe Performance for End Host Networking. In ACM SIGCOMM ’18. https://doi.org/10.1145/3230543.3230560
[118] Radhika Niranjan Mysore, Andreas Pamboris, Nathan Farrington, Nelson Huang, Pardis Miri, Sivasankar Radhakrishnan, Vikram Subramanya, and Amin Vahdat. 2009. PortLand: A Scalable Fault-tolerant Layer 2 Data Center Network Fabric. In ACM SIGCOMM ’09. https://doi.org/10.1145/1592568.1592575
[119] NVIDIA. [n.d.]. Tesla T4. https://www.nvidia.com/en-us/data-center/tesla-t4/. Accessed on 08/12/2021.
[120] NVidia. 2021. NVIDIA BLUEFIELD DATA PROCESSING UNITS. https://www.nvidia.com/en-us/networking/products/data-processing-unit/.
[121] Martin Odersky, Lex Spoon, and Bill Venners. 2008. Programming in Scala. Artima Inc.
[122] P4.org. 2020. P4-16 Language Specification. https://p4.org/p4-spec/docs/P4-16-v1.2.1.pdf.
[123] Junghun Park, Hsiao-Rong Tyan, and C-C Jay Kuo. 2006. Internet Traffic Classification for Scalable QoS Provision. In IEEE ICME ’06.
[124] Roberto Perdisci, Davide Ariu, Prahlad Fogla, Giorgio Giacinto, and Wenke Lee. 2009. McPAD: A Multiple Classifier System for Accurate Payload-Based Anomaly Detection. Computer Networks 53, 6 (2009), 864–881. https://doi.org/10.1016/j.comnet.2008.11.011
[125] Dan RK Ports and Jacob Nelson. 2019. When Should The Network Be The Computer?. In ACM HotOS ’19. https://doi.org/10.1145/3317550.3321439
[126] Pascal Poupart, Zhitang Chen, Priyank Jaini, Fred Fung, Hengky Susanto, Yanhui Geng, Li Chen, Kai Chen, and Hao Jin. 2016. Online Flow Size Prediction for Improved Network Routing. In IEEE ICNP ’16.
[127] Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, Ardavan Pedram, Christos Kozyrakis, and Kunle Olukotun. 2017. Plasticine: A Reconfigurable Architecture for Parallel Patterns. In ACM/IEEE ISCA ’17. https://doi.org/10.1145/3079856.3080256
[128] Raghu Prabhakar, Yaqi Zhang, and Kunle Olukotun. 2020. Coarse-Grained Reconfigurable Architectures. In NANO-CHIPS 2030. Springer, 227–246. https://doi.org/10.1007/978-3-030-18338-7_14
[129] IO Visor Project. [n.d.]. XDP: eXpress Data Path. https://www.iovisor.org/technology/xdp. Accessed on 08/12/2021.
[130] Jack W Rae, Sergey Bartunov, and Timothy P Lillicrap. 2019. Meta-Learning Neural Bloom Filters. arXiv:1906.04304 (2019).
[131] Alon Rashelbach, Ori Rottenstreich, and Mark Silberstein. 2020. A Computational Approach to Packet Classification. In ACM SIGCOMM ’20. https://doi.org/10.1145/3387514.3405886
[132] Luigi Rizzo. 2012. Netmap: A Novel Framework for Fast Packet I/O. In USENIX ATC ’12.
[133] Joshua Rosen, Neoklis Polyzotis, Vinayak Borkar, Yingyi Bu, Michael J Carey, Markus Weimer, Tyson Condie, and Raghu Ramakrishnan. 2013. Iterative mapreduce for large scale machine learning. arXiv preprint arXiv:1303.3517 (2013).
[134] Alexander Rucker, Tushar Swamy, Muhammad Shahbaz, and Kunle Olukotun. 2019. Elastic RSS: Co-Scheduling Packets and Cores Using Programmable NICs. In ACM APNet ’19. https://doi.org/10.1145/3343180.3343184
[135] Alexander Rucker, Matthew Vilim, Tian Zhao, Yaqi Zhang, Raghu Prabhakar, and Kunle Olukotun. 2021. Capstan: A Vector RDA for Sparsity. Association for Computing Machinery, New York, NY, USA, 1022–1035.
[136] Davide Sanvito, Giuseppe Siracusano, and Roberto Bifulco. 2018. Can the Network Be the AI Accelerator?. In NetCompute ’18. https://doi.org/10.1145/3229591.3229594
[137] Amedeo Sapio, Marco Canini, Chen-Yu Ho, Jacob Nelson, Panos Kalnis, Changhoon Kim, Arvind Krishnamurthy, Masoud Moshref, Dan RK Ports, and Peter Richtárik. 2019. Scaling Distributed Machine Learning with In-Network Aggregation. arXiv:1903.06701 (2019).
[138] Dipanjan Sarkar. 2018. Continuous Numeric Data – Strategies for Working with Continuous, Numerical Data. https://towardsdatascience.com/understandingfeature-engineering-part-1-continuous-numeric-data-da4e47099a7b. Accessed on 08/12/2021.
[139] Danfeng Shan, Fengyuan Ren, Peng Cheng, Ran Shu, and Chuanxiong Guo. 2018. Micro-burst in Data Centers: Observations, Analysis, and Mitigations. In IEEE ICNP ’18. https://doi.org/10.1109/ICNP.2018.00019
[140] Ahmad Shawahna, Sadiq M Sait, and Aiman El-Maleh. 2018. Fpga-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review. IEEE Access 7 (2018), 7823–7859.
[141] Justine Sherry, Sylvia Ratnasamy, and Justine Sherry At. 2012. A survey of enterprise middlebox deployments. (2012).
[142] Anton Shilov. [n.d.]. TSMC & Broadcom Develop 1700-mm2 CoWoS Interposer: 2x Larger Than Reticles. https://www.anandtech.com/show/15582/tsmcbroadcom-develop-1700-mm2-cowos-interposer-2x-larger-than-reticles. Accessed on 08/12/2021.[143] Hartej Singh, Ming-Hau Lee, Guangming Lu, Fadi J Kurdahi, Nader Bagherzadeh, and Eliseu M Chaves Filho. 2000. MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive Applications. IEEE Transactions on Computers ’00 49, 5 (2000), 465–481.
[144] Giuseppe Siracusano and Roberto Bifulco. 2018. In-Network Neural Networks. arXiv:1801.05731 (2018).
[145] Arunan Sivanathan, Hassan Habibi Gharakheili, Franco Loi, Adam Radford, Chamith Wijenayake, Arun Vishwanath, and Vijay Sivaraman. 2018. Classifying IoT devices in Smart Environments Using Network Traffic characteristics. IEEE Transactions on Mobile Computing ’18 18, 8 (2018), 1745–1759. https://doi.org/10.1109/TMC.2018.2866249
[146] Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, and Steve Licking. 2016. Packet Transactions: High-Level Programming for Line-Rate Switches. In ACM SIGCOMM ’16. https://doi.org/10.1145/2934872.2934900
[147] Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, and Nick McKeown. 2016. Programmable Packet Scheduling at Line Rate. In ACM SIGCOMM ’16. https://doi.org/10.1145/2934872.2934899
[148] Vibhaalakshmi Sivaraman, Srinivas Narayana, Ori Rottenstreich, Shan Muthukrishnan, and Jennifer Rexford. 2017. Heavy-Hitter Detection Entirely in the Data Plane. In ACM SOSR ’17. https://doi.org/10.1145/3050220.3063772
[149] Mario Srouji, Jian Zhang, and Ruslan Salakhutdinov. 2018. Structured control nets for deep reinforcement learning. In ICML ’18. PMLR, 4742–4751.
[150] Arvind K Sujeeth, HyoukJoong Lee, Kevin J Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand R Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: an implicitly parallel domain-specific language for machine learning. In ICML ’11.
[151] Jinsheng Sun and Moshe Zukerman. 2007. An Adaptive Neuron AQM for a Stable Internet. In International Conference on Research in Networking ’07. Springer.
[152] Runyuan Sun, Bo Yang, Lizhi Peng, Zhenxiang Chen, Lei Zhang, and Shan Jing. 2010. Traffic Classification Using Probabilistic Neural Networks. In IEEE ICNC ’10.
[153] Tuan A Tang, Lotfi Mhamdi, Des McLernon, Syed Ali Raza Zaidi, and Mounir Ghogho. 2016. Deep Learning Approach for Network Intrusion Detection in Software Defined Networking. In IEEE WINCOM ’16.
[154] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A Ghorbani. 2009. A Detailed Analysis of the KDD CUP 99 Data Set. In IEEE CISDA ’09.
[155] Simon Thompson. 2011. Haskell: The Craft of Functional Programming. Vol. 2. Addison-Wesley.
[156] Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A Physics Engine for Model-Based Control. In IEEE IROS ’12.
[157] C Van Rijsbergen. 1979. Information Retrieval: Theory and Practice. In Proceedings of the Joint IBM/University of Newcastle upon Tyne Seminar on Data Base Systems ’79.
[158] Erwei Wang, James J Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter YK Cheung, and George A Constantinides. 2019. Deep Neural Network Approximation for Custom Hardware: Where We’ve Been, Where We’re Going. ACM Computing Surveys (CSUR) ’19 52, 2 (2019), 1–39. https://doi.org/10.1145/3309551
[159] Haining Wang, Danlu Zhang, and Kang G Shin. 2002. Detecting SYN Flooding Attacks. In Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies ’02, Vol. 3. 1530–1539.
[160] Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training Deep Neural Networks with 8-bit Floating Point Numbers. In NeurIPS ’18.
[161] Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs. In ACM/SIGDA FPGA ’18. https://doi.org/10.1145/3174243.3174253
[162] Keith Winstein and Hari Balakrishnan. 2013. TCP Ex Machina: ComputerGenerated Congestion Control. In ACM SIGCOMM ’13. https://doi.org/10.1145/2534169.2486020
[163] Shihan Xiao, Haiyan Mao, Bo Wu, Wenjie Liu, and Fenglin Li. 2020. Neural Packet Routing. In ACM NetAI ’20. https://doi.org/10.1145/3405671.3405813
[164] Xilinx. [n.d.]. Alveo U250 Data Center Accelerator Card. https://www.xilinx.com/products/boards-and-kits/alveo/u250.html. Accessed on 08/12/2021.
[165] Xilinx. [n.d.]. Xilinx: AXI Reference Guide. https://www.xilinx.com/support/-documentation/ip_documentation/ug761_axi_reference_guide.pdf. Accessed on 08/12/2021.
[166] Xilinx. [n.d.]. Xilinx OpenNIC Shell. https://github.com/Xilinx/open-nic-shell. Accessed on 09/30/2021.
[167] Xilinx. [n.d.]. Xilinx: UltraScale+ Integrated 100G Ethernet Subsystem. https://www.xilinx.com/products/intellectual-property/cmac_usplus.html. Accessed on 08/12/2021.
[168] Zhaoqi Xiong and Noa Zilberman. 2019. Do Switches Dream of Machine Learning? Toward In-Network Classification. In ACM HotNets ’19. https://doi.org/10.1145/3365609.3365864
[169] Francis Y Yan, Jestin Ma, Greg D Hill, Deepti Raghavan, Riad S Wahby, Philip Levis, and Keith Winstein. 2018. Pantheon: The Training Ground for Internet Congestion-Control Research. In USENIX ATC ’18.
[170] Liangcheng Yu, John Sonchack, and Vincent Liu. 2020. Mantis: Reactive Programmable Switches. In ACM SIGCOMM ’20. https://doi.org/10.1145/3387514.3405870
[171] Yasir Zaki, Thomas Pötsch, Jay Chen, Lakshminarayanan Subramanian, and Carmelita Görg. 2015. Adaptive Congestion Control for Unpredictable Cellular Networks. In ACM SIGCOMM ’15. https://doi.org/10.1145/2785956.2787498
[172] Sebastian Zander, Thuy Nguyen, and Grenville Armitage. 2005. Automated Traffic Classification and Application Identification Using Machine Learning. In IEEE LCN ’05.
[173] Jun Zhang, Chao Chen, Yang Xiang, Wanlei Zhou, and Yong Xiang. 2012. Internet Traffic Classification by Aggregating Correlated Naïve Bayes Predictions. IEEE Transactions on Information Forensics and Security ’12 8, 1 (2012), 5–15. https://doi.org/10.1109/TIFS.2012.2223675
[174] Jun Zhang, Xiao Chen, Yang Xiang, Wanlei Zhou, and Jie Wu. 2014. Robust Network Traffic Classification. IEEE/ACM Transactions on Networking ’14 23, 4 (2014), 1257–1270. https://doi.org/10.1109/TNET.2014.2320577
[175] Menghao Zhang, Guanyu Li, Shicheng Wang, Chang Liu, Ang Chen, Hongxin Hu, Guofei Gu, Qianqian Li, Mingwei Xu, and Jianping Wu. 2020. Poseidon: Mitigating Volumetric DDoS Attacks with Programmable Switches. In NDSS ’20. https://doi.org/10.14722/ndss.2020.24007
[176] Yaqi Zhang, Alexander Rucker, Matthew Vilim, Raghu Prabhakar, William Hwang, and Kunle Olukotun. 2019. Scalable Interconnects for Reconfigurable Spatial Architectures. In ACM/IEEE ISCA ’19. https://doi.org/10.1145/3307650.3322249
[177] Yaqi Zhang, Nathan Zhang, Tian Zhao, Matt Vilim, Muhammad Shahbaz, and Kunle Olukotun. 2021. SARA: Scaling a Reconfigurable Dataflow Accelerator. In ACM/IEEE ISCA ’21. https://doi.org/10.1109/ISCA52012.2021.00085
[178] Hongtao Zhong, Kevin Fan, Scott Mahlke, and Michael Schlansker. 2005. A Distributed Control Path Architecture for VLIW Processors. In IEEE PACT ’05.
[179] Chuan Zhou, Dongjie Di, Qingwei Chen, and Jian Guo. 2009. An Adaptive AQM Algorithm Based on Neuron Reinforcement Learning. In IEEE ICCA ’09.\