F5 CMP architecture

简介:

Most manufacturers would simply attempt to use SMP to distribute TMOS process across multiple processors—with shared memory, network card, and special purpose processors. Others might attempt to run multiple instances of the TMM on different processors—still with the requisite shared memory, network card, and special-purpose processors. Instead, CMP( clustered multiprocessing) enables load balancing of multiple processing cores, each with its own dedicated memory, network interface, and special-purpose processors. Each core runs its own, completely independent TMM process. By separating the dependencies between the instances, CMP allows more of the traffic management process virtually the entire process to be parallelized. This provides a substantial benefit to the overall performance of the system.The hardware that enables CMP is comprised of two important, proprietary F5 technologies: the  Disaggregator and the High  Speed Bridge (HSB).
image

The  Disaggregator acts as a hardware-based load balancer, distributing traffic flows between the independent TMM instances and managing flow affinity if or when necessary. Not only does this facilitate a near 1:1 linear performance growth (doubling the number of processing cores nearly doubles the computing power with no diminished returns), but it completely virtualizes the processing cores from the system and the other cores. This provides high availability and reliability in the event that any core becomes non-functional.

The  HSB delivers direct, non-blocking communication between the TMM instances and the outside world without the loss normally associated with Ethernet interconnects. It also provides the streamlined message-passing interface that enables TMM instances to share information. This provides the unsurpassed throughput and interconnectivity of each processor’s dedicated network interfaces. It also mitigates the performance impact of inter-process communications in the few remaining instances where it takes place.

The rules has been changed by CMP

The amount of performance increase that can be expected by parallelizing a process is a factor of the amount of the process that can truly be parallelized. If a process requiring 10 units of time can only be 50 percent parallelized, the process will never run in less than five units, even if the parallelized portion is processed instantly. As a result, the entire process can never be more than twice as fast.

Up until now, the game has been pretty simple—and widely understood. First, it was to optimize your code to run on a single processor as best you can and ride the “Intel power-curve.” Then, it was to optimize your code for SMP or AMP and then build your platforms with as many processing cores as possible. All the while, performance improvements have slowly dwindled to miniscule amounts.

CMP changes the rules of the game. Instead of working to continually improve the performance of a never-changing proportion of parallelized processes, CMP’s most basic tenant is to change that proportion. Continuing improvements in performance can only be realized by increasing the amount of the application delivery process that can be parallelized. Only parallelizing nearly all of that process can enable near 1:1 linear scaling—fully utilizing all the processing cores.









本文转自 chris_lee 51CTO博客,原文链接:http://blog.51cto.com/ipneter/370040,如需转载请自行联系原作者

目录
相关文章
|
3月前
|
Linux
L1 Cache architecture in ARM
L1 Cache architecture in ARM
34 0
|
5月前
|
存储
【CSAPP】HW1 | 位向量的应用 Application of bit vectors | Adressing and Byte Ordering
【CSAPP】HW1 | 位向量的应用 Application of bit vectors | Adressing and Byte Ordering
15 0
【CSAPP】HW1 | 位向量的应用 Application of bit vectors | Adressing and Byte Ordering
|
iOS开发 MacOS
Qt 报错:Undefined symbols for architecture arm64
MacBook Pro Apple M1 使用 Qt 6.4.1 的时候碰到的报错,做了不同的尝试,最后解决了这个报错。
526 0
|
10月前
|
Ubuntu 物联网 Linux
【Matter】使用chip tool在ESP32-C3上进行matter开发
【Matter】使用chip tool在ESP32-C3上进行matter开发
669 0
relocation R_X86_64_PC32 against symbol can not be used when making a shared object recompile with
relocation R_X86_64_PC32 against symbol can not be used when making a shared object recompile with
422 0
onnx-tensorrt:builtin_op_importers.cpp:628:5: error: ‘IIdentityLayer’ is not a member of ‘nvinfer1’
onnx-tensorrt:builtin_op_importers.cpp:628:5: error: ‘IIdentityLayer’ is not a member of ‘nvinfer1’
78 0
|
人工智能 Linux vr&ar
High-performance RISC-V Processor Xuantie C908
High-performance RISC-V Processor Xuantie C908
342 0
High-performance RISC-V Processor Xuantie C908
Data Structures and Algorithms (English) - 6-2 Two Stacks In One Array(20 分)
Data Structures and Algorithms (English) - 6-2 Two Stacks In One Array(20 分)
117 0
Why expand does not work for complex note
Why expand does not work for complex note? Created by Wang, Jerry, last modified on Jan 12, 2015
103 0
Why expand does not work for complex note