计算机体系结构词汇表

简介: 计算机体系结构词汇表

1BP: 1-bit branch predictor


4 C's - compulsory Misses: the first time a block is accessed by the cache 4 C's - capacity Misses: blocks must be evicted due to the size of the cache.


4 C's - coherence Miss: processors are accessing the same block. Processor A writes to the block. Even though Processor B has the block in its cache, it is a miss, because the block is no longer up-to-date.


4 C's - conflict Misses: associated with set associative and direct mapped caches - another data address needs the cache block and must replace the data currently in the cache.


ALAT: advance load table - stores advance information about load operations


aliasing: in the BTB, when two addresses overlap with the same BTB entry, this is called aliasing. Aliasing should be kept to <1%.


ALU: arithmetic logic unit


AMAT: average memory access time


AMAT: Average Memory Access Time = hit time + miss rate * miss penalty


Amdahl's Law: an equation to determine the improvement of a system when only a portion of the system is improved.


architectural registers: registers (Floating point and General Purpose) that are visible to the programmer.


ARF: architectural register file or retirement register file


Asynchronous Message Passing: a processor requests data, then continues processing instructions while message is retrieved.


BHT: branch history table - records if branch was taken or not taken.


blocking cache: the cache services only one block at a time, blocking all other requests BTB: branch target buffer - keeps track of what address was taken last time the processor encountered this instruction.


cache coherence definition #1: Definition #1 - A read R from address X on processor P1 returns the value written by the most recent write W to X on P1 if no other processor has written to X between W and R.


cache coherence definition #2: Definition #2 - If P1 writes to X and P2 reads X after a sufficient time, and there are no other writes to X in between, P2’s read returns the value written by P1’s write.


cache coherence definition #3: Definition #3 - Writes to the same location are serialized:two writes to location X are seen in the same order by all processors.


cache hit: desired data is in the cache and is up-to-date cache miss: desired data is not in the cache or is dirty


cache thrashing: when two or more addresses are competing for the same cache block. The processor is requesting both addresses, which results in each access evicting the previous access. CDB: common data bus


check pointing: store the state of the CPU before a branch is taken. Then if the branch is a misprediction, restore the CPU to correct state. Don't store to memory until it is determined this is the correct branch.


CISC Processor: complex instruction set CMP: chip multiprocessor


coarse multi-threading: the thread being processed changes every few clock cycles consistency: order of access to different addresses


control hazard: branching and jumps cannot be executed until the destination address is known CPI: cycle per instruction


CPU: central processing unit


Dark Silicon: the gap between how many transistors are on a chip and how many you can use simultaneously. The simultaneous usage is determined by the power consumption of the chip. data hazard: the order of the program is changed which results in data commands being out of order, if the instructions are dependent - then there is a data hazard.


DDR SDRAM: double data rate synchronous dynamic RAM dependency chain: long series of dependent instructions in code


directory protocols: information about each block state in the caches is stored in a common directory.


DRAM: dynamic random access memory


DSM: distributed shared memory - all processors can access all memory locations Enterprise class: used for large scale systems that service enterprises


error: defect that results in failure


error forecasting: estimate presence, creation, and consequences of errors error removal: removing latent errors by verification


exclusion property: each cache level will not contain any data held by a lower level cache explicit ILP: compiler decides which instruction to execute in parallel


failure: the cause of an error


fault avoidance: prevent an occurrence of faults by construction


fault tolerance: prevent faults from becoming failures through redundancy faults: actual behavior deviates from specified behavior


FIFO: first in first out


fine multi-threading: the thread being processed changes every cycle FLOPS: floating point operations per second


Flynn's Taxonomy: classifications of parallel computer architecture, SISD, SIMD, MISD, MIMD


FPR: floating point register FSB: front side bus


Geometric Mean: the nth root of the product of the numbers global miss rate: (the # of L2 misses)/(# of all memory misses) GPR: general purpose register


hit latency: time it takes to get data from cache. Includes the time to find the address in the cache and load it on the data lines


ilp: instruction level programming


inclusion property: each level of cache will include all data from the lower level caches IPC: instructions per cycle


Iron Law: execution time is the number of executed instructions N (write N in in the ExeTime for Single-Cycle), times the CPI (write x1), times the clock cycle time (write 2ns) so we get N2ns (write =N2ns) for single-cycle.


Iron Law: instructions per program depends on source code, compiler technology, and ISA. CPI depends upon the ISA and the micro architecture. Time per cycle depends upon the micro architecture and the base technology.


iron law of computer performance: relates cycles per instruction, frequency and number of instructions to computer performance


ISA: instruction set architecture


Itanium architecture: an explicit ILP architecture, six instructions can be executed per clock cycle


Itanium Processor: Intel family of 64-bit processors that uses the Itanium architecture LFU: least frequently used


ll and sc: load link and store conditional, a method using two instructions ll and sc for ensuring synchronization.


local miss rate: # of L2 misses/ # of L1 misses


locality principle: things that will happen soon are likely to be similar to things that just happened.


loop interchange: used for nested loops. Interchange the order of the iterations of the loop, to make the accesses of the indexes closer to what is actually the layout in memory


LRU: least recently used LSQ: load store queue


MCB: memory conflict buffer - "Dynamic Memory Disambiguation Using the Memory Conflict Buffer", see also "Memory Disambiguation"


MEOSI Protocol: modified-exclusive-owner-shared-invalid protocol, the states of any cached block.


MESI Protocol: modified-exclusive-shared-invalid protocol, the states of any cached block. Message Passing: a processor can only access its local memory. To access other memory locations is must send request/receive messages for data at other memory locations. meta-predictor: a predictor that chooses the best branch predictor for each branch. MIMD: multiple instruction stream, multiple data streams


MISD: multiple instruction streams, single data stream


miss latency: time it takes to get data from main memory. This includes the time it takes to check that it is not in the cache and then to determine who owns the data, and then send it to the CPU.


mobo: mother board


Moore's Law: Gordon E. Moore observed the number of transistors on an integrated circuit board doubles every two years.


MP: multiprocessing


MPKI: Misses per Kilo Instruction


MSI Protocol: modified-shared-invalid protocol, the states of any cached block. MTPI: message transfer part interface


MTTF: mean time to failure MTTR: mean time to repair


multi-level caches: caches with two or more levels, each level larger and slower than the previous level


mutex variable: mutually exclusive (mutex), a low level synchronization mechanism. A thread acquires the variable, then releases it upon completion of the task. During this period no other thread can acquire the mutex.


NMRU: not most recently used


non-blocking caches: if there is a miss, the cache services the next request while waiting for memory


NUMA: non-uniform memory access, also called a distributed shared memory OOO: out of order


OS: operating system


PAPT: physically addressed, physically tagged cache - the cache stores the data based on its physcial address


PC: program counter


PCI: peripheral component interconnect


Pentium Processor: x86 super scalar processor from Intel


physical registers: registers, FP and GP that are not visible to the programmer pipeline burst cache:


pipelined cache: a pipelined burst cache uses 3 clock cycles to transfer the first data set from a cache block, then 1 clock cycle to transfer each of the rest. The pipeline and the 'burst'. (3-1-1-1) PIPT: physically indexed, physically tagged cache.


Power: Power = 1/2C V^2 * f Alpha


Power Architecture: performance optimization with enhanced RISC


Power vs Performance Equation:


pre-fetch buffer: when getting data from memory, get all the data in the row and store it in a buffer.


pre-fetching cache: instructions are fetched from memory before they are needed by the cpu " Prescott Processor: Based on the Netburst architecture. It has a 31 stage pipeline in the core. The high penatly paid for mispredictions is supposedly offset with a Rapid Execution Engine. It also has a trace execution cache, this stores decoded instructions and then reuses them instead of fetching and decoding again.


PRF: physical register file


pseudo associative cache: an address is first searched in 1/2 of the cache. If it is not there, then it is searched in the other half of the cache.


RAID: redundant array of independent disks


RAID 0: strips of data are stored on disks - alternating between disks. Each disk supplies a portion of the data, which usually improves performance.


RAID 1: the data is replicated on another disk. Each disk contains the data. Which ever disk is free responds to the read request. The write request is written to one disk and then mirrored to the other disk(s).


RAID 2 and RAID 3: the data is striped on disks and Hamming codes or parity bits are used for error detection. RAID 2 and RAID 3 are not used in any current application


RAID 4: Data is striped in large blocks onto disks with a dedicated parity disk. It is used by the NetApp company.


RAID 5: Data is striped in large blocks onto disks, but there is no dedicated parity disk. The parity for each block is stored on one of the data blocks.


RAR: read after read RAS: return address stack RAT: register alias table


RAT: *(another RAT in multiprocessing) register allocation table RAW: read after write


RDRAM: direct random access memory


relaxed consistency: some instructions can be performed ooo and still maintain consistency reliability: measure of continuous service accomplishment


reservation stations: function unit buffers


RETO: return from interrupt


RF: register file


RISC Processor: reduced instruction set - simple instructions of the same size. Instructions are executed in one clock cycle


ROB: re-order buffer RS: reservation station


RWX: read - write- execute permissions on files


SHARC processor: floating point processors designed for DSP applications SIMD: singe instruction stream, multiple data streams


simultaneous multi-threading: instructions from different threads are processed, even in the same cycle


SISD: single instruction stream , single data stream SMP: symmetric multiprocessing


SMT: simultaneous multi threading


snooping protocols: A broadcast network - caches for each processor watch the bus for addresses in their cache.


SPARC processor: Scalable Processor Architecture -a RISC instruction set processor


spatial locality: if we access a memory location, nearby memory locations have a tendency to be accessed soon.


Speedup: how much faster a modified system is compared to the unmodified system. SPR: special purpose registers - such as program counter, or status register


SRAM: static random access memory


structural hazard: the pipeline contains two instructions attempting to access the same resource.


super scalar architecture: the processor manages instruction dependencies at run-time. Executes more than one instruction per clock cycle using pipelines.


synchronization: "a system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations for each individual processor appear in the order specified by the program." Quote by Leslie Lamport


Synchronous Message Passing: a processor requests data then waits until the data is received before continuing.


tag: the part of the data address that is used to find the data in the cache. This portion of the address is unique so that it can be distinguished from other lines in the cache.


temporal locality: if a program accesses a memory location, it tends to access the same location again very soon.


TLB: translation look aside buffer - a cache of translated virtual memory to physical memory addresses. TLB misses are very time consuming


Tomasulo's Algorithm: achieve high performance without using special compilers by using dynamic scheduling


tournament predictor: a meta-predictor


trace caches: sets of instructions are stored in a separate cache. These are instructions that have been decoded and executed. If there is a branch in the set, only the taken branch instructions are kept. If there is a misprediction the trace stops.


trace scheduling: rearranging instructions for faster execution, the common cases are scheduled tree, tournament, dissemination barriers: types of structures for barriers


UMA: uniform memory access - all memory locations have similar latencie.


皮格马利翁效应心理学指出,赞美、赞同能够产生奇迹,越具体,效果越好~

“收藏夹吃灰”是学“器”练“术”非常聪明的方法,帮助我们避免日常低效的勤奋~


相关文章
|
关系型数据库 MySQL 数据库
轻松入门MySQL:精准查询,巧用WHERE与HAVING,数据库查询如虎添翼(7)
轻松入门MySQL:精准查询,巧用WHERE与HAVING,数据库查询如虎添翼(7)
243 0
|
安全 网络安全 数据安全/隐私保护
网络ACL
网络ACL 网络ACL(Access Control List)是一种网络安全机制,用于控制网络中数据流的进出和传递。它基于规则列表,定义了允许或拒绝通过网络设备(如路由器、防火墙)的数据流。 网络ACL通常用于限制或过滤特定类型的流量,以实现对网络资源和服务的保护和管理。它可以根据不同的条件对数据流进行过滤,如源IP地址、目标IP地址、源端口、目标端口、协议类型等。 下面是网络ACL的一些常见应用场景和功能: 1. 访问控制:网络ACL可以设置规则,限制特定IP地址或子网访问某些网络资源。例如,可以设置拒绝来自某个IP地址的所有入站流量,或者只允许特定子网的流量通过。
901 0
|
前端开发
jeecgboot数据权限用法
jeecgboot数据权限用法
1368 0
jeecgboot数据权限用法
|
9月前
|
缓存 安全 Linux
通过层级内隔离提升软件的安全性|龙蜥大讲堂第112期
本次分享的主题是通过层级内隔离提升软件的安全性,由中科院计算所的武成岗分享。主要分为以下两个部分: 1. 计算系统的安全关乎着整个“数字化”世界的安全 2. 目标:同时获取微内核的安全性和宏内核的高性能 3. 层级内隔离手段 4. 总结
241 12
|
算法
无损加速最高5x,EAGLE-2让RTX 3060的生成速度超过A100
【8月更文挑战第5天】EAGLE-2是一种针对大型语言模型(LLMs)的无损加速算法,通过上下文感知的动态草稿树技术显著提升推理速度。它利用小型模型快速生成草稿,并依据置信度动态调整草稿树结构以提高标记接受率。实验表明EAGLE-2在多种任务上实现2.5x至5x的加速比,且不影响生成质量。相较于其他加速方法,EAGLE-2更高效可靠。[论文链接: https://arxiv.org/pdf/2406.16858]
266 11
|
存储 监控 安全
【实战经验】记录项目开发常见的8个难题
风沙席地起,战马踏风归!
598 125
|
12月前
|
监控 JavaScript 开发者
在 Vue 中,子组件为何不可以修改父组件传递的 Prop,如果修改了,Vue 是如何监控到属性的修改并给出警告的
在 Vue 中,子组件不能直接修改父组件传递的 Prop,以确保数据流的单向性和可预测性。如果子组件尝试修改 Prop,Vue 会通过响应式系统检测到这一变化,并在控制台发出警告,提示开发者避免这种操作。
|
11月前
|
人工智能 边缘计算 监控
边缘AI计算技术应用-实训解决方案
《边缘AI计算技术应用-实训解决方案》提供完整的实训体系,面向高校和科研机构的AI人才培养需求。方案包括云原生AI平台、百度AIBOX边缘计算硬件,以及8门计算机视觉实训课程与2门大模型课程。AI平台支持大规模分布式训练、超参数搜索、标注及自动化数据管理等功能,显著提升AI训练与推理效率。硬件涵盖多规格AIBOX服务器,支持多种推理算法及灵活部署。课程涵盖从计算机视觉基础到大模型微调的完整路径,通过真实商业项目实操,帮助学员掌握前沿AI技术和产业应用。
406 2
进度网络图-单代号网络图详解
进度网络图-单代号网络图详解
|
机器学习/深度学习 人工智能 算法