High-performance RISC-V Processor Xuantie C908

简介: High-performance RISC-V Processor Xuantie C908

XuanTie C908 is the latest RISC-V processor of the XuanTie series launched by T-Head Semiconductor. It has adopted the RV64GCB[V] instruction and is compatible with RVA22 profile. XuanTie C908 utilizes a high-efficiency,dual-issued, and 9-stage in-order pipeline. It is equipped with an AI acceleration engine. It is designed to mainly suit for applications such as Intelligent Interaction, AR/VR.

Specifications and features

In 2019, T-Head Semiconductor released XuanTie C910, a high-performance multi-issue out-of-order processor.  Later XuanTie C906, a low-cost single-issue in-order processor, had followed for launch. The newest XuanTie C908 is a high-efficiency processor targeted at the mid-end market segments for the growing market of image and video processing applications. Its performance and cost are between those of C910 and C906, filling the gap in the product line of XuanTie series processors

image.png

XuanTie C908 supports three privileged modes: Machine, Supervisor, and User. Among them, the User mode supports both RV64GCB[V] and RV32GCB[V] instruction sets. Softwares can  switch among the modes during runtime through UXL. XuanTie C908 supports the RV32 COMPAT mode for the first time in the industry to meet the requirements in applications, e.g. IP Camera.  Furthermore, it has been merged into the Linux mainline in version 5.19[1]. The RV32 COMPAT mode not only provides higher code density but also allows users to port 32-bit applications to XuanTie C908 in a faster manner.

XuanTie C908 supports the following features: RISC-V Bitmanip 1.0 instruction extension including the carry-less multiplication (zbc), optional supports RISC-V Vector 1.0 instruction set extension, BF16 operations, IEEE-754 compatible half-precision, and other floating-point operations. In addition, XuanTie C908 supports the RISC-V CMO Base extension and Svinval extension. It adopts the Sv39/Sv48 virtual address system and holds up Svnapot and Svpbmt. All these features make it possible for XuanTie C908 to be one of the first RISC-V processors for the  upcoming RVA22 profile. XuanTie C908 also inherits XuanTie extensions, including Instruction, Memory Attributes Extension (XMAE).

image.png

As illustrated in the above graph, XuanTie C908 uses a two-level cache system to support hardware cache coherency and optional ECC. In this multi-cluster architecture, each cluster can contain 1 to 4 cores.The bus interface supports AXI4/ACE protocol with two optional interfaces: a Device Coherence Port (DCP) and a Low Latency Port (LLP). DCP maintains data coherency with external I/O masters, while LLP accesse peripherals.  In terms of peripherals, XuanTie C908 provides the enhanced physical memory protection (ePMP) unit that allows a maximum of 64 regions. C908 also backs up for RISC-V Debug and Platform-Level Interrupt Controller (PLIC), with which can be configured up to 1023 interrupt sources.

Microarchitecture and metrics

XuanTie C908 contains a  9-stage dual-issue in-order pipeline. It delivers industry-leading performance in control flow, computing, and frequency through architecture and micro-architecture innovations.

image.pngXuanTie C908 is the pillar for branch prediction technologies, including state-of-the-art Branch History Table, Branch Target Buffer, and Return Address Stack. It utilizes Instruction Fusion technology, which can fuse various types of instructions into a single instruction for execution. In addition, XuanTie C908 provides  a brand-new data prefetching algorithm, further improving the memory access performance in complex application scenarios.

image.pngTo further benefit from the efficient pipeline design, XuanTie C908 can run at a frequency of up to 2 GHz, and the dynamic power consumption can be 52.8 mW/GHz per core under TSMC's 12nm process. Under the same frequency and process constraints, the energy efficiency ratio of XuanTie C908 in typical scenarios can be improved by more than 20% compared with that of XuanTie C906.

AI-oriented software and hardware acceleration technology

XuanTie C908 includes an optional Vector Processing Unit (VPU), which is compatible with the RISC-V Vector Extension 1.0 specification. This feature supports various vector floating-point and integer data formats. The computing power of key operations, such as multiply-accumulate, are enhanced in different application scenarios. For typical AI application scenarios, XuanTie C908 supplies the vector dot product instruction extension and intruduces the INT4 data type. This helps to improve the peak computing power, while reducing the memory requirement. XuanTie C908 has outperformed C906 in the MLPerf tiny V0.7 inference performance test. The performance of C908 is up to more than 3.5 times that of C906.

image.png

XuanTie C908 adopts co-design methodology to accelerate deep learning inference applications for both hardware and software. With the neural network inference deployment tool, i.e.HHB, and a high-performance heterogeneous computing library,i.e. SHL, XuanTie C908 is empowered and optimized with reference implementations of compilation and assembly.

Conclusion

XuanTie C908 has achieved technological breakthroughs for higher performance in RISC-V. XuanTie C908 supports a multi-core and multi-cluster architecture, adopts a high-efficiency 9-stage dual-issue in-order pipeline, and utilizes innovative instruction fusion technology to further improve efficiency. Its energy efficiency ratio has reached the industry's advanced level. Compatible with the latest RISC-V Vector 1.0 specification, XuanTie C908 introduces the INT4 data type and vector dot product instruction extension and provides a comprehensively optimized algorithm library, which helps drastically improve AI computing performance.


[1]: https://www.phoronix.com/news/Linux-5.19-RISC-V



相关文章
|
并行计算 安全 开发者
RISC-V生态全景解析(五):Vector向量计算技术与SIMD技术的对比
芯片开放社区(OCC)面向开发者推出RISC-V系列内容,通过多角度、全方位解读RISC-V,系统性梳理总结相关理论知识,构建RISC-V知识图谱,促进开发者对RISC-V生态全貌的了解。
3951 0
RISC-V生态全景解析(五):Vector向量计算技术与SIMD技术的对比
|
安全 区块链 数据安全/隐私保护
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
1547 12
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
|
缓存 算法 大数据
倚天710规模化应用 - 性能优化 - 软件预取分析与优化实践
软件预取技术是编程者结合数据结构和算法知识,将访问内存的指令提前插入到程序,以此获得内存访取的最佳性能。然而,为了获取性能收益,预取数据与load加载数据,比依据指令时延调用减小cachemiss的收益更大。
|
存储 监控 固态存储
磁盘碎片整理
磁盘碎片整理
520 3
|
存储 固态存储 数据管理
1t固态硬盘为什么不建议分区
随着技术进步,1TB固态硬盘(SSD)成为升级存储的首选。SSD因其快速读写和耐用性正替代传统机械硬盘(HDD)。本文探讨为何不建议为1TB SSD分区:了解SSD与HDD工作原理差异至关重要;分区可能增加磨损、降低性能并使管理复杂化;然而,分区有助于数据安全与管理。若决定分区,教程提供了无损拆分方法。最终,分区决策取决于个人使用习惯,重要的是做好数据备份。
1t固态硬盘为什么不建议分区
|
前端开发 开发者 容器
【专栏:CSS进阶篇】CSS Flexbox布局:实现灵活的响应式设计
【4月更文挑战第30天】CSS Flexbox是现代网页设计中创建响应式布局的关键工具,它提供了一种一维布局模型,使元素能灵活适应各种屏幕尺寸。通过设置容器的`display`属性为`flex`,开发者可以利用主轴和交叉轴调整元素排列和对齐方式。核心概念包括弹性项、伸缩性、空间分配和对齐。通过实例,如导航栏、卡片布局、图片画廊和响应式表单,展示了Flexbox在实现响应式设计中的应用。尽管需要注意浏览器兼容性,但掌握Flexbox能帮助开发者构建出功能强大且适应性强的界面。
364 0
|
机器学习/深度学习 人工智能 自然语言处理
2023年排行前五的大规模语言模型(LLM)
截至2023年,人工智能正在风靡全球。它已经成为热门的讨论话题,吸引了数百万人的关注,不仅限于技术专家和研究人员,还包括来自不同背景的个人。人们对人工智能热情高涨的原因之一是其在人类多年来处理的各种形式的领域中所具备的能力,其中包括语言。语言是人类生活的一个组成部分,它帮助我们交流,理解我们周围的事物,甚至帮助我们思考。但是,如今人工智能已经更有能力处理与人类水平甚至高于人类水平的语言。这是由于自然语言处理(NLP)和大型语言模型(LLMs)的进步,ChatGPT的背后就是其中之一,这是总部位于旧金山的初创公司OpenAI的伟大创举。但是,OpenAI成为成功将其LLM技术推向公众的公司之一。
626 0
|
存储 架构师 网络协议
架构师必须要掌握的大小端问题
架构师必须要掌握的大小端问题
|
缓存 监控 Linux
RISC-V SiFive U64内核——L2 Prefetcher预取器
RISC-V SiFive U64内核——L2 Prefetcher预取器

热门文章

最新文章