High-performance RISC-V Processor Xuantie C908

简介: High-performance RISC-V Processor Xuantie C908

XuanTie C908 is the latest RISC-V processor of the XuanTie series launched by T-Head Semiconductor. It has adopted the RV64GCB[V] instruction and is compatible with RVA22 profile. XuanTie C908 utilizes a high-efficiency,dual-issued, and 9-stage in-order pipeline. It is equipped with an AI acceleration engine. It is designed to mainly suit for applications such as Intelligent Interaction, AR/VR.

Specifications and features

In 2019, T-Head Semiconductor released XuanTie C910, a high-performance multi-issue out-of-order processor.  Later XuanTie C906, a low-cost single-issue in-order processor, had followed for launch. The newest XuanTie C908 is a high-efficiency processor targeted at the mid-end market segments for the growing market of image and video processing applications. Its performance and cost are between those of C910 and C906, filling the gap in the product line of XuanTie series processors

image.png

XuanTie C908 supports three privileged modes: Machine, Supervisor, and User. Among them, the User mode supports both RV64GCB[V] and RV32GCB[V] instruction sets. Softwares can  switch among the modes during runtime through UXL. XuanTie C908 supports the RV32 COMPAT mode for the first time in the industry to meet the requirements in applications, e.g. IP Camera.  Furthermore, it has been merged into the Linux mainline in version 5.19[1]. The RV32 COMPAT mode not only provides higher code density but also allows users to port 32-bit applications to XuanTie C908 in a faster manner.

XuanTie C908 supports the following features: RISC-V Bitmanip 1.0 instruction extension including the carry-less multiplication (zbc), optional supports RISC-V Vector 1.0 instruction set extension, BF16 operations, IEEE-754 compatible half-precision, and other floating-point operations. In addition, XuanTie C908 supports the RISC-V CMO Base extension and Svinval extension. It adopts the Sv39/Sv48 virtual address system and holds up Svnapot and Svpbmt. All these features make it possible for XuanTie C908 to be one of the first RISC-V processors for the  upcoming RVA22 profile. XuanTie C908 also inherits XuanTie extensions, including Instruction, Memory Attributes Extension (XMAE).

image.png

As illustrated in the above graph, XuanTie C908 uses a two-level cache system to support hardware cache coherency and optional ECC. In this multi-cluster architecture, each cluster can contain 1 to 4 cores.The bus interface supports AXI4/ACE protocol with two optional interfaces: a Device Coherence Port (DCP) and a Low Latency Port (LLP). DCP maintains data coherency with external I/O masters, while LLP accesse peripherals.  In terms of peripherals, XuanTie C908 provides the enhanced physical memory protection (ePMP) unit that allows a maximum of 64 regions. C908 also backs up for RISC-V Debug and Platform-Level Interrupt Controller (PLIC), with which can be configured up to 1023 interrupt sources.

Microarchitecture and metrics

XuanTie C908 contains a  9-stage dual-issue in-order pipeline. It delivers industry-leading performance in control flow, computing, and frequency through architecture and micro-architecture innovations.

image.pngXuanTie C908 is the pillar for branch prediction technologies, including state-of-the-art Branch History Table, Branch Target Buffer, and Return Address Stack. It utilizes Instruction Fusion technology, which can fuse various types of instructions into a single instruction for execution. In addition, XuanTie C908 provides  a brand-new data prefetching algorithm, further improving the memory access performance in complex application scenarios.

image.pngTo further benefit from the efficient pipeline design, XuanTie C908 can run at a frequency of up to 2 GHz, and the dynamic power consumption can be 52.8 mW/GHz per core under TSMC's 12nm process. Under the same frequency and process constraints, the energy efficiency ratio of XuanTie C908 in typical scenarios can be improved by more than 20% compared with that of XuanTie C906.

AI-oriented software and hardware acceleration technology

XuanTie C908 includes an optional Vector Processing Unit (VPU), which is compatible with the RISC-V Vector Extension 1.0 specification. This feature supports various vector floating-point and integer data formats. The computing power of key operations, such as multiply-accumulate, are enhanced in different application scenarios. For typical AI application scenarios, XuanTie C908 supplies the vector dot product instruction extension and intruduces the INT4 data type. This helps to improve the peak computing power, while reducing the memory requirement. XuanTie C908 has outperformed C906 in the MLPerf tiny V0.7 inference performance test. The performance of C908 is up to more than 3.5 times that of C906.

image.png

XuanTie C908 adopts co-design methodology to accelerate deep learning inference applications for both hardware and software. With the neural network inference deployment tool, i.e.HHB, and a high-performance heterogeneous computing library,i.e. SHL, XuanTie C908 is empowered and optimized with reference implementations of compilation and assembly.

Conclusion

XuanTie C908 has achieved technological breakthroughs for higher performance in RISC-V. XuanTie C908 supports a multi-core and multi-cluster architecture, adopts a high-efficiency 9-stage dual-issue in-order pipeline, and utilizes innovative instruction fusion technology to further improve efficiency. Its energy efficiency ratio has reached the industry's advanced level. Compatible with the latest RISC-V Vector 1.0 specification, XuanTie C908 introduces the INT4 data type and vector dot product instruction extension and provides a comprehensively optimized algorithm library, which helps drastically improve AI computing performance.


[1]: https://www.phoronix.com/news/Linux-5.19-RISC-V



相关文章
|
并行计算 安全 开发者
RISC-V生态全景解析(五):Vector向量计算技术与SIMD技术的对比
芯片开放社区(OCC)面向开发者推出RISC-V系列内容,通过多角度、全方位解读RISC-V,系统性梳理总结相关理论知识,构建RISC-V知识图谱,促进开发者对RISC-V生态全貌的了解。
3891 0
RISC-V生态全景解析(五):Vector向量计算技术与SIMD技术的对比
|
11月前
|
安全 区块链 数据安全/隐私保护
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
1519 12
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
|
缓存 算法 大数据
倚天710规模化应用 - 性能优化 - 软件预取分析与优化实践
软件预取技术是编程者结合数据结构和算法知识,将访问内存的指令提前插入到程序,以此获得内存访取的最佳性能。然而,为了获取性能收益,预取数据与load加载数据,比依据指令时延调用减小cachemiss的收益更大。
|
存储 监控 固态存储
磁盘碎片整理
磁盘碎片整理
492 3
|
Android开发 Python
uiautomator2:python控制手机的神器
uiautomator2:python控制手机的神器
501 0
|
自然语言处理 JavaScript 程序员
UTF-8 GBK UTF8 GB2312 之间的区别和关系
【8月更文挑战第24天】UTF-8(Unicode Transformation Format-8bit)是一种多字节编码方案,用于解决国际化字符编码问题,英文使用一个字节编码,中文使用三个字节。它涵盖了全球所有国家的字符,具备良好的通用性,可在支持UTF-8的浏览器上显示。尽管可包含字节顺序标记(BOM),但通常不使用。GBK是在GB2312基础上扩展的标准,使用双字节编码,包括所有中文字符,但通用性较弱。UTF-8和GBK之间需通过Unicode转换。对于含有大量英文字符的网站或论坛,使用UTF-8编码可节省存储空间。
677 5
|
Java 关系型数据库 MySQL
基于SpringBoot+Vue中小企业人事管理系统代码(源码+部署说明+演示视频+源码介绍)(1)
基于SpringBoot+Vue中小企业人事管理系统代码(源码+部署说明+演示视频+源码介绍)
344 0
|
前端开发 开发者 容器
【专栏:CSS进阶篇】CSS Flexbox布局:实现灵活的响应式设计
【4月更文挑战第30天】CSS Flexbox是现代网页设计中创建响应式布局的关键工具,它提供了一种一维布局模型,使元素能灵活适应各种屏幕尺寸。通过设置容器的`display`属性为`flex`,开发者可以利用主轴和交叉轴调整元素排列和对齐方式。核心概念包括弹性项、伸缩性、空间分配和对齐。通过实例,如导航栏、卡片布局、图片画廊和响应式表单,展示了Flexbox在实现响应式设计中的应用。尽管需要注意浏览器兼容性,但掌握Flexbox能帮助开发者构建出功能强大且适应性强的界面。
347 0
|
缓存 监控 Linux
RISC-V SiFive U64内核——L2 Prefetcher预取器
RISC-V SiFive U64内核——L2 Prefetcher预取器

热门文章

最新文章