High-performance RISC-V Processor Xuantie C908

简介: High-performance RISC-V Processor Xuantie C908

XuanTie C908 is the latest RISC-V processor of the XuanTie series launched by T-Head Semiconductor. It has adopted the RV64GCB[V] instruction and is compatible with RVA22 profile. XuanTie C908 utilizes a high-efficiency,dual-issued, and 9-stage in-order pipeline. It is equipped with an AI acceleration engine. It is designed to mainly suit for applications such as Intelligent Interaction, AR/VR.

Specifications and features

In 2019, T-Head Semiconductor released XuanTie C910, a high-performance multi-issue out-of-order processor.  Later XuanTie C906, a low-cost single-issue in-order processor, had followed for launch. The newest XuanTie C908 is a high-efficiency processor targeted at the mid-end market segments for the growing market of image and video processing applications. Its performance and cost are between those of C910 and C906, filling the gap in the product line of XuanTie series processors

image.png

XuanTie C908 supports three privileged modes: Machine, Supervisor, and User. Among them, the User mode supports both RV64GCB[V] and RV32GCB[V] instruction sets. Softwares can  switch among the modes during runtime through UXL. XuanTie C908 supports the RV32 COMPAT mode for the first time in the industry to meet the requirements in applications, e.g. IP Camera.  Furthermore, it has been merged into the Linux mainline in version 5.19[1]. The RV32 COMPAT mode not only provides higher code density but also allows users to port 32-bit applications to XuanTie C908 in a faster manner.

XuanTie C908 supports the following features: RISC-V Bitmanip 1.0 instruction extension including the carry-less multiplication (zbc), optional supports RISC-V Vector 1.0 instruction set extension, BF16 operations, IEEE-754 compatible half-precision, and other floating-point operations. In addition, XuanTie C908 supports the RISC-V CMO Base extension and Svinval extension. It adopts the Sv39/Sv48 virtual address system and holds up Svnapot and Svpbmt. All these features make it possible for XuanTie C908 to be one of the first RISC-V processors for the  upcoming RVA22 profile. XuanTie C908 also inherits XuanTie extensions, including Instruction, Memory Attributes Extension (XMAE).

image.png

As illustrated in the above graph, XuanTie C908 uses a two-level cache system to support hardware cache coherency and optional ECC. In this multi-cluster architecture, each cluster can contain 1 to 4 cores.The bus interface supports AXI4/ACE protocol with two optional interfaces: a Device Coherence Port (DCP) and a Low Latency Port (LLP). DCP maintains data coherency with external I/O masters, while LLP accesse peripherals.  In terms of peripherals, XuanTie C908 provides the enhanced physical memory protection (ePMP) unit that allows a maximum of 64 regions. C908 also backs up for RISC-V Debug and Platform-Level Interrupt Controller (PLIC), with which can be configured up to 1023 interrupt sources.

Microarchitecture and metrics

XuanTie C908 contains a  9-stage dual-issue in-order pipeline. It delivers industry-leading performance in control flow, computing, and frequency through architecture and micro-architecture innovations.

image.pngXuanTie C908 is the pillar for branch prediction technologies, including state-of-the-art Branch History Table, Branch Target Buffer, and Return Address Stack. It utilizes Instruction Fusion technology, which can fuse various types of instructions into a single instruction for execution. In addition, XuanTie C908 provides  a brand-new data prefetching algorithm, further improving the memory access performance in complex application scenarios.

image.pngTo further benefit from the efficient pipeline design, XuanTie C908 can run at a frequency of up to 2 GHz, and the dynamic power consumption can be 52.8 mW/GHz per core under TSMC's 12nm process. Under the same frequency and process constraints, the energy efficiency ratio of XuanTie C908 in typical scenarios can be improved by more than 20% compared with that of XuanTie C906.

AI-oriented software and hardware acceleration technology

XuanTie C908 includes an optional Vector Processing Unit (VPU), which is compatible with the RISC-V Vector Extension 1.0 specification. This feature supports various vector floating-point and integer data formats. The computing power of key operations, such as multiply-accumulate, are enhanced in different application scenarios. For typical AI application scenarios, XuanTie C908 supplies the vector dot product instruction extension and intruduces the INT4 data type. This helps to improve the peak computing power, while reducing the memory requirement. XuanTie C908 has outperformed C906 in the MLPerf tiny V0.7 inference performance test. The performance of C908 is up to more than 3.5 times that of C906.

image.png

XuanTie C908 adopts co-design methodology to accelerate deep learning inference applications for both hardware and software. With the neural network inference deployment tool, i.e.HHB, and a high-performance heterogeneous computing library,i.e. SHL, XuanTie C908 is empowered and optimized with reference implementations of compilation and assembly.

Conclusion

XuanTie C908 has achieved technological breakthroughs for higher performance in RISC-V. XuanTie C908 supports a multi-core and multi-cluster architecture, adopts a high-efficiency 9-stage dual-issue in-order pipeline, and utilizes innovative instruction fusion technology to further improve efficiency. Its energy efficiency ratio has reached the industry's advanced level. Compatible with the latest RISC-V Vector 1.0 specification, XuanTie C908 introduces the INT4 data type and vector dot product instruction extension and provides a comprehensively optimized algorithm library, which helps drastically improve AI computing performance.


[1]: https://www.phoronix.com/news/Linux-5.19-RISC-V



相关文章
|
并行计算 安全 开发者
RISC-V生态全景解析(五):Vector向量计算技术与SIMD技术的对比
芯片开放社区(OCC)面向开发者推出RISC-V系列内容,通过多角度、全方位解读RISC-V,系统性梳理总结相关理论知识,构建RISC-V知识图谱,促进开发者对RISC-V生态全貌的了解。
3912 0
RISC-V生态全景解析(五):Vector向量计算技术与SIMD技术的对比
|
12月前
|
安全 区块链 数据安全/隐私保护
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
1527 12
区块链技术在跨境支付中的应用:打破传统,畅行全球支付新时代
|
编解码 文字识别 自然语言处理
如何使用OCR技术批量识别图片中的文字并重命名文件,OCR 技术批量识别图片中的文字可能出现的错误
### 简介 【批量识别图片内容重命名】工具可批量识别图片中的文字并重命名文件,方便高效处理大量图片。然而,OCR 技术面临字符识别错误(如形近字混淆、生僻字识别不佳)、格式错误(段落错乱、换行问题)和语义理解错误等挑战。为提高准确性,建议提升图片质量、选择合适的 OCR 软件及参数,并结合自动校对与人工审核,确保最终文本的正确性和完整性。
1908 12
如何使用OCR技术批量识别图片中的文字并重命名文件,OCR 技术批量识别图片中的文字可能出现的错误
|
存储 监控 固态存储
磁盘碎片整理
磁盘碎片整理
500 3
|
存储 固态存储 数据管理
1t固态硬盘为什么不建议分区
随着技术进步,1TB固态硬盘(SSD)成为升级存储的首选。SSD因其快速读写和耐用性正替代传统机械硬盘(HDD)。本文探讨为何不建议为1TB SSD分区:了解SSD与HDD工作原理差异至关重要;分区可能增加磨损、降低性能并使管理复杂化;然而,分区有助于数据安全与管理。若决定分区,教程提供了无损拆分方法。最终,分区决策取决于个人使用习惯,重要的是做好数据备份。
1t固态硬盘为什么不建议分区
|
人工智能 测试技术 人机交互
深入浅出智能工作流(Agentic Workflow)|技术干货
著名AI学者、斯坦福大学教授吴恩达提出AI Agent的四种设计方式后,Agentic Workflow(智能体工作流)在全球范围内迅速走红,多个行业纷纷实践其应用,并推动了新的Agentic AI探索热潮。吴恩达总结了Agent设计的四种模式:自我反思、工具调用、规划设计及多智能体协作。前两者较普及,后两者则为智能体使用模式从单一大模型向多智能体协同配合完成业务流程的转变奠定了基础。
7278 3
|
机器学习/深度学习 传感器 算法
【信道估计】基于LS和MMSE算法导频信道估计(误差率对比)附Matlab代码
【信道估计】基于LS和MMSE算法导频信道估计(误差率对比)附Matlab代码
|
前端开发 开发者 容器
【专栏:CSS进阶篇】CSS Flexbox布局:实现灵活的响应式设计
【4月更文挑战第30天】CSS Flexbox是现代网页设计中创建响应式布局的关键工具,它提供了一种一维布局模型,使元素能灵活适应各种屏幕尺寸。通过设置容器的`display`属性为`flex`,开发者可以利用主轴和交叉轴调整元素排列和对齐方式。核心概念包括弹性项、伸缩性、空间分配和对齐。通过实例,如导航栏、卡片布局、图片画廊和响应式表单,展示了Flexbox在实现响应式设计中的应用。尽管需要注意浏览器兼容性,但掌握Flexbox能帮助开发者构建出功能强大且适应性强的界面。
352 0
|
缓存 监控 Linux
RISC-V SiFive U64内核——L2 Prefetcher预取器
RISC-V SiFive U64内核——L2 Prefetcher预取器