Qwen3-LiveTranslate-Flash:视、听、说全模态同传大模型

简介: 通义千问Qwen3-LiveTranslate-Flash推出实时多模态同声传译,支持18种语言及多种方言,融合视觉信息增强理解,实现3秒超低延迟、高精度语音翻译,适用于复杂环境下的跨语言交流。

Swipe for Chinese >>>

News Today

Qwen3-LiveTranslate-Flash: Real‑Time Multimodal Interpretation — See It, Hear It, Speak It

Qwen3-LiveTranslate-Flash delivers high‑precision, lightning‑fast and ultra‑reliable real‑time multilingual audio and video interpretation. With the extensive capabilities of Qwen3‑Omni and training on millions of hours of multimodal data, it enables both offline and live translation in 18 languages, making cross‑language communication seamless.

Key Features:

  • Multilingual and Dialect Coverage: Supports major official languages including Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Indonesian, Thai, Vietnamese, Arabic, Hindi, Greek, Turkish; as well as dialect and accent translation for Mandarin, Cantonese, Beijing, Wu, Sichuan, and Tianjin dialects.
  • Vision‑Enhanced Comprehension: For the first time, Qwen3‑LiveTranslate‑Flash incorporates visual context augmentation, enabling it to not only understand what it hears but also understand what it sees. By detecting and interpreting lip movements, gestures, on‑screen text, and real‑world entities, the system robustly handles noisy audio environments and resolves ambiguities in terms with multiple meanings.
  • 3s Latency: A lightweight mixture‑of‑experts architecture, coupled with dynamic sampling, enables simultaneous interpretation with latency as low as three seconds.
  • Lossless Interpretation: Utilizes semantic unit prediction to mitigate cross‑lingual reordering challenges in translation, achieving real‑time translation quality that is close to offline translation.
  • Natural Voice Quality: With training on massive speech datasets, the model delivers lifelike voices whose tone and expressiveness naturally follow the meaning of the source speech.

Performance

Qwen3‑LiveTranslate‑Flash achieves significantly higher accuracy than strong large-scale models, including Gemini‑2.5‑Flash, GPT‑4o‑Audio‑Preview, and Voxtral Small‑24B, on public benchmarks for Chinese, English and multilingual speech translation.

Qwen3‑LiveTranslate‑Flash consistently delivers leading translation performance across different domains and under challenging acoustic conditions.

Semantic unit prediction technology alleviates cross-lingual reordering issues, enabling real-time simultaneous interpretation to significantly reduce latency while maintaining over 94% of the accuracy achieved by non-real-time translation.

Visual enhancement technology further improves Qwen3-LiveTranslate-Flash’s translation precision in challenging scenarios such as noisy audio, ambiguous word meanings, and proper noun translation. In real-time settings, visual information compensates for missing speech context, making its advantages even more pronounced.

Examples

1 Speech‑to‑Speech Simultaneous Translation

Local API Test: real‑time interpretation | English → Chinese

2 Vision‑Enhanced Speech Translation

Homophones / Ambiguous Terms | English → Chinese

What's Next

Qwen will keep advancing the accuracy, naturalness, and emotional fidelity of our speech translation; extend coverage to more languages; and reinforce its robustness across varied and challenging acoustic environments. The goal is to bridge linguistic divides, enabling conversations to flow as smoothly and naturally as if speaking face to face.

/ END /

来源  | Alibaba Cloud Internationa公众号


相关文章
|
3月前
|
机器学习/深度学习 人工智能 自然语言处理
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
570 13
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
|
20天前
|
人工智能 语音技术 流计算
一图掌握通义千问:模型生态与应用场景全览
通义千问(Qwen)系列提供全栈开源AI能力,涵盖语言、视觉、语音等多模态应用。旗舰模型Qwen3-Max性能领先,支持92种语言翻译与高精度语音识别,具备强大代码生成与图像处理能力,助力开发者与企业高效构建智能应用。
271 2
一图掌握通义千问:模型生态与应用场景全览
|
16天前
|
运维 监控 数据可视化
故障发现提速 80%,运维成本降 40%:魔方文娱的可观测升级之路
魔方文娱集团携手阿里云构建全链路可观测体系,突破高并发场景下监控盲区、告警风暴等难题,实现故障发现效率提升80%、运维成本降低40%,推动运维从被动响应向智能预防演进。
92 10
故障发现提速 80%,运维成本降 40%:魔方文娱的可观测升级之路
|
16天前
|
测试技术 开发者
「 玩透ESA 」有奖征文|参与即送50GB,最高10TB加量包
用文字点亮创见,用作品赢得认可!ESA有奖征文面向所有开发者开放,投稿即赠50GB加量包。分享技术实践、性能调优或出海经验,优质内容可获最高10TB资源奖励,并有机会成为“ESA先锋成员”,享官方署名转载与内测特权。立即参与,让好方法走得更远!
「 玩透ESA 」有奖征文|参与即送50GB,最高10TB加量包
|
4天前
|
人工智能 运维 监控
从代码到生产推理服务:DevPod 全流程部署 DeepSeek-OCR 模型实战指南
DevPod 重塑 AI 工程化流程,实现从开发、调试到生产部署的全流程闭环。依托云端 GPU 环境与一键镜像构建,打通代码到服务的“最后一公里”,让模型真正高效落地。
|
5天前
|
人工智能 运维 Cloud Native
一起聊聊大规模 AI Agent 部署与运维实战
诚挚地邀请您参加将于 11 月 28 日(周五)下午,在北京阿里中心举办的 【企业 AI 原生应用架构升级】主题研讨会。
|
3天前
|
机器学习/深度学习 人工智能 缓存
让AI评测AI:构建智能客服的自动化运营Agent体系
大模型推动客服智能化演进,从规则引擎到RAG,再到AI原生智能体。通过构建“评估-诊断-优化”闭环的运营Agent,实现对话效果自动化评测与持续优化,显著提升服务质量和效率。
让AI评测AI:构建智能客服的自动化运营Agent体系
|
16天前
|
安全 Java Android开发
深度解析 Android 崩溃捕获原理及从崩溃到归因的闭环实践
本文系统解析Android崩溃捕获原理,涵盖Java与Native层崩溃的捕获机制、核心技术难点及解决方案,介绍基于Breakpad的Minidump生成、堆栈回溯与符号化解析实践,实现崩溃信息可靠采集与精准归因。
145 5
|
24天前
|
弹性计算 Ubuntu Linux
阿里云服务器镜像怎么选?公共、自定义、共享、云市场、社区镜像区别及选择指南
在我们选购阿里云服务器的过程中,精准挑选适配的镜像(也就是云服务器所搭载的操作系统)可以让我们快速部署自己的业务。阿里云服务器镜像体系丰富,主要包含公共镜像、自定义镜像、共享镜像、云市场镜像以及社区镜像这五大类别。本文对各类镜像的特性、区别展开深入且细致的剖析,并为新手用户提供详尽、实用的选择参考。
|
20天前
|
人工智能 IDE Java
我们从零开始实现了一个cursor的codebase功能(踩了很多RAG的坑)
VoidMuse 是一个以学习为目标的开源AI IDE插件,支持IntelliJ IDEA与VS Code,集成20+优秀开源组件,助力开发者在实践中掌握AI工程化技术。本文深入解析其基于混合检索的Codebase实现,涵盖向量化、索引构建与检索优化,助你真正理解并应用Function Call等核心技术。
215 5
我们从零开始实现了一个cursor的codebase功能(踩了很多RAG的坑)