Qwen3-LiveTranslate-Flash:视、听、说全模态同传大模型

简介: 通义千问Qwen3-LiveTranslate-Flash推出实时多模态同声传译,支持18种语言及多种方言,融合视觉信息增强理解,实现3秒超低延迟、高精度语音翻译,适用于复杂环境下的跨语言交流。

Swipe for Chinese >>>

News Today

Qwen3-LiveTranslate-Flash: Real‑Time Multimodal Interpretation — See It, Hear It, Speak It

Qwen3-LiveTranslate-Flash delivers high‑precision, lightning‑fast and ultra‑reliable real‑time multilingual audio and video interpretation. With the extensive capabilities of Qwen3‑Omni and training on millions of hours of multimodal data, it enables both offline and live translation in 18 languages, making cross‑language communication seamless.

Key Features:

  • Multilingual and Dialect Coverage: Supports major official languages including Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Indonesian, Thai, Vietnamese, Arabic, Hindi, Greek, Turkish; as well as dialect and accent translation for Mandarin, Cantonese, Beijing, Wu, Sichuan, and Tianjin dialects.
  • Vision‑Enhanced Comprehension: For the first time, Qwen3‑LiveTranslate‑Flash incorporates visual context augmentation, enabling it to not only understand what it hears but also understand what it sees. By detecting and interpreting lip movements, gestures, on‑screen text, and real‑world entities, the system robustly handles noisy audio environments and resolves ambiguities in terms with multiple meanings.
  • 3s Latency: A lightweight mixture‑of‑experts architecture, coupled with dynamic sampling, enables simultaneous interpretation with latency as low as three seconds.
  • Lossless Interpretation: Utilizes semantic unit prediction to mitigate cross‑lingual reordering challenges in translation, achieving real‑time translation quality that is close to offline translation.
  • Natural Voice Quality: With training on massive speech datasets, the model delivers lifelike voices whose tone and expressiveness naturally follow the meaning of the source speech.

Performance

Qwen3‑LiveTranslate‑Flash achieves significantly higher accuracy than strong large-scale models, including Gemini‑2.5‑Flash, GPT‑4o‑Audio‑Preview, and Voxtral Small‑24B, on public benchmarks for Chinese, English and multilingual speech translation.

Qwen3‑LiveTranslate‑Flash consistently delivers leading translation performance across different domains and under challenging acoustic conditions.

Semantic unit prediction technology alleviates cross-lingual reordering issues, enabling real-time simultaneous interpretation to significantly reduce latency while maintaining over 94% of the accuracy achieved by non-real-time translation.

Visual enhancement technology further improves Qwen3-LiveTranslate-Flash’s translation precision in challenging scenarios such as noisy audio, ambiguous word meanings, and proper noun translation. In real-time settings, visual information compensates for missing speech context, making its advantages even more pronounced.

Examples

1 Speech‑to‑Speech Simultaneous Translation

Local API Test: real‑time interpretation | English → Chinese

2 Vision‑Enhanced Speech Translation

Homophones / Ambiguous Terms | English → Chinese

What's Next

Qwen will keep advancing the accuracy, naturalness, and emotional fidelity of our speech translation; extend coverage to more languages; and reinforce its robustness across varied and challenging acoustic environments. The goal is to bridge linguistic divides, enabling conversations to flow as smoothly and naturally as if speaking face to face.

/ END /

来源  | Alibaba Cloud Internationa公众号


相关文章
|
人工智能 移动开发 自然语言处理
阿里云百炼产品月刊【2025年9月】
本月通义千问模型大升级,新增多模态、语音、视频生成等高性能模型,支持图文理解、端到端视频生成。官网改版上线全新体验中心,推出高代码应用与智能体多模态知识融合,RAG能力增强,助力企业高效部署AI应用。
1766 0
|
7月前
|
机器学习/深度学习 人工智能 自然语言处理
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
1053 13
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
|
5月前
|
人工智能 语音技术 流计算
一图掌握通义千问:模型生态与应用场景全览
通义千问(Qwen)系列提供全栈开源AI能力,涵盖语言、视觉、语音等多模态应用。旗舰模型Qwen3-Max性能领先,支持92种语言翻译与高精度语音识别,具备强大代码生成与图像处理能力,助力开发者与企业高效构建智能应用。
907 2
一图掌握通义千问:模型生态与应用场景全览
|
2月前
|
JavaScript 安全 Linux
喂饭级教程:阿里云+本地安装OpenClaw(Clawdbot)集成必备Skills配置、避坑指南与实战测试
很多用户安装完OpenClaw后,常会陷入“工具已就绪,功能用不了”的困境——明明部署流程显示成功,却无法执行联网查询、自动扩展功能等核心操作。问题的关键在于:OpenClaw的基础安装仅提供核心框架,如同买了一辆高性能汽车却未加油充电,必须通过安装专用工具与必备技能(Skills),才能激活其全量能力。本文将详解**2026年阿里云OpenClaw超简单部署流程**与**本地私有化部署方案**,重点拆解技能管理工具clawhub及三大必备Skills的安装配置,附带完整代码命令、避坑指南与实战测试,让你从“部署完成”一步到位实现“功能全开”。
5458 13
|
4月前
|
编解码 人工智能 语音技术
📢 我们发布了新一代端到端语音交互模型 Fun-Audio-Chat!
通义百聆开源Fun-Audio-Chat(8B),支持端到端语音交互,具备情感感知与任务执行能力。在多榜单同尺寸模型中排名第一,支持高精度语音理解、情感识别与Function Call,高效低延迟,已全面开放代码与权重,欢迎体验!
2061 10
|
8月前
|
机器学习/深度学习 自然语言处理 API
Qwen-MT:翻得快,译得巧
今天,机器翻译模型Qwen-MT正式上线,支持92种语言互译,具备高度可控性与低延迟、低成本特点,适用于多种场景。开发者可通过Qwen API体验其强大翻译能力。
1528 15

热门文章

最新文章

下一篇
开通oss服务