Qwen3-LiveTranslate-Flash:视、听、说全模态同传大模型

简介: 通义千问Qwen3-LiveTranslate-Flash推出实时多模态同声传译,支持18种语言及多种方言,融合视觉信息增强理解,实现3秒超低延迟、高精度语音翻译,适用于复杂环境下的跨语言交流。

Swipe for Chinese >>>

News Today

Qwen3-LiveTranslate-Flash: Real‑Time Multimodal Interpretation — See It, Hear It, Speak It

Qwen3-LiveTranslate-Flash delivers high‑precision, lightning‑fast and ultra‑reliable real‑time multilingual audio and video interpretation. With the extensive capabilities of Qwen3‑Omni and training on millions of hours of multimodal data, it enables both offline and live translation in 18 languages, making cross‑language communication seamless.

Key Features:

  • Multilingual and Dialect Coverage: Supports major official languages including Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Indonesian, Thai, Vietnamese, Arabic, Hindi, Greek, Turkish; as well as dialect and accent translation for Mandarin, Cantonese, Beijing, Wu, Sichuan, and Tianjin dialects.
  • Vision‑Enhanced Comprehension: For the first time, Qwen3‑LiveTranslate‑Flash incorporates visual context augmentation, enabling it to not only understand what it hears but also understand what it sees. By detecting and interpreting lip movements, gestures, on‑screen text, and real‑world entities, the system robustly handles noisy audio environments and resolves ambiguities in terms with multiple meanings.
  • 3s Latency: A lightweight mixture‑of‑experts architecture, coupled with dynamic sampling, enables simultaneous interpretation with latency as low as three seconds.
  • Lossless Interpretation: Utilizes semantic unit prediction to mitigate cross‑lingual reordering challenges in translation, achieving real‑time translation quality that is close to offline translation.
  • Natural Voice Quality: With training on massive speech datasets, the model delivers lifelike voices whose tone and expressiveness naturally follow the meaning of the source speech.

Performance

Qwen3‑LiveTranslate‑Flash achieves significantly higher accuracy than strong large-scale models, including Gemini‑2.5‑Flash, GPT‑4o‑Audio‑Preview, and Voxtral Small‑24B, on public benchmarks for Chinese, English and multilingual speech translation.

Qwen3‑LiveTranslate‑Flash consistently delivers leading translation performance across different domains and under challenging acoustic conditions.

Semantic unit prediction technology alleviates cross-lingual reordering issues, enabling real-time simultaneous interpretation to significantly reduce latency while maintaining over 94% of the accuracy achieved by non-real-time translation.

Visual enhancement technology further improves Qwen3-LiveTranslate-Flash’s translation precision in challenging scenarios such as noisy audio, ambiguous word meanings, and proper noun translation. In real-time settings, visual information compensates for missing speech context, making its advantages even more pronounced.

Examples

1 Speech‑to‑Speech Simultaneous Translation

Local API Test: real‑time interpretation | English → Chinese

2 Vision‑Enhanced Speech Translation

Homophones / Ambiguous Terms | English → Chinese

What's Next

Qwen will keep advancing the accuracy, naturalness, and emotional fidelity of our speech translation; extend coverage to more languages; and reinforce its robustness across varied and challenging acoustic environments. The goal is to bridge linguistic divides, enabling conversations to flow as smoothly and naturally as if speaking face to face.

/ END /

来源  | Alibaba Cloud Internationa公众号


相关文章
|
人工智能 移动开发 自然语言处理
阿里云百炼产品月刊【2025年9月】
本月通义千问模型大升级,新增多模态、语音、视频生成等高性能模型,支持图文理解、端到端视频生成。官网改版上线全新体验中心,推出高代码应用与智能体多模态知识融合,RAG能力增强,助力企业高效部署AI应用。
1636 0
|
7月前
|
机器学习/深度学习 人工智能 自然语言处理
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
979 13
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
|
5月前
|
人工智能 语音技术 流计算
一图掌握通义千问:模型生态与应用场景全览
通义千问(Qwen)系列提供全栈开源AI能力,涵盖语言、视觉、语音等多模态应用。旗舰模型Qwen3-Max性能领先,支持92种语言翻译与高精度语音识别,具备强大代码生成与图像处理能力,助力开发者与企业高效构建智能应用。
839 2
一图掌握通义千问:模型生态与应用场景全览
|
1月前
|
Web App开发 人工智能 自然语言处理
AI Agent自主上网! OpenClaw阿里云及本地部署搭建喂饭级教程+配置 Tavily/Exa 浏览器自动化指南
手动搜索资料、逐页浏览网页、整理关键信息——这类重复低效的工作,如今已能让OpenClaw完全自主完成。只需一句自然语言指令,它就能通过搜索工具定位信息源,操控浏览器抓取内容,最终生成结构化报告,全程无需人工干预。但不少用户在使用中会遇到浏览器连接失败、搜索工具配置复杂等问题,本文将结合2026年OpenClaw的阿里云与本地部署全流程,详解Tavily/Exa搜索工具接入、浏览器自动化配置等核心操作,所有代码命令可直接复制执行,全程无营销词汇,助力用户快速打造“会上网的AI助手”。
5212 6
|
3月前
|
人工智能 自然语言处理 Shell
🦞 如何在 OpenClaw (Clawdbot/Moltbot) 配置阿里云百炼 API
本教程指导用户在开源AI助手Clawdbot中集成阿里云百炼API,涵盖安装Clawdbot、获取百炼API Key、配置环境变量与模型参数、验证调用等完整流程,支持Qwen3-max thinking (Qwen3-Max-2026-01-23)/Qwen - Plus等主流模型,助力本地化智能自动化。
77563 201
🦞 如何在 OpenClaw (Clawdbot/Moltbot) 配置阿里云百炼 API
|
2月前
|
JavaScript 安全 Linux
喂饭级教程:阿里云+本地安装OpenClaw(Clawdbot)集成必备Skills配置、避坑指南与实战测试
很多用户安装完OpenClaw后,常会陷入“工具已就绪,功能用不了”的困境——明明部署流程显示成功,却无法执行联网查询、自动扩展功能等核心操作。问题的关键在于:OpenClaw的基础安装仅提供核心框架,如同买了一辆高性能汽车却未加油充电,必须通过安装专用工具与必备技能(Skills),才能激活其全量能力。本文将详解**2026年阿里云OpenClaw超简单部署流程**与**本地私有化部署方案**,重点拆解技能管理工具clawhub及三大必备Skills的安装配置,附带完整代码命令、避坑指南与实战测试,让你从“部署完成”一步到位实现“功能全开”。
5151 13
|
4月前
|
编解码 人工智能 语音技术
📢 我们发布了新一代端到端语音交互模型 Fun-Audio-Chat!
通义百聆开源Fun-Audio-Chat(8B),支持端到端语音交互,具备情感感知与任务执行能力。在多榜单同尺寸模型中排名第一,支持高精度语音理解、情感识别与Function Call,高效低延迟,已全面开放代码与权重,欢迎体验!
1857 10
|
5月前
|
存储 人工智能 分布式计算
阿里云DLF 3.0:面向AI时代的智能全模态湖仓管理平台
在2025年云栖大会,阿里云发布DLF 3.0,升级为面向AI时代的智能全模态湖仓管理平台。支持结构化与非结构化数据统一管理,实现秒级实时处理、智能存储优化与细粒度安全控制,助力企业高效构建Data+AI基础设施。
1743 3