Qwen3-LiveTranslate-Flash:视、听、说全模态同传大模型

简介: 通义千问Qwen3-LiveTranslate-Flash推出实时多模态同声传译,支持18种语言及多种方言,融合视觉信息增强理解,实现3秒超低延迟、高精度语音翻译,适用于复杂环境下的跨语言交流。

Swipe for Chinese >>>

News Today

Qwen3-LiveTranslate-Flash: Real‑Time Multimodal Interpretation — See It, Hear It, Speak It

Qwen3-LiveTranslate-Flash delivers high‑precision, lightning‑fast and ultra‑reliable real‑time multilingual audio and video interpretation. With the extensive capabilities of Qwen3‑Omni and training on millions of hours of multimodal data, it enables both offline and live translation in 18 languages, making cross‑language communication seamless.

Key Features:

  • Multilingual and Dialect Coverage: Supports major official languages including Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Indonesian, Thai, Vietnamese, Arabic, Hindi, Greek, Turkish; as well as dialect and accent translation for Mandarin, Cantonese, Beijing, Wu, Sichuan, and Tianjin dialects.
  • Vision‑Enhanced Comprehension: For the first time, Qwen3‑LiveTranslate‑Flash incorporates visual context augmentation, enabling it to not only understand what it hears but also understand what it sees. By detecting and interpreting lip movements, gestures, on‑screen text, and real‑world entities, the system robustly handles noisy audio environments and resolves ambiguities in terms with multiple meanings.
  • 3s Latency: A lightweight mixture‑of‑experts architecture, coupled with dynamic sampling, enables simultaneous interpretation with latency as low as three seconds.
  • Lossless Interpretation: Utilizes semantic unit prediction to mitigate cross‑lingual reordering challenges in translation, achieving real‑time translation quality that is close to offline translation.
  • Natural Voice Quality: With training on massive speech datasets, the model delivers lifelike voices whose tone and expressiveness naturally follow the meaning of the source speech.

Performance

Qwen3‑LiveTranslate‑Flash achieves significantly higher accuracy than strong large-scale models, including Gemini‑2.5‑Flash, GPT‑4o‑Audio‑Preview, and Voxtral Small‑24B, on public benchmarks for Chinese, English and multilingual speech translation.

Qwen3‑LiveTranslate‑Flash consistently delivers leading translation performance across different domains and under challenging acoustic conditions.

Semantic unit prediction technology alleviates cross-lingual reordering issues, enabling real-time simultaneous interpretation to significantly reduce latency while maintaining over 94% of the accuracy achieved by non-real-time translation.

Visual enhancement technology further improves Qwen3-LiveTranslate-Flash’s translation precision in challenging scenarios such as noisy audio, ambiguous word meanings, and proper noun translation. In real-time settings, visual information compensates for missing speech context, making its advantages even more pronounced.

Examples

1 Speech‑to‑Speech Simultaneous Translation

Local API Test: real‑time interpretation | English → Chinese

2 Vision‑Enhanced Speech Translation

Homophones / Ambiguous Terms | English → Chinese

What's Next

Qwen will keep advancing the accuracy, naturalness, and emotional fidelity of our speech translation; extend coverage to more languages; and reinforce its robustness across varied and challenging acoustic environments. The goal is to bridge linguistic divides, enabling conversations to flow as smoothly and naturally as if speaking face to face.

/ END /

来源  | Alibaba Cloud Internationa公众号


相关文章
|
3月前
|
机器学习/深度学习 人工智能 自然语言处理
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
642 13
AI Compass前沿速览:Qwen3-Max、Mixboard、Qwen3-VL、Audio2Face、Vidu Q2 AI视频生成模型、Qwen3-LiveTranslate-全模态同传大模型
|
1月前
|
人工智能 语音技术 流计算
一图掌握通义千问:模型生态与应用场景全览
通义千问(Qwen)系列提供全栈开源AI能力,涵盖语言、视觉、语音等多模态应用。旗舰模型Qwen3-Max性能领先,支持92种语言翻译与高精度语音识别,具备强大代码生成与图像处理能力,助力开发者与企业高效构建智能应用。
352 2
一图掌握通义千问:模型生态与应用场景全览
|
人工智能 移动开发 自然语言处理
阿里云百炼产品月刊【2025年9月】
本月通义千问模型大升级,新增多模态、语音、视频生成等高性能模型,支持图文理解、端到端视频生成。官网改版上线全新体验中心,推出高代码应用与智能体多模态知识融合,RAG能力增强,助力企业高效部署AI应用。
915 0
|
24天前
|
存储 消息中间件 关系型数据库
Apache Doris 数据导入原理与性能优化 | Deep Dive
Apache Doris 数据导入机制基于分布式架构,通过 FE 与 BE 协同实现高效、可靠的数据写入。本文深入解析其核心流程、事务管理与性能瓶颈,涵盖 Stream Load、Broker Load 等多种导入方式,重点剖析 MemTable 前移、存算分离优化等关键技术,并提供表结构设计、攒批策略、分桶配置等实战优化方案,帮助用户在延迟与吞吐间取得平衡,显著提升数据导入效率。
285 4
Apache Doris 数据导入原理与性能优化 | Deep Dive
|
7月前
|
机器学习/深度学习 PyTorch 数据处理
PyTorchVideo实战:从零开始构建高效视频分类模型
本文详细介绍了基于PyTorchVideo和PyTorch Lightning构建视频分类模型的全流程。通过Kinetics数据集,利用3D ResNet-50实现高效动作识别。教程涵盖数据加载与增强、模型构建及训练流程,结合两大框架优势,简化开发复杂度并提升性能,为视频理解任务提供完整解决方案。
351 3
PyTorchVideo实战:从零开始构建高效视频分类模型
|
3月前
|
存储 机器学习/深度学习 人工智能
云栖 2025|阿里云 Qwen3 系列领衔:AI 模型全栈突破与开发者落地指南
阿里云发布Qwen3全栈AI体系,七大模型升级、性能全球领先,开源生态稳居第一。从底层基建到开发工具链全面优化,助力企业高效落地AI应用,共建超级AI云生态。
1530 11
|
8月前
|
人工智能 自然语言处理 搜索推荐
AI 搜索开放平台重磅发布:Qwen3 模型上线啦
阿里云AI搜索开放平台重磅发布最新Qwen3模型,为企业和开发者提供全栈智能搜索解决方案。Qwen3作为最新一代大模型,在推理、多语言支持和Agent能力上表现卓越。用户可通过三步快速体验Qwen3服务,助力业务在AI时代抢占先机。
989 13
|
3月前
|
人工智能 自然语言处理 语音技术
阿里云百炼官网首页登录入口:开通百炼,每个大模型免费100万Tokens
阿里云百炼平台现开放免费领Token福利,开通即享超5000万额度。提供大模型推理、部署及训练服务,涵盖通义千问、万相等多个系列模型。前台介绍平台详情与价格,后台支持API-Key申请及管理操作。
924 8
|
7月前
|
人工智能 安全 Android开发
手机也能跑通义Qwen3大模型,手把手教你部署!
全球开源模型冠军Qwen3与端到端全模态模型Qwen2.5-Omni现已成功在手机上跑通!借助MNN支持,适配Android、iOS及桌面端,实现低延迟、本地化、高安全的AI体验。用户可通过自定义Sampler设置、System Prompt和Max New Tokens调节模型输出风格与长度。
3642 11