🙋魔搭ModelScope本期社区进展:
📟1910个模型:MiniCPM 4.0系列、Qwen3-Embedding、Qwen3-Reranker、BAGEL-7B-MoT 等;
📁183个数据集:VideoMathQA、AceReason-Math、svla_so101_pickplace等;
🎨47个创新应用:MagicColor、浏览器操作标注工具、RapidOCRv3.0.0等;
📄 5 篇内容:
- 面壁小钢炮MiniCPM 4.0开源,端侧推理常规提速5倍!
- 基于Qwen3的Embedding和Rerank模型系列,开源!
- 更丰富的视频创作能力,ModelScope AIGC专区更新!
- “一丹一世界”三等奖 | 南柯一梦 经验分享
- 字节Seed开源统一多模态理解和生成模型 BAGEL!
01.模型推荐
MiniCPM 4.0系列
面壁智能重磅推出MiniCPM 4.0 ——一个极致高效的端侧大模型,通过其 CPM.cu 自研推理框架,可实现220倍极致的速度提升,5 倍常规提速。本次在开源社区核心推出 8B 和 0.5B 两个参数规模的版本,均在同级别模型对比中实现了最佳性能。
MiniCPM4系列通过系统性技术创新实现端侧大模型极致推理效率:采用InfLLM v2可训练稀疏注意力架构,在128K长文本处理中将词元关联计算量压缩至不足5%;结合BitCPM三值量化技术实现模型位宽90%压缩,配合FP8低精度计算与多词元预测策略显著降低训练成本;依托UltraClean数据清洗和UltraChat v2合成技术构建高质量多维训练集;推理端集成CPM.cu高效CUDA框架,融合稀疏注意力、模型量化与投机采样技术,并通过ArkInfer跨平台系统实现灵活部署。
模型合集:
https://www.modelscope.cn/collections/MiniCPM-4-ec015560e8c84d
示例代码:
推荐使用CPM.cu对MiniCPM4进行推理。CPM.cu是OpenBMB开发的一个CUDA推理框架,集成了高效的稀疏、推测采样和量化技术,充分利用了MiniCPM4的效率优势。
可以通过运行以下命令来安装CPM.cu:
git clone https://github.com/OpenBMB/cpm.cu.git --recursive cd cpm.cu python3 setup.py install
MiniCPM4原生支持最长达到32,768 tokens的上下文长度。为了重现论文中的长文本加速效果,建议使用已经验证过的LongRoPE因子。通过修改config.json
文件中的rope_scaling
字段来启用LongRoPE
{ ..., "rope_scaling": { "rope_type": "longrope", "long_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.481536379650452, 2.784415934557119, 3.1413289096347365, 3.560047844772632, 4.048719380066383, 4.752651957515948, 5.590913044973868, 6.584005926629993, 7.7532214876576155, 9.119754865903639, 10.704443927019176, 12.524994176518703, 14.59739595363613, 16.93214476166354, 19.53823297353041, 22.417131025031697, 25.568260840911098, 28.991144156566317, 32.68408069090375, 36.65174474170465, 40.90396065611201, 45.4664008671033, 50.37147343433591, 55.6804490772103, 61.470816952306556, 67.8622707390618, 75.00516023410414, 83.11898235973767, 92.50044360202462, 103.57086856690864, 116.9492274587385, 118.16074567836519, 119.18497548708795, 120.04810876261652, 120.77352815196981, 121.38182790207875, 121.89094985353891, 122.31638758099915, 122.6714244963338, 122.9673822552567, 123.21386397019609, 123.41898278254268, 123.58957065488238, 123.73136519024158, 123.84917421274221, 123.94701903496814, 124.02825801299717, 124.09569231686116], "short_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.481536379650452, 2.784415934557119, 3.1413289096347365, 3.560047844772632, 4.048719380066383, 4.752651957515948, 5.590913044973868, 6.584005926629993, 7.7532214876576155, 9.119754865903639, 10.704443927019176, 12.524994176518703, 14.59739595363613, 16.93214476166354, 19.53823297353041, 22.417131025031697, 25.568260840911098, 28.991144156566317, 32.68408069090375, 36.65174474170465, 40.90396065611201, 45.4664008671033, 50.37147343433591, 55.6804490772103, 61.470816952306556, 67.8622707390618, 75.00516023410414, 83.11898235973767, 92.50044360202462, 103.57086856690864, 116.9492274587385, 118.16074567836519, 119.18497548708795, 120.04810876261652, 120.77352815196981, 121.38182790207875, 121.89094985353891, 122.31638758099915, 122.6714244963338, 122.9673822552567, 123.21386397019609, 123.41898278254268, 123.58957065488238, 123.73136519024158, 123.84917421274221, 123.94701903496814, 124.02825801299717, 124.09569231686116], "original_max_position_embeddings": 32768 } }
修改后,你可以运行以下命令来重现长上下文加速效果
python3 tests/test_generate.py
有关CPM.cu的更多详细信息,请参阅 CPM.cu仓库(https://github.com/OpenBMB/cpm.cu).
更多推理、微调实战教程详见:
面壁小钢炮MiniCPM 4.0开源,端侧推理常规提速5倍!
Qwen3-Embedding、Qwen3-Reranker系列
阿里巴巴通义实验室正式发布Qwen3-Embedding系列模型, Qwen模型家族的新成员。该系列模型专为文本表征、检索与排序任务设计,基于Qwen3基础模型进行训练,充分继承了Qwen3在多语言文本理解能力方面的优势。
基于Qwen3基础模型,Embedding模型和Reranker模型分别采用了双塔结构和单塔结构的设计。通过LoRA微调,最大限度地保留并继承了基础模型的文本理解能力。
模型链接:
Qwen3-Embedding
https://modelscope.cn/collections/Qwen3-Embedding-3edc3762d50f48
Qwen3-Reranker
https://modelscope.cn/collections/Qwen3-Reranker-6316e71b146c4f
示例代码:
使用modelscope推理
import torch import torch.nn.functional as F from torch import Tensor from modelscope import AutoTokenizer, AutoModel def last_token_pool(last_hidden_states: Tensor, attention_mask: Tensor) -> Tensor: left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0]) if left_padding: return last_hidden_states[:, -1] else: sequence_lengths = attention_mask.sum(dim=1) - 1 batch_size = last_hidden_states.shape[0] return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths] def get_detailed_instruct(task_description: str, query: str) -> str: return f'Instruct: {task_description}\nQuery:{query}' def tokenize(tokenizer, input_texts, eod_id, max_length): batch_dict = tokenizer(input_texts, padding=False, truncation=True, max_length=max_length-2) for seq, att in zip(batch_dict["input_ids"], batch_dict["attention_mask"]): seq.append(eod_id) att.append(1) batch_dict = tokenizer.pad(batch_dict, padding=True, return_tensors="pt") return batch_dict # Each query must come with a one-sentence instruction that describes the task task = 'Given a web search query, retrieve relevant passages that answer the query' queries = [ get_detailed_instruct(task, 'What is the capital of China?'), get_detailed_instruct(task, 'Explain gravity') ] # No need to add instruction for retrieval documents documents = [ "The capital of China is Beijing.", "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun." ] input_texts = queries + documents tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-8B', padding_side='left') model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-8B') # We recommend enabling flash_attention_2 for better acceleration and memory saving. # model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-8B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda() eod_id = tokenizer.convert_tokens_to_ids("<|endoftext|>") max_length = 8192 # Tokenize the input texts batch_dict = tokenize(tokenizer, input_texts, eod_id, max_length) batch_dict.to(model.device) outputs = model(**batch_dict) embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask']) # normalize embeddings embeddings = F.normalize(embeddings, p=2, dim=1) scores = (embeddings[:2] @ embeddings[2:].T) print(scores.tolist())
使用ollama推理
ollama pull modelscope.cn/Qwen/Qwen3-Embedding-0.6B-GGUF
查看运行结果
curl http://localhost:11434/api/embed -d '{ "model": "modelscope.cn/Qwen/Qwen3-Embedding-0.6B-GGUF:latest", "input": "Hello, World!" }'
更多推理、微调详见教程:
基于Qwen3的Embedding和Rerank模型系列,开源!
BAGEL-7B-MoT
字节跳动Seed推出了 BAGEL—— 一个开源的多模态理解和生成础模型,具有70亿个激活参数(总共140亿个),并在大规模交错多模态数据上进行训练。BAGEL 在标准多模态理解排行榜上超越了当前顶级的开源VLMs,如Qwen2.5-VL和InternVL-2.5,并且提供了与强大的专业生成器如SD3竞争的文本到图像质量。
模型地址:
https://modelscope.cn/models/ByteDance-Seed/BAGEL-7B-MoT
示例代码:
1. 下载代码仓库,并安装依赖
git clone https://github.com/bytedance-seed/BAGEL.git cd BAGEL pip install -r requirements.txt
2. 下载模型
modelscope download ByteDance-Seed/BAGEL-7B-MoT --local_dir ./models/BAGEL-7B-MoT/
3. 开启WebUI
pip install gradio python app.py
更多技术详解文章:
02.数据集推荐
VideoMathQA
VideoMathQA 是一个旨在评估现实教育视频中数学推理能力的基准测试。它要求模型解释并整合来自三种模态(视觉、音频和文本)的信息,并且这些信息是随着时间变化的。该基准测试解决了多模态干草堆中的针问题,其中关键信息稀少且分布在不同的模态和视频的不同时间点。
数据集链接:
https://modelscope.cn/datasets/MBZUAI/VideoMathQA
AceReason-Math
AceReason-Nemotron-14B是一个通过强化学习训练的数学和代码推理模型,基于DeepSeek-R1-Distilled-Qwen-14B。它在AIME 2024、2025和LiveCodeBench v5、v6等基准测试中表现优异,显著提升了数学和代码推理性能。
数据集链接:
https://modelscope.cn/datasets/nv-community/AceReason-Math
svla_so101_pickplace
这是一个由LeRobot创建的机器人操作数据集,包含50个episode,11939帧,1个任务,100个视频,数据集结构详细记录了动作、状态、图像等信息,采用Apache-2.0许可。
数据集链接:
https://modelscope.cn/datasets/lerobot/svla_so101_pickplace
03.创空间
MagicColor
MagicColor 是一个基于扩散模型的多实例草图上色框架,能够通过一键式流程实现精准且自然的色彩填充,适用于动漫草图着色等场景 。
体验链接:
https://modelscope.cn/studios/zhdddd/MagicColor
浏览器操作标注工具
该工具是一款基于网络的通用注释工具,适用于多种语言标注任务,包括形态学、句法等多个层面的标注。
体验链接:
https://modelscope.cn/studios/kongquyu/browser-use-annotator
RapidOCRv3.0.0
RapidOCR v3.0.0 是一个基于 PP-OCRv5 模型的高性能多平台 OCR 工具,支持多语言识别和离线部署,适用于各类图像文字提取场景 。
体验链接:
https://modelscope.cn/studios/RapidAI/RapidOCRv3.0.0
04.社区精选文章
- 面壁小钢炮MiniCPM 4.0开源,端侧推理常规提速5倍!
- 基于Qwen3的Embedding和Rerank模型系列,开源!
- 更丰富的视频创作能力,ModelScope AIGC专区更新!
- “一丹一世界”三等奖 | 南柯一梦 经验分享
- 字节Seed开源统一多模态理解和生成模型 BAGEL!