🙋魔搭ModelScope本期社区进展:
📟315个模型:Qwen2-Audio、Qwen2-Math系列、MiniCPM-V-2_6系列、InternLM2.5系列、CogVideoX-2b等;
📁36个数据集:MedTrinity-25M、Recap-DataComp-1B、WikiRAG-TR等;
🎨62个创新应用:思·索MindSearch、天降之物合集-Bert-VITS2-2.3、FLUX文生图/图生图模型体验空间_gradio版等;
📄5篇文章:
- Qwen2-Math开源!初步探索数学合成数据生成!
- Qwen2-Audio开源,让VoiceChat更流畅!
- 面向多样应用需求,书生·浦语2.5开源超轻量、高性能多种参数版本
- 多图、视频首上端!面壁「小钢炮」 MiniCPM-V 2.6 模型重磅上新!魔搭推理、微调、部署实战教程来啦!
- MindSearch技术详解,本地搭建媲美Perplexity的AI思·索应用!
精选模型
Qwen2-Audio
Qwen2-Audio是 Qwen-Audio 的下一代版本,它能够接受音频和文本输入,并生成文本输出,具有以下特点:
- 语音聊天:用户可以使用语音向音频语言模型发出指令,无需通过自动语音识别(ASR)模块。
- 音频分析:该模型能够根据文本指令分析音频信息,包括语音、声音、音乐等。
- 多语言支持:该模型支持超过8种语言和方言,例如中文、英语、粤语、法语、意大利语、西班牙语、德语和日语。
模型链接:
Qwen2-Audio-7B-Instruct
https://www.modelscope.cn/models/qwen/Qwen2-Audio-7B-Instruct
代码示例:
语音聊天推理
from io import BytesIO from urllib.request import urlopen import librosa from transformers import Qwen2AudioForConditionalGeneration, AutoProcessor from modelscope import snapshot_download import torch model_dir = snapshot_download("Qwen/Qwen2-Audio-7B-Instruct") processor = AutoProcessor.from_pretrained(model_dir) model = Qwen2AudioForConditionalGeneration.from_pretrained(model_dir, device_map="auto",torch_dtype=torch.bfloat16) conversation = [ {"role": "user", "content": [ {"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/guess_age_gender.wav"}, ]}, {"role": "assistant", "content": "Yes, the speaker is female and in her twenties."}, {"role": "user", "content": [ {"type": "audio", "audio_url": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/translate_to_chinese.wav"}, ]}, ] text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False) audios = [] for message in conversation: if isinstance(message["content"], list): for ele in message["content"]: if ele["type"] == "audio": audios.append(librosa.load( BytesIO(urlopen(ele['audio_url']).read()), sr=processor.feature_extractor.sampling_rate)[0] ) inputs = processor(text=text, audios=audios, return_tensors="pt", padding=True) inputs = inputs.to("cuda") generate_ids = model.generate(**inputs, max_length=256) generate_ids = generate_ids[:, inputs.input_ids.size(1):] response = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
更多实战教程详见:
Qwen2-Math系列
Qwen2-Math基于开源模型Qwen2研发, Qwen2-Math-72B-Instruct在权威测评集MATH上的得分超越目前主流的闭源和开源模型,以84%的准确率处理了代数、几何、计数与概率、数论等多种数学问题。
Qwen2-Math系列模型目前主要支持英文,通义团队很快就将推出中英双语版本,多语言版本也在开发中。
模型链接:
Qwen2-Math-1.5B
https://www.modelscope.cn/models/qwen/Qwen2-Math-1.5B
Qwen2-Math-72B
https://www.modelscope.cn/models/qwen/Qwen2-Math-72B
Qwen2-Math-7B
https://www.modelscope.cn/models/qwen/Qwen2-Math-7B
Qwen2-Math-72B-Instruct
https://www.modelscope.cn/models/qwen/Qwen2-Math-72B-Instruct
Qwen2-Math-7B-Instruct
https://www.modelscope.cn/models/qwen/Qwen2-Math-7B-Instruct
Qwen2-Math-1.5B-Instruct
https://www.modelscope.cn/models/qwen/Qwen2-Math-1.5B-Instruct
代码示例:
以Qwen2-Math-72B-Instruct为例:
from modelscope import AutoModelForCausalLM, AutoTokenizer model_name = "qwen/Qwen2-Math-72B-Instruct" device = "cuda" # the device to load the model onto model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$." messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
MiniCPM-V-2.6系列
MiniCPM-V 2.6 是 MiniCPM-V 系列中最新、性能最佳的模型。该模型基于 SigLip-400M 和 Qwen2-7B 构建,共 8B 参数。与 MiniCPM-Llama3-V 2.5 相比,MiniCPM-V 2.6 性能提升显著,并引入了多图和视频理解的新功能。
模型链接:
MiniCPM-V-2_6
https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6
MiniCPM-V-2_6-gguf
https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-2_6-gguf
MiniCPM-V-2_6-int4
https://www.modelscope.cn/models/openbmb/minicpm-v-2_6-int4
示例代码:
以MiniCPM-V-2_6为例
# test.py import torch from PIL import Image from modelscope import AutoModel, AutoTokenizer model = AutoModel.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16) # sdpa or flash_attention_2, no eager model = model.eval().cuda() tokenizer = AutoTokenizer.from_pretrained('OpenBMB/MiniCPM-V-2_6', trust_remote_code=True) image = Image.open('image.png').convert('RGB') question = 'What is in the image?' msgs = [{'role': 'user', 'content': [image, question]}] res = model.chat( image=None, msgs=msgs, tokenizer=tokenizer ) print(res) ## if you want to use streaming, please make sure sampling=True and stream=True ## the model.chat will return a generator res = model.chat( image=None, msgs=msgs, tokenizer=tokenizer, sampling=True, stream=True ) generated_text = "" for new_text in res: generated_text += new_text print(new_text, flush=True, end='')
CogVideoX-2b
CogVideoX-2b 是智谱AI推出的清影的同源开源视频生成模型,提示词上限为 226 个token,视频长6秒,帧率为8帧/秒,分辨率为720*480,FP-16 精度推理只需 18GB 显存,微调只需 40GB 显存。
模型链接:
https://www.modelscope.cn/models/ZhipuAI/CogVideoX-2b
运行示例:
安装依赖项
pip install --upgrade opencv-python transformers diffusers # Must using diffusers>=0.30.0
运行代码
import torch from diffusers import CogVideoXPipeline from diffusers.utils import export_to_video prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance." pipe = CogVideoXPipeline.from_pretrained( "THUDM/CogVideoX-2b", torch_dtype=torch.float16 ) pipe.enable_model_cpu_offload() prompt_embeds, _ = pipe.encode_prompt( prompt=prompt, do_classifier_free_guidance=True, num_videos_per_prompt=1, max_sequence_length=226, device="cuda", dtype=torch.float16, ) video = pipe( num_inference_steps=50, guidance_scale=6, prompt_embeds=prompt_embeds, ).frames[0] export_to_video(video, "output.mp4", fps=8)
数据集推荐
MedTrinity-25M
MedTrinity-25M,一个全面的、大规模的医学多模态数据集,涵盖 10 种模态的超过 2500 万张图像,具有超过 65 种疾病的多粒度注释。这些丰富的注释既包含全球文本信息,例如疾病/病变类型、模式、区域特定描述和区域间关系,也包含感兴趣区域 (ROI) 的详细本地注释,包括边界框、分割掩码。与现有数据集相比,MedTrinity-25M 提供了最丰富的注释,支持全面的多模态任务,如字幕和报告生成,以及以视觉为中心的任务,如分类和分割。该数据集可用于支持多模态医疗AI模型的大规模预训练,为医疗领域未来基础模型的发展做出贡献。
数据集链接:
https://www.modelscope.cn/datasets/AI-ModelScope/MedTrinity-25M
WikiRAG-TR
WikiRAG-TR是一个由6K(5999)个问答对组成的数据集,该数据集是根据土耳其语维基百科文章的介绍部分合成创建的。创建数据集以用于土耳其检索增强生成 (RAG) 任务。
数据集链接:
https://www.modelscope.cn/datasets/AI-ModelScope/Recap-DataComp-1B
Recap-DataComp-1B
WikiRAG-TR是一个由6K(5999)个问答对组成的数据集,该数据集是根据土耳其语维基百科文章的介绍部分合成创建的。创建数据集以用于土耳其检索增强生成 (RAG) 任务。
数据集链接:
https://www.modelscope.cn/datasets/AI-ModelScope/WikiRAG-TR
精选应用
通义千问2-音频模型-对话
音频理解大模型Qwen2-Audio-Instruct,不同于仅能处理人声信号的传统语音模型,Qwen2-Audio具备对人声、自然声、动物声、音乐声等各类语音信号的感知和理解能力。向模型输入一段语音,就可要求模型给出对音频的理解,甚至基于音频进行文学创作、逻辑推理、故事续写等等。这让大模型具备了接近人类的听觉能力。
体验直达:
https://modelscope.cn/studios/qwen/Qwen2-Audio-Instruct-Demo
思·索MindSearch
书生·浦语团队提出了 MindSearch(思·索)框架,能够在 3 分钟内主动从 300+ 网页中搜集整理有效信息,总结归纳,解决人类需要 3 小时才能完成的任务。
体验直达:
https://www.modelscope.cn/studios/Shanghai_AI_Laboratory/MindSearch
天降之物合集-Bert-VITS2-2.3
可根据二次元角色进行语音生成
体验直达:
https://www.modelscope.cn/studios/Ikaros/Ikaros-Bert-VITS2-2.3