魔搭社区每周速递（8.25-8.31）

2024-09-02 93

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 326个模型、82个数据集、71个创新应用、5篇应用文章

🙋魔搭ModelScope本期社区进展：

📟326个模型：Qwen2-VL、CogVideoX-5B、Hyper-SD等；

📁82个数据集：Alpaca-CoT、OpenVid-1M、MINT-1T-HTML等；

🎨71个创新应用：Qwen2-7B-VL-demo、CogVideoX-5B-demo、Hyper-FLUX-8Steps-LoRA等；

📄5篇文章：

Qwen2-VL 全链路模型体验、下载、推理、微调实战！
Paper Reading | 一种高效的光流估计方法——NeuFlow v2
Flux第四弹-秒级生图，字节开源Hyper-sd支持Flux，支持多LoRA叠加！
国产开源Sora，视频生成CogVideoX再开源！更大尺寸，更高质量！
10G显存，使用Unsloth微调Qwen2并使用Ollama推理

精选模型

Qwen2-VL

Qwen2-VL开源了两个尺寸的模型，Qwen2-VL-2B-Instruct 和 Qwen2-VL-7B-Instruct，以及其GPTQ和AWQ的量化版本，具有以下亮点：

增强的图像理解能力：Qwen2-VL显著提高了模型理解和解释视觉信息的能力，为关键性能指标设定了新的基准；
高级视频理解能力：Qwen2-VL具有卓越的在线流媒体功能，能够以很高的精度实时分析动态视频内容；
集成的可视化agent功能：Qwen2-VL 现在无缝整合了复杂的系统集成，将 Qwen2-VL 转变为能够进行复杂推理和决策的强大可视化代理；
扩展的多语言支持：Qwen2-VL 扩展了语言能力，以更好地服务于多样化的全球用户群，使 Qwen2-VL 在不同语言环境中更易于访问和有效。

模型链接：

Qwen2-VL-2B-Instruct：https://www.modelscope.cn/models/qwen/Qwen2-VL-2B-Instruct

Qwen2-VL-7B-Instruct：https://www.modelscope.cn/models/qwen/Qwen2-VL-7B-Instruct

代码示例：

以Qwen2-VL-2B-Instruct为例，使用transformers推理

安装依赖：

pip install git+https://github.com/huggingface/transformers
pip install qwen-vl-utils

模型推理代码-单图推理

from PIL import Image
import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
from modelscope import snapshot_download
model_dir = "/mnt/workspace/Qwen2-VL-2B-Instruct"
# Load the model in half-precision on the available device(s)
model = Qwen2VLForConditionalGeneration.from_pretrained(model_dir, device_map="auto", torch_dtype = torch.float16)
min_pixels = 256*28*28
max_pixels = 1280*28*28
processor = AutoProcessor.from_pretrained(model_dir, min_pixels=min_pixels, max_pixels=max_pixels)
messages = [{"role": "user", "content": [{"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"}, {"type": "text", "text": "Describe this image."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

模型推理代码-视频理解

# Messages containing a video and a text query
messages = [{"role": "user", "content": [{"type": "video", "video": "file:///path/to/video1.mp4", 'max_pixels': 360*420, 'fps': 1.0}, {"type": "text", "text": "Describe this video."}]}]
# Preparation for inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
inputs = inputs.to('cuda')
# Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(output_text)

更多推理、微调实战教程详见

Qwen2-VL 全链路模型体验、下载、推理、微调实战！

CogVideoX-5B

智谱AI开源CogVideoX系列新模型CogVideoX-5B，模型推理性能大幅提升，推理门槛降低，可在RTX 3060等桌面端显卡运行此新模型。同时，此次发布增强了视频生成质量，实现更高分辨率与更高质量的视频渲染，对比CogVideoX系列模型参数如下：

模型链接：

https://www.modelscope.cn/models/AI-ModelScope/CogVideoX-5b

代码示例：

在魔搭社区免费算力可完成CogVideoX-5B模型bf16精度的推理。

CogVideoX-5B已经支持使用 diffusers 推理，可以按照以下步骤进行推理。

环境依赖

# diffusers>=0.30.1
# transformers>=0.44.0
# accelerate>=0.33.0 (suggest install from source)
# imageio-ffmpeg>=0.5.1
pip install --upgrade transformers accelerate diffusers imageio-ffmpeg

运行代码

import torch
from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video
from modelscope import snapshot_download
model_dir = snapshot_download("ZhipuAI/CogVideoX-5b")
prompt = "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical atmosphere of this unique musical performance."
pipe = CogVideoXPipeline.from_pretrained(
    model_dir,
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
video = pipe(
    prompt=prompt,
    num_videos_per_prompt=1,
    num_inference_steps=50,
    num_frames=49,
    guidance_scale=6,
    generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]
export_to_video(video, "output.mp4", fps=8)

Hyper-SD

字节跳动的高效图像合成框架Hyper-sd正式支持FLUX.1-dev，目前支持8步lora和16步lora，和FLUX.1-dev默认的30步对比，速度提升了近4倍！

模型链接：

https://modelscope.cn/models/bytedance/hyper-sd

示例代码：

使用ComfyUI工作流进行 Flux fp8模型+Hyper-SD实现秒级生图和多lora融合 实战教程：

Flux第四弹-秒级生图，字节开源Hyper-sd支持Flux，支持多LoRA叠加！

数据集推荐

Alpaca-CoT

该数据集由Instruction-Tuning-with-GPT-4发布。它包含 GPT-4 使用 Alpaca 提示生成的 52K 英文指令跟踪样本，用于微调 LLM。

数据集链接：

https://www.modelscope.cn/datasets/swift/Alpaca-CoT

OpenVid-1M

OpenVid-1M 是一个高质量的文本转视频数据集，专为研究机构设计，用于提高视频质量，具有高美观度、清晰度和分辨率。它可用于直接训练或作为其他视频数据集的质量调整补充。OpenVid-1M 数据集中的所有视频分辨率至少为 512×512。此外，我们从 OpenVid-1M 中精选了 433K 1080p 视频来创建 OpenVidHD，推动高清视频生成。

数据集链接：

https://www.modelscope.cn/datasets/AI-ModelScope/OpenVid-1M

MINT-1T-HTML

MINT-1T 是一个开源多模态INT交叉数据集，包含 1 万亿个文本标记和 34 亿张图像，是现有开源数据集的 10 倍。此外，我们还包含以前未开发的资源，例如 PDF 和 ArXiv 论文。🍃 MINT-1T 旨在促进多模态预训练研究。🍃 MINT-1T 由华盛顿大学的一个团队与 Salesforce Research、斯坦福大学、德克萨斯大学奥斯汀分校和加州大学伯克利分校等其他学术机构合作创建。

数据集链接：

https://www.modelscope.cn/datasets/AI-ModelScope/MINT-1T-HTML

精选应用

Qwen2-7B-VL-demo

Qwen2-7B-VL-demo 演示了Qwen2-VL系列7亿参数的视觉语言模型，展现先进的图像视频理解、多语言识别、设备代理操作和创新模型架构。

体验直达：

https://www.modelscope.cn/studios/qwen/Qwen2-7B-VL-demo

CogVideoX-5B-demo

智谱AI联合清华开源视频生成模型CogVideoX-5B，更大参数量，更高质量

体验直达：

https://www.modelscope.cn/studios/ZhipuAI/CogVideoX-5b-demo

Hyper-FLUX-8Steps-LoRA

Hyper-SD 支持秒级生成图像，大幅提升了视觉内容创作的效率。

体验直达：

https://www.modelscope.cn/studios/ByteDance/Hyper-FLUX-8Steps-LoRA

魔搭社区每周速递（8.25-8.31）

精选模型

数据集推荐

精选应用

社区精选文章

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

魔搭社区每周速递（8.25-8.31）

精选模型

数据集推荐

精选应用

社区精选文章

热门文章

最新文章

相关课程

相关电子书

相关实验场景