ModelScope X 昇腾910快速上手

简介: 一个简单的基于国产昇腾硬件的ModelScope适配攻略

背景

ModelScope作为更适合中国宝宝体质的HuggingFace社区,别的不说,在模型下载和获取方面很好的扮演了一个镜像平台的角色。除了下载模型以外,ModelScope提供了类似Transformers库语法风格和定义的一套接口。而在本文成文时(2024.1),ModelScope社区尚未公开支持Ascend系列硬件,但在对ModelScope的代码做了一定的研读后,发现对于ModelScope的源码进行少数的基础改动(得益于良好的代码可读性和松散的耦合关系),原始的ModelScope代码就可以基于Ascend系列硬件运行(ModelScope的几个官方用例可以跑通)。

具体的修改请见下文。

测试环境

pytorch == 2.1.0
modelscope == 1.9.4
硬件==910B1

官方示例

官方示例

from modelscope.pipelines import pipeline
word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')

input_str = '今天天气不错,适合出去游玩'
print(word_segmentation(input_str))

为了更好的观察模型的运行情况,我们稍微修改下打印这部分的代码

from modelscope.pipelines import pipeline
word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')

input_str = '今天天气不错,适合出去游玩'
print("word segment result is {} on device {}".format(word_segmentation(input_str), next(word_segmentation.model.parameters()).device))

输出为

word segment result is {'output': ['今天', '天气', '不错', ',', '适合', '出去', '游玩']} on device cpu

按照通常指定设备的信息,我们需要设置NPU的设备

from modelscope.pipelines import pipeline
import torch_npu

device = "npu:0"
word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device)

input_str = '今天天气不错,适合出去游玩'
print("word segment result is {} on device {}".format(word_segmentation(input_str), next(word_segmentation.model.parameters()).device))

执行以上代码会出现报错

(PyTorch-2.1.0) [root@4bfd19a25abf playground]# python npu_orig.py 
2024-01-16 09:05:49,901 - modelscope - INFO - PyTorch version 2.1.0 Found.
2024-01-16 09:05:49,902 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-01-16 09:05:50,107 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 6354b5190fb2274895e8f10bfc329a7d and a total number of 945 components indexed
Warning : ASCEND_HOME_PATH environment variable is not set.
2024-01-16 09:05:53,885 - modelscope - WARNING - Model revision not specified, use revision: v1.0.3
Traceback (most recent call last):
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/registry.py", line 212, in build_from_cfg
    return obj_cls(**args)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/nlp/token_classification_pipeline.py", line 50, in __init__
    super().__init__(
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/base.py", line 95, in __init__
    verify_device(device)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/device.py", line 27, in verify_device
    assert eles[0] in ['cpu', 'cuda', 'gpu'], err_msg
AssertionError: device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aicc/playground/npu_orig.py", line 5, in <module>
    word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/builder.py", line 164, in pipeline
    return build_pipeline(cfg, task_name=task)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/builder.py", line 67, in build_pipeline
    return build_from_cfg(
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/registry.py", line 215, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
AssertionError: WordSegmentationPipeline: device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device.
(PyTorch-2.1.0) [root@4bfd19a25abf playground]#

在当前modelscope已经注册的设备中还没有包含npu,那么我们截下来可以对modelscope/utils/device.py这部分代码稍作修改。这部分的函数主要有3个函数待修改:verify_devicedevice_placementcreate_device

这里直接将修改后的device.py贴在此处,以供参考

# Copyright (c) Alibaba, Inc. and its affiliates.
import os
from contextlib import contextmanager

from modelscope.utils.constant import Devices, Frameworks
from modelscope.utils.logger import get_logger

logger = get_logger()


def verify_device(device_name):
    """ Verify device is valid, device should be either cpu, cuda, gpu, cuda:X or gpu:X.

    Args:
        device (str):  device str, should be either cpu, cuda, gpu, gpu:X or cuda:X
            where X is the ordinal for gpu device.

    Return:
        device info (tuple):  device_type and device_id, if device_id is not set, will use 0 as default.
    """
    err_msg = 'device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device.'
    assert device_name is not None and device_name != '', err_msg
    device_name = device_name.lower()
    eles = device_name.split(':')
    assert len(eles) <= 2, err_msg
    assert device_name is not None
    assert eles[0] in ['cpu', 'cuda', 'gpu', 'npu'], err_msg
    device_type = eles[0]
    device_id = None
    if len(eles) > 1:
        device_id = int(eles[1])
    if device_type == 'cuda':
        device_type = Devices.gpu
    if device_type == Devices.gpu and device_id is None:
        device_id = 0
    return device_type, device_id


@contextmanager
def device_placement(framework, device_name='gpu:0'):
    """ Device placement function, allow user to specify which device to place model or tensor
    Args:
        framework (str):  tensorflow or pytorch.
        device (str):  gpu or cpu to use, if you want to specify certain gpu,
            use gpu:$gpu_id or cuda:$gpu_id.

    Returns:
        Context manager

    Examples:

        >>> # Requests for using model on cuda:0 for gpu
        >>> with device_placement('pytorch', device='gpu:0'):
        >>>     model = Model.from_pretrained(...)
    """
    device_type, device_id = verify_device(device_name)

    if framework == Frameworks.tf:
        import tensorflow as tf
        if device_type == Devices.gpu and not tf.test.is_gpu_available():
            logger.debug(
                'tensorflow: cuda is not available, using cpu instead.')
        device_type = Devices.cpu
        if device_type == Devices.cpu:
            with tf.device('/CPU:0'):
                yield
        else:
            if device_type == Devices.gpu:
                with tf.device(f'/device:gpu:{device_id}'):
                    yield

    elif framework == Frameworks.torch:
        import torch
        import torch_npu
        if device_type == Devices.gpu:
            if torch.cuda.is_available():
                torch.cuda.set_device(f'cuda:{device_id}')
            else:
                logger.debug(
                    'pytorch: cuda is not available, using cpu instead.')
        elif device_type == "npu":
            torch.npu.set_device(f'npu:{device_id}')
        yield
    else:
        yield


def create_device(device_name):
    """ create torch device

    Args:
        device_name (str):  cpu, gpu, gpu:0, cuda:0 etc.
    """
    import torch
    import torch_npu
    device_type, device_id = verify_device(device_name)
    use_cuda = False
    if device_type == Devices.gpu:
        use_cuda = True
        if not torch.cuda.is_available():
            logger.info('cuda is not available, using cpu instead.')
            use_cuda = False
    if device_type == "npu":
        torch_npu.npu.set_device(f"npu:{device_id}")
        device = torch.device(f"npu:{device_id}")
    elif use_cuda:
        device = torch.device(f'cuda:{device_id}')
    else:
        device = torch.device('cpu')
    return device


def get_device():
    import torch
    from torch import distributed as dist
    if torch.cuda.is_available():
        if dist.is_available() and dist.is_initialized(
        ) and 'LOCAL_RANK' in os.environ:
            device_id = f"cuda:{os.environ['LOCAL_RANK']}"
        else:
            device_id = 'cuda:0'
    else:
        device_id = 'cpu'
    return torch.device(device_id)

结果比较

我们加上性能的打点,然后比较两者之间的差异,可以看到NPU的性能远高于CPU执行推理的性能

原始代码为

from modelscope.pipelines import pipeline
import torch_npu
import time

word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')

input_str = '今天天气不错,适合出去游玩'
tik = time.time()
result = word_segmentation(input_str)
tok = time.time()
print("word segment result is {} on device {} with perf {} tokens/s".format(result, next(word_segmentation.model.parameters()).device, len(result)/(tok-tik)))


device = "npu:0"
word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device)

input_str = '今天天气不错,适合出去游玩'
tik = time.time()
result = word_segmentation_npu(input_str)
tok = time.time()
print("word segment result is {} on device {} with perf {} tokens/s".format(result, next(word_segmentation_npu.model.parameters()).device, len(result)/(tok-tik)))

输出为(已经删除了一些冗余的打印内容)

word segment result is {'output': ['今天', '天气', '不错', ',', '适合', '出去', '游玩']} on device cpu with perf 0.34250816868505096 tokens/s
word segment result is {'output': ['今天', '天气', '不错', ',', '适合', '出去', '游玩']} on device npu:0 with perf 1.7934692348675776 tokens/s
相关文章
|
11天前
|
人工智能 达摩院 自然语言处理
超好用的开源模型平台,ModelScope阿里达摩院
超好用的开源模型平台,ModelScope阿里达摩院
119 1
|
11天前
|
人工智能 JSON 前端开发
CodeFuse--AI编程代码辅助工具开盒尝鲜
生成式人工智能在编码方面表现出的学习和适应能力令人非常兴奋,本文将为读者率先开盒尝试CodeFuse,供大家学习和借鉴!(邀请码:【552049】,有邀请码可更快过审!!!)
103 0
CodeFuse--AI编程代码辅助工具开盒尝鲜
|
6月前
|
数据可视化 测试技术 PyTorch
智谱ChatGLM3魔搭最佳实践教程来了!
ChatGLM3-6B 是 ChatGLM 系列最新一代的开源模型,在保留了前两代模型对话流畅、部署门槛低等众多优秀特性的基础上
|
机器学习/深度学习 人工智能 编解码
|
11天前
|
前端开发 搜索推荐 JavaScript
20分钟搭建一个专属于自己的chatGPT!!!
20分钟搭建一个专属于自己的chatGPT!!!
160 0
|
11天前
|
人工智能
极智AI | 昇腾开发环境搭建CANN&MindStudio(无坑版)
大家好,我是极智视界,本文介绍一下 昇腾开发环境搭建 CANN & MindStudio,没有坑。
66 0
|
7月前
|
人工智能 自然语言处理 数据可视化
多语言对话模型 openbuddy-mistral-7b,魔搭社区免费算力环境最佳实践
近日,法国人工智能初创公司 Mistral AI 发布了一款新模型 Mistral 7B,其在每个基准测试中,都优于 Llama 2 13B,同时已免费开源可商用!
|
8月前
|
人工智能 自然语言处理 数据安全/隐私保护
【玩转ModelScope有奖征文活动】全新中文大模型竞技场上手评测
近年来,人工智能领域取得了巨大的进展,其中中文自然语言处理(NLP)领域尤为突出。这一进展的关键因素之一是中文大型语言模型的崛起,如BERT、GPT-3和其后继者,等。为了评估这些模型的性能和效用,进行中文大模型评测变得至关重要。本文将深入探讨中文大模型评测的关键方面,方法和洞见。本次测试主要模型为ChatGLM,moss-moon-003-sft,BiLLa-7B-SFT和BELLE-LLaMA-13B-2M等
73950 32
【玩转ModelScope有奖征文活动】全新中文大模型竞技场上手评测
|
9月前
|
人工智能 自然语言处理 算法
如何基于Llama 2搭建自己的大模型?8月26日,4位技术大牛手把手教你
如何基于Llama 2搭建自己的大模型?8月26日,4位技术大牛手把手教你
274 0
|
10月前
|
机器学习/深度学习 人工智能 JSON
Llama2开源大模型的新篇章以及在阿里云的实践
随着时间的推移,基于Llama2开源模型的应用预计将在国内如雨后春笋般涌现。这种趋势反映了从依赖外部技术向自主研发的转变,这不仅能满足我们特定的需求和目标,也能避免依赖外部技术的风险。因此,我们更期待看到优秀的、独立的、自主的大模型的出现,这将推动我们的AI技术的发展和进步。
1182 0
Llama2开源大模型的新篇章以及在阿里云的实践