背景

ModelScope作为更适合中国宝宝体质的HuggingFace社区，别的不说，在模型下载和获取方面很好的扮演了一个镜像平台的角色。除了下载模型以外，ModelScope提供了类似Transformers库语法风格和定义的一套接口。而在本文成文时（2024.1），ModelScope社区尚未公开支持Ascend系列硬件，但在对ModelScope的代码做了一定的研读后，发现对于ModelScope的源码进行少数的基础改动（得益于良好的代码可读性和松散的耦合关系），原始的ModelScope代码就可以基于Ascend系列硬件运行（ModelScope的几个官方用例可以跑通）。

具体的修改请见下文。

测试环境

pytorch == 2.1.0
modelscope == 1.9.4
硬件==910B1

官方示例

from modelscope.pipelines import pipeline
word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')

input_str = '今天天气不错，适合出去游玩'
print(word_segmentation(input_str))

为了更好的观察模型的运行情况，我们稍微修改下打印这部分的代码

from modelscope.pipelines import pipeline
word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')

input_str = '今天天气不错，适合出去游玩'
print("word segment result is {} on device {}".format(word_segmentation(input_str), next(word_segmentation.model.parameters()).device))

输出为

word segment result is {'output': ['今天', '天气', '不错', '，', '适合', '出去', '游玩']} on device cpu

按照通常指定设备的信息，我们需要设置NPU的设备

from modelscope.pipelines import pipeline
import torch_npu

device = "npu:0"
word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device)

input_str = '今天天气不错，适合出去游玩'
print("word segment result is {} on device {}".format(word_segmentation(input_str), next(word_segmentation.model.parameters()).device))

执行以上代码会出现报错

(PyTorch-2.1.0) [root@4bfd19a25abf playground]# python npu_orig.py 
2024-01-16 09:05:49,901 - modelscope - INFO - PyTorch version 2.1.0 Found.
2024-01-16 09:05:49,902 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-01-16 09:05:50,107 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 6354b5190fb2274895e8f10bfc329a7d and a total number of 945 components indexed
Warning : ASCEND_HOME_PATH environment variable is not set.
2024-01-16 09:05:53,885 - modelscope - WARNING - Model revision not specified, use revision: v1.0.3
Traceback (most recent call last):
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/registry.py", line 212, in build_from_cfg
    return obj_cls(**args)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/nlp/token_classification_pipeline.py", line 50, in __init__
    super().__init__(
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/base.py", line 95, in __init__
    verify_device(device)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/device.py", line 27, in verify_device
    assert eles[0] in ['cpu', 'cuda', 'gpu'], err_msg
AssertionError: device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aicc/playground/npu_orig.py", line 5, in <module>
    word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/builder.py", line 164, in pipeline
    return build_pipeline(cfg, task_name=task)
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/builder.py", line 67, in build_pipeline
    return build_from_cfg(
  File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/registry.py", line 215, in build_from_cfg
    raise type(e)(f'{obj_cls.__name__}: {e}')
AssertionError: WordSegmentationPipeline: device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device.
(PyTorch-2.1.0) [root@4bfd19a25abf playground]#

在当前modelscope已经注册的设备中还没有包含npu，那么我们截下来可以对modelscope/utils/device.py这部分代码稍作修改。这部分的函数主要有3个函数待修改:verify_device、device_placement、create_device。

这里直接将修改后的device.py贴在此处，以供参考

# Copyright (c) Alibaba, Inc. and its affiliates.
import os
from contextlib import contextmanager

from modelscope.utils.constant import Devices, Frameworks
from modelscope.utils.logger import get_logger

logger = get_logger()


def verify_device(device_name):
    """ Verify device is valid, device should be either cpu, cuda, gpu, cuda:X or gpu:X.

    Args:
        device (str):  device str, should be either cpu, cuda, gpu, gpu:X or cuda:X
            where X is the ordinal for gpu device.

    Return:
        device info (tuple):  device_type and device_id, if device_id is not set, will use 0 as default.
    """
    err_msg = 'device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device.'
    assert device_name is not None and device_name != '', err_msg
    device_name = device_name.lower()
    eles = device_name.split(':')
    assert len(eles) <= 2, err_msg
    assert device_name is not None
    assert eles[0] in ['cpu', 'cuda', 'gpu', 'npu'], err_msg
    device_type = eles[0]
    device_id = None
    if len(eles) > 1:
        device_id = int(eles[1])
    if device_type == 'cuda':
        device_type = Devices.gpu
    if device_type == Devices.gpu and device_id is None:
        device_id = 0
    return device_type, device_id


@contextmanager
def device_placement(framework, device_name='gpu:0'):
    """ Device placement function, allow user to specify which device to place model or tensor
    Args:
        framework (str):  tensorflow or pytorch.
        device (str):  gpu or cpu to use, if you want to specify certain gpu,
            use gpu:$gpu_id or cuda:$gpu_id.

    Returns:
        Context manager

    Examples:

        >>> # Requests for using model on cuda:0 for gpu
        >>> with device_placement('pytorch', device='gpu:0'):
        >>>     model = Model.from_pretrained(...)
    """
    device_type, device_id = verify_device(device_name)

    if framework == Frameworks.tf:
        import tensorflow as tf
        if device_type == Devices.gpu and not tf.test.is_gpu_available():
            logger.debug(
                'tensorflow: cuda is not available, using cpu instead.')
        device_type = Devices.cpu
        if device_type == Devices.cpu:
            with tf.device('/CPU:0'):
                yield
        else:
            if device_type == Devices.gpu:
                with tf.device(f'/device:gpu:{device_id}'):
                    yield

    elif framework == Frameworks.torch:
        import torch
        import torch_npu
        if device_type == Devices.gpu:
            if torch.cuda.is_available():
                torch.cuda.set_device(f'cuda:{device_id}')
            else:
                logger.debug(
                    'pytorch: cuda is not available, using cpu instead.')
        elif device_type == "npu":
            torch.npu.set_device(f'npu:{device_id}')
        yield
    else:
        yield


def create_device(device_name):
    """ create torch device

    Args:
        device_name (str):  cpu, gpu, gpu:0, cuda:0 etc.
    """
    import torch
    import torch_npu
    device_type, device_id = verify_device(device_name)
    use_cuda = False
    if device_type == Devices.gpu:
        use_cuda = True
        if not torch.cuda.is_available():
            logger.info('cuda is not available, using cpu instead.')
            use_cuda = False
    if device_type == "npu":
        torch_npu.npu.set_device(f"npu:{device_id}")
        device = torch.device(f"npu:{device_id}")
    elif use_cuda:
        device = torch.device(f'cuda:{device_id}')
    else:
        device = torch.device('cpu')
    return device


def get_device():
    import torch
    from torch import distributed as dist
    if torch.cuda.is_available():
        if dist.is_available() and dist.is_initialized(
        ) and 'LOCAL_RANK' in os.environ:
            device_id = f"cuda:{os.environ['LOCAL_RANK']}"
        else:
            device_id = 'cuda:0'
    else:
        device_id = 'cpu'
    return torch.device(device_id)

结果比较

我们加上性能的打点，然后比较两者之间的差异，可以看到NPU的性能远高于CPU执行推理的性能

原始代码为

from modelscope.pipelines import pipeline
import torch_npu
import time

word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base')

input_str = '今天天气不错，适合出去游玩'
tik = time.time()
result = word_segmentation(input_str)
tok = time.time()
print("word segment result is {} on device {} with perf {} tokens/s".format(result, next(word_segmentation.model.parameters()).device, len(result)/(tok-tik)))


device = "npu:0"
word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device)

input_str = '今天天气不错，适合出去游玩'
tik = time.time()
result = word_segmentation_npu(input_str)
tok = time.time()
print("word segment result is {} on device {} with perf {} tokens/s".format(result, next(word_segmentation_npu.model.parameters()).device, len(result)/(tok-tik)))

输出为(已经删除了一些冗余的打印内容)

word segment result is {'output': ['今天', '天气', '不错', '，', '适合', '出去', '游玩']} on device cpu with perf 0.34250816868505096 tokens/s
word segment result is {'output': ['今天', '天气', '不错', '，', '适合', '出去', '游玩']} on device npu:0 with perf 1.7934692348675776 tokens/s

ModelScope X 昇腾910快速上手

背景

测试环境

官方示例

结果比较

自然语言处理

热门文章

最新文章

相关课程

相关电子书

相关实验场景