【DSW Gallery】基于ModelScope的中文GPT-3模型(1.3B)的微调训练

本文涉及的产品
交互式建模 PAI-DSW,5000CU*H 3个月
简介: 本文基于ModelScope,以GPT-3(1.3B)为例介绍如何使用ModelScope-GPT3进行续写训练与输入输出形式的训练,训练方式不需要额外指定,训练数据集仅包含 src_txt 时会进行续写训练,同时包含 src_txt 和 tgt_txt 时会进行输入输出形式的训练。

直接使用

请打开基于ModelScope的中文GPT-3模型(1.3B)的微调训练,并点击右上角 “ 在DSW中打开” 。

image.png

基于ModelScope的中文GPT-3模型(1.3B)的微调训练

GPT-3模型是一个通用的预训练生成模型,使用Transformer的Decoder-only结构,可以用于解决下游各种类型的生成任务,特别是zero-shot生成能力。模型利用大量无监督数据,通过自回归任务进行预训练。可以用于解决文本生成相关的任务包含:文本摘要、问题生成、data-to-text等。本文参考modelscope社区中的GPT3中文1.3B参数量文本生成模型

模型描述

GPT-3模型使用Transformer的Decoder结构,并对Transformer Decoder进行了一些改动,原本的Decoder包含了两个 Multi-Head Attention 结构,GPT-3只保留了 Mask Multi-Head Attention,利用常规的语言建模优化,从左到右的自回归预训练。本模型是基于GPT-3的代码结合大量中文无监督数据和下游任务数据预训练得到,ModelScope训练了多种不同参数的模型 ,GPT-3模型介绍,详见:Language models are few-shot learners

模型使用方式以及适用范围

ModelScope-GPT3可直接用于文本生成,也可以通过finetune用于各类文本理解的任务。用户可以自行尝试各种输入文档。具体调用方式请参考代码示例。

运行环境要求

使用镜像URL,地址为:registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu20.04-cuda11.3.0-py37-torch1.11.0-tf1.15.5-1.3.0

使用机型最低要求为V100,内存至少为32G

模型训练流程

本模型的预训练分为两个阶段。第一阶段严格按照原始GPT3的参数设置进行:在中文wiki/ Common crawl等无监督数据上,通过自回归的训练任务训练了约300B字得到。第二阶段中,ModelScope加入了多种有监督数据继续训练,使得模型具备多种任务的zero-shot的处理能力。

ModelScope-GPT3模型支持了续写训练与输入输出形式的训练,训练方式不需要额外指定,训练数据集仅包含 src_txt 时会进行续写训练,同时包含 src_txt 和 tgt_txt 时会进行输入输出形式的训练。以下将为两种训练方式提供示例代码。

续写训练

下面是基于ModelScope-GPT3中文1.3B模型在诗词生成数据集上二次开发训练,首先,您需要下载用于本示例的训练和测试集:

! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/train_poetry.csv
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/test_poetry.csv

数据下载完成后,可以通过以下代码查看前3条数据。其中,每一行为一条数据。

用户也可以按照格式准备自己的数据,数据格式为csv,第一行是src_txt,后续每一行都为用户想要模型续写的文本。

print('Training data sample:')
! head -n 3 train_poetry.csv
print('Test set data sample:')
! head -n 3 test_poetry.csv
Training data sample:
src_txt
秋入黎山风雨多,长江不定驻惊波。翠榕叶底闻鹦鹉,自此无如客思何!
南州未识异州苹,初向沙头问水神。料得行藏无用卜,乘桴人是北来人。
Development set data sample:
src_txt
危楼向暮倚层空,故岁今年不得同。记取合沙元夕节,满街箫鼓雨兼风。
双门此夜启崇台,次第东山好月来。自古金吾元不禁,少留大将与衔杯。
from torch.utils.tensorboard import SummaryWriter
from modelscope.msdatasets import MsDataset
from modelscope.trainers import build_trainer
from modelscope.metainfo import Trainers
from datasets import load_dataset
data_files = {"train": "train_poetry.csv", "test": "test_poetry.csv"}
# \t is the tab character in Python
dataset = load_dataset("csv", data_files=data_files)
dataset = MsDataset(dataset)
train_dataset = dataset['train']
eval_dataset = dataset['test']
max_epochs = 1
tmp_dir = './gpt3_poetry'
num_warmup_steps = 100
def noam_lambda(current_step: int):
    current_step += 1
    return min(current_step**(-0.5),
               current_step * num_warmup_steps**(-1.5))
def cfg_modify_fn(cfg):
    cfg.train.lr_scheduler = {
        'type': 'LambdaLR',
        'lr_lambda': noam_lambda,
        'options': {
            'by_epoch': False
        }
    }
    cfg.train.optimizer = {'type': 'AdamW', 'lr': 3e-4}
    cfg.train.dataloader = {
        'batch_size_per_gpu': 16,
        'workers_per_gpu': 1
    }
    cfg.train.hooks.append({
        'type': 'EvaluationHook',
        'by_epoch': True,
        'interval': 1
    })
    cfg.evaluation.dataloader = {
        'batch_size_per_gpu': 8,
        'workers_per_gpu': 1
    }
    cfg.evaluation.metrics = 'ppl'
    return cfg
kwargs = dict(
    model='damo/nlp_gpt3_text-generation_1.3B',
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    max_epochs=max_epochs,
    work_dir=tmp_dir,
    cfg_modify_fn=cfg_modify_fn)
# Construct trainer and train
trainer = build_trainer(
    name=Trainers.gpt3_trainer, default_args=kwargs)
trainer.train()
2023-02-21 17:26:26,985 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.
2023-02-21 17:26:26,988 - modelscope - INFO - Loading ast index from /mnt/workspace/.cache/modelscope/ast_indexer
2023-02-21 17:26:27,089 - modelscope - INFO - Loading done! Current index file version is 1.3.0, with md5 669801b5418712a4860fdb4442e1ae9e and a total number of 746 components indexed
Using custom data configuration default-40dd9dfcac29625e
Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-40dd9dfcac29625e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317)
2023-02-21 17:26:33,185 - modelscope - INFO - Model revision not specified, use the latest revision: v1.2.0
2023-02-21 17:26:33,380 - modelscope - INFO - File config.json already in cache, skip downloading!
2023-02-21 17:26:33,381 - modelscope - INFO - File configuration.json already in cache, skip downloading!
2023-02-21 17:26:33,382 - modelscope - INFO - File gpt.png already in cache, skip downloading!
2023-02-21 17:26:33,382 - modelscope - INFO - File mp_rank_00_model_states.pt already in cache, skip downloading!
2023-02-21 17:26:33,383 - modelscope - INFO - File README.md already in cache, skip downloading!
2023-02-21 17:26:33,383 - modelscope - INFO - File tokenizer.json already in cache, skip downloading!
2023-02-21 17:26:33,388 - modelscope - INFO - ==========================Training Config Start==========================
2023-02-21 17:26:33,389 - modelscope - INFO - {
    "framework": "pytorch",
    "task": "text-generation",
    "preprocessor": {
        "type": "text-gen-jieba-tokenizer"
    },
    "model": {
        "type": "gpt3",
        "world_size": 1,
        "model_parallel_size": 1,
        "rank": 0
    },
    "pipeline": {
        "type": "gpt3-generation"
    },
    "train": {
        "work_dir": "/tmp",
        "max_epochs": 3,
        "dataloader": {
            "batch_size_per_gpu": 16,
            "workers_per_gpu": 1
        },
        "optimizer": {
            "type": "AdamW",
            "lr": 0.0003
        },
        "lr_scheduler": {
            "type": "LambdaLR",
            "lr_lambda": null,
            "options": {
                "by_epoch": false
            }
        },
        "hooks": [
            {
                "type": "CheckpointHook",
                "interval": 1
            },
            {
                "type": "TextLoggerHook",
                "interval": 1
            },
            {
                "type": "IterTimerHook"
            },
            {
                "type": "EvaluationHook",
                "by_epoch": true,
                "interval": 1
            }
        ]
    },
    "evaluation": {
        "dataloader": {
            "batch_size_per_gpu": 8,
            "workers_per_gpu": 1
        },
        "metrics": "ppl"
    },
    "megatron": {
        "tensor_model_parallel_size": 8
    }
}
2023-02-21 17:26:33,389 - modelscope - INFO - ===========================Training Config End===========================
2023-02-21 17:26:33,390 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/nlp_gpt3_text-generation_1.3B
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 
using torch.float32 for parameters ...
> initializing torch distributed ...
> initializing tensor model parallel with size 1
> initializing pipeline model parallel with size 1
> setting random seeds to 42 ...
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling and loading fused kernels ...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
>>> done with compiling and loading fused kernels. Compilation time: 4.448 seconds
2023-02-21 17:26:41,026 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file
2023-02-21 17:26:41,027 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file
2023-02-21 17:26:41,032 - modelscope - WARNING - ('OPTIMIZER', 'default', 'AdamW') not found in ast index file
2023-02-21 17:26:41,034 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'LambdaLR') not found in ast index file
2023-02-21 17:26:41,039 - modelscope - INFO - Checkpoints will be saved to ./gpt3_poetry
2023-02-21 17:26:41,040 - modelscope - INFO - Text logs will be saved to ./gpt3_poetry
2023-02-21 17:26:43,552 - modelscope - INFO - epoch [1][1/1]  lr: 3.000e-07, eta: 0:00:00, iter_time: 2.466, data_load_time: 0.788, memory: 21173, loss: 9.4350
Total test samples: 100%|██████████| 10/10 [00:01<00:00,  5.23it/s]
2023-02-21 17:26:45,509 - modelscope - INFO - Saving checkpoint at 1 epoch
2023-02-21 17:28:41,272 - modelscope - INFO - epoch(eval) [1][2]  memory: 24425, evaluation/ppl: 979

注意:上述代码如果出现“Address already in use”错误,则需要运行以下代码清理端口上正在执行的程序,或者重启kernel再执行续写训练代码

apt-get update

apt install net-tools

netstat -tunlp|grep 29500

kill -9 PID (需要替换成上一行代码执行结果中对应的程序ID)

以上为单卡训练脚本,用户也可以在命令行使用 torchrun 拉起训练

! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/finetune_poetry.py 
--2023-02-21 17:43:35--  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/finetune_poetry.py
正在解析主机 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27
正在连接 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 1616 (1.6K) [application/octet-stream]
正在保存至: “finetune_poetry.py”
finetune_poetry.py  100%[===================>]   1.58K  --.-KB/s    用时 0s      
2023-02-21 17:43:35 (43.9 MB/s) - 已保存 “finetune_poetry.py” [1616/1616])
! torchrun finetune_poetry.py
2023-02-21 17:43:48,727 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.
2023-02-21 17:43:48,729 - modelscope - INFO - Loading ast index from /mnt/workspace/.cache/modelscope/ast_indexer
2023-02-21 17:43:48,762 - modelscope - INFO - Loading done! Current index file version is 1.3.0, with md5 669801b5418712a4860fdb4442e1ae9e and a total number of 746 components indexed
Using custom data configuration default-40dd9dfcac29625e
Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-40dd9dfcac29625e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317)
100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 759.49it/s]
2023-02-21 17:43:54,335 - modelscope - INFO - Model revision not specified, use the latest revision: v1.2.0
2023-02-21 17:43:54,518 - modelscope - INFO - File config.json already in cache, skip downloading!
2023-02-21 17:43:54,518 - modelscope - INFO - File configuration.json already in cache, skip downloading!
2023-02-21 17:43:54,518 - modelscope - INFO - File gpt.png already in cache, skip downloading!
2023-02-21 17:43:54,518 - modelscope - INFO - File mp_rank_00_model_states.pt already in cache, skip downloading!
2023-02-21 17:43:54,518 - modelscope - INFO - File README.md already in cache, skip downloading!
2023-02-21 17:43:54,518 - modelscope - INFO - File tokenizer.json already in cache, skip downloading!
2023-02-21 17:43:54,523 - modelscope - INFO - ==========================Training Config Start==========================
2023-02-21 17:43:54,523 - modelscope - INFO - {
    "framework": "pytorch",
    "task": "text-generation",
    "preprocessor": {
        "type": "text-gen-jieba-tokenizer"
    },
    "model": {
        "type": "gpt3",
        "world_size": 1,
        "model_parallel_size": 1,
        "rank": 0
    },
    "pipeline": {
        "type": "gpt3-generation"
    },
    "train": {
        "work_dir": "/tmp",
        "max_epochs": 3,
        "dataloader": {
            "batch_size_per_gpu": 16,
            "workers_per_gpu": 1
        },
        "optimizer": {
            "type": "AdamW",
            "lr": 0.0003
        },
        "lr_scheduler": {
            "type": "LambdaLR",
            "lr_lambda": null,
            "options": {
                "by_epoch": false
            }
        },
        "hooks": [
            {
                "type": "CheckpointHook",
                "interval": 1
            },
            {
                "type": "TextLoggerHook",
                "interval": 1
            },
            {
                "type": "IterTimerHook"
            },
            {
                "type": "EvaluationHook",
                "by_epoch": true,
                "interval": 1
            }
        ]
    },
    "evaluation": {
        "dataloader": {
            "batch_size_per_gpu": 8,
            "workers_per_gpu": 1
        },
        "metrics": "ppl"
    },
    "megatron": {
        "tensor_model_parallel_size": 8
    }
}
2023-02-21 17:43:54,523 - modelscope - INFO - ===========================Training Config End===========================
2023-02-21 17:43:54,523 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/nlp_gpt3_text-generation_1.3B
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 
using torch.float32 for parameters ...
> initializing torch distributed ...
> initializing tensor model parallel with size 1
> initializing pipeline model parallel with size 1
> setting random seeds to 42 ...
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling and loading fused kernels ...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
>>> done with compiling and loading fused kernels. Compilation time: 4.431 seconds
2023-02-21 17:44:02,106 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file
2023-02-21 17:44:02,107 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file
2023-02-21 17:44:02,110 - modelscope - WARNING - ('OPTIMIZER', 'default', 'AdamW') not found in ast index file
2023-02-21 17:44:02,112 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'LambdaLR') not found in ast index file
2023-02-21 17:44:02,124 - modelscope - INFO - Checkpoints will be saved to ./gpt3_poetry
2023-02-21 17:44:02,124 - modelscope - INFO - Text logs will be saved to ./gpt3_poetry
2023-02-21 17:44:04,649 - modelscope - INFO - epoch [1][1/1]  lr: 3.000e-07, eta: 0:00:00, iter_time: 2.478, data_load_time: 0.795, memory: 21173, loss: 9.4350
Total test samples: 100%|███████████████████████| 10/10 [00:02<00:00,  4.98it/s]
2023-02-21 17:44:06,696 - modelscope - INFO - Saving checkpoint at 1 epoch
2023-02-21 17:45:58,245 - modelscope - INFO - epoch(eval) [1][2]  memory: 24425, evaluation/ppl: 9799.8618

输入输出形式训练

下面是基于ModelScope-GPT3中文1.3B模型在Dureader问题生成数据集上二次开发训练。首先,您需要下载用于本示例的训练和测试集:

! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/train_dureader.csv
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/test_dureader.csv
--2023-02-22 10:07:44--  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/train_dureader.csv
正在解析主机 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27
正在连接 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 4317 (4.2K) [text/csv]
正在保存至: “train_dureader.csv.1”
train_dureader.csv. 100%[===================>]   4.22K  --.-KB/s    用时 0s      
2023-02-22 10:07:44 (9.69 MB/s) - 已保存 “train_dureader.csv.1” [4317/4317])
--2023-02-22 10:07:45--  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/test_dureader.csv
正在解析主机 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27
正在连接 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 2109 (2.1K) [text/csv]
正在保存至: “test_dureader.csv”
test_dureader.csv   100%[===================>]   2.06K  --.-KB/s    用时 0s      
2023-02-22 10:07:45 (54.7 MB/s) - 已保存 “test_dureader.csv” [2109/2109])

数据下载完成后,可以通过以下代码查看前3条数据。其中,每一行为一条数据。

用户也可以按照格式准备自己的数据,数据格式为csv,第一行第一列是src_txt,后续每一行都为输入的文本;第二行第一列是tgt_txt,后续每一行都为用户期望模型输出的文本。

print('Training data sample:')
! head -n 3 train_dureader.csv
print('Test set data sample:')
! head -n 3 test_dureader.csv
Training data sample:
src_txt,tgt_txt
第35集<sep>第35集雪见缓缓张开眼睛,景天又惊又喜之际,长卿和紫萱的仙船驶至,见众人无恙,也十分高兴。众人登船,用尽合力把自身的真气和水分输给她。雪见终于醒过来了,但却一脸木然,全无反应。众人向常胤求助,却发现人世界竟没有雪见的身世纪录。长卿询问清微的身世,清微语带双关说一切上了天界便有答案。长卿驾驶仙船,众人决定立马动身,往天界而去。众人来到一荒山,长卿指出,魔界和天界相连。由魔界进入通过神魔之井,便可登天。众人至魔界入口,仿若一黑色的蝙蝠洞,但始终无法进入。后来花楹发现只要有翅膀便能飞入。于是景天等人打下许多乌鸦,模仿重楼的翅膀,制作数对翅膀状巨物。刚佩戴在身,便被吸入洞口。众人摔落在地,抬头发现魔界守卫。景天和众魔套交情,自称和魔尊重楼相熟,众魔不理,打了起来。\n,仙剑奇侠传3第几集上天界
方太<sep>选择燃气热水器时,一定要关注这几个问题:1、出水稳定性要好,不能出现忽热忽冷的现象2、快速到达设定的需求水温3、操作要智能、方便4、安全性要好,要装有安全报警装置 市场上燃气热水器品牌众多,购买时还需多加对比和仔细鉴别。方太今年主打的磁化恒温热水器在使用体验方面做了全面升级:9秒速热,可快速进入洗浴模式;水温持久稳定,不会出现忽热忽冷的现象,并通过水量伺服技术将出水温度精确控制在±0.5℃,可满足家里宝贝敏感肌肤洗护需求;配备CO和CH4双气体报警装置更安全(市场上一般多为CO单气体报警)。另外,这款热水器还有智能WIFI互联功能,只需下载个手机APP即可用手机远程操作热水器,实现精准调节水温,满足家人多样化的洗浴需求。当然方太的磁化恒温系列主要的是增加磁化功能,可以有效吸附水中的铁锈、铁屑等微小杂质,防止细菌滋生,使沐浴水质更洁净,长期使用磁化水沐浴更利于身体健康。\n,燃气热水器哪个牌子好
Test set data sample:
src_txt,tgt_txt
15个<sep>迈克尔.乔丹在NBA打了15个赛季。他在84年进入nba,期间在1993年10月6日第一次退役改打棒球,95年3月18日重新回归,在99年1月13日第二次退役,后于2001年10月31日复出,在03年最终退役。迈克尔·乔丹(Michael Jordan),1963年2月17日生于纽约布鲁克林,美国著名篮球运动员,司职得分后卫,历史上最伟大的篮球运动员。1984年的NBA选秀大会,乔丹在首轮第3顺位被芝加哥公牛队选中。 1986-87赛季,乔丹场均得到37.1分,首次获得分王称号。1990-91赛季,乔丹连夺常规赛MVP和总决赛MVP称号,率领芝加哥公牛首次夺得NBA总冠军。 1997-98赛季,乔丹获得个人职业生涯第10个得分王,并率领公牛队第六次夺得总冠军。2009年9月11日,乔丹正式入选NBA名人堂。\n,乔丹打了多少个赛季
曙光男科医院<sep>杜达昌重庆曙光男科医院临床专家,曙光名医堂中心专家,从事泌尿(男科)工作30多年,擅长泌尿生殖肿瘤、前列腺增生、尿路结石男性生殖整形等疑难疾病诊治,独立开展前列腺汽化电切手术千余例,技术精湛,经验丰富。...[详情]\n,重庆割包皮哪好
# finetune_dureader.py
from torch.utils.tensorboard import SummaryWriter
from modelscope.msdatasets import MsDataset
from modelscope.trainers import build_trainer
from modelscope.metainfo import Trainers
from datasets import load_dataset
data_files = {"train": "train_dureader.csv", "test": "test_dureader.csv"}
dataset = load_dataset("csv", data_files=data_files, delimiter=",")
dataset = MsDataset(dataset)
train_dataset = dataset['train']
eval_dataset = dataset['test']
max_epochs = 1
tmp_dir = './gpt3_dureader'
num_warmup_steps = 200
def noam_lambda(current_step: int):
    current_step += 1
    return min(current_step**(-0.5),
               current_step * num_warmup_steps**(-1.5))
def cfg_modify_fn(cfg):
    cfg.train.lr_scheduler = {
        'type': 'LambdaLR',
        'lr_lambda': noam_lambda,
        'options': {
            'by_epoch': False
        }
    }
    cfg.train.optimizer = {'type': 'AdamW', 'lr': 1e-4}
    cfg.train.dataloader = {
        'batch_size_per_gpu': 4,
        'workers_per_gpu': 1
    }
    cfg.train.hooks.append({
        'type': 'EvaluationHook',
        'by_epoch': True,
        'interval': 1
    })
    cfg.preprocessor.sequence_length = 512
    cfg.model.checkpoint_model_parallel_size = 1
    return cfg
kwargs = dict(
    model='damo/nlp_gpt3_text-generation_1.3B',
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    max_epochs=max_epochs,
    work_dir=tmp_dir,
    cfg_modify_fn=cfg_modify_fn)
trainer = build_trainer(
    name=Trainers.gpt3_trainer, default_args=kwargs)
trainer.train()
2023-02-22 10:01:11,421 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.
2023-02-22 10:01:11,426 - modelscope - INFO - Loading ast index from /mnt/workspace/.cache/modelscope/ast_indexer
2023-02-22 10:01:11,531 - modelscope - INFO - Loading done! Current index file version is 1.3.0, with md5 669801b5418712a4860fdb4442e1ae9e and a total number of 746 components indexed
Using custom data configuration default-bf273fb84d6c068b
Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-bf273fb84d6c068b/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317)
2023-02-22 10:01:17,951 - modelscope - INFO - Model revision not specified, use the latest revision: v1.2.0
2023-02-22 10:01:18,153 - modelscope - INFO - File config.json already in cache, skip downloading!
2023-02-22 10:01:18,154 - modelscope - INFO - File configuration.json already in cache, skip downloading!
2023-02-22 10:01:18,155 - modelscope - INFO - File gpt.png already in cache, skip downloading!
2023-02-22 10:01:18,155 - modelscope - INFO - File mp_rank_00_model_states.pt already in cache, skip downloading!
2023-02-22 10:01:18,156 - modelscope - INFO - File README.md already in cache, skip downloading!
2023-02-22 10:01:18,156 - modelscope - INFO - File tokenizer.json already in cache, skip downloading!
2023-02-22 10:01:18,162 - modelscope - INFO - ==========================Training Config Start==========================
2023-02-22 10:01:18,162 - modelscope - INFO - {
    "framework": "pytorch",
    "task": "text-generation",
    "preprocessor": {
        "type": "text-gen-jieba-tokenizer",
        "sequence_length": 512
    },
    "model": {
        "type": "gpt3",
        "world_size": 1,
        "model_parallel_size": 1,
        "checkpoint_model_parallel_size": 1,
        "rank": 0
    },
    "pipeline": {
        "type": "gpt3-generation"
    },
    "train": {
        "work_dir": "/tmp",
        "max_epochs": 3,
        "dataloader": {
            "batch_size_per_gpu": 4,
            "workers_per_gpu": 1
        },
        "optimizer": {
            "type": "AdamW",
            "lr": 0.0001
        },
        "lr_scheduler": {
            "type": "LambdaLR",
            "lr_lambda": null,
            "options": {
                "by_epoch": false
            }
        },
        "hooks": [
            {
                "type": "CheckpointHook",
                "interval": 1
            },
            {
                "type": "TextLoggerHook",
                "interval": 1
            },
            {
                "type": "IterTimerHook"
            },
            {
                "type": "EvaluationHook",
                "by_epoch": true,
                "interval": 1
            }
        ]
    },
    "evaluation": {
        "dataloader": {
            "batch_size_per_gpu": 1,
            "workers_per_gpu": 1,
            "shuffle": false
        }
    },
    "megatron": {
        "tensor_model_parallel_size": 8
    }
}
2023-02-22 10:01:18,163 - modelscope - INFO - ===========================Training Config End===========================
2023-02-22 10:01:18,163 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/nlp_gpt3_text-generation_1.3B
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 
using torch.float32 for parameters ...
> initializing torch distributed ...
> initializing tensor model parallel with size 1
> initializing pipeline model parallel with size 1
> setting random seeds to 42 ...
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42
> compiling and loading fused kernels ...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module scaled_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja...
Building extension module fused_mix_prec_layer_norm_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module fused_mix_prec_layer_norm_cuda...
>>> done with compiling and loading fused kernels. Compilation time: 4.480 seconds
2023-02-22 10:01:25,755 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file
2023-02-22 10:01:25,757 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file
2023-02-22 10:01:25,762 - modelscope - WARNING - ('OPTIMIZER', 'default', 'AdamW') not found in ast index file
2023-02-22 10:01:25,764 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'LambdaLR') not found in ast index file
2023-02-22 10:01:25,769 - modelscope - INFO - Checkpoints will be saved to ./gpt3_dureader
2023-02-22 10:01:25,770 - modelscope - INFO - Text logs will be saved to ./gpt3_dureader
2023-02-22 10:01:28,858 - modelscope - INFO - epoch [1][1/2]  lr: 3.536e-08, eta: 0:00:03, iter_time: 3.041, data_load_time: 0.818, memory: 21435, loss: 9.9545
2023-02-22 10:01:30,046 - modelscope - INFO - epoch [1][2/2]  lr: 7.071e-08, eta: 0:00:00, iter_time: 1.188, data_load_time: 0.057, memory: 24157, loss: 9.8727
Total test samples: 100%|██████████| 3/3 [00:25<00:00,  8.57s/it]
2023-02-22 10:01:56,144 - modelscope - INFO - Saving checkpoint at 1 epoch
2023-02-22 10:03:54,174 - modelscope - INFO - epoch(eval) [1][3]  memory: 24157, evaluation/rouge-1: 0.2745, evaluation/rouge-l: 0.2745, evaluation/bleu-1: 0.0172, evaluation/bleu-4: 0.0089

注意:上述代码如果出现“Address already in use”错误,则需要运行以下代码清理端口上正在执行的程序,或者重启kernel再执行输入输出训练代码

apt-get update

apt install net-tools

netstat -tunlp|grep 29500

kill -9 PID (需要替换成上一行代码执行结果中对应的程序ID)

用户也可以在命令行使用 torchrun 拉起训练

! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/finetune_dureader.py
--2023-02-22 10:10:07--  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/finetune_dureader.py
正在解析主机 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27
正在连接 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度: 1562 (1.5K) [application/octet-stream]
正在保存至: “finetune_dureader.py”
finetune_dureader.p 100%[===================>]   1.53K  --.-KB/s    用时 0.001s  
2023-02-22 10:10:07 (2.41 MB/s) - 已保存 “finetune_dureader.py” [1562/1562])
# N 为模型并行度
! torchrun --nproc_per_node $N finetune_dureader.py
相关实践学习
使用PAI-EAS一键部署ChatGLM及LangChain应用
本场景中主要介绍如何使用模型在线服务(PAI-EAS)部署ChatGLM的AI-Web应用以及启动WebUI进行模型推理,并通过LangChain集成自己的业务数据。
机器学习概览及常见算法
机器学习(Machine Learning, ML)是人工智能的核心,专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能,它是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。 本课程将带你入门机器学习,掌握机器学习的概念和常用的算法。
相关文章
|
2月前
|
自然语言处理
在ModelScope中,你可以通过设置模型的参数来控制输出的阈值
在ModelScope中,你可以通过设置模型的参数来控制输出的阈值
17 1
|
2月前
|
API 语音技术
ModelScope-FunASR**有支持热词又支持时间戳的模型**。
【2月更文挑战第30天】ModelScope-FunASR**有支持热词又支持时间戳的模型**。
33 2
|
2月前
|
人工智能 达摩院 自然语言处理
超好用的开源模型平台,ModelScope阿里达摩院
超好用的开源模型平台,ModelScope阿里达摩院
102 1
|
1月前
|
机器学习/深度学习 测试技术 TensorFlow
ModelScope模型使用与EAS部署调用
本文以魔搭数据的模型为例,演示在DSW实例中如何快速调用模型,然后通过Python SDK将模型部署到阿里云PAI EAS服务,并演示使用EAS SDK实现对服务的快速调用,重点针对官方关于EAS模型上线后示例代码无法正常调通部分进行了补充。
|
2月前
|
语音技术 开发工具 git
要进行ModelScope-Funasr实时ASR的微调,您可以按照以下步骤操作:
要进行ModelScope-Funasr实时ASR的微调,您可以按照以下步骤操作:
74 5
|
2月前
modelscope-funasr怎么拿验证集评估微调后的模型效果呢
【2月更文挑战第19天】modelscope-funasr怎么拿验证集评估微调后的模型效果呢
32 1
|
前端开发 JavaScript 安全
JavaScript 权威指南第七版(GPT 重译)(七)(4)
JavaScript 权威指南第七版(GPT 重译)(七)
26 0
|
前端开发 JavaScript 算法
JavaScript 权威指南第七版(GPT 重译)(七)(3)
JavaScript 权威指南第七版(GPT 重译)(七)
38 0
|
前端开发 JavaScript Unix
JavaScript 权威指南第七版(GPT 重译)(七)(2)
JavaScript 权威指南第七版(GPT 重译)(七)
43 0
|
前端开发 JavaScript 算法
JavaScript 权威指南第七版(GPT 重译)(七)(1)
JavaScript 权威指南第七版(GPT 重译)(七)
66 0

热门文章

最新文章