直接使用
请打开基于ModelScope的中文GPT-3模型(1.3B)的微调训练,并点击右上角 “ 在DSW中打开” 。
基于ModelScope的中文GPT-3模型(1.3B)的微调训练
GPT-3模型是一个通用的预训练生成模型,使用Transformer的Decoder-only结构,可以用于解决下游各种类型的生成任务,特别是zero-shot生成能力。模型利用大量无监督数据,通过自回归任务进行预训练。可以用于解决文本生成相关的任务包含:文本摘要、问题生成、data-to-text等。本文参考modelscope社区中的GPT3中文1.3B参数量文本生成模型。
模型描述
GPT-3模型使用Transformer的Decoder结构,并对Transformer Decoder进行了一些改动,原本的Decoder包含了两个 Multi-Head Attention 结构,GPT-3只保留了 Mask Multi-Head Attention,利用常规的语言建模优化,从左到右的自回归预训练。本模型是基于GPT-3的代码结合大量中文无监督数据和下游任务数据预训练得到,ModelScope训练了多种不同参数的模型 ,GPT-3模型介绍,详见:Language models are few-shot learners
模型使用方式以及适用范围
ModelScope-GPT3可直接用于文本生成,也可以通过finetune用于各类文本理解的任务。用户可以自行尝试各种输入文档。具体调用方式请参考代码示例。
运行环境要求
使用机型最低要求为V100,内存至少为32G
模型训练流程
本模型的预训练分为两个阶段。第一阶段严格按照原始GPT3的参数设置进行:在中文wiki/ Common crawl等无监督数据上,通过自回归的训练任务训练了约300B字得到。第二阶段中,ModelScope加入了多种有监督数据继续训练,使得模型具备多种任务的zero-shot的处理能力。
ModelScope-GPT3模型支持了续写训练与输入输出形式的训练,训练方式不需要额外指定,训练数据集仅包含 src_txt 时会进行续写训练,同时包含 src_txt 和 tgt_txt 时会进行输入输出形式的训练。以下将为两种训练方式提供示例代码。
续写训练
下面是基于ModelScope-GPT3中文1.3B模型在诗词生成数据集上二次开发训练,首先,您需要下载用于本示例的训练和测试集:
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/train_poetry.csv ! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/test_poetry.csv
数据下载完成后,可以通过以下代码查看前3条数据。其中,每一行为一条数据。
用户也可以按照格式准备自己的数据,数据格式为csv,第一行是src_txt,后续每一行都为用户想要模型续写的文本。
print('Training data sample:') ! head -n 3 train_poetry.csv print('Test set data sample:') ! head -n 3 test_poetry.csv
Training data sample: src_txt 秋入黎山风雨多,长江不定驻惊波。翠榕叶底闻鹦鹉,自此无如客思何! 南州未识异州苹,初向沙头问水神。料得行藏无用卜,乘桴人是北来人。 Development set data sample: src_txt 危楼向暮倚层空,故岁今年不得同。记取合沙元夕节,满街箫鼓雨兼风。 双门此夜启崇台,次第东山好月来。自古金吾元不禁,少留大将与衔杯。
from torch.utils.tensorboard import SummaryWriter from modelscope.msdatasets import MsDataset from modelscope.trainers import build_trainer from modelscope.metainfo import Trainers from datasets import load_dataset data_files = {"train": "train_poetry.csv", "test": "test_poetry.csv"} # \t is the tab character in Python dataset = load_dataset("csv", data_files=data_files) dataset = MsDataset(dataset) train_dataset = dataset['train'] eval_dataset = dataset['test'] max_epochs = 1 tmp_dir = './gpt3_poetry' num_warmup_steps = 100 def noam_lambda(current_step: int): current_step += 1 return min(current_step**(-0.5), current_step * num_warmup_steps**(-1.5)) def cfg_modify_fn(cfg): cfg.train.lr_scheduler = { 'type': 'LambdaLR', 'lr_lambda': noam_lambda, 'options': { 'by_epoch': False } } cfg.train.optimizer = {'type': 'AdamW', 'lr': 3e-4} cfg.train.dataloader = { 'batch_size_per_gpu': 16, 'workers_per_gpu': 1 } cfg.train.hooks.append({ 'type': 'EvaluationHook', 'by_epoch': True, 'interval': 1 }) cfg.evaluation.dataloader = { 'batch_size_per_gpu': 8, 'workers_per_gpu': 1 } cfg.evaluation.metrics = 'ppl' return cfg kwargs = dict( model='damo/nlp_gpt3_text-generation_1.3B', train_dataset=train_dataset, eval_dataset=eval_dataset, max_epochs=max_epochs, work_dir=tmp_dir, cfg_modify_fn=cfg_modify_fn) # Construct trainer and train trainer = build_trainer( name=Trainers.gpt3_trainer, default_args=kwargs) trainer.train()
2023-02-21 17:26:26,985 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found. 2023-02-21 17:26:26,988 - modelscope - INFO - Loading ast index from /mnt/workspace/.cache/modelscope/ast_indexer 2023-02-21 17:26:27,089 - modelscope - INFO - Loading done! Current index file version is 1.3.0, with md5 669801b5418712a4860fdb4442e1ae9e and a total number of 746 components indexed Using custom data configuration default-40dd9dfcac29625e Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-40dd9dfcac29625e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317)
2023-02-21 17:26:33,185 - modelscope - INFO - Model revision not specified, use the latest revision: v1.2.0 2023-02-21 17:26:33,380 - modelscope - INFO - File config.json already in cache, skip downloading! 2023-02-21 17:26:33,381 - modelscope - INFO - File configuration.json already in cache, skip downloading! 2023-02-21 17:26:33,382 - modelscope - INFO - File gpt.png already in cache, skip downloading! 2023-02-21 17:26:33,382 - modelscope - INFO - File mp_rank_00_model_states.pt already in cache, skip downloading! 2023-02-21 17:26:33,383 - modelscope - INFO - File README.md already in cache, skip downloading! 2023-02-21 17:26:33,383 - modelscope - INFO - File tokenizer.json already in cache, skip downloading! 2023-02-21 17:26:33,388 - modelscope - INFO - ==========================Training Config Start========================== 2023-02-21 17:26:33,389 - modelscope - INFO - { "framework": "pytorch", "task": "text-generation", "preprocessor": { "type": "text-gen-jieba-tokenizer" }, "model": { "type": "gpt3", "world_size": 1, "model_parallel_size": 1, "rank": 0 }, "pipeline": { "type": "gpt3-generation" }, "train": { "work_dir": "/tmp", "max_epochs": 3, "dataloader": { "batch_size_per_gpu": 16, "workers_per_gpu": 1 }, "optimizer": { "type": "AdamW", "lr": 0.0003 }, "lr_scheduler": { "type": "LambdaLR", "lr_lambda": null, "options": { "by_epoch": false } }, "hooks": [ { "type": "CheckpointHook", "interval": 1 }, { "type": "TextLoggerHook", "interval": 1 }, { "type": "IterTimerHook" }, { "type": "EvaluationHook", "by_epoch": true, "interval": 1 } ] }, "evaluation": { "dataloader": { "batch_size_per_gpu": 8, "workers_per_gpu": 1 }, "metrics": "ppl" }, "megatron": { "tensor_model_parallel_size": 8 } } 2023-02-21 17:26:33,389 - modelscope - INFO - ===========================Training Config End=========================== 2023-02-21 17:26:33,390 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/nlp_gpt3_text-generation_1.3B
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 using torch.float32 for parameters ... > initializing torch distributed ... > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 1 > setting random seeds to 42 ... > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling and loading fused kernels ... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 4.448 seconds 2023-02-21 17:26:41,026 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file 2023-02-21 17:26:41,027 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file 2023-02-21 17:26:41,032 - modelscope - WARNING - ('OPTIMIZER', 'default', 'AdamW') not found in ast index file 2023-02-21 17:26:41,034 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'LambdaLR') not found in ast index file 2023-02-21 17:26:41,039 - modelscope - INFO - Checkpoints will be saved to ./gpt3_poetry 2023-02-21 17:26:41,040 - modelscope - INFO - Text logs will be saved to ./gpt3_poetry 2023-02-21 17:26:43,552 - modelscope - INFO - epoch [1][1/1] lr: 3.000e-07, eta: 0:00:00, iter_time: 2.466, data_load_time: 0.788, memory: 21173, loss: 9.4350 Total test samples: 100%|██████████| 10/10 [00:01<00:00, 5.23it/s] 2023-02-21 17:26:45,509 - modelscope - INFO - Saving checkpoint at 1 epoch 2023-02-21 17:28:41,272 - modelscope - INFO - epoch(eval) [1][2] memory: 24425, evaluation/ppl: 979
注意:上述代码如果出现“Address already in use”错误,则需要运行以下代码清理端口上正在执行的程序,或者重启kernel再执行续写训练代码
apt-get update
apt install net-tools
netstat -tunlp|grep 29500
kill -9 PID (需要替换成上一行代码执行结果中对应的程序ID)
以上为单卡训练脚本,用户也可以在命令行使用 torchrun 拉起训练
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/finetune_poetry.py
--2023-02-21 17:43:35-- https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/finetune_poetry.py 正在解析主机 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 正在连接 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度: 1616 (1.6K) [application/octet-stream] 正在保存至: “finetune_poetry.py” finetune_poetry.py 100%[===================>] 1.58K --.-KB/s 用时 0s 2023-02-21 17:43:35 (43.9 MB/s) - 已保存 “finetune_poetry.py” [1616/1616])
! torchrun finetune_poetry.py
2023-02-21 17:43:48,727 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found. 2023-02-21 17:43:48,729 - modelscope - INFO - Loading ast index from /mnt/workspace/.cache/modelscope/ast_indexer 2023-02-21 17:43:48,762 - modelscope - INFO - Loading done! Current index file version is 1.3.0, with md5 669801b5418712a4860fdb4442e1ae9e and a total number of 746 components indexed Using custom data configuration default-40dd9dfcac29625e Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-40dd9dfcac29625e/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317) 100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 759.49it/s] 2023-02-21 17:43:54,335 - modelscope - INFO - Model revision not specified, use the latest revision: v1.2.0 2023-02-21 17:43:54,518 - modelscope - INFO - File config.json already in cache, skip downloading! 2023-02-21 17:43:54,518 - modelscope - INFO - File configuration.json already in cache, skip downloading! 2023-02-21 17:43:54,518 - modelscope - INFO - File gpt.png already in cache, skip downloading! 2023-02-21 17:43:54,518 - modelscope - INFO - File mp_rank_00_model_states.pt already in cache, skip downloading! 2023-02-21 17:43:54,518 - modelscope - INFO - File README.md already in cache, skip downloading! 2023-02-21 17:43:54,518 - modelscope - INFO - File tokenizer.json already in cache, skip downloading! 2023-02-21 17:43:54,523 - modelscope - INFO - ==========================Training Config Start========================== 2023-02-21 17:43:54,523 - modelscope - INFO - { "framework": "pytorch", "task": "text-generation", "preprocessor": { "type": "text-gen-jieba-tokenizer" }, "model": { "type": "gpt3", "world_size": 1, "model_parallel_size": 1, "rank": 0 }, "pipeline": { "type": "gpt3-generation" }, "train": { "work_dir": "/tmp", "max_epochs": 3, "dataloader": { "batch_size_per_gpu": 16, "workers_per_gpu": 1 }, "optimizer": { "type": "AdamW", "lr": 0.0003 }, "lr_scheduler": { "type": "LambdaLR", "lr_lambda": null, "options": { "by_epoch": false } }, "hooks": [ { "type": "CheckpointHook", "interval": 1 }, { "type": "TextLoggerHook", "interval": 1 }, { "type": "IterTimerHook" }, { "type": "EvaluationHook", "by_epoch": true, "interval": 1 } ] }, "evaluation": { "dataloader": { "batch_size_per_gpu": 8, "workers_per_gpu": 1 }, "metrics": "ppl" }, "megatron": { "tensor_model_parallel_size": 8 } } 2023-02-21 17:43:54,523 - modelscope - INFO - ===========================Training Config End=========================== 2023-02-21 17:43:54,523 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/nlp_gpt3_text-generation_1.3B using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 using torch.float32 for parameters ... > initializing torch distributed ... > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 1 > setting random seeds to 42 ... > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling and loading fused kernels ... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 4.431 seconds 2023-02-21 17:44:02,106 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file 2023-02-21 17:44:02,107 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file 2023-02-21 17:44:02,110 - modelscope - WARNING - ('OPTIMIZER', 'default', 'AdamW') not found in ast index file 2023-02-21 17:44:02,112 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'LambdaLR') not found in ast index file 2023-02-21 17:44:02,124 - modelscope - INFO - Checkpoints will be saved to ./gpt3_poetry 2023-02-21 17:44:02,124 - modelscope - INFO - Text logs will be saved to ./gpt3_poetry 2023-02-21 17:44:04,649 - modelscope - INFO - epoch [1][1/1] lr: 3.000e-07, eta: 0:00:00, iter_time: 2.478, data_load_time: 0.795, memory: 21173, loss: 9.4350 Total test samples: 100%|███████████████████████| 10/10 [00:02<00:00, 4.98it/s] 2023-02-21 17:44:06,696 - modelscope - INFO - Saving checkpoint at 1 epoch 2023-02-21 17:45:58,245 - modelscope - INFO - epoch(eval) [1][2] memory: 24425, evaluation/ppl: 9799.8618
输入输出形式训练
下面是基于ModelScope-GPT3中文1.3B模型在Dureader问题生成数据集上二次开发训练。首先,您需要下载用于本示例的训练和测试集:
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/train_dureader.csv ! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/test_dureader.csv
--2023-02-22 10:07:44-- https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/train_dureader.csv 正在解析主机 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 正在连接 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度: 4317 (4.2K) [text/csv] 正在保存至: “train_dureader.csv.1” train_dureader.csv. 100%[===================>] 4.22K --.-KB/s 用时 0s 2023-02-22 10:07:44 (9.69 MB/s) - 已保存 “train_dureader.csv.1” [4317/4317]) --2023-02-22 10:07:45-- https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/test_dureader.csv 正在解析主机 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 正在连接 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度: 2109 (2.1K) [text/csv] 正在保存至: “test_dureader.csv” test_dureader.csv 100%[===================>] 2.06K --.-KB/s 用时 0s 2023-02-22 10:07:45 (54.7 MB/s) - 已保存 “test_dureader.csv” [2109/2109])
数据下载完成后,可以通过以下代码查看前3条数据。其中,每一行为一条数据。
用户也可以按照格式准备自己的数据,数据格式为csv,第一行第一列是src_txt,后续每一行都为输入的文本;第二行第一列是tgt_txt,后续每一行都为用户期望模型输出的文本。
print('Training data sample:') ! head -n 3 train_dureader.csv print('Test set data sample:') ! head -n 3 test_dureader.csv
Training data sample: src_txt,tgt_txt 第35集<sep>第35集雪见缓缓张开眼睛,景天又惊又喜之际,长卿和紫萱的仙船驶至,见众人无恙,也十分高兴。众人登船,用尽合力把自身的真气和水分输给她。雪见终于醒过来了,但却一脸木然,全无反应。众人向常胤求助,却发现人世界竟没有雪见的身世纪录。长卿询问清微的身世,清微语带双关说一切上了天界便有答案。长卿驾驶仙船,众人决定立马动身,往天界而去。众人来到一荒山,长卿指出,魔界和天界相连。由魔界进入通过神魔之井,便可登天。众人至魔界入口,仿若一黑色的蝙蝠洞,但始终无法进入。后来花楹发现只要有翅膀便能飞入。于是景天等人打下许多乌鸦,模仿重楼的翅膀,制作数对翅膀状巨物。刚佩戴在身,便被吸入洞口。众人摔落在地,抬头发现魔界守卫。景天和众魔套交情,自称和魔尊重楼相熟,众魔不理,打了起来。\n,仙剑奇侠传3第几集上天界 方太<sep>选择燃气热水器时,一定要关注这几个问题:1、出水稳定性要好,不能出现忽热忽冷的现象2、快速到达设定的需求水温3、操作要智能、方便4、安全性要好,要装有安全报警装置 市场上燃气热水器品牌众多,购买时还需多加对比和仔细鉴别。方太今年主打的磁化恒温热水器在使用体验方面做了全面升级:9秒速热,可快速进入洗浴模式;水温持久稳定,不会出现忽热忽冷的现象,并通过水量伺服技术将出水温度精确控制在±0.5℃,可满足家里宝贝敏感肌肤洗护需求;配备CO和CH4双气体报警装置更安全(市场上一般多为CO单气体报警)。另外,这款热水器还有智能WIFI互联功能,只需下载个手机APP即可用手机远程操作热水器,实现精准调节水温,满足家人多样化的洗浴需求。当然方太的磁化恒温系列主要的是增加磁化功能,可以有效吸附水中的铁锈、铁屑等微小杂质,防止细菌滋生,使沐浴水质更洁净,长期使用磁化水沐浴更利于身体健康。\n,燃气热水器哪个牌子好 Test set data sample: src_txt,tgt_txt 15个<sep>迈克尔.乔丹在NBA打了15个赛季。他在84年进入nba,期间在1993年10月6日第一次退役改打棒球,95年3月18日重新回归,在99年1月13日第二次退役,后于2001年10月31日复出,在03年最终退役。迈克尔·乔丹(Michael Jordan),1963年2月17日生于纽约布鲁克林,美国著名篮球运动员,司职得分后卫,历史上最伟大的篮球运动员。1984年的NBA选秀大会,乔丹在首轮第3顺位被芝加哥公牛队选中。 1986-87赛季,乔丹场均得到37.1分,首次获得分王称号。1990-91赛季,乔丹连夺常规赛MVP和总决赛MVP称号,率领芝加哥公牛首次夺得NBA总冠军。 1997-98赛季,乔丹获得个人职业生涯第10个得分王,并率领公牛队第六次夺得总冠军。2009年9月11日,乔丹正式入选NBA名人堂。\n,乔丹打了多少个赛季 曙光男科医院<sep>杜达昌重庆曙光男科医院临床专家,曙光名医堂中心专家,从事泌尿(男科)工作30多年,擅长泌尿生殖肿瘤、前列腺增生、尿路结石男性生殖整形等疑难疾病诊治,独立开展前列腺汽化电切手术千余例,技术精湛,经验丰富。...[详情]\n,重庆割包皮哪好
# finetune_dureader.py from torch.utils.tensorboard import SummaryWriter from modelscope.msdatasets import MsDataset from modelscope.trainers import build_trainer from modelscope.metainfo import Trainers from datasets import load_dataset data_files = {"train": "train_dureader.csv", "test": "test_dureader.csv"} dataset = load_dataset("csv", data_files=data_files, delimiter=",") dataset = MsDataset(dataset) train_dataset = dataset['train'] eval_dataset = dataset['test'] max_epochs = 1 tmp_dir = './gpt3_dureader' num_warmup_steps = 200 def noam_lambda(current_step: int): current_step += 1 return min(current_step**(-0.5), current_step * num_warmup_steps**(-1.5)) def cfg_modify_fn(cfg): cfg.train.lr_scheduler = { 'type': 'LambdaLR', 'lr_lambda': noam_lambda, 'options': { 'by_epoch': False } } cfg.train.optimizer = {'type': 'AdamW', 'lr': 1e-4} cfg.train.dataloader = { 'batch_size_per_gpu': 4, 'workers_per_gpu': 1 } cfg.train.hooks.append({ 'type': 'EvaluationHook', 'by_epoch': True, 'interval': 1 }) cfg.preprocessor.sequence_length = 512 cfg.model.checkpoint_model_parallel_size = 1 return cfg kwargs = dict( model='damo/nlp_gpt3_text-generation_1.3B', train_dataset=train_dataset, eval_dataset=eval_dataset, max_epochs=max_epochs, work_dir=tmp_dir, cfg_modify_fn=cfg_modify_fn) trainer = build_trainer( name=Trainers.gpt3_trainer, default_args=kwargs) trainer.train()
2023-02-22 10:01:11,421 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found. 2023-02-22 10:01:11,426 - modelscope - INFO - Loading ast index from /mnt/workspace/.cache/modelscope/ast_indexer 2023-02-22 10:01:11,531 - modelscope - INFO - Loading done! Current index file version is 1.3.0, with md5 669801b5418712a4860fdb4442e1ae9e and a total number of 746 components indexed Using custom data configuration default-bf273fb84d6c068b Found cached dataset csv (/root/.cache/huggingface/datasets/csv/default-bf273fb84d6c068b/0.0.0/6b34fb8fcf56f7c8ba51dc895bfa2bfbe43546f190a60fcf74bb5e8afdcc2317)
2023-02-22 10:01:17,951 - modelscope - INFO - Model revision not specified, use the latest revision: v1.2.0 2023-02-22 10:01:18,153 - modelscope - INFO - File config.json already in cache, skip downloading! 2023-02-22 10:01:18,154 - modelscope - INFO - File configuration.json already in cache, skip downloading! 2023-02-22 10:01:18,155 - modelscope - INFO - File gpt.png already in cache, skip downloading! 2023-02-22 10:01:18,155 - modelscope - INFO - File mp_rank_00_model_states.pt already in cache, skip downloading! 2023-02-22 10:01:18,156 - modelscope - INFO - File README.md already in cache, skip downloading! 2023-02-22 10:01:18,156 - modelscope - INFO - File tokenizer.json already in cache, skip downloading! 2023-02-22 10:01:18,162 - modelscope - INFO - ==========================Training Config Start========================== 2023-02-22 10:01:18,162 - modelscope - INFO - { "framework": "pytorch", "task": "text-generation", "preprocessor": { "type": "text-gen-jieba-tokenizer", "sequence_length": 512 }, "model": { "type": "gpt3", "world_size": 1, "model_parallel_size": 1, "checkpoint_model_parallel_size": 1, "rank": 0 }, "pipeline": { "type": "gpt3-generation" }, "train": { "work_dir": "/tmp", "max_epochs": 3, "dataloader": { "batch_size_per_gpu": 4, "workers_per_gpu": 1 }, "optimizer": { "type": "AdamW", "lr": 0.0001 }, "lr_scheduler": { "type": "LambdaLR", "lr_lambda": null, "options": { "by_epoch": false } }, "hooks": [ { "type": "CheckpointHook", "interval": 1 }, { "type": "TextLoggerHook", "interval": 1 }, { "type": "IterTimerHook" }, { "type": "EvaluationHook", "by_epoch": true, "interval": 1 } ] }, "evaluation": { "dataloader": { "batch_size_per_gpu": 1, "workers_per_gpu": 1, "shuffle": false } }, "megatron": { "tensor_model_parallel_size": 8 } } 2023-02-22 10:01:18,163 - modelscope - INFO - ===========================Training Config End=========================== 2023-02-22 10:01:18,163 - modelscope - INFO - initialize model from /mnt/workspace/.cache/modelscope/damo/nlp_gpt3_text-generation_1.3B
using world size: 1, data-parallel-size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1 using torch.float32 for parameters ... > initializing torch distributed ... > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 1 > setting random seeds to 42 ... > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 2760 and data parallel seed: 42 > compiling and loading fused kernels ... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module scaled_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_softmax_cuda... Detected CUDA files, patching ldflags Emitting ninja build file /opt/conda/lib/python3.7/site-packages/megatron_util/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... >>> done with compiling and loading fused kernels. Compilation time: 4.480 seconds
2023-02-22 10:01:25,755 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file 2023-02-22 10:01:25,757 - modelscope - WARNING - ('TASK_DATASETS', 'text-generation', 'gpt3') not found in ast index file 2023-02-22 10:01:25,762 - modelscope - WARNING - ('OPTIMIZER', 'default', 'AdamW') not found in ast index file 2023-02-22 10:01:25,764 - modelscope - WARNING - ('LR_SCHEDULER', 'default', 'LambdaLR') not found in ast index file 2023-02-22 10:01:25,769 - modelscope - INFO - Checkpoints will be saved to ./gpt3_dureader 2023-02-22 10:01:25,770 - modelscope - INFO - Text logs will be saved to ./gpt3_dureader 2023-02-22 10:01:28,858 - modelscope - INFO - epoch [1][1/2] lr: 3.536e-08, eta: 0:00:03, iter_time: 3.041, data_load_time: 0.818, memory: 21435, loss: 9.9545 2023-02-22 10:01:30,046 - modelscope - INFO - epoch [1][2/2] lr: 7.071e-08, eta: 0:00:00, iter_time: 1.188, data_load_time: 0.057, memory: 24157, loss: 9.8727 Total test samples: 100%|██████████| 3/3 [00:25<00:00, 8.57s/it] 2023-02-22 10:01:56,144 - modelscope - INFO - Saving checkpoint at 1 epoch 2023-02-22 10:03:54,174 - modelscope - INFO - epoch(eval) [1][3] memory: 24157, evaluation/rouge-1: 0.2745, evaluation/rouge-l: 0.2745, evaluation/bleu-1: 0.0172, evaluation/bleu-4: 0.0089
注意:上述代码如果出现“Address already in use”错误,则需要运行以下代码清理端口上正在执行的程序,或者重启kernel再执行输入输出训练代码
apt-get update
apt install net-tools
netstat -tunlp|grep 29500
kill -9 PID (需要替换成上一行代码执行结果中对应的程序ID)
用户也可以在命令行使用 torchrun 拉起训练
! wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/finetune_dureader.py
--2023-02-22 10:10:07-- https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/GPT-3/finetune_dureader.py 正在解析主机 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)... 47.101.88.27 正在连接 atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com (atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com)|47.101.88.27|:443... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度: 1562 (1.5K) [application/octet-stream] 正在保存至: “finetune_dureader.py” finetune_dureader.p 100%[===================>] 1.53K --.-KB/s 用时 0.001s 2023-02-22 10:10:07 (2.41 MB/s) - 已保存 “finetune_dureader.py” [1562/1562])
# N 为模型并行度 ! torchrun --nproc_per_node $N finetune_dureader.py