【SPACE-T表格问答预训练模型-中文-通用领域-base】训练时使用gpu报错

将模型configuration.json里面的device设置为"cuda",运行官网提供的训练脚本，报张量不在同一设备的问题。 版本:

python                        3.8.16
modelscope                    1.3.0
torch                         1.10.0
torchaudio                    0.10.0
torchvision                   0.11.0

configuration.json：

"device": "cuda", # 只修改了这一项

log：

2023-03-17 10:33:42,503 - modelscope - INFO - PyTorch version 1.10.0 Found.
2023-03-17 10:33:42,504 - modelscope - INFO - Loading ast index from /home/xuc/.cache/modelscope/ast_indexer
2023-03-17 10:33:42,529 - modelscope - INFO - Loading done! Current index file version is 1.3.0, with md5 6087da66a93f94dc2d05987df0e603c5 and a total number of 746 components indexed
2023-03-17 10:33:44,606 - modelscope - INFO - No subset_name specified, defaulting to the default
Using custom data configuration modelscope-6e91f528cf9cd8e0
Downloading and preparing dataset ChineseText2SQL/modelscope to /home/xuc/.cache/modelscope/hub/datasets/modelscope/ChineseText2SQL/master/meta/modelscope___dataset_builder/modelscope-6e91f528cf9cd8e0/master/train_test...
Downloading data: 100%|██████████████████████████████████████████████████████████████████████| 6.79k/6.79k [00:00<00:00, 10.7MB/s]
Downloading data: 100%|██████████████████████████████████████████████████████████████████████| 1.19k/1.19k [00:00<00:00, 3.01MB/s]
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.97it/s]
Extracting data files: 100%|███████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 782.30it/s]
Downloading data files: 100%|███████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.79it/s]
Extracting data files: 100%|████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 17.19it/s]
Dataset chinese_text2_sql downloaded and prepared to /home/xuc/.cache/modelscope/hub/datasets/modelscope/ChineseText2SQL/master/meta/modelscope___dataset_builder/modelscope-6e91f528cf9cd8e0/master/train_test. Subsequent calls will reuse this data.
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 772.57it/s]
size of training set 500
size of evaluation set 100
2023-03-17 10:33:47,756 - modelscope - INFO - Model revision not specified, use the latest revision: v1.0.3
2023-03-17 10:33:47,939 - modelscope - INFO - File configuration.json already in cache, skip downloading!
2023-03-17 10:33:47,939 - modelscope - INFO - File pytorch_model.bin already in cache, skip downloading!
2023-03-17 10:33:47,939 - modelscope - INFO - File README.md already in cache, skip downloading!
2023-03-17 10:33:47,939 - modelscope - INFO - File star.jpg already in cache, skip downloading!
2023-03-17 10:33:47,939 - modelscope - INFO - File star.png already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - File synonym.txt already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - File table.json already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - File table1.json already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - File table2.json already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - File table3.json already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - File table4.json already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - File table5.json already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - File vocab.txt already in cache, skip downloading!
2023-03-17 10:33:47,940 - modelscope - INFO - initialize model from /home/xuc/.cache/modelscope/hub/damo/nlp_convai_text2sql_pretrain_cn
Traceback (most recent call last):
  File "train.py", line 39, in <module>
    trainer.train(
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/modelscope/trainers/nlp/table_question_answering_trainer.py", line 501, in train
    self.model.get_bert_output(
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/modelscope/models/nlp/space_T_cn/table_question_answering.py", line 613, in get_bert_output
    all_encoder_layer, pooled_output = model_bert(
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/modelscope/models/nlp/space_T_cn/backbone.py", line 842, in forward
    embedding_output = self.embeddings(
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/modelscope/models/nlp/space_T_cn/backbone.py", line 115, in forward
    words_embeddings = self.word_embeddings(input_ids)
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
    return F.embedding(
  File "/home/xuc/.conda/envs/ms_env/lib/python3.8/site-packages/torch/nn/functional.py", line 2044, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)

【SPACE-T表格问答预训练模型-中文-通用领域-base】训练时使用gpu报错

自然语言处理

相关文章

相关解决方案

热门讨论

热门文章