"Checkpoints will be saved to E:\python_workspaces\text_classification\model
2024-03-11 15:14:38,290 - modelscope - INFO - Text logs will be saved to E:\python_workspaces\text_classification\model
Traceback (most recent call last):
File """", line 1, in
File ""D:\Program Files\Python311\Lib\multiprocessing\spawn.py"", line 120, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File ""D:\Program Files\Python311\Lib\multiprocessing\spawn.py"", line 129, in _main
prepare(preparation_data)
File ""D:\Program Files\Python311\Lib\multiprocessing\spawn.py"", line 240, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File ""D:\Program Files\Python311\Lib\multiprocessing\spawn.py"", line 291, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File """", line 291, in run_path
File """", line 98, in _run_module_code
File """", line 88, in _run_code
File ""e:\python_workspaces\text_classification\src\model\train.py"", line 67, in
train_test(
File ""e:\python_workspaces\text_classification\src\model\train.py"", line 63, in train_test
trainer.train()
File ""D:\Program Files\Python311\Lib\site-packages\modelscope\trainers\trainer.py"", line 711, in train
self.train_loop(self.train_dataloader)
File ""D:\Program Files\Python311\Lib\site-packages\modelscope\trainers\trainer.py"", line 1225, in train_loop
for i, data_batch in enumerate(data_loader):
^^^^^^^^^^^^^^^^^^^^^^
File ""C:\Users\Administrator\AppData\Roaming\Python\Python311\site-packages\torch\utils\data\dataloader.py"", line 439, in iter
return self._get_iterator()
^^^^^^^^^^^^^^^^^^^^
File ""C:\Users\Administrator\AppData\Roaming\Python\Python311\site-packages\torch\utils\data\dataloader.py"", line 387, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ""C:\Users\Administrator\AppData\Roaming\Python\Python311\site-packages\torch\utils\data\dataloader.py"", line 1040, in init
w.start()
File ""D:\Program Files\Python311\Lib\multiprocessing\process.py"", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File ""D:\Program Files\Python311\Lib\multiprocessing\context.py"", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ""D:\Program Files\Python311\Lib\multiprocessing\context.py"", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File ""D:\Program Files\Python311\Lib\multiprocessing\popen_spawn_win32.py"", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ""D:\Program Files\Python311\Lib\multiprocessing\spawn.py"", line 158, in get_preparation_data
_check_not_importing_main()
File ""D:\Program Files\Python311\Lib\multiprocessing\spawn.py"", line 138, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The ""freeze_support()"" line can be omitted if the program
is not going to be frozen to produce an executable.
帮忙看一下这个是什么问题,在modelscope训练的时候报出这个问题怎么解决?
文本分类模型:iic/nlp_mt5_zero-shot-augment_chinese-base"
根据您提供的错误信息,问题的核心是 RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.
。这通常发生在使用 multiprocessing
模块时,尤其是在 Windows 系统上运行 Python 脚本时。
以下是问题的详细分析和解决方案:
多进程启动方式的差异:
multiprocessing
使用的是 spawn
启动方式,而不是 Linux 中的 fork
方式。spawn
会重新加载主模块以启动子进程。未正确保护主模块入口:
if __name__ == '__main__':
# 主程序逻辑
ModelScope 的训练器可能涉及多进程操作:
trainer.train()
方法可能内部使用了 multiprocessing
来加速数据加载或模型训练。如果主模块未正确保护,就会触发上述错误。确保您的主模块(即包含 train_test()
或 trainer.train()
调用的脚本)遵循以下结构:
import multiprocessing
if __name__ == '__main__':
# 初始化多进程支持
multiprocessing.freeze_support()
# 调用训练函数
train_test()
if __name__ == '__main__':
是关键,它确保只有在主进程中才会执行训练代码,避免子进程重复加载主模块。multiprocessing.freeze_support()
是可选的,但在某些情况下(例如打包为可执行文件时)可以避免潜在问题。根据错误堆栈,问题出现在 torch.utils.data.dataloader
的 _MultiProcessingDataLoaderIter
部分。这表明数据加载器可能启用了多进程模式(num_workers > 0
)。您可以尝试以下调整:
降低 num_workers
参数:
在创建 DataLoader
时,将 num_workers
设置为 0
,禁用多进程数据加载:
from torch.utils.data import DataLoader
data_loader = DataLoader(dataset, batch_size=32, num_workers=0)
验证数据集路径和格式:
确保您使用的依赖库版本兼容,尤其是 torch
和 modelscope
。可以通过以下命令更新相关库:
pip install --upgrade torch modelscope
如果问题仍然存在,建议添加更多日志以定位问题。例如,在 train_test()
函数中打印关键变量和状态:
def train_test():
print("Starting training...")
trainer.train()
print("Training completed.")
通过以上步骤,您可以解决因多进程启动方式导致的错误。核心在于: 1. 保护主模块入口,确保代码只在主进程中执行。 2. 调整数据加载器配置,避免多进程冲突。 3. 更新依赖库,确保版本兼容性。
如果问题仍未解决,请提供更多上下文信息(例如 train_test()
函数的具体实现),以便进一步分析。您可以复制页面截图提供更多信息,我可以进一步帮您分析问题原因。
ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!欢迎加入技术交流群:微信公众号:魔搭ModelScope社区,钉钉群号:44837352