notebook GPU模式,首次运行实例代码“介绍清华大学”,能成功返回,之后写了个python,让批量生成,一跑就出错,提示GPU内存不够。
Traceback (most recent call last):
File "glm.py", line 63, in
main()
File "glm.py", line 60, in main
generate_and_save_articles(model, input_file, output_dir)
File "glm.py", line 23, in generate_and_save_articles
article = generate_article(model, keyword)
File "glm.py", line 9, in generate_article
result = pipe(inputs)
File "/opt/conda/lib/python3.8/site-packages/modelscope/pipelines/base.py", line 219, in call
output = self._process_single(input, args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/modelscope/pipelines/base.py", line 254, in _process_single
out = self.forward(out, forward_params)
File "/opt/conda/lib/python3.8/site-packages/modelscope/pipelines/nlp/text_generation_pipeline.py", line 274, in forward
return self.model.chat(inputs, self.tokenizer)
File "/opt/conda/lib/python3.8/site-packages/modelscope/models/nlp/chatglm2/text_generation.py", line 1432, in chat
response, history = self._chat(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/modelscope/models/nlp/chatglm2/text_generation.py", line 1204, in _chat
outputs = self.generate(inputs, gen_kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 1572, in generate
return self.sample(
File "/opt/conda/lib/python3.8/site-packages/transformers/generation/utils.py", line 2619, in sample
outputs = self(
File "/opt/conda/lib/python3.8/site-packages/modelscope/models/base/base_torch_model.py", line 36, in call
return self.postprocess(self.forward(args, **kwargs))
File "/opt/conda/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/modelscope/models/nlp/chatglm2/text_generation.py", line 1094, in forward
lm_logits = self.transformer.output_layer(hidden_states)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/opt/conda/lib/python3.8/site-packages/accelerate/hooks.py", line 160, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, args, *kwargs)
File "/opt/conda/lib/python3.8/site-packages/accelerate/hooks.py", line 286, in pre_forward
set_module_tensor_to_device(
File "/opt/conda/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 298, in set_module_tensor_to_device
new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 15.90 GiB total capacity; 2.04 GiB already allocated; 494.81 MiB free; 2.05 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
chatglm2-6b 是一个比较大的模型,需要比较大的内存。如果您使用 notebook GPU 模式,可能无法运行 chatglm2-6b。
您可以尝试以下方法:
使用更大的 GPU。
在本地环境中运行 chatglm2-6b。
使用更少的 epochs。
使用更小的 batch size。
如果您仍然无法运行 chatglm2-6b,您可以尝试联系 modelscope 社区寻求帮助。
ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!欢迎加入技术交流群:微信公众号:魔搭ModelScope社区,钉钉群号:44837352