"ModelScope中,有10张v100的卡,其中前两张被占用,我只能用后面8张卡。我写了代码# 设置 CUDA_VISIBLE_DEVICES 环境变量
os.environ['CUDA_VISIBLE_DEVICES'] = '2,3,4,5,6,7,8,9'
server_process = subprocess.Popen([
'python', '-m', 'vllm.entrypoints.openai.api_server',
'--model', './qwen/Qwen2-72B-Instruct',
'--dtype=half',
'--tensor-parallel-size=8'
]) 但是报错,请问咋解决? torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 232.00 MiB. GPU 0 has a total capacty of 31.74 GiB of which 215.12 MiB is free. Including non-PyTorch memory, this process has 30.24 GiB memory in use. Process 269595 has 436.00 MiB memory in use. Process 269513 has 436.00 MiB memory in use. Process 269211 has 436.00 MiB memory in use. Of the allocated memory 29.78 GiB is allocated by PyTorch, and 13.43 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF "
参考以下链接
https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.md
此回答整理自钉群“魔搭ModelScope开发者联盟群 ①”
ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!欢迎加入技术交流群:微信公众号:魔搭ModelScope社区,钉钉群号:44837352