各位大佬帮看下这个怎么配置,在启动demo72B的提示显存不足,我的配置是两张A5000,总共48G的显存,下载的模型也是int4的
报错信息:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB. GPU 1 has a total capacty of 23.68 GiB of which 2.25 GiB is free. Including non-PyTorch memory, this process has 21.42 GiB memory in use. Of the allocated memory 21.13 GiB is allocated by PyTorch, and 98.39 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
根据您的描述,您的显存不足导致无法启动demo72B。您可以尝试以下方法来解决这个问题:
batch_size
参数。例如:train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
accumulation_steps = 4
optimizer.zero_grad()
for i, (inputs, labels) in enumerate(train_loader):
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
检查是否有其他进程占用了大量显存。您可以使用nvidia-smi
命令查看GPU内存使用情况。如果有其他进程占用了大量显存,您可以考虑关闭它们或者增加系统可用显存。
如果以上方法都无法解决问题,您可以考虑使用具有更多显存的GPU,或者升级您的硬件配置。
ModelScope旨在打造下一代开源的模型即服务共享平台,为泛AI开发者提供灵活、易用、低成本的一站式模型服务产品,让模型应用更简单!欢迎加入技术交流群:微信公众号:魔搭ModelScope社区,钉钉群号:44837352