RuntimeError: Address already in use

简介: RuntimeError: Address already in use

问题描述:Pytorch用多张GPU训练时,会报地址已被占用的错误。其实是端口号冲突了。


20201026092122255.png


因此解决方法要么kill原来的进程,要么修改端口号。


在代码里重新配置


torch.distributed.init_process_group()
    dist_init_method = 'tcp://{master_ip}:{master_port}'.format(master_ip='127.0.0.1', master_port='10000')
    dist_world_size = opt.world_size    #total number of distributed processes.
    torch.distributed.init_process_group(backend="nccl", init_method=dist_init_method, world_size=dist_world_size, rank=[0,1])


每次只要重新修改master_port

目录
打赏
0
0
0
0
691
分享
相关文章
|
6月前
|
Go
panic:runtime error:invalid memory address or nil pointer dereference
panic:runtime error:invalid memory address or nil pointer dereference
【已解决】RuntimeError: CUDA error: device-side assert triggeredCUDA kernel errors might be asynchronous
【已解决】RuntimeError: CUDA error: device-side assert triggeredCUDA kernel errors might be asynchronous
| ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lin
| ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lin
959 0
成功解决ValueError: Found input variables with inconsistent numbers of samples: [86, 891]
成功解决ValueError: Found input variables with inconsistent numbers of samples: [86, 891]
成功解决TypeError: __init__() got an unexpected keyword argument 'serialized_options'
成功解决TypeError: __init__() got an unexpected keyword argument 'serialized_options'
解决RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cp
对应两种保存模型的方式,pytorch也有两种加载模型的方式。对应第一种保存方式,加载模型时通过torch.load(‘.pth’)直接初始化新的神经网络对象;对应第二种保存方式,需要首先导入对应的网络,再通过net.load_state_dict(torch.load(‘.pth’))完成模型参数的加载。
2289 0
RuntimeError: CUDA error (10): invalid device ordinal
造成这个错误的原因主要是本地只有一个 GPU (GPU:0),而程序中使用 GPUs:1。
1096 0
opencv出错:error: (-213:The function/feature is not implemented) Unknown/unsupported array type
opencv出错:error: (-213:The function/feature is not implemented) Unknown/unsupported array type
514 0
Error: the tx doesn't have the correct nonce.TestRPC/Ganache无法获取nonce
做一个truffle相关的项目,每次尝试创建交易时,总会有以下的一个错误提示: the tx doesn’t have the correct nonce 完整的一般是这样: Error: the tx doesn't have the correct nonce.
2670 0

热门文章

最新文章

AI助理

你好,我是AI助理

可以解答问题、推荐解决方案等