RuntimeError: Address already in use

简介: RuntimeError: Address already in use

问题描述:Pytorch用多张GPU训练时,会报地址已被占用的错误。其实是端口号冲突了。


20201026092122255.png


因此解决方法要么kill原来的进程,要么修改端口号。


在代码里重新配置


torch.distributed.init_process_group()
    dist_init_method = 'tcp://{master_ip}:{master_port}'.format(master_ip='127.0.0.1', master_port='10000')
    dist_world_size = opt.world_size    #total number of distributed processes.
    torch.distributed.init_process_group(backend="nccl", init_method=dist_init_method, world_size=dist_world_size, rank=[0,1])


每次只要重新修改master_port

目录
相关文章
|
Web App开发 前端开发
【前端异常】Unchecked runtime.lastError: Could not establish connection. Receiving end does not exist.
【前端异常】Unchecked runtime.lastError: Could not establish connection. Receiving end does not exist.
828 0
|
5月前
|
开发者 Python
【Python】已解决:TypeError: __init__() got an unexpected keyword argument ‘port’
【Python】已解决:TypeError: __init__() got an unexpected keyword argument ‘port’
751 0
【Python】已解决:TypeError: __init__() got an unexpected keyword argument ‘port’
|
3月前
|
Go
panic:runtime error:invalid memory address or nil pointer dereference
panic:runtime error:invalid memory address or nil pointer dereference
|
7月前
|
前端开发
单步调试报错 Thread 1: EXC_BAD_ACCESS (code=1, address=0x6565656565)
单步调试报错 Thread 1: EXC_BAD_ACCESS (code=1, address=0x6565656565)
133 0
|
前端开发 安全
| ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lin
| ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lin
765 0
|
机器学习/深度学习 Windows
raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}‘.format( RuntimeError: Error(s)..报错
即load_state_dict(fsd,strict=False) 属性strict;当strict=True,要求预训练练权重层数的键值与新构建的模型中的权重层数名称完全吻合;
1561 0
|
并行计算 PyTorch 算法框架/工具
RuntimeError: CUDA error (10): invalid device ordinal
造成这个错误的原因主要是本地只有一个 GPU (GPU:0),而程序中使用 GPUs:1。
843 0
|
资源调度 JavaScript
The futex facility returned an unexpected error code
The futex facility returned an unexpected error code
838 0
|
区块链
Error: No network specified. Cannot determine current network异常
Error: No network specified. Cannot determine current network异常
158 0