RuntimeError: Address already in use

简介: RuntimeError: Address already in use

问题描述:Pytorch用多张GPU训练时,会报地址已被占用的错误。其实是端口号冲突了。


20201026092122255.png


因此解决方法要么kill原来的进程,要么修改端口号。


在代码里重新配置


torch.distributed.init_process_group()
    dist_init_method = 'tcp://{master_ip}:{master_port}'.format(master_ip='127.0.0.1', master_port='10000')
    dist_world_size = opt.world_size    #total number of distributed processes.
    torch.distributed.init_process_group(backend="nccl", init_method=dist_init_method, world_size=dist_world_size, rank=[0,1])


每次只要重新修改master_port

目录
相关文章
|
6月前
|
前端开发 安全
| ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lin
| ERROR: [2] bootstrap checks failed. You must address the points described in the following [2] lin
131 0
|
10月前
|
程序员 Go API
|
12月前
|
机器学习/深度学习 Windows
raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}‘.format( RuntimeError: Error(s)..报错
即load_state_dict(fsd,strict=False) 属性strict;当strict=True,要求预训练练权重层数的键值与新构建的模型中的权重层数名称完全吻合;
1178 0
|
12月前
|
并行计算 PyTorch 算法框架/工具
RuntimeError: CUDA error (10): invalid device ordinal
造成这个错误的原因主要是本地只有一个 GPU (GPU:0),而程序中使用 GPUs:1。
295 0
成功解决TypeError: __init__() got an unexpected keyword argument 'serialized_options'
成功解决TypeError: __init__() got an unexpected keyword argument 'serialized_options'
solidity中transfer异常"send" and "transfer" are only available for objects of type address
solidity中transfer异常"send" and "transfer" are only available for objects of type address
412 0
repeated call of attachBrowserEvent
Created by Jerry Wang, last modified on Jun 19, 2015
repeated call of attachBrowserEvent
|
SQL
Remote table-valued function calls are not allowed
在SQL Server中,在链接服务器中调用表值函数(table-valued function)时,会遇到下面错误:   SELECT * FROM LNK_TEST.TEST.DBO.TEST(12)   消息 4122,级别 16,状态 1,第 1 行   Remote table-valued function calls are not allowed.   以前几乎没有在链接服务器(Linked Server)当中调用过表值函数,查了一下资料,看来SQL Server这似乎是不支持的(抑或是不允许)的。
1371 0
Unable to handle kernel NULL pointer dereference at virtual address 00000000问题的解决
今天在编译好内核模块后,安装内核模块memdev.ko的时候,出现了Unable to handle kernel NULL pointer dereference at virtual address 00000000等如图所示的问题:     在百度和google找了很多答案,明显就是跟指针有关系。
3044 0