前言
不读mpi4py势必学不会sgd和m/s模型的修改处, 所以先得将修改看明白
使用mpi4py可是不容易, 毕竟很难有多个Ubuntu的环境所以得学会docker 命令
docker volume create: 使用docker volume ls查看 docker volume inspect portainer_data 检视详细信息
$ docker volume create portainer_data
$ docker run --name portainer -d -p 8000:8000 -p 9000:9000 -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer
# docker 管理软件portainer学习, 我就是这么懒2333, 有好刀必须用好刀. xjtuos12
dokcer 开一个Ubuntu还真是空空如也; 那可就深了
apt update; apt install net-tools; ifconfig
apt install python-mpi4py; mkdir mpi4py
cd mpi4py; apt install wget; wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-1.3.tar.gz; tar -xf *
cd mpi4py-1.3/demo; apt install vim; vim machinefile
172.17.0.2
172.17.0.3
172.17.0.4
172.17.0.5
apt install ssh; /etc/init.d/ssh restart
mpirun.openmpi --allow-run-as-root -np 4 -machinefile ./machinefile python helloworld.py
搞了点ip地址: 172.17.0.2, 172.17.0.3, 172.17.0.4, 172.17.0.5
碰到一个新麻烦, docker 与mpi 的合作,docker和mpi的orwel84方案宣告失败;但是新的方法如下:
docker-compose 学习: link连接 scale指定个数
docker container run -it -v ~/usr/mpi4py:/volume --name="worker0" orwel84/ubuntu-16-mpi:latest
172.17.0.10
172.17.0.9
172.17.0.8
172.17.0.7
adduser aibot
su aibot
mpirun.openmpi -np 4 -machinefile ../../machinefile python helloworld.py
container exposes its SHH server to the host system, so you can log into it to start your MPI applications.
docker exec -it mpi4py_mpi_head_1 /bin/bash
cat /etc/hosts | grep mpi_node | awk '{print $1}'| sort -u > machines && cat ./machines
mpiexec -hostfile machines -n 16 python helloworld.py
网络问题层出不穷, 现在的问题是 Get https://registry-1.docker.io/v2/library/ubuntu/manifests/18.04: net/http: TLS handshake timeout
windows能进去吗, 不能 引入windows就是一个错误国内镜像方法 错误代理方法 不抱希望 待会儿再试
pytorch-mpi 镜像始终无法拉取
双保险, 在windows里面进行镜像的下载;然后推到Ubuntu上。
docker container run -it -v ~/usr/mpi4py:/volume --name="worker0" a60d1971920a
mpirun.openmpi --allow-run-as-root -np 4 -machinefile ../../machinefile python helloworld.py
apt update; apt install ssh
cd /etc/apt/; mv sources.list sources.list.backup; cp /volume/sources.list ./
conda install mpi4py
新问题层出不穷
The value of the MCA parameter "plm_rsh_agent" was set to a path
that could not be found:
plm_rsh_agent: ssh : rsh
mpi卡在这里面寸步难行
docker run -v ~/usr/mpi4py:/volume --name="master" -it dispel4py/docker.openmpi:latest /bin/bash
openmpi也不行
docker pull wendo/openmpi 没东西啊, apt和python一个都没有
docker pull wendo/openmpi
docker network create --subnet=192.168.10.0/16 network_my
docker run -it --name node1 -h node1 --net network_my --ip 192.168.10.30 --add-host node2:192.168.10.31 [image id] /bin/bash
docker run -it --name node2 -h node2 --net network_my --ip 192.168.10.31 --add-host node1:192.168.10.30 [image id] /bin/bash
--add-host使用域名解析
apt update
apt install openssh-server
service ssh restart
/etc/init.d/ssh start
nano /etc/ssh/sshd_config
把PermitRootLogin prohibit-password 改成 PermitRootLogin yes
/etc/init.d/ssh restart
ssh-keygen -t rsa
一直敲回车,不要设置密码
node1->node2 公钥文件
scp ~/.ssh/id_rsa.pub root@node2:~/.ssh/1.pub
写入authorized_key
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
cat ~/.ssh/1.pub >> ~/.ssh/authorized_keys
node2->node1 authorized_keys
scp ~/.ssh/authorized_keys root@node1:~/.ssh/
docker commit -m "description" -a "author information" [containterid] name:tag
node1:2
node2:2
在docker之间进行mpi真是太折磨了; 感觉绕了一圈, 其实本质的问题是将ssh里面的permit_xx没有开启导致的. 然后新建user又不是所有的软件都能访问总之问题多多; 但是毕竟是软件的问题都是相对而言比较容易在手头上解决的, 除了头发和脑细胞我什么都没有失去.
no send receive error
HYDU_sock_write (utils/sock/sock.c:256): write error (Bad file descriptor)
docker network create --subnet=192.168.10.0/16 network_my
docker run -v ~/usr/mpi4py:/workspace -it --name node3 -h node3 --net network_my --ip 192.168.10.10 --add-host node4:192.168.10.11 a60d1971920a /bin/bash
mv /etc/apt/sources.list /etc/apt/sources.list.back; cp sources.list /etc/apt/; ls /etc/apt/
apt update
apt install openssh-server vim ; vim /etc/ssh/sshd_config
# 把PermitRootLogin prohibit-password 改成 PermitRootLogin yes
/etc/init.d/ssh start
docker run -v ~/usr/mpi4py:/workspace -it --name node4 -h node4 --net network_my --ip 192.168.10.11 --add-host node3:192.168.10.10 a60d1971920a /bin/bash
mv /etc/apt/sources.list /etc/apt/sources.list.back; cp sources.list /etc/apt/; ls /etc/apt/
apt update
apt install openssh-server vim ; vim /etc/ssh/sshd_config
# 把PermitRootLogin prohibit-password 改成 PermitRootLogin yes
/etc/init.d/ssh start
ssh-keygen -t rsa
vim ~/.ssh/authorized_keys
cat ~/.ssh/*.pub
conda install mpi4py
cd mpi4py-1.3/demo/
mpiexec -f ../../testhost -np 2 python hello.py