关于vagrant up的一个终结之谜(Markdown版本)-阿里云开发者社区

背景

使用vagrant管理的机器，在使用vagrant命令的过程中总有一些问题. vagrant up 有时候会拉不起机器。虽然公司的很多闲置机器没有充分利用，但是鉴于对目前的这个问题应该还是要很谨慎的上线核心项目。对虚拟化的技术还需要加强。

此外，对于开源的使用有必要时常关心release log, 看看你的问题是否被修复，可以选择升级来解决。

综述，开源需谨慎，PLAN B 一定要。辨证的来看，流行的开源软件一般都有人维护，大家齐心协力修bug，一般不会有太fatal的bug, 但是不排除所以核心业务一定要有背书，否则你力荐推广，背锅的一定是你. 这时又得辩证的看待这个问题，成功与失败就在一瞬间，如果你力荐推广一个开源技术，然后投入大量经历去研究，当有一定深度(看懂源代码、可以修改源代码、可以开发新feature …) 就可以cover掉开源的劣势。那你就成为大牛啦.

vagrant up 无法连接机器

根据提示，vagrant 好像无法管理这个机器了，无法ssh也ping不通。但是statu 却显示running，感觉好像是vagrant的bug. 这里有个同样的问题

下面是详细的log

/gpmaster/gpseg-1_down/k8s/k8s-node2]#vagrant reload
==> node: Attempting graceful shutdown of VM...
    node: Guest communication could not be established! This is usually because
    node: SSH is not running, the authentication information was changed,
    node: or some other networking issue. Vagrant will force halt, if
    node: capable.
==> node: Forcing shutdown of VM...
==> node: Clearing any previously set forwarded ports...
==> node: Fixed port collision for 22 => 2222. Now on port 2209.
==> node: Clearing any previously set network interfaces...
==> node: Preparing network interfaces based on configuration...
    node: Adapter 1: nat
==> node: Forwarding ports...
    node: 22 (guest) => 2209 (host) (adapter 1)
==> node: Running 'pre-boot' VM customizations...
==> node: Booting VM...
==> node: Waiting for machine to boot. This may take a few minutes...
    node: SSH address: 127.0.0.1:2209
    node: SSH username: vagrant
    node: SSH auth method: private key
Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within
the configured ("config.vm.boot_timeout" value) time period.
If you look above, you should be able to see the error(s) that
Vagrant had when attempting to connect to the machine. These errors
are usually good hints as to what may be wrong.
If you're using a custom box, make sure that networking is properly
working and you're able to connect to the machine. It is a common
problem that networking isn't setup properly in these boxes.
Verify that authentication configurations are also setup properly,
as well.
If the box appears to be booting properly, you may want to increase
the timeout ("config.vm.boot_timeout") value.

vagrant status查看状态

 /gpmaster/gpseg-1_down/k8s/k8s-node2]#vagrant status
Current machine states:
node                      running (virtualbox)
The VM is running. To stop this VM, you can run `vagrant halt` to
shut it down forcefully, or you can run `vagrant suspend` to simply
suspend the virtual machine. In either case, to restart it again,
simply run `vagrant up`.
[root@P1WQMSMDW02 /gpmaster/gpseg-1_down/k8s/k8s-node2]#vagrant halt
==> node: Attempting graceful shutdown of VM...
    node: Guest communication could not be established! This is usually because
    node: SSH is not running, the authentication information was changed,
    node: or some other networking issue. Vagrant will force halt, if
    node: capable.
==> node: Forcing shutdown of VM...

再次重现vagrant up无法拉起机器

2022年11月20日22:04:16

这次的情况是这样的，升级了OS 的内核，然后直接在虚拟机里面执行reboot

执行的操作如下：

#默认启动的顺序是从0开始，新内核是从头插入（目前位置在0，而4.4.4的是在1），所以需要选择0。
grub2-set-default 0  
#重启并检查
reboot

然后就起不来了…

尝试用virtualbox的命令行工具启停虚拟机

0、ping主机

无法ping通.

1、VBoxManage list runningvms l 列出正在运行的vm

[root@p1edaspk03 /nfs/k8s-node]#ps -ef | grep -i virtu
root      50698 132091  1 21:51 ?        00:00:14 /usr/lib/virtualbox/VBoxHeadless --comment master1-186 --startvm e480abec-465f-4826-8cc6-a2270a4f3878 --vrde config
root      66072  44455  0 22:09 pts/2    00:00:00 grep --color=auto -i virtu
root     132084      1  0 Nov10 ?        00:00:07 /usr/lib/virtualbox/VBoxXPCOMIPCD
root     132091      1  0 Nov10 ?        00:01:44 /usr/lib/virtualbox/VBoxSVC --auto-shutdown
You have mail in /var/spool/mail/root
[root@p1edaspk03 /nfs/k8s-node]#
You have mail in /var/spool/mail/root
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#VBoxManage list runningvms
"master1-186" {e480abec-465f-4826-8cc6-a2270a4f3878}
[root@p1edaspk03 /nfs/k8s-node]#

2、VBoxManage controlvm 【第一步获取的uuid】 poweroff

[root@p1edaspk03 /nfs/k8s-node]#VBoxManage controlvm e480abec-465f-4826-8cc6-a2270a4f3878 poweroff
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#VBoxManage list runningvms^C
[root@p1edaspk03 /nfs/k8s-node]#

3、查看vm是否存在

[root@p1edaspk03 /nfs/k8s-node]#ps -ef | grep -i virtu
root      67239  44455  0 22:11 pts/2    00:00:00 grep --color=auto -i virtu
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#vagrant global-status
id       name      provider   state   directory
--------------------------------------------------------------------------
8162a08  node1-186 virtualbox running /nfs/k8s-node

4、vagrant查看vm状态

出现了一个矛盾的情况, 两个命令结果不一样，不过人家也解释了可能存在cache.

使用如下命令清除cache

vagrant global-status --prune

详细的日志.

[root@p1edaspk03 /nfs/k8s-node]#vagrant global-status
id       name      provider   state   directory
--------------------------------------------------------------------------
8162a08  node1-186 virtualbox running /nfs/k8s-node
The above shows information about all known Vagrant environments
on this machine. This data is cached and may not be completely
up-to-date (use "vagrant global-status --prune" to prune invalid
entries). To interact with any of the machines, you can go to that
directory and run Vagrant, or you can use the ID directly with
Vagrant commands from any directory. For example:
"vagrant destroy 1a2b3c4d"
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#
[root@p1edaspk03 /nfs/k8s-node]#vagrant status
Current machine states:
node1-186                 poweroff (virtualbox)

vagrant halt 停vm

]#vagrant halt
==> node1-186: Attempting graceful shutdown of VM...
    node1-186: Guest communication could not be established! This is usually because
    node1-186: SSH is not running, the authentication information was changed,
    node1-186: or some other networking issue. Vagrant will force halt, if
    node1-186: capable.
==> node1-186: Forcing shutdown of VM...
You have mail in /var/spool/mail/root

vagrant up 启动vm

e]#vagrant ssh
ssh_exchange_identification: read: Connection reset by peer

目前的解决方案

删掉重新拉起一个，所以重要的vm要做好备份,

vagrant destroy -f

如何做快照？

正常的vagrant up日志

]#time bash startk8s.sh
==> node1-186: Attempting graceful shutdown of VM...
==> node1-186: Clearing any previously set forwarded ports...
==> node1-186: Clearing any previously set network interfaces...
==> node1-186: Preparing network interfaces based on configuration...
    node1-186: Adapter 1: nat
    node1-186: Adapter 2: bridged
==> node1-186: Forwarding ports...
    node1-186: 22 (guest) => 2222 (host) (adapter 1)
==> node1-186: Running 'pre-boot' VM customizations...
==> node1-186: Booting VM...
==> node1-186: Waiting for machine to boot. This may take a few minutes...
    node1-186: SSH address: 127.0.0.1:2222
    node1-186: SSH username: vagrant
    node1-186: SSH auth method: private key
==> node1-186: Machine booted and ready!
==> node1-186: Checking for guest additions in VM...
    node1-186: No guest additions were detected on the base box for this VM! Guest
    node1-186: additions are required for forwarded ports, shared folders, host only
    node1-186: networking, and more. If SSH fails on this machine, please install
    node1-186: the guest additions and repackage the box to continue.
    node1-186:
    node1-186: This is not an error message; everything may continue to work properly,
    node1-186: in which case you may ignore this message.
==> node1-186: Setting hostname...
==> node1-186: Configuring and enabling network interfaces...
==> node1-186: Rsyncing folder: /nfs/k8s-node/k8s-master/ => /vagrant
==> node1-186: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> node1-186: flag to force provisioning. Provisioners marked to run always will still run.

Machine booted and ready 代表机器起来了.

重现

在vagrant设置磁盘大小中竟然重写了此问题。

我的操作:

step 1 fdisk 分区

step 2 挂载分区

step 3 设置开启自动挂载

重启失败。关键的问题在于重新做一遍就可以启动，这才是最S的操作.

继续重现

–update 2022年12月26日10:14:44

这次又重现了这个问题。这一次想趁机搞懂这个问题到底什么导致的

开debug模式

vagrant up --debug &> timeout.debug

网上对这个问题的讨论也有，一般认为是virtualbox的bug，建议升级virtualbox。我也照做。升级到6.1 目前最高版本。并没有解决

查看virtualbox的日志

查看官方forum

这篇forum说是宿主机没开VT-x

lscpu查看是开启的状态.

什么是VT-x？

VT-x（虚拟化扩展）是一项硬件功能，可以在计算机的中央处理器（CPU）上提供虚拟化支持。它允许在一台计算机上运行多个独立的操作系统，每个操作系统在虚拟机中运行，而不是在物理机上运行。这使得可以在同一台计算机上并行运行多个操作系统，而无需使用多台计算机。

VT-x 最初由 Intel 开发，现在已被广泛采用。它是虚拟化技术中常用的一种，其他虚拟化技术包括 AMD 的 AMD-V 和 ARM 的 Virtualization Extension（VE）。

使用 VT-x 需要满足以下条件：

计算机的 CPU 支持 VT-x。
计算机的 BIOS 或 UEFI 已启用 VT-x。
计算机的操作系统支持 VT-x。

如果上述条件都满足，就可以使用 VT-x 来安装和运行虚拟机。

这篇forum说是kvm可能会影响

事实上，我确实这么做了，在宿主机上安装了kvm。

问题应该就是出在这里。因为之前使用virtualbox启动虚拟机都是没有问题的。

解决

1、卸载kvm相关组件

yum remove qemu-kvm qemu-img  virt-manager libvirt libvirt-python virt-manager  libvirt-client virt-install virt-viewer -y

2、重新拉起virtualbox

vagrant up启动成功

参考

【1】vagrant up问题

【2】 vagrant up的issue

官方的解释说这是第三方插件的bug.他不管这个。

关于vagrant up的一个终结之谜(Markdown版本)

背景

vagrant up 无法连接机器

vagrant status查看状态

再次重现vagrant up无法拉起机器

尝试用virtualbox的命令行工具启停虚拟机

0、ping主机

1、VBoxManage list runningvms l 列出正在运行的vm

2、VBoxManage controlvm 【第一步获取的uuid】 poweroff

3、查看vm是否存在

4、vagrant查看vm状态

vagrant halt 停vm

vagrant up 启动vm

目前的解决方案

正常的vagrant up日志

重现

继续重现

查看virtualbox的日志

查看官方forum

什么是VT-x？

解决

参考

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

关于vagrant up的一个终结之谜(Markdown版本)

背景

vagrant up 无法连接机器

vagrant status查看状态

再次重现vagrant up无法拉起机器

尝试用virtualbox的命令行工具启停虚拟机

0、ping主机

1、VBoxManage list runningvms l 列出正在运行的vm

2、VBoxManage controlvm 【第一步获取的uuid】 poweroff

3、查看vm是否存在

4、vagrant查看vm状态

vagrant halt 停vm

vagrant up 启动vm

目前的解决方案

正常的vagrant up日志

重现

继续重现

查看virtualbox的日志

查看官方forum

什么是VT-x？

解决

参考

热门文章

最新文章

相关课程

相关电子书