openstack 云主机关机:ournal: End of file while reading data: Input/output error

简介:

问题现象描述:

    某个集群环境每天出现一台机器关机现象,随机发生,经过排查解决问题,为大家提供方便

    

环境:

        集群环境:openstack + ceph 融合集群,版本:Mitaka+jewel 

        网络环境:网卡10G+bond0(主备模式)

        版      本:centos7.3 


message  错误日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Aug 30 16:45:14 lxx-4-5 journal: internal error: End of  file  from monitor
Aug 30 16:45:14 lxx-4-5 avahi-daemon[2412]: Withdrawing address record  for  fe80::fc16:3eff:fef3:5076 on vnet5.
Aug 30 16:45:14 lxx-4-5 kernel: vlan206: port 3(vnet5) entered disabled state
Aug 30 16:45:14 lxx-4-5 kvm: 10 guests now active
Aug 30 16:45:14 lxx-4-5 avahi-daemon[2412]: Withdrawing workstation service  for  vnet5.
Aug 30 16:45:14 lxx-4-5 kernel: device vnet5 left promiscuous mode
Aug 30 16:45:14 lxx-4-5 kernel: vlan206: port 3(vnet5) entered disabled state
Aug 30 16:45:14 lxx-4-5 systemd: autolog.service holdoff  time  over, scheduling restart.
Aug 30 16:45:14 lxx-4-5 systemd: Started Autolog.
Aug 30 16:45:14 lxx-4-5 systemd: Starting Autolog...
Aug 30 16:45:14 lxx-4-5 systemd-machined: Machine qemu-22-instance-000002c9 terminated.
Aug 30 16:45:14 lxx-4-5 autolog: Don't have master process.
Aug 30 16:45:15 l22-4-5 journal: End of  file  while  reading data: Input /output  error
Aug 30 16:45:15 lxx-4-5 systemd: autolog.service holdoff  time  over, scheduling restart.
Aug 30 16:45:15 lxx-4-5 systemd: Started Autolog.
Aug 30 16:45:15 lxx-4-5 systemd: Starting Autolog...
Aug 30 16:45:15 lxx-4-5 autolog: Don't have master process.
Aug 30 16:45:15 lxx-4-5 systemd: autolog.service holdoff  time  over, scheduling restart.
Aug 30 16:45:15 lxx-4-5 systemd: Started Autolog.
Aug 30 16:45:15 lxx-4-5 systemd: Starting Autolog...


openstack-compute 关键日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
2017 - 08 - 30  16 : 45 : 20.952  110867  DEBUG nova.compute.manager [req-0602316d-944c-42b4-9d3c-7d1b0e513765 - - - - -] [instance: 26f48b2e-f648-42e2- 8133 -7ebc060fd7ae] Updated the network info_cache  for  instance _heal_instance_info_cache /usr/lib/python2. 7 /site-packages/nova/compute/manager.py: 5803
2017 - 08 - 30  16 : 45 : 30.033  110867  DEBUG nova.virt.driver [-] Emitting event <LifecycleEvent:  1504082715.03 , 08330b10-f106- 4737 -b9db-0e45c84abb2e => Stopped> emit_event /usr/lib/python2. 7 /site-packages/nova/virt/driver.py: 1443
2017 - 08 - 30  16 : 45 : 30.034  110867  INFO nova.compute.manager [-] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] VM Stopped (Lifecycle Event)
2017 - 08 - 30  16 : 45 : 30.076  110867  DEBUG nova.compute.manager [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Checking state _get_power_state /usr/lib/python2. 7 /site-packages/nova/compute/manager.py: 1347
2017 - 08 - 30  16 : 45 : 30.079  110867  DEBUG nova.compute.manager [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Synchronizing instance power state after lifecycle event  "Stopped" ; current vm_state: active, current task_state: None, current DB power_state:  1 , VM power_state:  4  handle_lifecycle_event /usr/lib/python2. 7 /site-packages/nova/compute/manager.py: 1276
2017 - 08 - 30  16 : 45 : 30.119  110867  INFO nova.compute.manager [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] During _sync_instance_power_state the DB power_state ( 1 ) does not match the vm_power_state from the hypervisor ( 4 ). Updating power_state  in  the DB to match the hypervisor.
2017 - 08 - 30  16 : 45 : 30.177  110867  WARNING nova.compute.manager [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state:  1 , current VM power_state:  4
2017 - 08 - 30  16 : 45 : 30.178  110867  DEBUG nova.compute.api [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Going to  try  to stop instance force_stop /usr/lib/python2. 7 /site-packages/nova/compute/api.py: 1954
2017 - 08 - 30  16 : 45 : 30.267  110867  DEBUG oslo_concurrency.lockutils [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] Lock  "08330b10-f106-4737-b9db-0e45c84abb2e"  acquired by  "nova.compute.manager.do_stop_instance"  :: waited  0 .000s inner /usr/lib/python2. 7 /site-packages/oslo_concurrency/lockutils.py: 270
2017 - 08 - 30  16 : 45 : 30.268  110867  DEBUG nova.compute.manager [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Checking state _get_power_state /usr/lib/python2. 7 /site-packages/nova/compute/manager.py: 1347
2017 - 08 - 30  16 : 45 : 30.270  110867  DEBUG nova.compute.manager [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Stopping instance; current vm_state: active, current task_state: powering-off, current DB power_state:  4 , current VM power_state:  4  do_stop_instance /usr/lib/python2. 7 /site-packages/nova/compute/manager.py: 2545
2017 - 08 - 30  16 : 45 : 30.270  110867  INFO nova.compute.manager [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Instance  is  already powered off  in  the hypervisor when stop  is  called.
2017 - 08 - 30  16 : 45 : 30.271  110867  DEBUG nova.objects.instance [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] Lazy-loading  'metadata'  on Instance uuid 08330b10-f106- 4737 -b9db-0e45c84abb2e obj_load_attr /usr/lib/python2. 7 /site-packages/nova/objects/instance.py: 895
2017 - 08 - 30  16 : 45 : 30.314  110867  INFO nova.virt.libvirt.driver [req-5998b542-495c-41f2- 8010 -7f1c426f0127 - - - - -] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Instance already shutdown.
2017 - 08 - 30  16 : 45 : 30.318  110867  INFO nova.virt.libvirt.driver [-] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] Instance destroyed successfully.


关键日志:

1
2
3
message : Aug  30  16 : 45 : 15  l22- 4 - 5  journal: End of file  while  reading data: Input/output error
 
Openstack-compute:  2017 - 08 - 30  16 : 45 : 30.034  110867  INFO nova.compute.manager [-] [instance: 08330b10-f106- 4737 -b9db-0e45c84abb2e] VM Stopped (Lifecycle Event)


解决办法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
升级libvirt 版本:
libvirt-daemon-driver-secret- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-daemon-lxc- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-daemon-driver-lxc- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-python- 2.0 . 0 - 2 .el7.x86_64
libvirt-daemon- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-lock-sanlock- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-daemon-driver-storage- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-gobject- 0.2 . 3 - 1 .el7.x86_64
libvirt-nss- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-daemon-driver-nwfilter- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-gconfig- 0.2 . 3 - 1 .el7.x86_64
libvirt-snmp- 0.0 . 3 - 5 .el7.x86_64
libvirt-daemon-driver-nodedev- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-glib-devel- 0.2 . 3 - 1 .el7.x86_64
libvirt-gobject-devel- 0.2 . 3 - 1 .el7.x86_64
libvirt-java-javadoc- 0.4 . 9 - 4 .el7.noarch
libvirt-daemon-driver-qemu- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-daemon-kvm- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-gconfig-devel- 0.2 . 3 - 1 .el7.x86_64
libvirt-login-shell- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-client- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-daemon-driver- interface - 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-devel- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-cim- 0.6 . 3 - 19 .el7.x86_64
libvirt-glib- 0.2 . 3 - 1 .el7.x86_64
libvirt-java-devel- 0.4 . 9 - 4 .el7.noarch
libvirt-daemon-driver-network- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-docs- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-daemon-config-nwfilter- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-daemon-config-network- 2.0 . 0 - 10 .el7_3. 9 .x86_64
libvirt-java- 0.4 . 9 - 4 .el7.noarch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
升级qemu版本
qemu-system-lm32- 2.0 . 0 - 1 .el7. 6 .x86_64
ipxe-roms-qemu- 20160127 - 5 .git6366fa7a.el7.noarch
qemu-system-cris- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-x86- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-kvm-tools- 1.5 . 3 - 126 .el7_3. 10 .x86_64
qemu-system-xtensa- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-arm- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-s390x- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-sh4- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-kvm-common- 1.5 . 3 - 126 .el7_3. 10 .x86_64
qemu-user- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-unicore32- 2.0 . 0 - 1 .el7. 6 .x86_64
libvirt-daemon-driver-qemu- 2.0 . 0 - 10 .el7_3. 9 .x86_64
qemu-guest-agent- 2.5 . 0 - 3 .el7.x86_64
qemu-common- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-or32- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-kvm- 1.5 . 3 - 126 .el7_3. 10 .x86_64
qemu-system-moxie- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-img- 1.5 . 3 - 126 .el7_3. 10 .x86_64
qemu-system-m68k- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-alpha- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-microblaze- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu-system-mips- 2.0 . 0 - 1 .el7. 6 .x86_64
qemu- 2.0 . 0 - 1 .el7. 6 .x86_64
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
升级kernel 
[root@~]# rpm -qa|grep kernel
kernel- 3.10 . 0 - 514.26 . 2 .el7.x86_64
kernel-tools-libs- 3.10 . 0 - 514.26 . 2 .el7.x86_64
kernel-devel- 3.10 . 0 - 327.36 . 3 .el7.x86_64
kernel-tools- 3.10 . 0 - 514.26 . 2 .el7.x86_64
kernel-devel- 3.10 . 0 - 123 .el7.x86_64
kernel- 3.10 . 0 - 327.36 . 3 .el7.x86_64
abrt-addon-kerneloops- 2.1 . 11 - 45 .el7.centos.x86_64
kernel- 3.10 . 0 - 514.2 . 2 .el7.x86_64
kernel- 3.10 . 0 - 123 .el7.x86_64
kernel- 3.10 . 0 - 327.22 . 2 .el7.x86_64
kernel-devel- 3.10 . 0 - 327.22 . 2 .el7.x86_64
kernel-devel- 3.10 . 0 - 514.26 . 2 .el7.x86_64
kernel-devel- 3.10 . 0 - 514.2 . 2 .el7.x86_64
kernel-headers- 3.10 . 0 - 514.26 . 2 .el7.x86_64
[root@~]# uname  -r
3.10 . 0 - 514.26 . 2 .el7.x86_64


注意:升级版本之后一定要重启,才能成功,重启服务无效!!!


本文转自 swq499809608 51CTO博客,原文链接:http://blog.51cto.com/swq499809608/1962081


相关文章
|
2月前
|
网络安全
openstack 使用ssh远程管理云主机
在阿里云平台上,为云主机分配和配置浮动IP涉及以下步骤:首先,在“网络”部分分配一个公共IP,并将其关联到已创建的云主机。接着,在“浮动IP”页面确认绑定成功。然后,进入安全组,为默认安全组添加允许ping和SSH的新规则。通过控制台ping浮动IP以测试连通性。最后,从宿主机修改云主机的hostname并使用SSH登录。至此,SSH登录设置完成。
96 2
openstack 使用ssh远程管理云主机
|
2月前
|
Linux 数据安全/隐私保护
openstack 上创建云主机
该内容是关于使用OpenStack创建云实例的步骤指南。首先,提供了CentOS 7的镜像源,并建议用户自行封装qcow2格式镜像。接着,展示了通过`cat keystonerc_admin`获取OpenStack的管理员用户名和密码。然后,详细描述了在OpenStack界面中创建网络、子网和路由的过程,以连接到外部网络。最后,指导用户上传qcow2镜像,创建并配置实例,包括选择镜像、实例类型和网络,最终等待实例创建完成。
213 1
openstack 上创建云主机
|
存储
openstack之云主机备份还原
openstack之云主机备份还原
77 0
在openstack云平台中,使用命令行创建云主机操作步骤
在openstack云平台中,使用命令行创建云主机操作步骤
521 0
在openstack云平台中,使用命令行创建云主机操作步骤
在OpenStack云平台上手动迁移云主机操作步骤
在OpenStack云平台上手动迁移云主机操作步骤
167 0
在OpenStack云平台上手动迁移云主机操作步骤