问题现象描述:
某个集群环境每天出现一台机器关机现象,随机发生,经过排查解决问题,为大家提供方便
环境:
集群环境:openstack + ceph 融合集群,版本:Mitaka+jewel
网络环境:网卡10G+bond0(主备模式)
版 本:centos7.3
message 错误日志:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
Aug 30 16:45:14 lxx-4-5 journal: internal error: End of
file
from monitor
Aug 30 16:45:14 lxx-4-5 avahi-daemon[2412]: Withdrawing address record
for
fe80::fc16:3eff:fef3:5076 on vnet5.
Aug 30 16:45:14 lxx-4-5 kernel: vlan206: port 3(vnet5) entered disabled state
Aug 30 16:45:14 lxx-4-5 kvm: 10 guests now active
Aug 30 16:45:14 lxx-4-5 avahi-daemon[2412]: Withdrawing workstation service
for
vnet5.
Aug 30 16:45:14 lxx-4-5 kernel: device vnet5 left promiscuous mode
Aug 30 16:45:14 lxx-4-5 kernel: vlan206: port 3(vnet5) entered disabled state
Aug 30 16:45:14 lxx-4-5 systemd: autolog.service holdoff
time
over, scheduling restart.
Aug 30 16:45:14 lxx-4-5 systemd: Started Autolog.
Aug 30 16:45:14 lxx-4-5 systemd: Starting Autolog...
Aug 30 16:45:14 lxx-4-5 systemd-machined: Machine qemu-22-instance-000002c9 terminated.
Aug 30 16:45:14 lxx-4-5 autolog: Don't have master process.
Aug 30 16:45:15 l22-4-5 journal: End of
file
while
reading data: Input
/output
error
Aug 30 16:45:15 lxx-4-5 systemd: autolog.service holdoff
time
over, scheduling restart.
Aug 30 16:45:15 lxx-4-5 systemd: Started Autolog.
Aug 30 16:45:15 lxx-4-5 systemd: Starting Autolog...
Aug 30 16:45:15 lxx-4-5 autolog: Don't have master process.
Aug 30 16:45:15 lxx-4-5 systemd: autolog.service holdoff
time
over, scheduling restart.
Aug 30 16:45:15 lxx-4-5 systemd: Started Autolog.
Aug 30 16:45:15 lxx-4-5 systemd: Starting Autolog...
|
openstack-compute 关键日志:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
2017
-
08
-
30
16
:
45
:
20.952
110867
DEBUG nova.compute.manager [req-0602316d-944c-42b4-9d3c-7d1b0e513765 - - - - -] [instance: 26f48b2e-f648-42e2-
8133
-7ebc060fd7ae] Updated the network info_cache
for
instance _heal_instance_info_cache /usr/lib/python2.
7
/site-packages/nova/compute/manager.py:
5803
2017
-
08
-
30
16
:
45
:
30.033
110867
DEBUG nova.virt.driver [-] Emitting event <LifecycleEvent:
1504082715.03
, 08330b10-f106-
4737
-b9db-0e45c84abb2e => Stopped> emit_event /usr/lib/python2.
7
/site-packages/nova/virt/driver.py:
1443
2017
-
08
-
30
16
:
45
:
30.034
110867
INFO nova.compute.manager [-] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] VM Stopped (Lifecycle Event)
2017
-
08
-
30
16
:
45
:
30.076
110867
DEBUG nova.compute.manager [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Checking state _get_power_state /usr/lib/python2.
7
/site-packages/nova/compute/manager.py:
1347
2017
-
08
-
30
16
:
45
:
30.079
110867
DEBUG nova.compute.manager [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Synchronizing instance power state after lifecycle event
"Stopped"
; current vm_state: active, current task_state: None, current DB power_state:
1
, VM power_state:
4
handle_lifecycle_event /usr/lib/python2.
7
/site-packages/nova/compute/manager.py:
1276
2017
-
08
-
30
16
:
45
:
30.119
110867
INFO nova.compute.manager [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] During _sync_instance_power_state the DB power_state (
1
) does not match the vm_power_state from the hypervisor (
4
). Updating power_state
in
the DB to match the hypervisor.
2017
-
08
-
30
16
:
45
:
30.177
110867
WARNING nova.compute.manager [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state:
1
, current VM power_state:
4
2017
-
08
-
30
16
:
45
:
30.178
110867
DEBUG nova.compute.api [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Going to
try
to stop instance force_stop /usr/lib/python2.
7
/site-packages/nova/compute/api.py:
1954
2017
-
08
-
30
16
:
45
:
30.267
110867
DEBUG oslo_concurrency.lockutils [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] Lock
"08330b10-f106-4737-b9db-0e45c84abb2e"
acquired by
"nova.compute.manager.do_stop_instance"
:: waited
0
.000s inner /usr/lib/python2.
7
/site-packages/oslo_concurrency/lockutils.py:
270
2017
-
08
-
30
16
:
45
:
30.268
110867
DEBUG nova.compute.manager [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Checking state _get_power_state /usr/lib/python2.
7
/site-packages/nova/compute/manager.py:
1347
2017
-
08
-
30
16
:
45
:
30.270
110867
DEBUG nova.compute.manager [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Stopping instance; current vm_state: active, current task_state: powering-off, current DB power_state:
4
, current VM power_state:
4
do_stop_instance /usr/lib/python2.
7
/site-packages/nova/compute/manager.py:
2545
2017
-
08
-
30
16
:
45
:
30.270
110867
INFO nova.compute.manager [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Instance
is
already powered off
in
the hypervisor when stop
is
called.
2017
-
08
-
30
16
:
45
:
30.271
110867
DEBUG nova.objects.instance [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] Lazy-loading
'metadata'
on Instance uuid 08330b10-f106-
4737
-b9db-0e45c84abb2e obj_load_attr /usr/lib/python2.
7
/site-packages/nova/objects/instance.py:
895
2017
-
08
-
30
16
:
45
:
30.314
110867
INFO nova.virt.libvirt.driver [req-5998b542-495c-41f2-
8010
-7f1c426f0127 - - - - -] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Instance already shutdown.
2017
-
08
-
30
16
:
45
:
30.318
110867
INFO nova.virt.libvirt.driver [-] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] Instance destroyed successfully.
|
关键日志:
1
2
3
|
message : Aug
30
16
:
45
:
15
l22-
4
-
5
journal: End of file
while
reading data: Input/output error
Openstack-compute:
2017
-
08
-
30
16
:
45
:
30.034
110867
INFO nova.compute.manager [-] [instance: 08330b10-f106-
4737
-b9db-0e45c84abb2e] VM Stopped (Lifecycle Event)
|
解决办法:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
升级libvirt 版本:
libvirt-daemon-driver-secret-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-daemon-lxc-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-daemon-driver-lxc-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-python-
2.0
.
0
-
2
.el7.x86_64
libvirt-daemon-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-lock-sanlock-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-daemon-driver-storage-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-gobject-
0.2
.
3
-
1
.el7.x86_64
libvirt-nss-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-daemon-driver-nwfilter-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-gconfig-
0.2
.
3
-
1
.el7.x86_64
libvirt-snmp-
0.0
.
3
-
5
.el7.x86_64
libvirt-daemon-driver-nodedev-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-glib-devel-
0.2
.
3
-
1
.el7.x86_64
libvirt-gobject-devel-
0.2
.
3
-
1
.el7.x86_64
libvirt-java-javadoc-
0.4
.
9
-
4
.el7.noarch
libvirt-daemon-driver-qemu-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-daemon-kvm-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-gconfig-devel-
0.2
.
3
-
1
.el7.x86_64
libvirt-login-shell-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-client-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-daemon-driver-
interface
-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-devel-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-cim-
0.6
.
3
-
19
.el7.x86_64
libvirt-glib-
0.2
.
3
-
1
.el7.x86_64
libvirt-java-devel-
0.4
.
9
-
4
.el7.noarch
libvirt-daemon-driver-network-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-docs-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-daemon-config-nwfilter-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-daemon-config-network-
2.0
.
0
-
10
.el7_3.
9
.x86_64
libvirt-java-
0.4
.
9
-
4
.el7.noarch
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
升级qemu版本
qemu-system-lm32-
2.0
.
0
-
1
.el7.
6
.x86_64
ipxe-roms-qemu-
20160127
-
5
.git6366fa7a.el7.noarch
qemu-system-cris-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-x86-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-kvm-tools-
1.5
.
3
-
126
.el7_3.
10
.x86_64
qemu-system-xtensa-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-arm-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-s390x-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-sh4-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-kvm-common-
1.5
.
3
-
126
.el7_3.
10
.x86_64
qemu-user-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-unicore32-
2.0
.
0
-
1
.el7.
6
.x86_64
libvirt-daemon-driver-qemu-
2.0
.
0
-
10
.el7_3.
9
.x86_64
qemu-guest-agent-
2.5
.
0
-
3
.el7.x86_64
qemu-common-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-or32-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-kvm-
1.5
.
3
-
126
.el7_3.
10
.x86_64
qemu-system-moxie-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-img-
1.5
.
3
-
126
.el7_3.
10
.x86_64
qemu-system-m68k-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-alpha-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-microblaze-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-system-mips-
2.0
.
0
-
1
.el7.
6
.x86_64
qemu-
2.0
.
0
-
1
.el7.
6
.x86_64
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
升级kernel
[root@~]# rpm -qa|grep kernel
kernel-
3.10
.
0
-
514.26
.
2
.el7.x86_64
kernel-tools-libs-
3.10
.
0
-
514.26
.
2
.el7.x86_64
kernel-devel-
3.10
.
0
-
327.36
.
3
.el7.x86_64
kernel-tools-
3.10
.
0
-
514.26
.
2
.el7.x86_64
kernel-devel-
3.10
.
0
-
123
.el7.x86_64
kernel-
3.10
.
0
-
327.36
.
3
.el7.x86_64
abrt-addon-kerneloops-
2.1
.
11
-
45
.el7.centos.x86_64
kernel-
3.10
.
0
-
514.2
.
2
.el7.x86_64
kernel-
3.10
.
0
-
123
.el7.x86_64
kernel-
3.10
.
0
-
327.22
.
2
.el7.x86_64
kernel-devel-
3.10
.
0
-
327.22
.
2
.el7.x86_64
kernel-devel-
3.10
.
0
-
514.26
.
2
.el7.x86_64
kernel-devel-
3.10
.
0
-
514.2
.
2
.el7.x86_64
kernel-headers-
3.10
.
0
-
514.26
.
2
.el7.x86_64
[root@~]# uname -r
3.10
.
0
-
514.26
.
2
.el7.x86_64
|
注意:升级版本之后一定要重启,才能成功,重启服务无效!!!
本文转自 swq499809608 51CTO博客,原文链接:http://blog.51cto.com/swq499809608/1962081