系统环境:
操作系统: AIX 5300-08
前两天在做AIX系统运维时,客户遇到以下的案例:
错误现象:
1、查看rootvg时,一个PV missing
[root@aix199 /]#lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 524 0 00..00..00..00..00
0516-304 : Unable to find device id 000681aa29d5ceff in the Device
Configuration Database.
000681aa29d5ceff missing 480 184 70..00..00..18..96
[root@aix199 /]#
2、在/home下创建目录,出现I/O错误(创建普通文件可以)
[root@aix199 /]#mkdir /home/aaa
mkdir: cannot create /home/aaa.
/home/aaa: I/O error
[root@aix199 /]#df -m
1
2
3
4
5
6
7
8
9
|
Filesystem MB blocks Free %Used Iused %Iused Mounted
on
/dev/hd4
10880.00
10012.31
8
%
5890
1
% /
/dev/hd2
10560.00
8354.95
21
%
45691
2
% /usr
/dev/hd9var
5120.00
4769.67
7
%
1145
1
% /
var
/dev/hd3
9856.00
9404.13
5
%
440
1
% /tmp
/dev/hd10opt
5120.00
4693.67
9
%
4800
1
% /opt
/dev/lv00
5120.00
4959.23
4
%
18
1
% /
var
/adm/csd
/dev/lv_soft
9600.00
8029.12
17
%
2522
1
% /soft
/dev/hd1
5120.00
4877.34
5
%
213
1
% /home
|
--不是空间不足的原因
初步分析,应该是在rootvg下有两个PV,而一个PV被错误的删除后,导致出现以上错误!
解决方法:
1、正常删除丢失rootvg的PV
[root@aix199 /]#reducevg rootvg 000681aa29d5ceff
1
2
3
4
|
0516
-016
ldeletepv: Cannot
delete
physical volume
with
allocated
partitions. Use either migratepv to move the partitions
or
reducevg
with
the -d option to
delete
the partitions.
0516
-884
reducevg: Unable to remove physical volume 000681aa29d5ceff.
|
[root@aix199 /]#reducevg -d rootvg 000681aa29d5ceff
1
2
3
4
5
6
7
8
9
10
11
|
0516
-
914
rmlv: Warning, all data belonging to logical volume
lv00 on physical volume 000681aa29d5ceff will be destroyed.
rmlv: Do you wish to
continue
? y(es) n(o)?
y
0516
-
1008
rmlv: Logical volume lv00 must be closed. If the logical volume
contains a filesystem, the umount command will close the LV device.
0516
-
1008
rmlv: Logical volume hd9var must be closed. If the logical volume
contains a filesystem, the umount command will close the LV device.
0516
-
1008
rmlv: Logical volume hd10opt must be closed. If the logical volume
contains a filesystem, the umount command will close the LV device.
0516
-
884
reducevg: Unable to remove physical volume 000681aa29d5ceff.
|
在rootvg中一部分LV的PP是在丢失的PV中分配的,其中包括hd9var 、hd10opt的逻辑卷;如果不清除这些PP的信息,将无法删除丢失的PV.
[root@aix199 /]#lsvg -l rootvg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot
1
1
1
closed/syncd N/A
hd6 paging
8
8
1
open/syncd N/A
hd8 jfslog
1
1
1
open/syncd N/A
hd4 jfs
170
170
1
open/syncd /
hd2 jfs
165
165
1
open/syncd /usr
0516
-
1147
: Warning - logical volume hd9var may be partially mirrored.
hd9var jfs
80
81
3
open/stale /var
hd3 jfs
154
154
1
open/syncd /tmp
hd1 jfs
80
80
2
open/syncd /home
hd10opt jfs
80
80
2
open/syncd /opt
lv00 jfs
80
80
2
open/syncd /var/adm/csd
|
[root@aix199 /]#
2、查看hd9var上PP的分配信息
[root@aix199 /]#lslv -m hd9var
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
hd9var:/
var
LP PP1 PV1 PP2 PV2 PP3 PV3
0001
0215
hdisk0
0002
0504
hdisk0
0003
0509
hdisk0
0004
0510
hdisk0
0005
0513
hdisk0
0006
0514
hdisk0
0007
0521
hdisk0
0008
0522
hdisk0
0516
-304
: Unable to find device id 000681aa29d5ceff
in
the Device
Configuration Database.
0009
0217
000681aa29d5ceff
0516
-304
: Unable to find device id 000681aa29d5ceff
in
the Device
Configuration Database.
0010
0218
000681aa29d5ceff
......
|
从以上可以看出,在hdisk0上,hd9var 总共分配了8个PP,剩下的都在丢失的PV上分配
3、将PP分配信息写入到一个临时文件中
[root@aix199 /]#lquerylv -L `getlvodm -l hd9var` -r >/tmp/mapfile
注意:其中使用的是倒引号
查看PP分配表,并修改
[root@aix199 /]#cat /tmp/mapfile
1
2
3
4
5
6
7
8
|
0009affa94970f34
215
1
0009affa94970f34
504
2
0009affa94970f34
509
3
0009affa94970f34
510
4
......
000681aa29d5ceff
286
78
000681aa29d5ceff
287
79
000681aa29d5ceff
288
80
|
[root@aix199 /]#
注:总共80个PP
在文件中保留要删除的PP(前8个PP在hdisk0):
[root@aix199 /]#cat /tmp/mapfile
1
2
3
4
5
|
000681aa29d5ceff
217
9
000681aa29d5ceff
218
10
......
000681aa29d5ceff
287
79
000681aa29d5ceff
288
80
|
[root@aix199 /]#
4、删除在丢失PV上分配的PP
[root@aix199 /]#wc -l /tmp/mapfile
73 /tmp/mapfile
[root@aix199 /]#lreducelv -l `getlvodm -l hd9var` -s 73 /tmp/mapfile
[root@aix199 /]#
5、查看hd9var的信息
[root@aix199 /]#lslv hd9var
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
LOGICAL VOLUME: hd9var VOLUME GROUP: rootvg
LV IDENTIFIER: 0008570c00004c0000000144684ecb4c
.6
PERMISSION: read/write
VG STATE: active/complete LV STATE: opened/syncd
TYPE: jfs WRITE VERIFY: off
MAX LPs:
512
PP SIZE:
64
megabyte(s)
COPIES:
1
SCHED POLICY: parallel
LPs:
8
PPs:
8
STALE PPs:
0
BB POLICY: relocatable
INTER-POLICY: minimum RELOCATABLE: yes
INTRA-POLICY: center UPPER BOUND:
32
MOUNT POINT: /
var
LABEL: /
var
MIRROR WRITE CONSISTENCY:
on
/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?: NO
|
[root@aix199 /]#getlvcb -AT hd9var
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
AIX LVCB
intrapolicy = c
copies =
1
interpolicy = m
lvid = 0008570c00004c0000000144684ecb4c
.6
lvname = hd9var
label = /
var
machine id = 8570C4C00
number lps =
8
relocatable = y
strict = y
stripe width =
0
stripe size
in
exponent =
0
type = jfs
upperbound =
32
fs =
time created = Tue Feb
25
09
:
10
:
14
2014
time modified = Thu Mar
6
12
:
21
:
22
2014
|
保存配置信息:
[root@aix199 /]#savebase
[root@aix199 /]#lsvg -l rootvg
1
2
3
4
5
6
7
8
9
10
11
12
|
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot
1
1
1
closed/syncd N/A
hd6 paging
8
8
1
open/syncd N/A
hd8 jfslog
1
1
1
open/syncd N/A
hd4 jfs
170
170
1
open/syncd /
hd2 jfs
165
165
1
open/syncd /usr
hd9var jfs
8
8
1
open/syncd /var
hd3 jfs
154
154
1
open/syncd /tmp
hd1 jfs
80
80
2
closed/syncd /home
hd10opt jfs
80
80
2
open/syncd /opt
lv00 jfs
3
3
1
closed/syncd /var/adm/csd
|
用同样方法处理hd10opt:
[root@aix199 /]#lslv -m hd10opt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
hd10opt:/opt
LP PP1 PV1 PP2 PV2 PP3 PV3
0001
0218
hdisk0
0002
0229
hdisk0
0003
0505
hdisk0
0004
0506
hdisk0
0005
0507
hdisk0
0006
0511
hdisk0
0007
0512
hdisk0
0008
0517
hdisk0
0009
0518
hdisk0
0010
0523
hdisk0
0011
0524
hdisk0
0516
-304
: Unable to find device id 000681aa29d5ceff
in
the Device
Configuration Database.
0012
0193
000681aa29d5ceff
0516
-304
: Unable to find device id 000681aa29d5ceff
in
the Device
Configuration Database.
0013
0194
000681aa29d5ceff
0516
-304
: Unable to find device id 000681aa29d5ceff
in
the Device
Configuration Database.
0014
0195
000681aa29d5ceff
......
|
[root@aix199 /]#cat /tmp/mapfile
1
2
3
4
5
6
7
|
0009affa94970f34
218
1
0009affa94970f34
229
2
0009affa94970f34
505
3
......
000681aa29d5ceff
190
78
000681aa29d5ceff
191
79
000681aa29d5ceff
192
80
|
[root@aix199 /]#
在临时文件中保留要删除的PP:
[root@aix199 /]#cat /tmp/mapfile
1
2
3
4
5
6
7
|
000681aa29d5ceff
193
12
000681aa29d5ceff
194
13
000681aa29d5ceff
195
14
......
000681aa29d5ceff
190
78
000681aa29d5ceff
191
79
000681aa29d5ceff
192
80
|
[root@aix199 /]#wc -l /tmp/mapfile
69 /tmp/mapfile
[root@aix199 /]#
[root@aix199 /]#lreducelv -l `getlvodm -l hd10opt` -s 69 /tmp/mapfile
[root@aix199 /]#lslv -m hd10opt
1
2
3
4
5
6
7
8
9
10
11
12
13
|
hd10opt:/opt
LP PP1 PV1 PP2 PV2 PP3 PV3
0001
0218
hdisk0
0002
0229
hdisk0
0003
0505
hdisk0
0004
0506
hdisk0
0005
0507
hdisk0
0006
0511
hdisk0
0007
0512
hdisk0
0008
0517
hdisk0
0009
0518
hdisk0
0010
0523
hdisk0
0011
0524
hdisk0
|
6、再次删除rootvg中丢失的PV
[root@aix199 /]#lsvg -p rootvg
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active
524
0
00.
.00.
.00.
.00.
.00
0516
-304
: Unable to find device id 000681aa29d5ceff
in
the Device
Configuration Database.
000681aa29d5ceff missing
480
480
96.
.96.
.96.
.96.
.96
[root@aix199 /]#reducevg -d rootvg 000681aa29d5ceff
0516
-304
putlvodm: Unable to find device id 000681aa29d5ceff0000000000000000
in
the Device
Configuration Database.
0516
-896
reducevg: Warning, cannot remove physical volume 000681aa29d5ceff
from
Device Configuration Database.
[root@aix199 /]#lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active
524
0
00.
.00.
.00.
.00.
.00
|
注:丢失的PV已经被删除
7、解决I/O出错问题
[root@aix199 /]#mkdir /home/aaa
mkdir: cannot create /home/aaa.
/home/aaa: I/O error
注:在/home/下创建目录依然出错
[root@aix199 /]#lslv -m hd1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
hd1:/home
LP PP1 PV1 PP2 PV2 PP3 PV3
0001
0217
hdisk0
0002
0515
hdisk0
0003
0516
hdisk0
[root@aix199 /]#lsvg rootvg
VOLUME GROUP: rootvg VG IDENTIFIER: 0008570c00004c0000000144684ecb4c
VG STATE: active PP SIZE:
64
megabyte(s)
VG PERMISSION: read/write TOTAL PPs:
524
(
33536
megabytes)
MAX LVs:
256
FREE PPs:
0
(
0
megabytes)
LVs:
10
USED PPs:
524
(
33536
megabytes)
OPEN LVs:
8
QUORUM:
2
(Enabled)
TOTAL PVs:
1
VG DESCRIPTORS:
2
STALE PVs:
0
STALE PPs:
0
ACTIVE PVs:
1
AUTO ON: yes
MAX PPs per VG:
32512
MAX PPs per PV:
1016
MAX PVs:
32
LTG size (Dynamic):
2048
kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
|
[root@aix199 /]#lsvg -l rootvg
rootvg:
1
2
3
4
5
6
7
8
9
10
11
|
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot
1
1
1
closed/syncd N/A
hd6 paging
8
8
1
open/syncd N/A
hd8 jfslog
1
1
1
open/syncd N/A
hd4 jfs
170
170
1
open/syncd /
hd2 jfs
165
165
1
open/syncd /usr
hd9var jfs
8
8
1
open/syncd /
var
hd3 jfs
154
154
1
open/syncd /tmp
hd1 jfs
3
3
1
open/syncd /home
hd10opt jfs
11
11
1
open/syncd /opt
lv00 jfs
3
3
1
closed/syncd /
var
/adm/csd
|
[root@aix199 /]#df -m
1
2
3
4
5
6
7
8
9
|
Filesystem MB blocks Free %Used Iused %Iused Mounted on
/dev/hd4
10880.00
10012.24
8
%
5890
1
% /
/dev/hd2
10560.00
8354.95
21
%
45691
2
% /usr
/dev/hd9var
5120.00
4769.67
7
%
1145
1
% /var
/dev/hd3
9856.00
9404.13
5
%
441
1
% /tmp
/dev/hd10opt
5120.00
4693.67
9
%
4800
1
% /opt
/dev/lv_soft
9600.00
8029.12
17
%
2522
1
% /soft
/dev/hd1
5120.00
4877.34
5
%
213
1
% /home
以上可以看出,/home对应的LV只分配了
3
个PP,而显示的空间却有5120m,所以/home的空间应该还在使用丢失的PV.
|
8、对/home进行备份,并删除/home文件系统进行重新建立
[root@aix199 /]#umount /home
[root@aix199 /]#smit rmfs
删除/home文件系统后,hd1逻辑卷也被删除
[root@aix199 /]#lsvg -l rootvg
1
2
3
4
5
6
7
8
9
10
11
|
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot
1
1
1
closed/syncd N/A
hd6 paging
8
8
1
open/syncd N/A
hd8 jfslog
1
1
1
open/syncd N/A
hd4 jfs
170
170
1
open/syncd /
hd2 jfs
165
165
1
open/syncd /usr
hd9var jfs
8
8
1
open/syncd /
var
hd3 jfs
154
154
1
open/syncd /tmp
hd10opt jfs
11
11
1
open/syncd /opt
lv00 jfs
3
3
1
open/syncd /
var
/adm/csd
|
9、重新建立hd1的逻辑卷,并mount到/home
[root@aix199 /]#lsvg rootvg
1
2
3
4
5
6
7
8
9
10
11
12
13
|
VOLUME GROUP: rootvg VG IDENTIFIER: 0008570c00004c0000000144684ecb4c
VG STATE: active PP SIZE:
64
megabyte(s)
VG PERMISSION: read/write TOTAL PPs:
524
(
33536
megabytes)
MAX LVs:
256
FREE PPs:
3
(
192
megabytes)
LVs:
9
USED PPs:
521
(
33344
megabytes)
OPEN LVs:
8
QUORUM:
2
(Enabled)
TOTAL PVs:
1
VG DESCRIPTORS:
2
STALE PVs:
0
STALE PPs:
0
ACTIVE PVs:
1
AUTO ON: yes
MAX PPs per VG:
32512
MAX PPs per PV:
1016
MAX PVs:
32
LTG size (Dynamic):
2048
kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
|
[root@aix199 /]#lsvg -l rootvg
1
2
3
4
5
6
7
8
9
10
11
12
|
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd5 boot
1
1
1
closed/syncd N/A
hd6 paging
8
8
1
open/syncd N/A
hd8 jfslog
1
1
1
open/syncd N/A
hd4 jfs
170
170
1
open/syncd /
hd2 jfs
165
165
1
open/syncd /usr
hd9var jfs
8
8
1
open/syncd /var
hd3 jfs
154
154
1
open/syncd /tmp
hd1 jfs
2
2
1
closed/syncd N/A
hd10opt jfs
11
11
1
open/syncd /opt
lv00 jfs
3
3
1
open/syncd /var/adm/csd
|
[root@aix199 /]#mount /home
验证:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
[root@aix199 /]#ls /home
lost+found
[root@aix199 /]#mkdir /home/aa
[root@aix199 /]#ls -l /home
total
16
drwxr-sr-x
2
root sys
512
May
4
17
:
54
aa
drwxrwx---
2
root system
512
May
4
17
:
53
lost+found
[root@aix199 /]#df -m
Filesystem MB blocks Free %Used Iused %Iused Mounted
on
/dev/hd4
10880.00
10012.22
8
%
5893
1
% /
/dev/hd2
10560.00
8354.95
21
%
45691
2
% /usr
/dev/hd9var
5120.00
4769.67
7
%
1145
1
% /
var
/dev/hd3
9856.00
9404.12
5
%
441
1
% /tmp
/dev/hd10opt
5120.00
4693.67
9
%
4800
1
% /opt
/dev/lv00
5120.00
4959.23
4
%
18
1
% /
var
/adm/csd
/dev/lv_soft
9600.00
8029.12
17
%
2522
1
% /soft
/dev/hd1
128.00
123.94
4
%
18
1
% /home
|
@至此,问题解决。在给rootvg添加PV时,一定要使用本地磁盘,而不要使用阵列上的磁盘,在删除PV时,应该选择正确的步骤!