FS线上一次Crashes分析定位过程-ldns库问题
– by yine 2018-04-10 15:33:05
\
一、故障发生时间点
2018-04-10 09:54:07
\
二、堆栈查看结果
warning: .dynamic section for "/usr/lib/x86_64-linux-gnu/librtmp.so.1" is not at the expected address (wrong library or version mismatch?) \ warning: .dynamic section for "/usr/lib/libldns.so.1" is not at the expected address (wrong library or version mismatch?) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/bin/freeswitch -nc -nonat -nosql -u popo -g netease'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f2388bc9067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt full #0 0x00007f2388bc9067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 resultvar = 0 pid = 28593 selftid = 20809 #1 0x00007f2388bca448 in __GI_abort () at abort.c:89 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x3030303030207078, sa_sigaction = 0x3030303030207078}, sa_mask = {__val = {3475143045726351408, 2314885530819502128, 2314885530818453536, 8319937555149627424, 746872325959545721, 3775530756625032759, 3631650816742404144, 3472329422401517619, 3467895374536122416, 2319406791620833328, 3761104034442405222, 2314885530819704883, 2314885530818453536, 2314885530818453536, 4069054363051241248, 139789281265312}}, sa_flags = 65, sa_restorer = 0x7f233a740700} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00007f2388c071b4 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f2388cf9cb3 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175 ap = {{gp_offset = 32, fp_offset = 32547, overflow_arg_area = 0x7f233a740710, reg_save_area = 0x7f233a7406a0}} fd = 2 on_2 = list = nlist = cp = written = #3 0x00007f2388c8caa7 in __GI___fortify_fail (msg=msg@entry=0x7f2388cf9c4a "buffer overflow detected") at fortify_fail.c:31 No locals. #4 0x00007f2388c8acc0 in __GI___chk_fail () at chk_fail.c:28 No locals. #5 0x00007f2388c8ca17 in __fdelt_chk (d=) at fdelt_chk.c:25 No locals. #6 0x00007f23822184c5 in ?? () from /usr/lib/libldns.so.1 No symbol table info available. #7 0x0000000000000000 in ?? () No symbol table info available. (gdb)
三、FS日志查看结果
popo@hzadg-ysf-01:~/DATA/logs/freeswitch$ grep "8e660ca2-d28a-4f09-a6f7-260bd25b75f4" freeswitch.log
8e660ca2-d28a-4f09-a6f7-260bd25b75f4 2018-04-10 09:54:10.237472 [NOTICE] switch_channel.c:1104 New Channel sofia/internal/test@59.111.165.135:53 [8e660ca2-d28a-4f09-a6f7-260bd25b75f4]
8e660ca2-d28a-4f09-a6f7-260bd25b75f4 2018-04-10 09:54:10.357473 [INFO] mod_dialplan_xml.c:637 Processing test ->test in context default
8e660ca2-d28a-4f09-a6f7-260bd25b75f4 2018-04-10 09:54:10.377454 [NOTICE] switch_ivr.c:2172 Transfer sofia/internal/test@59.111.165.135:53 to enum[test@default]
popo@hzadg-ysf-01:~/DATA/logs/freeswitch$
四、问题定位
通过堆栈可以看出libldns库,通过fs中的日志可以看到最后执行的一行是:mod_enum这个模块下的enum指令后才crash,开始进行漫天的search,终于发现一些端倪;
\
首先发现有人在FS中报了这样一个jira单子:freeswitch.org/jira/browse…
\
FS作者向ldns库作者提了这样一个问题:www.nlnetlabs.nl/bugs-script…
\
ldns作者做了这样一个patch: www.nlnetlabs.nl/bugs-script…
\
五、问题解决
接作者所说增加宏定义,FD_SETSIZE 自己想要扩展的值
From your back trace I see that the crash happens in ldns_sock_wait which uses select to wait for a socket to become readable or writable. The maximum number of sockets fed to select is FD_SETSIZE which is 1024 by default. In the issue report I read that this crash only occurs when the number of file descriptors in use is more than 1024.
\
直接升级debian8上的ldns库至1.7.0版本解决问题
git.nlnetlabs.nl/ldns/tree/C… 中的bugfix #678: Use poll i.s.o. select to support > 1024 fds 这一条即是对本BUG的修复内容
\
但是1.7.0在debian8的发行版本里没有,最新的也只有1.6.18,所以只能自己编译依赖
\
先进入/usr/lib/freeswitch/mod目录下查看mod_enum.so对ldns的依赖, /usr/lib/freeswitch/mod# ldd mod_enum.so
linux-vdso.so.1 (0x00007ffde5fc5000)
libldns.so.1 => /usr/lib/libldns.so.1 (0x00007f1c4e9f6000)
可以看到,第二项就是对其的依赖。 复制代码
六、系统无污染替换方法
呼叫中心-媒体服务底层依赖模块替换方法
\
\
下载ldns源码
- www.linuxfromscratch.org/blfs/view/s…
- cd /home/popo/freeswitch/src
- wget www.nlnetlabs.nl/downloads/l…
- wget www.openssl.org/source/open…
\
\
安装openssl
- cd /home/popo/freeswitch/bin && mkdir openssl-1.1.0c && mkdir ldns-1.7.0
- 编译openssl高版本:
./config --prefix=/home/popo/freeswitch/bin/openssl-1.1.0c/openssl --openssldir=/home/popo/freeswitch/bin/openssl-1.1.0c/ssl && make && make install
\
\
安装ldns高版本库
- tar zxvf ldns-1.7.0.tar.gz && cd ldns-1.7.0
- ./configure --prefix=/home/popo/freeswitch/bin/ldns-1.7.0 --with-ssl=/home/popo/freeswitch/bin/openssl-1.1.0c/openssl && make && make install
- cd /home/popo/freeswitch/bin/ldns-1.7.0/lib
- ln -s libldns.so.2.0.0 libldns.so.1
\
\
配置用户环境变量
- cd ~
- vim .profile
- 打开此文件添加如下:
PATH=/home/popo/freeswitch/bin/openssl-1.1.0c/openssl/bin:$PATH
\
\
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/popo/freeswitch/bin/openssl-1.1.0c/openssl/lib
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/popo/freeswitch/bin/ldns-1.7.0/lib
\
\
export PATH LD_LIBRARY_PATH
\
\
- . .profile 使生效
- 校验openssl是否生效: openssl version
- 查看环境变量是否生效: env
\
\
重启FS使其mod_enum模块所依赖的ldns库生效
- sudo /etc/freeswitch restart
- ldd /usr/lib/freeswitch/mod/mod_enum.so
\
\
回退方法
- 删除环境变量 .profile 中的新增配置项
- 重启FS复原库依赖 sudo /etc/freeswitch restart
\
\
over!