前几天在写这篇文档的时候,发现sysbench对PostgreSQL libpq绑定变量使用的支持并不好。
《让 sysbench 支持 PostgreSQL 服务端绑定变量》
https://yq.aliyun.com/articles/34870
那么怎样跟踪出错的代码呢?
通过gdb跟踪是一种手段,但是sysbench在测试PostgreSQL libpq绑定时立即就退出。 通过pid来跟踪不太恰当,可以使用gdb的run指令来跟踪(之前没有仔细研究过gdb,还好有RDS PG内核团队小鲜肉给的方法,靠谱的团队,有问题立即就能找到靠谱的人)。
例如调试data程序
gdb date
(gdb) run
Starting program: /bin/date
[Thread debugging using libthread_db enabled]
Thu Apr 28 22:32:24 CST 2016
Program exited normally.
run后面加参数,实际上就是data命令加参数的效果一样
gdb date
(gdb) run +%F%t
Starting program: /bin/date +%F%t
[Thread debugging using libthread_db enabled]
2016-04-28
Program exited normally.
对于sysbench_pg,因为出错就立即退出,所以需要先加断点,然后再run,例如我们大概已经分析到sysbench_pg一定会运行的函数,设为断点,然后用单步调试。
(gdb) break [<file-name>:]<func-name>
(gdb) break [<file-name>:]<line-num>
例子 :
gdb ./sysbench_pg
(gdb) b sb_lua_db_execute
或
(gdb) b script_lua.c:sb_lua_db_execute
Breakpoint 1 at 0x40f130: file script_lua.c, line 851.
(gdb) run --test=lua/oltp_pg1.lua --db-driver=pgsql --pgsql-host=$PGDATA --pgsql-port=1921 --pgsql-user=postgres --pgsql-password=postgres --pgsql-db=postgres --oltp-tables-count=1 --oltp-table-size=1000000 --num-threads=1 --max-time=120 --max-requests=0 --report-interval=1 run
[Thread debugging using libthread_db enabled]
sysbench 0.5: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
[New Thread 0x7ffff7e6c700 (LWP 10898)]
[New Thread 0x7ffff7e5b700 (LWP 10899)]
Threads started!
[Switching to Thread 0x7ffff7e5b700 (LWP 10899)]
Breakpoint 1, sb_lua_db_execute (L=0x8ab080) at script_lua.c:851
851 script_lua.c: No such file or directory.
in script_lua.c
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.2.alios6.x86_64 libaio-0.3.107-10.1.alios6.x86_64
(gdb) n
863 in script_lua.c
(gdb) s
sb_lua_get_context (L=0x8ab080) at script_lua.c:1109
1109 in script_lua.c
查看对应的代码 :
$vi sysbench/scripting/script_lua.c
:set nu
:1109
1108 sb_lua_ctxt_t *sb_lua_get_context(lua_State *L)
1109 {
打印当前环境的变量值
(gdb) p L
$1 = (lua_State *) 0x8ab080
(gdb) p *L
$2 = {next = 0x7ffff00097b0, tt = 8 '\b', marked = 97 'a', status = 0 '\000', top = 0x8ab3a0, base = 0x8ab390, l_G = 0x8ab138, ci = 0x8a20a0, savedpc = 0x8b6d78, stack_last = 0x8ab560, stack = 0x8ab2f0, end_ci = 0x8a2168,
base_ci = 0x8a2050, stacksize = 45, size_ci = 8, nCcalls = 1, hookmask = 0 '\000', allowhook = 1 '\001', basehookcount = 0, hookcount = 0, hook = 0, l_gt = {value = {gc = 0x8aa560, p = 0x8aa560, n = 9086304, b = 9086304}, tt = 5},
env = {value = {gc = 0x8af150, p = 0x8af150, n = 9105744, b = 9105744}, tt = 5}, openupval = 0x0, gclist = 0x0, errorJmp = 0x7ffff7e5ac20, errfunc = 0}
(gdb) p *L->savedpc
$3 = 147525
一路回车,在这个位置抛出异常
sb_lua_db_execute (L=0x8ab080) at script_lua.c:943
943 script_lua.c: No such file or directory.
in script_lua.c
(gdb)
942 in script_lua.c
(gdb)
943 in script_lua.c
(gdb)
946 in script_lua.c
(gdb)
945 in script_lua.c
(gdb)
946 in script_lua.c
(gdb)
948 in script_lua.c
(gdb)
lua_error (L=0x8ab080) at lapi.c:957
957 lapi.c: No such file or directory.
in lapi.c
(gdb)
960 in lapi.c
(gdb)
luaG_errormsg (L=0x8ab080) at ldebug.c:600
600 ldebug.c: No such file or directory.
in ldebug.c
(gdb)
601 in ldebug.c
(gdb)
610 in ldebug.c
(gdb)
609 in ldebug.c
(gdb)
610 in ldebug.c
(gdb)
609 in ldebug.c
(gdb)
luaD_throw (L=0x8ab080, errcode=2) at ldo.c:94
94 ldo.c: No such file or directory.
in ldo.c
(gdb)
95 in ldo.c
(gdb)
94 in ldo.c
(gdb)
95 in ldo.c
(gdb)
96 in ldo.c
(gdb)
97 in ldo.c
(gdb)
FATAL: failed to execute function `event': (null)
[Thread 0x7ffff7e5b700 (LWP 11124) exited]
[Thread 0x7ffff7e6c700 (LWP 11123) exited]
重来一遍,直接跟踪行号
gdb ./sysbench_pg
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.1.alios6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/dege.zzz/sysbench/sysbench_pg...done.
(gdb) b script_lua.c:943
Breakpoint 1 at 0x40f2cd: file script_lua.c, line 943.
(gdb) run --test=lua/oltp_pg1.lua --db-driver=pgsql --pgsql-host=$PGDATA --pgsql-port=1921 --pgsql-user=postgres --pgsql-password=postgres --pgsql-db=postgres --oltp-tables-count=1 --oltp-table-size=1000000 --num-threads=1 --max-time=120 --max-requests=0 --report-interval=1 run
Starting program: /home/dege.zzz/sysbench/sysbench_pg --test=lua/oltp_pg1.lua --db-driver=pgsql --pgsql-host=$PGDATA --pgsql-port=1921 --pgsql-user=postgres --pgsql-password=postgres --pgsql-db=postgres --oltp-tables-count=1 --oltp-table-size=1000000 --num-threads=1 --max-time=120 --max-requests=0 --report-interval=1 run
[Thread debugging using libthread_db enabled]
sysbench 0.5: multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Report intermediate results every 1 second(s)
Random number generator seed is 0 and will be ignored
[New Thread 0x7ffff7e6c700 (LWP 11347)]
[New Thread 0x7ffff7e5b700 (LWP 11348)]
Threads started!
FATAL: query execution failed: -268398832
[Switching to Thread 0x7ffff7e5b700 (LWP 11348)]
Breakpoint 1, sb_lua_db_execute (L=0x8ab080) at script_lua.c:943
943 script_lua.c: No such file or directory.
in script_lua.c
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.2.alios6.x86_64 libaio-0.3.107-10.1.alios6.x86_64
(gdb) n
942 in script_lua.c
对应的代码
应该是在类型处理时出现了问题。
908 /* Rebind if needed */
909 if (needs_rebind)
910 {
911 binds = (db_bind_t *)calloc(stmt->nparams, sizeof(db_bind_t));
912 if (binds == NULL)
913 luaL_error(L, "Memory allocation failure");
914
915 for (i = 0; i < stmt->nparams; i++)
916 {
917 param = stmt->params + i;
918 binds[i].type = param->type;
919 binds[i].is_null = ¶m->is_null;
920 if (*binds[i].is_null != 0)
921 continue;
922 switch (param->type)
923 {
924 case DB_TYPE_INT:
925 binds[i].buffer = param->buf;
926 break;
927 case DB_TYPE_CHAR:
928 binds[i].buffer = param->buf;
929 binds[i].data_len = &stmt->params[i].buflen;
930 binds[i].is_null = 0;
931 break;
932 default:
933 luaL_error(L, "Unsupported variable type");
934 }
935 }
937 if (db_bind_param(stmt->ptr, binds, stmt->nparams))
938 luaL_error(L, "db_bind_param() failed");
939 free(binds);
940 }
941
942 ptr = db_execute(stmt->ptr);
943 if (ptr == NULL)
944 {
945 stmt->rs = NULL;
946 if (ctxt->con->db_errno == SB_DB_ERROR_DEADLOCK)
947 lua_pushnumber(L, SB_DB_RESTART_TRANSACTION);
gdb的详细用法可以参考gdb手册。