在debug hive的问题时,我们经常需要在源码里面增加必要的日志信息,来跟踪变量的变化或者是方法执行的情况,性能等。。hive里面有下面几个日志的实现方式。
1种是常用的通过
1
|
org.apache.commons.logging.Log
|
和
1
|
org.apache.commons.logging.LogFactory
|
类实现的。
比如要打印类的info信息时,只需要下面这样就可以。
1
2
|
static
final
private
Log LOG = LogFactory.getLog(CLASS_NAME)
LOG.info(
"xxxxxx"
);
|
如果要跟踪一个方法的执行时间,只需要在在方法调用前后各加一条日志信息记录当前时间戳,然后取差值即可。
另一种是LogHelper
1
|
org.apache.hadoop.hive.ql.session.SessionState.LogHelper
|
LogHelper类是SessionState的内部类
输出日志最终调用的是LOG.xxx和对应日志输出的println方法(比如System.out.println/log4j等)
但是日志只有两个级别,info和error。
以printInfo方法为例,具体实现:
1
2
3
4
5
6
|
public
void
printInfo(String info, String detail) {
if
(!getIsSilent()) {
getInfoStream().println(info);
}
LOG.info(info + StringUtils.defaultString(detail));
//调用LOG.info(即第一种方法)
}
|
使用的sample:
1
2
|
static
final
private
LogHelper console =
new
LogHelper (LOG);
console.printError(
"Authorization failed:"
+ authExp.getMessage() +
". Use SHOW GRANT to get more details."
);
|
还有一种重要的类是PerfLogger类,可以实现记录方法执行时间的信息,在一些情况下可以取代第一种记录性能的方法。
PerfLogger类主要是用在Driver类的compile方法中,用来记录各个step使用的时间。
看下具体实现:
首先声明存储starttime和endtime的map:
1
2
|
protected
final
Map<String, Long> startTimes =
new
HashMap<String, Long>();
protected
final
Map<String, Long> endTimes =
new
HashMap<String, Long>();
|
主要方法:
1).构造方法可以传入一个boolean参数,控制是否reset(默认是false)
1
2
3
4
5
6
7
8
9
10
11
|
protected
static
final
ThreadLocal<PerfLogger> perfLogger =
new
ThreadLocal<PerfLogger>();
public
static
PerfLogger getPerfLogger(
boolean
resetPerfLogger) {
if
(SessionState.get() ==
null
) {
if
(perfLogger.get() ==
null
|| resetPerfLogger) {
perfLogger.set(
new
PerfLogger());
}
return
perfLogger.get();
}
else
{
return
SessionState.get().getPerfLogger(resetPerfLogger);
}
}
|
2).PerfLogBegin 用来记录开始时间,主要是使用了LOG.info(第一种记录日志的方法)
1
2
3
4
5
|
public
void
PerfLogBegin(String callerName, String method) {
long
startTime = System.currentTimeMillis();
LOG.info(
"<PERFLOG method="
+ method +
" from="
+ callerName +
">"
);
startTimes.put(method,
new
Long(startTime));
//将方法和对应的startTime放在startTimes的map中
}
|
3).相应的PerfLogEnd用来记录结束时间
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
public
long
PerfLogEnd(String callerName, String method) {
Long startTime = startTimes.get(method);
long
endTime = System. currentTimeMillis();
long
duration = -
1
;
endTimes.put(method,
new
Long(endTime));
StringBuilder sb =
new
StringBuilder(
"</PERFLOG method="
).append(method);
if
(startTime !=
null
) {
sb.append(
" start="
).append(startTime);
}
sb.append(
" end="
).append(endTime);
if
(startTime !=
null
) {
duration = endTime - startTime.longValue();
sb.append(
" duration="
).append(duration);
}
sb.append(
" from="
).append(callerName).append(
">"
);
LOG.info(sb);
return
duration;
}
|
getStartTime和getEndTime分别通过对应的map来获取starttime和endtime
4.是不是所有的method都可以用这种方法获取执行的性能?答案是no,在PerfLogger类中定义了部分属性,对应于method
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
|
public
static
final
String ACQUIRE_READ_WRITE_LOCKS =
"acquireReadWriteLocks"
;
public
static
final
String COMPILE =
"compile"
;
public
static
final
String PARSE =
"parse"
;
public
static
final
String ANALYZE =
"semanticAnalyze"
;
public
static
final
String DO_AUTHORIZATION =
"doAuthorization"
;
public
static
final
String DRIVER_EXECUTE =
"Driver.execute"
;
public
static
final
String INPUT_SUMMARY =
"getInputSummary"
;
public
static
final
String GET_SPLITS =
"getSplits"
;
public
static
final
String RUN_TASKS =
"runTasks"
;
public
static
final
String SERIALIZE_PLAN =
"serializePlan"
;
public
static
final
String DESERIALIZE_PLAN =
"deserializePlan"
;
public
static
final
String CLONE_PLAN =
"clonePlan"
;
public
static
final
String TASK =
"task."
;
public
static
final
String RELEASE_LOCKS =
"releaseLocks"
;
public
static
final
String PRUNE_LISTING =
"prune-listing"
;
public
static
final
String PARTITION_RETRIEVING =
"partition-retrieving"
;
public
static
final
String PRE_HOOK =
"PreHook."
;
public
static
final
String POST_HOOK =
"PostHook."
;
public
static
final
String FAILURE_HOOK =
"FailureHook."
;
public
static
final
String DRIVER_RUN =
"Driver.run"
;
public
static
final
String TIME_TO_SUBMIT =
"TimeToSubmit"
;
public
static
final
String TEZ_SUBMIT_TO_RUNNING =
"TezSubmitToRunningDag"
;
public
static
final
String TEZ_BUILD_DAG =
"TezBuildDag"
;
public
static
final
String TEZ_SUBMIT_DAG =
"TezSubmitDag"
;
public
static
final
String TEZ_RUN_DAG =
"TezRunDag"
;
public
static
final
String TEZ_CREATE_VERTEX =
"TezCreateVertex."
;
public
static
final
String TEZ_RUN_VERTEX =
"TezRunVertex."
;
public
static
final
String TEZ_INITIALIZE_PROCESSOR =
"TezInitializeProcessor"
;
public
static
final
String TEZ_RUN_PROCESSOR =
"TezRunProcessor"
;
public
static
final
String TEZ_INIT_OPERATORS =
"TezInitializeOperators"
;
public
static
final
String LOAD_HASHTABLE =
"LoadHashtable"
;
public
static
final
String ORC_GET_SPLITS =
"OrcGetSplits"
;
|
再来看一下其调用方式:
1
2
3
4
5
6
7
8
9
10
|
比如org.apache.hadoop.hive.ql.Driver类中的compile方法:
static
final
private
String CLASS_NAME = Driver.
class
.getName();
public
int
compile(String command,
boolean
resetTaskIds) {
PerfLogger perfLogger = PerfLogger.getPerfLogger();
//实例化一个PerfLogger的对象
.......
perfLogger.PerfLogBegin( CLASS_NAME, PerfLogger.PARSE);
//记录开始时间,传入本类的类名记录的方法(方法对应了PerfLogger类中对应的属性)
ParseDriver pd =
new
ParseDriver();
ASTNode tree = pd.parse(command, ctx);
tree = ParseUtils.findRootNonNullToken(tree);
perfLogger.PerfLogEnd( CLASS_NAME, PerfLogger.PARSE);
|
记录parse消耗时间的对应日志:
1
2
3
|
14
/
07
/
31
14
:
33
:
21
INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
.....
14
/
07
/
31
14
:
33
:
21
INFO log.PerfLogger: </PERFLOG method=parse start=
1406788401914
end=
1406788401916
duration=
2
from=org.apache.hadoop.hive.ql.Driver>
|
3种方法可以根据需要选择使用。
本文转自菜菜光 51CTO博客,原文链接:http://blog.51cto.com/caiguangguang/1541771,如需转载请自行联系原作者