新的impala已经支持udf了,在测试环境部署了1.2.3版本的cluster.
在运行测试udf时遇到下面这个错误:
java.lang.IllegalArgumentException (表明向方法传递了一个不合法或不正确的参数。)
经过确认这是一个bug:
https://issues.cloudera.org/browse/IMPALA-791
The currently impala 1.2.3 doesn't support String as the input and return types. You'll instead have to use Text or BytesWritable.
1.2.3版本的impala udf的输入参数和返回值还不支持String,可以使用import org.apache.hadoop.io.Text类代替String
Text的api文档:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/Text.html
重要的几点:
Constructor:
Text(String string) Construct from a string.
Method:
String toString() Convert text back to string
void set(String string) Set to contain the contents of a string.
void set(Text other) copy a text.
void clear() clear the string to empty
在eclipse中测试Text类的用法:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
package
com.hive.myudf;
import
java.util.Arrays;
import
java.util.regex.Pattern;
import
java.util.regex.Matcher;
import
org.apache.hadoop.io.Text;
public
class
TextTest {
private
static
Text schemal =
new
Text(
"http://"
);
private
static
Text t =
new
Text(
"GET /vips-mobile/router.do?api_key=04e0dd9c76902b1bfc5c7b3bb4b1db92&app_version=1.8.7 HTTP/1.0"
);
private
static
Pattern p =
null
;
private
static
Matcher m =
null
;
public
static
void
main(String[] args) {
p = Pattern. compile(
"(.+?) +(.+?) (.+)"
);
Matcher m = p.matcher( t.toString());
if
(m.matches()){
String tt = schemal +
"test.test.com"
+m.group(
2
);
System. out .println(tt);
//return m.group(2);
}
else
{
System. out .println(
"not match"
);
//return null;
}
schemal .clear();
t.clear();
}
}
|
测试udf:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
|
package
com.hive.myudf;
import
java.net.URL;
import
java.util.regex.Matcher;
import
java.util.regex.Pattern;
import
org.apache.hadoop.hive.ql.exec.UDF;
import
org.apache.hadoop.io.Text;
import
org.apache.log4j.Logger;
public
class
UDFNginxParseUrl
extends
UDF {
private
static
final
Logger LOG = Logger.getLogger(UDFNginxParseUrl.
class
);
private
Text schemal =
new
Text(
"http://"
);
private
Pattern p1 =
null
;
private
URL url =
null
;
private
Pattern p =
null
;
private
Text lastKey =
null
;
private
String rt;
public
UDFNginxParseUrl() {
}
public
Text evaluate(Text host1, Text urlStr, Text partToExtract) {
LOG.debug(
"3args|args1:"
+ host1 +
",args2:"
+ urlStr +
",args3:"
+ partToExtract);
System. out.println(
"3 args"
);
System. out.println(
"args1:"
+ host1 +
",args2:"
+ urlStr +
",args3:"
+ partToExtract);
if
(host1 ==
null
|| urlStr ==
null
|| partToExtract ==
null
) {
//return null;
return
new
Text(
"a"
);
}
p1 = Pattern.compile(
"(.+?) +(.+?) (.+)"
);
Matcher m1 = p1.matcher(urlStr.toString());
if
(m1.matches()){
LOG.debug(
"into match"
);
String realUrl = schemal.toString() + host1.toString() + m1.group(
2
);
Text realUrl1 =
new
Text(realUrl);
System. out.println(
"URL is "
+ realUrl1);
LOG.debug(
"realurl:"
+ realUrl1.toString());
try
{
LOG.debug(
"into try"
);
url =
new
URL(realUrl1.toString());
}
catch
(Exception e){
//return null;
LOG.debug(
"into exception"
);
return
new
Text(
"b"
);
}
}
if
(partToExtract.equals(
"HOST"
)) {
rt = url.getHost();
LOG.debug(
"get host"
+ rt );
}
//return new Text(rt);
LOG.debug(
"get what"
);
return
new
Text(
"rt"
);
}
}
|
几个注意的地方:
1.function是和db相关联的。
2.jar文件存放在hdfs中
3.function会被catalog缓存
本文转自菜菜光 51CTO博客,原文链接:http://blog.51cto.com/caiguangguang/1359312,如需转载请自行联系原作者