Hadoop Job使用第三方依赖jar文件

本文涉及的产品
服务治理 MSE Sentinel/OpenSergo,Agent数量 不受限
简介:

当我们实现了一个Hadoop MapReduce Job以后,而这个Job可能又依赖很多外部的jar文件,在Hadoop集群上运行时,有时会出现找不到具体Class的异常。出现这种问题,基本上就是在Hadoop Job执行过程中,没有从执行的上下文中找到对应的jar文件(实际是unjar的目录,目录里面是对应的Class文件)。所以,我们自然而然想到,正确配置好对应的classpath,MapReduce Job运行时就能够找到。
有两种方式可以更好地实现,一种是设置HADOOP_CLASSPATH,将Job所依赖的jar文件加载到HADOOP_CLASSPATH,这种配置只针对该Job生效,Job结束之后HADOOP_CLASSPATH会被清理;另一种方式是,直接在构建代码的时候,将依赖jar文件与Job代码打成一个jar文件,这种方式可能会使得最终的jar文件比较大,但是结合一些代码构建工具,如Maven,可以在依赖控制方面保持一个Job一个依赖的构建配置,便于管理。下面,我们分别说明这两种方式。

设置HADOOP_CLASSPATH

比如,我们有一个使用HBase的应用,操作HBase数据库中表,肯定需要ZooKeeper,所以对应的jar文件的位置都要设置正确,让运行时Job能够检索并加载。
Hadoop实现里面,有个辅助工具类org.apache.hadoop.util.GenericOptionsParser,能够帮助我们加载对应的文件到classpath中,操作比较容易一些。
下面我们是我们实现的一个例子,程序执行入口的类,代码如下所示:

01 package org.shirdrn.kodz.inaction.hbase.job.importing;
02
03 import java.io.IOException;
04 import java.net.URISyntaxException;
05
06 import org.apache.hadoop.conf.Configuration;
07 import org.apache.hadoop.fs.Path;
08 import org.apache.hadoop.hbase.HBaseConfiguration;
09 import org.apache.hadoop.hbase.client.Put;
10 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
11 import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
12 import org.apache.hadoop.mapreduce.Job;
13 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
14 import org.apache.hadoop.util.GenericOptionsParser;
15
16 /**
17 * Table DDL: create 't_sub_domains', 'cf_basic', 'cf_status'
18 * <pre>
19 * cf_basic:domain cf_basic:len
20 * cf_status:status cf_status:live
21 * </pre>
22 *
23 * @author shirdrn
24 */
25 public class DataImporter {
26
27 public static void main(String[] args)
28 throws IOException, InterruptedException, ClassNotFoundException, URISyntaxException {
29
30 Configuration conf = HBaseConfiguration.create();
31 String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
32
33 assert(otherArgs.length == 2);
34
35 if(otherArgs.length < 2) {
36 System.err.println("Usage: \n" +
37 " ImportDataDriver -libjars <jar1>[,<jar2>...[,<jarN>]] <tableName> <input>");
38 System.exit(1);
39 }
40 String tableName = otherArgs[0].trim();
41 String input = otherArgs[1].trim();
42
43 // set table columns
44 conf.set("table.cf.family", "cf_basic");
45 conf.set("table.cf.qualifier.fqdn", "domain");
46 conf.set("table.cf.qualifier.timestamp", "create_at");
47
48 Job job = new Job(conf, "Import into HBase table");
49 job.setJarByClass(DataImporter.class);
50 job.setMapperClass(ImportFileLinesMapper.class);
51 job.setOutputFormatClass(TableOutputFormat.class);
52
53 job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
54 job.setOutputKeyClass(ImmutableBytesWritable.class);
55 job.setOutputValueClass(Put.class);
56
57 job.setNumReduceTasks(0);
58
59 FileInputFormat.addInputPath(job, new Path(input));
60
61 int exitCode = job.waitForCompletion(true) ? 0 : 1;
62 System.exit(exitCode);
63 }
64
65 }

可以看到,我们可以通过-libjars选项来指定该Job运行所依赖的第三方jar文件,具体使用方法,说明如下:

  • 第一步:设置环境变量

我们修改.bashrc文件,增加如下配置内容:

1 export HADOOP_HOME=/opt/stone/cloud/hadoop-1.0.3
2 export PATH=$PATH:$HADOOP_HOME/bin
3 export HBASE_HOME=/opt/stone/cloud/hbase-0.94.1
4 export PATH=$PATH:$HBASE_HOME/bin
5 export ZK_HOME=/opt/stone/cloud/zookeeper-3.4.3

不要忘记要使当前的配置生效:

1 . .bashrc
2
3 source .bashrc

这样就可以方便地引用外部的jar文件了。

  • 第二步:确定Job依赖的jar文件列表

上面提到,我们要使用HBase,需要HBase和ZooKeeper的相关jar文件,用到的文件如下所示:

1 HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.1.jar:$ZK_HOME/zookeeper-3.4.3.jar ./bin/hadoop jar import-into-hbase.jar

设置当前Job执行的HADOOP_CLASSPATH变量,只对当前Job有效,所以没有必要在.bashrc中进行配置。

  • 第三步:运行开发的Job

运行我们开发的Job,通过命令行输入HADOOP_CLASSPATH变量,以及使用-libjars选项指定当前这个Job依赖的第三方jar文件,启动命令行如下所示:

1 xiaoxiang@ubuntu3:~/hadoop$ HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.1.jar:$ZK_HOME/zookeeper-3.4.3.jar ./bin/hadoop jar import-into-hbase.jar org.shirdrn.kodz.inaction.hbase.job.importing.ImportDataDriver -libjars $HBASE_HOME/hbase-0.94.1.jar,$HBASE_HOME/lib/protobuf-java-2.4.0a.jar,$ZK_HOME/zookeeper-3.4.3.jar t_sub_domains /user/xiaoxiang/datasets/domains/

需要注意的是,环境变量中内容使用冒号分隔,而-libjars选项中的内容使用逗号分隔。

这样,我们就能够正确运行开发的Job了。
下面看看我们开发的Job运行的结果:

001 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
002 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:host.name=ubuntu3
003 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_30
004 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
005 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_30/jre
006 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/stone/cloud/hadoop-1.0.3/libexec/../conf:/usr/java/jdk1.6.0_30/lib/tools.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/..:/opt/stone/cloud/hadoop-1.0.3/libexec/../hadoop-core-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/asm-3.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/aspectjrt-1.6.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/aspectjtools-1.6.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-beanutils-1.7.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-beanutils-core-1.8.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-cli-1.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-codec-1.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-collections-3.2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-configuration-1.6.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-daemon-1.0.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-digester-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-el-1.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-httpclient-3.0.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-io-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-lang-2.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-logging-1.1.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-logging-api-1.0.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-math-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-net-1.4.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/core-3.1.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-capacity-scheduler-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-datajoin-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-fairscheduler-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-thriftfs-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hsqldb-1.8.0.10.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jackson-core-asl-1.8.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jasper-compiler-5.5.12.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jasper-runtime-5.5.12.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jdeb-0.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-core-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-json-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-server-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jets3t-0.6.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jetty-6.1.26.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jetty-util-6.1.26.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsch-0.1.42.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/junit-4.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/kfs-0.2.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/log4j-1.2.15.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/mockito-all-1.8.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/oro-2.0.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/protobuf-java-2.4.0a.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/servlet-api-2.5-20081211.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/slf4j-api-1.4.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/slf4j-log4j12-1.4.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/xmlenc-0.52.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsp-2.1/jsp-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/opt/stone/cloud/hbase-0.94.1/hbase-0.94.1.jar:/opt/stone/cloud/zookeeper-3.4.3/zookeeper-3.4.3.jar
007 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/native/Linux-amd64-64
008 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
009 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
010 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
011 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
012 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.version=3.0.0-12-server
013 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.name=xiaoxiang
014 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/xiaoxiang
015 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.dir=/opt/stone/cloud/hadoop-1.0.3
016 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ubuntu3:2222 sessionTimeout=180000 watcher=hconnection
017 13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Opening socket connection to server /172.0.8.252:2222
018 13/04/10 22:03:32 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 17561@ubuntu3
019 13/04/10 22:03:32 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
020 13/04/10 22:03:32 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
021 13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Socket connection established to ubuntu3/172.0.8.252:2222, initiating session
022 13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Session establishment complete on server ubuntu3/172.0.8.252:2222, sessionid = 0x13decd0f3960042, negotiated timeout = 180000
023 13/04/10 22:03:32 INFO mapreduce.TableOutputFormat: Created table instance for t_sub_domains
024 13/04/10 22:03:32 INFO input.FileInputFormat: Total input paths to process : 1
025 13/04/10 22:03:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library
026 13/04/10 22:03:32 WARN snappy.LoadSnappy: Snappy native library not loaded
027 13/04/10 22:03:32 INFO mapred.JobClient: Running job: job_201303302227_0034
028 13/04/10 22:03:33 INFO mapred.JobClient: map 0% reduce 0%
029 13/04/10 22:03:50 INFO mapred.JobClient: map 2% reduce 0%
030 13/04/10 22:03:53 INFO mapred.JobClient: map 3% reduce 0%
031 13/04/10 22:03:56 INFO mapred.JobClient: map 4% reduce 0%
032 13/04/10 22:03:59 INFO mapred.JobClient: map 6% reduce 0%
033 13/04/10 22:04:03 INFO mapred.JobClient: map 7% reduce 0%
034 13/04/10 22:04:06 INFO mapred.JobClient: map 8% reduce 0%
035 13/04/10 22:04:09 INFO mapred.JobClient: map 10% reduce 0%
036 13/04/10 22:04:15 INFO mapred.JobClient: map 12% reduce 0%
037 13/04/10 22:04:18 INFO mapred.JobClient: map 13% reduce 0%
038 13/04/10 22:04:21 INFO mapred.JobClient: map 14% reduce 0%
039 13/04/10 22:04:24 INFO mapred.JobClient: map 15% reduce 0%
040 13/04/10 22:04:27 INFO mapred.JobClient: map 17% reduce 0%
041 13/04/10 22:04:33 INFO mapred.JobClient: map 18% reduce 0%
042 13/04/10 22:04:36 INFO mapred.JobClient: map 19% reduce 0%
043 13/04/10 22:04:39 INFO mapred.JobClient: map 20% reduce 0%
044 13/04/10 22:04:42 INFO mapred.JobClient: map 21% reduce 0%
045 13/04/10 22:04:45 INFO mapred.JobClient: map 23% reduce 0%
046 13/04/10 22:04:48 INFO mapred.JobClient: map 24% reduce 0%
047 13/04/10 22:04:51 INFO mapred.JobClient: map 25% reduce 0%
048 13/04/10 22:04:54 INFO mapred.JobClient: map 27% reduce 0%
049 13/04/10 22:04:57 INFO mapred.JobClient: map 28% reduce 0%
050 13/04/10 22:05:00 INFO mapred.JobClient: map 29% reduce 0%
051 13/04/10 22:05:03 INFO mapred.JobClient: map 31% reduce 0%
052 13/04/10 22:05:06 INFO mapred.JobClient: map 32% reduce 0%
053 13/04/10 22:05:09 INFO mapred.JobClient: map 33% reduce 0%
054 13/04/10 22:05:12 INFO mapred.JobClient: map 34% reduce 0%
055 13/04/10 22:05:15 INFO mapred.JobClient: map 35% reduce 0%
056 13/04/10 22:05:18 INFO mapred.JobClient: map 37% reduce 0%
057 13/04/10 22:05:21 INFO mapred.JobClient: map 38% reduce 0%
058 13/04/10 22:05:24 INFO mapred.JobClient: map 39% reduce 0%
059 13/04/10 22:05:27 INFO mapred.JobClient: map 41% reduce 0%
060 13/04/10 22:05:30 INFO mapred.JobClient: map 42% reduce 0%
061 13/04/10 22:05:33 INFO mapred.JobClient: map 43% reduce 0%
062 13/04/10 22:05:36 INFO mapred.JobClient: map 44% reduce 0%
063 13/04/10 22:05:39 INFO mapred.JobClient: map 46% reduce 0%
064 13/04/10 22:05:42 INFO mapred.JobClient: map 47% reduce 0%
065 13/04/10 22:05:45 INFO mapred.JobClient: map 48% reduce 0%
066 13/04/10 22:05:48 INFO mapred.JobClient: map 50% reduce 0%
067 13/04/10 22:05:54 INFO mapred.JobClient: map 52% reduce 0%
068 13/04/10 22:05:57 INFO mapred.JobClient: map 53% reduce 0%
069 13/04/10 22:06:00 INFO mapred.JobClient: map 54% reduce 0%
070 13/04/10 22:06:03 INFO mapred.JobClient: map 55% reduce 0%
071 13/04/10 22:06:06 INFO mapred.JobClient: map 57% reduce 0%
072 13/04/10 22:06:12 INFO mapred.JobClient: map 59% reduce 0%
073 13/04/10 22:06:15 INFO mapred.JobClient: map 60% reduce 0%
074 13/04/10 22:06:18 INFO mapred.JobClient: map 61% reduce 0%
075 13/04/10 22:06:21 INFO mapred.JobClient: map 62% reduce 0%
076 13/04/10 22:06:24 INFO mapred.JobClient: map 63% reduce 0%
077 13/04/10 22:06:27 INFO mapred.JobClient: map 64% reduce 0%
078 13/04/10 22:06:30 INFO mapred.JobClient: map 66% reduce 0%
079 13/04/10 22:06:33 INFO mapred.JobClient: map 67% reduce 0%
080 13/04/10 22:06:36 INFO mapred.JobClient: map 68% reduce 0%
081 13/04/10 22:06:42 INFO mapred.JobClient: map 69% reduce 0%
082 13/04/10 22:06:45 INFO mapred.JobClient: map 70% reduce 0%
083 13/04/10 22:06:48 INFO mapred.JobClient: map 71% reduce 0%
084 13/04/10 22:06:51 INFO mapred.JobClient: map 73% reduce 0%
085 13/04/10 22:06:54 INFO mapred.JobClient: map 74% reduce 0%
086 13/04/10 22:06:57 INFO mapred.JobClient: map 75% reduce 0%
087 13/04/10 22:07:00 INFO mapred.JobClient: map 77% reduce 0%
088 13/04/10 22:07:03 INFO mapred.JobClient: map 78% reduce 0%
089 13/04/10 22:07:12 INFO mapred.JobClient: map 79% reduce 0%
090 13/04/10 22:07:18 INFO mapred.JobClient: map 80% reduce 0%
091 13/04/10 22:07:24 INFO mapred.JobClient: map 81% reduce 0%
092 13/04/10 22:07:30 INFO mapred.JobClient: map 82% reduce 0%
093 13/04/10 22:07:36 INFO mapred.JobClient: map 83% reduce 0%
094 13/04/10 22:07:48 INFO mapred.JobClient: map 84% reduce 0%
095 13/04/10 22:07:51 INFO mapred.JobClient: map 85% reduce 0%
096 13/04/10 22:07:59 INFO mapred.JobClient: map 86% reduce 0%
097 13/04/10 22:08:05 INFO mapred.JobClient: map 87% reduce 0%
098 13/04/10 22:08:11 INFO mapred.JobClient: map 88% reduce 0%
099 13/04/10 22:08:17 INFO mapred.JobClient: map 89% reduce 0%
100 13/04/10 22:08:23 INFO mapred.JobClient: map 90% reduce 0%
101 13/04/10 22:08:29 INFO mapred.JobClient: map 91% reduce 0%
102 13/04/10 22:08:35 INFO mapred.JobClient: map 92% reduce 0%
103 13/04/10 22:08:41 INFO mapred.JobClient: map 93% reduce 0%
104 13/04/10 22:08:47 INFO mapred.JobClient: map 94% reduce 0%
105 13/04/10 22:08:53 INFO mapred.JobClient: map 95% reduce 0%
106 13/04/10 22:08:59 INFO mapred.JobClient: map 96% reduce 0%
107 13/04/10 22:09:05 INFO mapred.JobClient: map 97% reduce 0%
108 13/04/10 22:09:11 INFO mapred.JobClient: map 98% reduce 0%
109 13/04/10 22:09:17 INFO mapred.JobClient: map 99% reduce 0%
110 13/04/10 22:09:23 INFO mapred.JobClient: map 100% reduce 0%
111 13/04/10 22:09:31 INFO mapred.JobClient: Job complete: job_201303302227_0034
112 13/04/10 22:09:31 INFO mapred.JobClient: Counters: 18
113 13/04/10 22:09:31 INFO mapred.JobClient: Job Counters
114 13/04/10 22:09:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=550605
115 13/04/10 22:09:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
116 13/04/10 22:09:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
117 13/04/10 22:09:31 INFO mapred.JobClient: Launched map tasks=2
118 13/04/10 22:09:31 INFO mapred.JobClient: Data-local map tasks=2
119 13/04/10 22:09:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
120 13/04/10 22:09:31 INFO mapred.JobClient: File Output Format Counters
121 13/04/10 22:09:31 INFO mapred.JobClient: Bytes Written=0
122 13/04/10 22:09:31 INFO mapred.JobClient: FileSystemCounters
123 13/04/10 22:09:31 INFO mapred.JobClient: HDFS_BYTES_READ=104394990
124 13/04/10 22:09:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=64078
125 13/04/10 22:09:31 INFO mapred.JobClient: File Input Format Counters
126 13/04/10 22:09:31 INFO mapred.JobClient: Bytes Read=104394710
127 13/04/10 22:09:31 INFO mapred.JobClient: Map-Reduce Framework
128 13/04/10 22:09:31 INFO mapred.JobClient: Map input records=4995670
129 13/04/10 22:09:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=279134208
130 13/04/10 22:09:31 INFO mapred.JobClient: Spilled Records=0
131 13/04/10 22:09:31 INFO mapred.JobClient: CPU time spent (ms)=129130
132 13/04/10 22:09:31 INFO mapred.JobClient: Total committed heap usage (bytes)=202833920
133 13/04/10 22:09:31 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1170251776
134 13/04/10 22:09:31 INFO mapred.JobClient: Map output records=4995670
135 13/04/10 22:09:31 INFO mapred.JobClient: SPLIT_RAW_BYTES=280

可以看到,除了加载Hadoop对应的HADOOP_HOME变量指定的路径下,lib*目录下的jar文件以外,还加载了我们设置的-libjars选项中指定的第三方jar文件,供Job运行时使用。

将Job代码和依赖jar文件打包

我比较喜欢这种方式,因为这样做首先利用饿Maven的很多优点,如管理依赖、自动构建。另外,对于其他想要使用该Job的开发人员或部署人员,无需关系更多的配置,只要按照Maven的构建规则去构建,就可以生成最终的部署文件,从而也就减少了在执行Job的时候,出现各种常见的问题(如CLASSPATH设置有问题等)。
使用如下的Maven构建插件配置,执行mvn package命令,就可以完成这些任务:

01 <build>
02 <plugins>
03 <plugin>
04 <artifactId>maven-assembly-plugin</artifactId>
05 <configuration>
06 <archive>
07 <manifest>
08 <mainClass>org.shirdrn.solr.cloud.index.hadoop.SolrCloudIndexer</mainClass>
09 </manifest>
10 </archive>
11 <descriptorRefs>
12 <descriptorRef>jar-with-dependencies</descriptorRef>
13 </descriptorRefs>
14 </configuration>
15 <executions>
16 <execution>
17 <id>make-assembly</id>
18 <phase>package</phase>
19 <goals>
20 <goal>single</goal>
21 </goals>
22 </execution>
23 </executions>
24 </plugin>
25 </plugins>
26 </build>

最后生成的jar文件在target目录下面,例如名称类似solr-platform-2.0-jar-with-dependencies.jar,然后可以直接拷贝这个文件到指定的目录,提交到Hadoop计算集群运行。

相关实践学习
云数据库HBase版使用教程
&nbsp; 相关的阿里云产品:云数据库 HBase 版 面向大数据领域的一站式NoSQL服务,100%兼容开源HBase并深度扩展,支持海量数据下的实时存储、高并发吞吐、轻SQL分析、全文检索、时序时空查询等能力,是风控、推荐、广告、物联网、车联网、Feeds流、数据大屏等场景首选数据库,是为淘宝、支付宝、菜鸟等众多阿里核心业务提供关键支撑的数据库。 了解产品详情:&nbsp;https://cn.aliyun.com/product/hbase &nbsp; ------------------------------------------------------------------------- 阿里云数据库体验:数据库上云实战 开发者云会免费提供一台带自建MySQL的源数据库&nbsp;ECS 实例和一台目标数据库&nbsp;RDS实例。跟着指引,您可以一步步实现将ECS自建数据库迁移到目标数据库RDS。 点击下方链接,领取免费ECS&amp;RDS资源,30分钟完成数据库上云实战!https://developer.aliyun.com/adc/scenario/51eefbd1894e42f6bb9acacadd3f9121?spm=a2c6h.13788135.J_3257954370.9.4ba85f24utseFl
目录
相关文章
|
3天前
|
Java Maven
maven依赖原则以及jar包冲突
该文介绍了Maven依赖原则:最短路径优先,申明顺序优先和覆写优先。当有冲突时,Maven选择路径最短的版本,按POM中声明顺序加载,并且子POM的依赖优先于父POM。解决冲突最佳方式是通过`mvn dependency:tree`检查依赖树并调整POM文件中的坐标顺序。
34 2
|
7月前
|
分布式计算 大数据 Hadoop
【大数据开发技术】实验03-Hadoop读取文件
【大数据开发技术】实验03-Hadoop读取文件
126 0
|
3天前
|
存储 XML Java
【Maven技术专题】「入门到精通」教你如何使用Maven中引用依赖本地Jar包,并进行打包输出
【Maven技术专题】「入门到精通」教你如何使用Maven中引用依赖本地Jar包,并进行打包输出
57 0
|
3天前
|
SQL 数据采集 Java
Java【代码分享 02】商品全部分类数据获取(建表语句+Jar包依赖+树结构封装+获取及解析源代码)包含csv和sql格式数据下载可用
Java【代码分享 02】商品全部分类数据获取(建表语句+Jar包依赖+树结构封装+获取及解析源代码)包含csv和sql格式数据下载可用
43 0
|
5月前
|
存储 分布式计算 Hadoop
Hadoop分块存储解析及还原分块存储的文件
Hadoop分块存储解析及还原分块存储的文件
27 0
|
5月前
|
分布式计算 Hadoop Linux
解决Hadoop在浏览器中Browse Directory,无法下载文件的问题
解决Hadoop在浏览器中Browse Directory,无法下载文件的问题
41 0
|
6月前
|
分布式计算 Hadoop Java
Hadoop学习笔记:运行wordcount对文件字符串进行统计案例
Hadoop学习笔记:运行wordcount对文件字符串进行统计案例
34 0
|
6月前
|
Java Maven 开发工具
解决Maven依赖本地jar包失败(可能原因之一)
解决Maven依赖本地jar包失败(可能原因之一)
125 0
|
3天前
|
Java Docker 容器
|
3天前
|
Java 测试技术 Maven
maven 打jar包:mvn clean package
maven 打jar包:mvn clean package
32 7

相关实验场景

更多