Hadoop Job使用第三方依赖jar文件

本文涉及的产品
服务治理 MSE Sentinel/OpenSergo,Agent数量 不受限
注册配置 MSE Nacos/ZooKeeper,118元/月
云原生网关 MSE Higress,422元/月
简介:

当我们实现了一个Hadoop MapReduce Job以后,而这个Job可能又依赖很多外部的jar文件,在Hadoop集群上运行时,有时会出现找不到具体Class的异常。出现这种问题,基本上就是在Hadoop Job执行过程中,没有从执行的上下文中找到对应的jar文件(实际是unjar的目录,目录里面是对应的Class文件)。所以,我们自然而然想到,正确配置好对应的classpath,MapReduce Job运行时就能够找到。
有两种方式可以更好地实现,一种是设置HADOOP_CLASSPATH,将Job所依赖的jar文件加载到HADOOP_CLASSPATH,这种配置只针对该Job生效,Job结束之后HADOOP_CLASSPATH会被清理;另一种方式是,直接在构建代码的时候,将依赖jar文件与Job代码打成一个jar文件,这种方式可能会使得最终的jar文件比较大,但是结合一些代码构建工具,如Maven,可以在依赖控制方面保持一个Job一个依赖的构建配置,便于管理。下面,我们分别说明这两种方式。

设置HADOOP_CLASSPATH

比如,我们有一个使用HBase的应用,操作HBase数据库中表,肯定需要ZooKeeper,所以对应的jar文件的位置都要设置正确,让运行时Job能够检索并加载。
Hadoop实现里面,有个辅助工具类org.apache.hadoop.util.GenericOptionsParser,能够帮助我们加载对应的文件到classpath中,操作比较容易一些。
下面我们是我们实现的一个例子,程序执行入口的类,代码如下所示:

01 package org.shirdrn.kodz.inaction.hbase.job.importing;
02
03 import java.io.IOException;
04 import java.net.URISyntaxException;
05
06 import org.apache.hadoop.conf.Configuration;
07 import org.apache.hadoop.fs.Path;
08 import org.apache.hadoop.hbase.HBaseConfiguration;
09 import org.apache.hadoop.hbase.client.Put;
10 import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
11 import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
12 import org.apache.hadoop.mapreduce.Job;
13 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
14 import org.apache.hadoop.util.GenericOptionsParser;
15
16 /**
17 * Table DDL: create 't_sub_domains', 'cf_basic', 'cf_status'
18 * <pre>
19 * cf_basic:domain cf_basic:len
20 * cf_status:status cf_status:live
21 * </pre>
22 *
23 * @author shirdrn
24 */
25 public class DataImporter {
26
27 public static void main(String[] args)
28 throws IOException, InterruptedException, ClassNotFoundException, URISyntaxException {
29
30 Configuration conf = HBaseConfiguration.create();
31 String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
32
33 assert(otherArgs.length == 2);
34
35 if(otherArgs.length < 2) {
36 System.err.println("Usage: \n" +
37 " ImportDataDriver -libjars <jar1>[,<jar2>...[,<jarN>]] <tableName> <input>");
38 System.exit(1);
39 }
40 String tableName = otherArgs[0].trim();
41 String input = otherArgs[1].trim();
42
43 // set table columns
44 conf.set("table.cf.family", "cf_basic");
45 conf.set("table.cf.qualifier.fqdn", "domain");
46 conf.set("table.cf.qualifier.timestamp", "create_at");
47
48 Job job = new Job(conf, "Import into HBase table");
49 job.setJarByClass(DataImporter.class);
50 job.setMapperClass(ImportFileLinesMapper.class);
51 job.setOutputFormatClass(TableOutputFormat.class);
52
53 job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
54 job.setOutputKeyClass(ImmutableBytesWritable.class);
55 job.setOutputValueClass(Put.class);
56
57 job.setNumReduceTasks(0);
58
59 FileInputFormat.addInputPath(job, new Path(input));
60
61 int exitCode = job.waitForCompletion(true) ? 0 : 1;
62 System.exit(exitCode);
63 }
64
65 }

可以看到,我们可以通过-libjars选项来指定该Job运行所依赖的第三方jar文件,具体使用方法,说明如下:

  • 第一步:设置环境变量

我们修改.bashrc文件,增加如下配置内容:

1 export HADOOP_HOME=/opt/stone/cloud/hadoop-1.0.3
2 export PATH=$PATH:$HADOOP_HOME/bin
3 export HBASE_HOME=/opt/stone/cloud/hbase-0.94.1
4 export PATH=$PATH:$HBASE_HOME/bin
5 export ZK_HOME=/opt/stone/cloud/zookeeper-3.4.3

不要忘记要使当前的配置生效:

1 . .bashrc
2
3 source .bashrc

这样就可以方便地引用外部的jar文件了。

  • 第二步:确定Job依赖的jar文件列表

上面提到,我们要使用HBase,需要HBase和ZooKeeper的相关jar文件,用到的文件如下所示:

1 HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.1.jar:$ZK_HOME/zookeeper-3.4.3.jar ./bin/hadoop jar import-into-hbase.jar

设置当前Job执行的HADOOP_CLASSPATH变量,只对当前Job有效,所以没有必要在.bashrc中进行配置。

  • 第三步:运行开发的Job

运行我们开发的Job,通过命令行输入HADOOP_CLASSPATH变量,以及使用-libjars选项指定当前这个Job依赖的第三方jar文件,启动命令行如下所示:

1 xiaoxiang@ubuntu3:~/hadoop$ HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.1.jar:$ZK_HOME/zookeeper-3.4.3.jar ./bin/hadoop jar import-into-hbase.jar org.shirdrn.kodz.inaction.hbase.job.importing.ImportDataDriver -libjars $HBASE_HOME/hbase-0.94.1.jar,$HBASE_HOME/lib/protobuf-java-2.4.0a.jar,$ZK_HOME/zookeeper-3.4.3.jar t_sub_domains /user/xiaoxiang/datasets/domains/

需要注意的是,环境变量中内容使用冒号分隔,而-libjars选项中的内容使用逗号分隔。

这样,我们就能够正确运行开发的Job了。
下面看看我们开发的Job运行的结果:

001 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
002 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:host.name=ubuntu3
003 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_30
004 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
005 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_30/jre
006 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/stone/cloud/hadoop-1.0.3/libexec/../conf:/usr/java/jdk1.6.0_30/lib/tools.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/..:/opt/stone/cloud/hadoop-1.0.3/libexec/../hadoop-core-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/asm-3.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/aspectjrt-1.6.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/aspectjtools-1.6.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-beanutils-1.7.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-beanutils-core-1.8.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-cli-1.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-codec-1.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-collections-3.2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-configuration-1.6.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-daemon-1.0.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-digester-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-el-1.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-httpclient-3.0.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-io-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-lang-2.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-logging-1.1.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-logging-api-1.0.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-math-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-net-1.4.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/core-3.1.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-capacity-scheduler-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-datajoin-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-fairscheduler-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-thriftfs-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hsqldb-1.8.0.10.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jackson-core-asl-1.8.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jasper-compiler-5.5.12.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jasper-runtime-5.5.12.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jdeb-0.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-core-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-json-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-server-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jets3t-0.6.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jetty-6.1.26.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jetty-util-6.1.26.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsch-0.1.42.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/junit-4.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/kfs-0.2.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/log4j-1.2.15.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/mockito-all-1.8.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/oro-2.0.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/protobuf-java-2.4.0a.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/servlet-api-2.5-20081211.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/slf4j-api-1.4.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/slf4j-log4j12-1.4.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/xmlenc-0.52.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsp-2.1/jsp-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/opt/stone/cloud/hbase-0.94.1/hbase-0.94.1.jar:/opt/stone/cloud/zookeeper-3.4.3/zookeeper-3.4.3.jar
007 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/native/Linux-amd64-64
008 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
009 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
010 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
011 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
012 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.version=3.0.0-12-server
013 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.name=xiaoxiang
014 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/xiaoxiang
015 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.dir=/opt/stone/cloud/hadoop-1.0.3
016 13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ubuntu3:2222 sessionTimeout=180000 watcher=hconnection
017 13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Opening socket connection to server /172.0.8.252:2222
018 13/04/10 22:03:32 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 17561@ubuntu3
019 13/04/10 22:03:32 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
020 13/04/10 22:03:32 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
021 13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Socket connection established to ubuntu3/172.0.8.252:2222, initiating session
022 13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Session establishment complete on server ubuntu3/172.0.8.252:2222, sessionid = 0x13decd0f3960042, negotiated timeout = 180000
023 13/04/10 22:03:32 INFO mapreduce.TableOutputFormat: Created table instance for t_sub_domains
024 13/04/10 22:03:32 INFO input.FileInputFormat: Total input paths to process : 1
025 13/04/10 22:03:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library
026 13/04/10 22:03:32 WARN snappy.LoadSnappy: Snappy native library not loaded
027 13/04/10 22:03:32 INFO mapred.JobClient: Running job: job_201303302227_0034
028 13/04/10 22:03:33 INFO mapred.JobClient: map 0% reduce 0%
029 13/04/10 22:03:50 INFO mapred.JobClient: map 2% reduce 0%
030 13/04/10 22:03:53 INFO mapred.JobClient: map 3% reduce 0%
031 13/04/10 22:03:56 INFO mapred.JobClient: map 4% reduce 0%
032 13/04/10 22:03:59 INFO mapred.JobClient: map 6% reduce 0%
033 13/04/10 22:04:03 INFO mapred.JobClient: map 7% reduce 0%
034 13/04/10 22:04:06 INFO mapred.JobClient: map 8% reduce 0%
035 13/04/10 22:04:09 INFO mapred.JobClient: map 10% reduce 0%
036 13/04/10 22:04:15 INFO mapred.JobClient: map 12% reduce 0%
037 13/04/10 22:04:18 INFO mapred.JobClient: map 13% reduce 0%
038 13/04/10 22:04:21 INFO mapred.JobClient: map 14% reduce 0%
039 13/04/10 22:04:24 INFO mapred.JobClient: map 15% reduce 0%
040 13/04/10 22:04:27 INFO mapred.JobClient: map 17% reduce 0%
041 13/04/10 22:04:33 INFO mapred.JobClient: map 18% reduce 0%
042 13/04/10 22:04:36 INFO mapred.JobClient: map 19% reduce 0%
043 13/04/10 22:04:39 INFO mapred.JobClient: map 20% reduce 0%
044 13/04/10 22:04:42 INFO mapred.JobClient: map 21% reduce 0%
045 13/04/10 22:04:45 INFO mapred.JobClient: map 23% reduce 0%
046 13/04/10 22:04:48 INFO mapred.JobClient: map 24% reduce 0%
047 13/04/10 22:04:51 INFO mapred.JobClient: map 25% reduce 0%
048 13/04/10 22:04:54 INFO mapred.JobClient: map 27% reduce 0%
049 13/04/10 22:04:57 INFO mapred.JobClient: map 28% reduce 0%
050 13/04/10 22:05:00 INFO mapred.JobClient: map 29% reduce 0%
051 13/04/10 22:05:03 INFO mapred.JobClient: map 31% reduce 0%
052 13/04/10 22:05:06 INFO mapred.JobClient: map 32% reduce 0%
053 13/04/10 22:05:09 INFO mapred.JobClient: map 33% reduce 0%
054 13/04/10 22:05:12 INFO mapred.JobClient: map 34% reduce 0%
055 13/04/10 22:05:15 INFO mapred.JobClient: map 35% reduce 0%
056 13/04/10 22:05:18 INFO mapred.JobClient: map 37% reduce 0%
057 13/04/10 22:05:21 INFO mapred.JobClient: map 38% reduce 0%
058 13/04/10 22:05:24 INFO mapred.JobClient: map 39% reduce 0%
059 13/04/10 22:05:27 INFO mapred.JobClient: map 41% reduce 0%
060 13/04/10 22:05:30 INFO mapred.JobClient: map 42% reduce 0%
061 13/04/10 22:05:33 INFO mapred.JobClient: map 43% reduce 0%
062 13/04/10 22:05:36 INFO mapred.JobClient: map 44% reduce 0%
063 13/04/10 22:05:39 INFO mapred.JobClient: map 46% reduce 0%
064 13/04/10 22:05:42 INFO mapred.JobClient: map 47% reduce 0%
065 13/04/10 22:05:45 INFO mapred.JobClient: map 48% reduce 0%
066 13/04/10 22:05:48 INFO mapred.JobClient: map 50% reduce 0%
067 13/04/10 22:05:54 INFO mapred.JobClient: map 52% reduce 0%
068 13/04/10 22:05:57 INFO mapred.JobClient: map 53% reduce 0%
069 13/04/10 22:06:00 INFO mapred.JobClient: map 54% reduce 0%
070 13/04/10 22:06:03 INFO mapred.JobClient: map 55% reduce 0%
071 13/04/10 22:06:06 INFO mapred.JobClient: map 57% reduce 0%
072 13/04/10 22:06:12 INFO mapred.JobClient: map 59% reduce 0%
073 13/04/10 22:06:15 INFO mapred.JobClient: map 60% reduce 0%
074 13/04/10 22:06:18 INFO mapred.JobClient: map 61% reduce 0%
075 13/04/10 22:06:21 INFO mapred.JobClient: map 62% reduce 0%
076 13/04/10 22:06:24 INFO mapred.JobClient: map 63% reduce 0%
077 13/04/10 22:06:27 INFO mapred.JobClient: map 64% reduce 0%
078 13/04/10 22:06:30 INFO mapred.JobClient: map 66% reduce 0%
079 13/04/10 22:06:33 INFO mapred.JobClient: map 67% reduce 0%
080 13/04/10 22:06:36 INFO mapred.JobClient: map 68% reduce 0%
081 13/04/10 22:06:42 INFO mapred.JobClient: map 69% reduce 0%
082 13/04/10 22:06:45 INFO mapred.JobClient: map 70% reduce 0%
083 13/04/10 22:06:48 INFO mapred.JobClient: map 71% reduce 0%
084 13/04/10 22:06:51 INFO mapred.JobClient: map 73% reduce 0%
085 13/04/10 22:06:54 INFO mapred.JobClient: map 74% reduce 0%
086 13/04/10 22:06:57 INFO mapred.JobClient: map 75% reduce 0%
087 13/04/10 22:07:00 INFO mapred.JobClient: map 77% reduce 0%
088 13/04/10 22:07:03 INFO mapred.JobClient: map 78% reduce 0%
089 13/04/10 22:07:12 INFO mapred.JobClient: map 79% reduce 0%
090 13/04/10 22:07:18 INFO mapred.JobClient: map 80% reduce 0%
091 13/04/10 22:07:24 INFO mapred.JobClient: map 81% reduce 0%
092 13/04/10 22:07:30 INFO mapred.JobClient: map 82% reduce 0%
093 13/04/10 22:07:36 INFO mapred.JobClient: map 83% reduce 0%
094 13/04/10 22:07:48 INFO mapred.JobClient: map 84% reduce 0%
095 13/04/10 22:07:51 INFO mapred.JobClient: map 85% reduce 0%
096 13/04/10 22:07:59 INFO mapred.JobClient: map 86% reduce 0%
097 13/04/10 22:08:05 INFO mapred.JobClient: map 87% reduce 0%
098 13/04/10 22:08:11 INFO mapred.JobClient: map 88% reduce 0%
099 13/04/10 22:08:17 INFO mapred.JobClient: map 89% reduce 0%
100 13/04/10 22:08:23 INFO mapred.JobClient: map 90% reduce 0%
101 13/04/10 22:08:29 INFO mapred.JobClient: map 91% reduce 0%
102 13/04/10 22:08:35 INFO mapred.JobClient: map 92% reduce 0%
103 13/04/10 22:08:41 INFO mapred.JobClient: map 93% reduce 0%
104 13/04/10 22:08:47 INFO mapred.JobClient: map 94% reduce 0%
105 13/04/10 22:08:53 INFO mapred.JobClient: map 95% reduce 0%
106 13/04/10 22:08:59 INFO mapred.JobClient: map 96% reduce 0%
107 13/04/10 22:09:05 INFO mapred.JobClient: map 97% reduce 0%
108 13/04/10 22:09:11 INFO mapred.JobClient: map 98% reduce 0%
109 13/04/10 22:09:17 INFO mapred.JobClient: map 99% reduce 0%
110 13/04/10 22:09:23 INFO mapred.JobClient: map 100% reduce 0%
111 13/04/10 22:09:31 INFO mapred.JobClient: Job complete: job_201303302227_0034
112 13/04/10 22:09:31 INFO mapred.JobClient: Counters: 18
113 13/04/10 22:09:31 INFO mapred.JobClient: Job Counters
114 13/04/10 22:09:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=550605
115 13/04/10 22:09:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
116 13/04/10 22:09:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
117 13/04/10 22:09:31 INFO mapred.JobClient: Launched map tasks=2
118 13/04/10 22:09:31 INFO mapred.JobClient: Data-local map tasks=2
119 13/04/10 22:09:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
120 13/04/10 22:09:31 INFO mapred.JobClient: File Output Format Counters
121 13/04/10 22:09:31 INFO mapred.JobClient: Bytes Written=0
122 13/04/10 22:09:31 INFO mapred.JobClient: FileSystemCounters
123 13/04/10 22:09:31 INFO mapred.JobClient: HDFS_BYTES_READ=104394990
124 13/04/10 22:09:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=64078
125 13/04/10 22:09:31 INFO mapred.JobClient: File Input Format Counters
126 13/04/10 22:09:31 INFO mapred.JobClient: Bytes Read=104394710
127 13/04/10 22:09:31 INFO mapred.JobClient: Map-Reduce Framework
128 13/04/10 22:09:31 INFO mapred.JobClient: Map input records=4995670
129 13/04/10 22:09:31 INFO mapred.JobClient: Physical memory (bytes) snapshot=279134208
130 13/04/10 22:09:31 INFO mapred.JobClient: Spilled Records=0
131 13/04/10 22:09:31 INFO mapred.JobClient: CPU time spent (ms)=129130
132 13/04/10 22:09:31 INFO mapred.JobClient: Total committed heap usage (bytes)=202833920
133 13/04/10 22:09:31 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1170251776
134 13/04/10 22:09:31 INFO mapred.JobClient: Map output records=4995670
135 13/04/10 22:09:31 INFO mapred.JobClient: SPLIT_RAW_BYTES=280

可以看到,除了加载Hadoop对应的HADOOP_HOME变量指定的路径下,lib*目录下的jar文件以外,还加载了我们设置的-libjars选项中指定的第三方jar文件,供Job运行时使用。

将Job代码和依赖jar文件打包

我比较喜欢这种方式,因为这样做首先利用饿Maven的很多优点,如管理依赖、自动构建。另外,对于其他想要使用该Job的开发人员或部署人员,无需关系更多的配置,只要按照Maven的构建规则去构建,就可以生成最终的部署文件,从而也就减少了在执行Job的时候,出现各种常见的问题(如CLASSPATH设置有问题等)。
使用如下的Maven构建插件配置,执行mvn package命令,就可以完成这些任务:

01 <build>
02 <plugins>
03 <plugin>
04 <artifactId>maven-assembly-plugin</artifactId>
05 <configuration>
06 <archive>
07 <manifest>
08 <mainClass>org.shirdrn.solr.cloud.index.hadoop.SolrCloudIndexer</mainClass>
09 </manifest>
10 </archive>
11 <descriptorRefs>
12 <descriptorRef>jar-with-dependencies</descriptorRef>
13 </descriptorRefs>
14 </configuration>
15 <executions>
16 <execution>
17 <id>make-assembly</id>
18 <phase>package</phase>
19 <goals>
20 <goal>single</goal>
21 </goals>
22 </execution>
23 </executions>
24 </plugin>
25 </plugins>
26 </build>

最后生成的jar文件在target目录下面,例如名称类似solr-platform-2.0-jar-with-dependencies.jar,然后可以直接拷贝这个文件到指定的目录,提交到Hadoop计算集群运行。

相关实践学习
lindorm多模间数据无缝流转
展现了Lindorm多模融合能力——用kafka API写入,无缝流转在各引擎内进行数据存储和计算的实验。
云数据库HBase版使用教程
&nbsp; 相关的阿里云产品:云数据库 HBase 版 面向大数据领域的一站式NoSQL服务,100%兼容开源HBase并深度扩展,支持海量数据下的实时存储、高并发吞吐、轻SQL分析、全文检索、时序时空查询等能力,是风控、推荐、广告、物联网、车联网、Feeds流、数据大屏等场景首选数据库,是为淘宝、支付宝、菜鸟等众多阿里核心业务提供关键支撑的数据库。 了解产品详情:&nbsp;https://cn.aliyun.com/product/hbase &nbsp; ------------------------------------------------------------------------- 阿里云数据库体验:数据库上云实战 开发者云会免费提供一台带自建MySQL的源数据库&nbsp;ECS 实例和一台目标数据库&nbsp;RDS实例。跟着指引,您可以一步步实现将ECS自建数据库迁移到目标数据库RDS。 点击下方链接,领取免费ECS&amp;RDS资源,30分钟完成数据库上云实战!https://developer.aliyun.com/adc/scenario/51eefbd1894e42f6bb9acacadd3f9121?spm=a2c6h.13788135.J_3257954370.9.4ba85f24utseFl
目录
相关文章
|
2月前
|
Java 开发者
修改JAR文件工具
本文介绍了一款名为JarEditor的IDEA插件,该插件允许用户直接对JAR包内的文件进行增删改查操作,无需先行解压。通过简单的安装与使用步骤,大大简化了传统上需要解压缩、反编译、重新编译及打包的过程。此外,JarEditor还支持对混淆过的JAR文件进行字节码级别的修改,并提供了强大的搜索功能,支持大小写、全词匹配和正则表达式搜索。对于开发者而言,这款插件无疑极大提高了处理JAR文件的效率和便捷性。
80 14
|
3月前
Hadoop-09-HDFS集群 JavaClient 代码上手实战!详细附代码 安装依赖 上传下载文件 扫描列表 PUT GET 进度条显示(二)
Hadoop-09-HDFS集群 JavaClient 代码上手实战!详细附代码 安装依赖 上传下载文件 扫描列表 PUT GET 进度条显示(二)
52 3
|
3月前
|
分布式计算 Java Hadoop
Hadoop-09-HDFS集群 JavaClient 代码上手实战!详细附代码 安装依赖 上传下载文件 扫描列表 PUT GET 进度条显示(一)
Hadoop-09-HDFS集群 JavaClient 代码上手实战!详细附代码 安装依赖 上传下载文件 扫描列表 PUT GET 进度条显示(一)
51 2
|
3月前
|
分布式计算 Hadoop 网络安全
Hadoop-08-HDFS集群 基础知识 命令行上机实操 hadoop fs 分布式文件系统 读写原理 读流程与写流程 基本语法上传下载拷贝移动文件
Hadoop-08-HDFS集群 基础知识 命令行上机实操 hadoop fs 分布式文件系统 读写原理 读流程与写流程 基本语法上传下载拷贝移动文件
48 1
|
3月前
|
存储 机器学习/深度学习 缓存
Hadoop-07-HDFS集群 基础知识 分布式文件系统 读写原理 读流程与写流程 基本语法上传下载拷贝移动文件
Hadoop-07-HDFS集群 基础知识 分布式文件系统 读写原理 读流程与写流程 基本语法上传下载拷贝移动文件
60 1
|
3月前
|
Java Windows
如何在windows上运行jar包/JAR文件 如何在cmd上运行 jar包 保姆级教程 超详细
本文提供了一个详细的教程,解释了如何在Windows操作系统的命令提示符(cmd)中运行JAR文件。
1325 1
|
6月前
|
分布式计算 Hadoop Java
Hadoop编辑hadoop-env.sh文件
【7月更文挑战第19天】
374 5
|
6月前
|
分布式计算 Hadoop 关系型数据库
实时计算 Flink版操作报错合集之Hadoop在将文件写入HDFS时,无法在所有指定的数据节点上进行复制,该如何解决
在使用实时计算Flink版过程中,可能会遇到各种错误,了解这些错误的原因及解决方法对于高效排错至关重要。针对具体问题,查看Flink的日志是关键,它们通常会提供更详细的错误信息和堆栈跟踪,有助于定位问题。此外,Flink社区文档和官方论坛也是寻求帮助的好去处。以下是一些常见的操作报错及其可能的原因与解决策略。
|
3月前
|
分布式计算 Kubernetes Hadoop
大数据-82 Spark 集群模式启动、集群架构、集群管理器 Spark的HelloWorld + Hadoop + HDFS
大数据-82 Spark 集群模式启动、集群架构、集群管理器 Spark的HelloWorld + Hadoop + HDFS
205 6
|
3月前
|
分布式计算 资源调度 Hadoop
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
91 2