上一篇文章介绍了sqoop增量同步数据到hive，同时上一篇文章也给出了本人写的hadoop+hive+hbase+sqoop+kylin的伪分布式安装方法及使用和增量同步实现的连接，
本篇文章将介绍如何将上一篇文章介绍的增量方式同sqoop自带的job机制和shell脚本以及crontab结合起来实现自动增量同步的需求。

一、知识储备

sqoop job --help
usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [<tool-name>] [TOOL-ARGS]]
Job management arguments:
   --create <job-id> Create a new saved job
   --delete <job-id> Delete a saved job
   --exec <job-id> Run a saved job
   --help Print usage instructions
   --list List saved jobs
   --meta-connect <jdbc-uri> Specify JDBC connect string for the
                                metastore
   --show <job-id> Show the parameters for a saved job
   --verbose Print more information while working

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

二、详细实验

这里先来看一个根据时间戳增量append的job创建和执行的过程，然后再看merge-id方式。
1、先来创建一个增量追加的job：

[root@hadoop bin]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS --fields-terminated-by '\t' --li
nes-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'19/03/13 18:12:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/13 18:12:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
    at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
    at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)
    at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:399)
    at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.create(HsqldbJobStorage.java:379)
    at org.apache.sqoop.tool.JobTool.createJob(JobTool.java:181)
    at org.apache.sqoop.tool.JobTool.run(JobTool.java:294)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

报错了，sqoop缺少java-json.jar包。好吧，下载缺少的jar包然后上传到$SQOOP_HOME/lib，连接：

点此下载jar包
将下载好的jar包放到$SQOOP_HOME/lib下，然后重新创建：
先把之前创建失败的job删除了

[root@hadoop ~]# sqoop job --delete inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:40:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

创建job

[root@hadoop ~]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password "tiger" --table INR_LAS --fields-terminated-by '\t' --l
ines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:40:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/13 18:40:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

列出来刚刚创建的job

[root@hadoop ~]# sqoop job --list 
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:41:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Available jobs:
  inc_job

查看刚刚创建的job保存的last_value

[root@hadoop ~]# sqoop job --show inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:45:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password: -------这里输入的是执行这个job的系统用户密码。而执行的时候这里输入的是连接数据库的用户对应的密码
Job: inc_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-20 14:49:19
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
hbase.create.table = false
split.limit = null
null.string = \\N
db.require.password = true
skip.dist.cache = false
hdfs.append.dir = true
db.table = INR_LAS
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = scott
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = ETLTIME
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = true
codegen.input.delimiters.enclose = 0
hive.table.name = INR_LAS
accumulo.batch.size = 10240000
hive.database.name = oracle
hive.drop.delims = false
customtool.options.jsonmap = {}
null.non-string = \\N
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 9
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/1173d716481c4bd8f6cb589b87a382ea
direct.import = false
temporary.dirRoot = _sqoop
hive.fail.table.exists = false
db.batch = false

接下来手动执行

[root@hadoop ~]# sqoop job --exec inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:47:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password: ---------这里输入的是连接数据库的用户对应的密码，如何做到免密登录？向下继续看
19/03/13 18:47:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/13 18:47:50 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/13 18:47:50 INFO tool.CodeGenTool: Beginning code generation
19/03/13 18:47:51 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 18:47:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/13 18:47:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.jar
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:55 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS
19/03/13 18:47:55 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/13 18:47:55 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 18:47:55 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:55 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS
19/03/13 18:47:55 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/13 18:47:57 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/13 18:48:00 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/13 18:48:00 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 18:48:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0016
19/03/13 18:48:01 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0016
19/03/13 18:48:01 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0016/
19/03/13 18:48:01 INFO mapreduce.Job: Running job: job_1552469242276_0016
19/03/13 18:48:11 INFO mapreduce.Job: Job job_1552469242276_0016 running in uber mode : false
19/03/13 18:48:11 INFO mapreduce.Job:  map 0% reduce 0%
19/03/13 18:48:18 INFO mapreduce.Job:  map 100% reduce 0%
19/03/13 18:48:18 INFO mapreduce.Job: Job job_1552469242276_0016 completed successfully
19/03/13 18:48:19 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=144628
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=39
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=4454
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=4454
        Total vcore-milliseconds taken by all map tasks=4454
        Total megabyte-milliseconds taken by all map tasks=4560896
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=229
        CPU time spent (ms)=2430
        Physical memory (bytes) snapshot=191975424
        Virtual memory (bytes) snapshot=2143756288
        Total committed heap usage (bytes)=116916224
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=39
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3135 seconds (1.7478 bytes/sec)
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Retrieved 1 records.
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS
19/03/13 18:48:19 INFO util.AppendUtils: Creating missing output directory - INR_LAS
19/03/13 18:48:19 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:48:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 18:48:19 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
19/03/13 18:48:19 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
19/03/13 18:48:19 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive
19/03/13 18:48:19 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/13 18:48:19 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml

Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 18:48:22 INFO SessionState: 
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0/_tmp_space.db
19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main
19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO ql.Driver: Compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:25 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:25 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:25 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/13 18:48:25 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27
19/03/13 18:48:25 INFO ql.Driver: Semantic Analysis Completed
19/03/13 18:48:25 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 18:48:25 INFO ql.Driver: Completed compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 3.251 seconds
19/03/13 18:48:25 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 18:48:25 INFO ql.Driver: Executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=e09a2f96-2edd-4747-a65f-4899c2863aa
0, clientType=HIVECLI]19/03/13 18:48:25 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/13 18:48:25 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:26 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:26 INFO ql.Driver: Completed executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 0.113 seconds
OK
19/03/13 18:48:26 INFO ql.Driver: OK
Time taken: 3.379 seconds
19/03/13 18:48:26 INFO CliDriver: Time taken: 3.379 seconds
19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:26 INFO session.SessionState: Resetting thread name to  main
19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:26 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main
19/03/13 18:48:26 INFO ql.Driver: Compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2): 
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 18:48:26 INFO ql.Driver: Semantic Analysis Completed
19/03/13 18:48:26 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 18:48:26 INFO ql.Driver: Completed compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 0.426 seconds
19/03/13 18:48:26 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 18:48:26 INFO ql.Driver: Executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2): 
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 18:48:26 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.inr_las
19/03/13 18:48:26 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS
19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:26 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:26 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/13 18:48:27 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
19/03/13 18:48:27 INFO exec.StatsTask: Executing stats task
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:27 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:27 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:27 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=6, numRows=0, totalSize=518, rawDataSize=0]
19/03/13 18:48:27 INFO ql.Driver: Completed executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 1.225 seconds
OK
19/03/13 18:48:27 INFO ql.Driver: OK
Time taken: 1.653 seconds
19/03/13 18:48:27 INFO CliDriver: Time taken: 1.653 seconds
19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:27 INFO session.SessionState: Resetting thread name to  main
19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme hdfs
19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme file
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.HiveImport: Hive import complete.
19/03/13 18:48:27 INFO hive.HiveImport: Export directory is empty, removing it.
19/03/13 18:48:27 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/13 18:48:27 INFO tool.ImportTool: Updated data for job: inc_job

通过上面实验我们发现每次执行job时，都要输入数据库用户密码，怎么实现免密登录，可以参照这种方式：
在创建Job时，使用--password-file参数，而且非--passoword。主要原因是在执行Job时使用--password参数将有警告，并且需要输入密码才能执行Job。当我们采用--password-file参数时，执行Job无需输入数据库密码，所以我们修改一下上面创建的job语句：
先drop原来的job

[root@hadoop conf]# sqoop job --delete inc_job

创建password-file文件
注：sqoop规定密码文件必须放在HDFS之上，并且权限必须为400

[root@hadoop sqoop]# mkdir pwd
[root@hadoop sqoop]# cd pwd
[root@hadoop pwd]# pwd
/hadoop/sqoop/pwd
[root@hadoop pwd]# echo -n "tiger" > scott.pwd
[root@hadoop pwd]# hdfs dfs -put scott.pwd /user/hive/warehouse
[root@hadoop pwd]# hdfs dfs -chmod 400 /user/hive/warehouse/scott.pwd

重新创建，这里不在指定password而是passwordfile

[root@hadoop conf]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password-file /user/hive/warehouse/scott.pwd --table INR_LAS --fields-terminated-by '\t' 
--lines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'

验证，看下当前oracle数据库表：

select * from inr_las;
EMPNO        ENAME        JOB        SAL            ETLTIME
1            er        CLERK        800.00    2019/3/20 10:42:27
2            ALLEN    SALESMAN    1600.00    2019/3/20 10:42:27
3            WARD    SALESMAN    1250.00    2019/3/20 10:42:27
4            JONES    MANAGER        2975.00    2019/3/20 10:42:27
5            MARTIN    SALESMAN    1250.00    2019/3/20 10:42:27
6            zhao    DBA            1000.00    2019/3/20 10:52:34
7            yan        BI            100.00    2019/3/20 10:42:27
8            dong    JAVA        5232.00    2019/3/20 15:36:07

再看下当前hive表数据：

hive> select * from inr_las;
OK
1    er    CLERK    800.0    2019-03-20 10:42:27.0
2    ALLEN    SALESMAN    1600.0    2019-03-20 10:42:27.0
3    WARD    SALESMAN    1250.0    2019-03-20 10:42:27.0
4    JONES    MANAGER    2975.0    2019-03-20 10:42:27.0
5    MARTIN    SALESMAN    1250.0    2019-03-20 10:42:27.0
6    zhao    DBA    1000.0    2019-03-20 10:52:34.0
7    yan    BI    100.0    2019-03-20 10:42:27.0
8    dong    JAVA    332.0    2019-03-20 14:49:19.0
8    dong    JAVA    3232.0    2019-03-20 15:13:35.0
8    dong    JAVA    4232.0    2019-03-20 15:29:03.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    3232.0    2019-03-20 15:13:35.0
Time taken: 0.176 seconds, Fetched: 13 row(s)

我们job的增量时间设置的--last-value '2019-03-20 14:49:19'，源端有一条数据empno=8符合增量条件，现在再执行一下新创建的job：

[root@hadoop pwd]# sqoop job --exec inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 19:14:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/13 19:14:32 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/13 19:14:32 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/13 19:14:32 INFO tool.CodeGenTool: Beginning code generation
19/03/13 19:14:33 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 19:14:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/13 19:14:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.jar
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS
19/03/13 19:14:38 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/13 19:14:38 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 19:14:38 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS
19/03/13 19:14:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/13 19:14:38 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/13 19:14:42 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/13 19:14:42 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 19:14:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0017
19/03/13 19:14:42 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0017
19/03/13 19:14:43 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0017/
19/03/13 19:14:43 INFO mapreduce.Job: Running job: job_1552469242276_0017
19/03/13 19:14:53 INFO mapreduce.Job: Job job_1552469242276_0017 running in uber mode : false
19/03/13 19:14:53 INFO mapreduce.Job:  map 0% reduce 0%
19/03/13 19:15:00 INFO mapreduce.Job:  map 100% reduce 0%
19/03/13 19:15:00 INFO mapreduce.Job: Job job_1552469242276_0017 completed successfully
19/03/13 19:15:00 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=144775
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=39
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5332
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=5332
        Total vcore-milliseconds taken by all map tasks=5332
        Total megabyte-milliseconds taken by all map tasks=5459968
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=651
        CPU time spent (ms)=2670
        Physical memory (bytes) snapshot=188571648
        Virtual memory (bytes) snapshot=2148745216
        Total committed heap usage (bytes)=119537664
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=39
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3081 seconds (1.7482 bytes/sec)
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Retrieved 1 records.
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS
19/03/13 19:15:00 INFO util.AppendUtils: Creating missing output directory - INR_LAS
19/03/13 19:15:01 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:15:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 19:15:01 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
19/03/13 19:15:01 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
19/03/13 19:15:01 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive
19/03/13 19:15:01 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/13 19:15:01 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml

Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 19:15:04 INFO SessionState: 
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618/_tmp_space.db
19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main
19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO ql.Driver: Compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:09 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:09 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:09 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/13 19:15:09 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27
19/03/13 19:15:09 INFO ql.Driver: Semantic Analysis Completed
19/03/13 19:15:09 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 19:15:09 INFO ql.Driver: Completed compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 5.309 seconds
19/03/13 19:15:09 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 19:15:09 INFO ql.Driver: Executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=7feac288-289d-4d74-8641-553c5ab6561
8, clientType=HIVECLI]19/03/13 19:15:09 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/13 19:15:09 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:10 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:10 INFO ql.Driver: Completed executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 0.106 seconds
OK
19/03/13 19:15:10 INFO ql.Driver: OK
Time taken: 5.429 seconds
19/03/13 19:15:10 INFO CliDriver: Time taken: 5.429 seconds
19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:10 INFO session.SessionState: Resetting thread name to  main
19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:10 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main
19/03/13 19:15:10 INFO ql.Driver: Compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01): 
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 19:15:10 INFO ql.Driver: Semantic Analysis Completed
19/03/13 19:15:10 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 19:15:10 INFO ql.Driver: Completed compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 0.415 seconds
19/03/13 19:15:10 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 19:15:10 INFO ql.Driver: Executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01): 
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 19:15:10 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.inr_las
19/03/13 19:15:10 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS
19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:10 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:10 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/13 19:15:11 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
19/03/13 19:15:11 INFO exec.StatsTask: Executing stats task
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:11 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:11 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:11 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=7, numRows=0, totalSize=557, rawDataSize=0]
19/03/13 19:15:11 INFO ql.Driver: Completed executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 1.296 seconds
OK
19/03/13 19:15:11 INFO ql.Driver: OK
Time taken: 1.713 seconds
19/03/13 19:15:11 INFO CliDriver: Time taken: 1.713 seconds
19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:11 INFO session.SessionState: Resetting thread name to  main
19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme hdfs
19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme file
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.HiveImport: Hive import complete.
19/03/13 19:15:11 INFO hive.HiveImport: Export directory is empty, removing it.
19/03/13 19:15:11 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/13 19:15:11 INFO tool.ImportTool: Updated data for job: inc_job

发现已经不需要输入密码了，再来看下hive表数据：

hive> select * from inr_las;
OK
1    er    CLERK    800.0    2019-03-20 10:42:27.0
2    ALLEN    SALESMAN    1600.0    2019-03-20 10:42:27.0
3    WARD    SALESMAN    1250.0    2019-03-20 10:42:27.0
4    JONES    MANAGER    2975.0    2019-03-20 10:42:27.0
5    MARTIN    SALESMAN    1250.0    2019-03-20 10:42:27.0
6    zhao    DBA    1000.0    2019-03-20 10:52:34.0
7    yan    BI    100.0    2019-03-20 10:42:27.0
8    dong    JAVA    332.0    2019-03-20 14:49:19.0
8    dong    JAVA    3232.0    2019-03-20 15:13:35.0
8    dong    JAVA    4232.0    2019-03-20 15:29:03.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    3232.0    2019-03-20 15:13:35.0
Time taken: 0.161 seconds, Fetched: 14 row(s)

成14条数据了，多了条empno=8的数据，成功了。
不过笔者这里的需求是源端oracle数据库做了update之后，由于时间戳也会跟着变化，所以我们要求根据时间戳找出变更的数据然后在hive增量更新，这里就使用到了根据时间和主键进行合并增量的nerge-id模式，job的创建类似上面的例子.
这里的例子为：我们通过shell脚本进行封装让crontab 定时执行增量。
注意：先声明一下，因为笔者是测试增量导入给kylin做增量cube用，测试数据量很少，所以hive表只创建外部表，不在分区。
下面全流程演示如何一步步把一个表通过sqoop job结合crontab+shell脚本自动增量导入到hive：
第一步，先在oracle端创建一个要同步的表,这里用上面的inr_las表再初始化一个表：

create table inr_job as select a.empno,
                       a.ename,
                       a.job,
                       a.sal,
                       sysdate etltime  from inr_las a ;

第二步，在hive创建目标表：

hive> use oracle;
OK
Time taken: 1.425 seconds
create table INR_JOB
(
  empno   int,
  ename   string,
  job     string,
  sal     float,
  etltime string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
location '/user/hive/warehouse/exter_inr_job'; 
Time taken: 2.836 seconds

第三步，全量把数据导入hive：
先删除一下上面创建外部表时指定的目录，因为创建外部表时会自动创建目录，而下面的全量导入也会自动创建，因此会导致冲突提示目录存在的错误：

[root@hadoop hadoop]# hadoop fs -rmr /user/hive/warehouse/exter_inr_job
rmr: DEPRECATED: Please use 'rm -r' instead.
19/03/14 06:01:23 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minut
es, Emptier interval = 0 minutes.Deleted /user/hive/warehouse/exter_inr_job

接下来全量导入：

sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_JOB -m 1 --target-dir /user/hive/warehouse/exter_inr_job --fields-terminated-by '\t'

导入完成查询下hive数据：

hive> select * from inr_job;
OK
1    er    CLERK    800.0    2019-03-22 17:24:42.0
2    ALLEN    SALESMAN    1600.0    2019-03-22 17:24:42.0
3    WARD    SALESMAN    1250.0    2019-03-22 17:24:42.0
4    JONES    MANAGER    2975.0    2019-03-22 17:24:42.0
5    MARTIN    SALESMAN    1250.0    2019-03-22 17:24:42.0
6    zhao    DBA    1000.0    2019-03-22 17:24:42.0
7    yan    BI    100.0    2019-03-22 17:24:42.0
8    dong    JAVA    400.0    2019-03-22 17:24:42.0
Time taken: 3.153 seconds, Fetched: 8 row(s)

第四步，创建增量sqoop job
下面的--password-file /user/hive/warehouse/scott.pwd 是之前上一篇文章创建的密码文件，读者可以看下上篇文章如何创建的

 sqoop job --create auto_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott  --password-file /user/hive/warehouse/scott.pwd  --table INR_JOB --fields-terminated-by '\t' --lines-terminated-by '\n'  --target-dir /user/hive/warehouse/exter_inr_job -m 1 --check-column ETLTIME --incremental lastmodified --merge-key EMPNO --last-value "2019-03-22 17:24:42"

看下创建的job信息：

[root@hadoop hadoop]# sqoop  job --show auto_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/14 06:10:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/i
mpl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLogg
erBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLog
gerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: auto_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-22 17:24:42
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
split.limit = null
hbase.create.table = false
skip.dist.cache = false
hdfs.append.dir = false
db.table = INR_JOB
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = scott
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = ETLTIME
codegen.input.delimiters.record = 0
db.password.file = /user/hive/warehouse/scott.pwd
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 9
export.new.update = UpdateOnly
incremental.mode = DateLastModified
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/be3b358816e17c786d114afb7a4f2a6d
direct.import = false
temporary.dirRoot = _sqoop
hdfs.target.dir = /user/hive/warehouse/exter_inr_job
hive.fail.table.exists = false
merge.key.col = EMPNO
db.batch = false

第五步，封装到shell脚本，加入定时任务

[root@hadoop ~]# cd /hadoop/
[root@hadoop hadoop]# vim auto_inr.sh
加入下面内容：
#!/bin/bash
log="/hadoop/auto_job_log.log"
echo "======================`date "+%Y-%m-%d %H:%M:%S"`增量======================" >> $log
nohup sqoop job --exec auto_job >> $log 2>&1 &
保存退出，赋予权限
[root@hadoop hadoop]# chmod +x auto_inr.sh

先来手动执行一下，不过执行前先再看看job的last_value时间：

[root@hadoop hadoop]# sqoop job --show auto_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 17:50:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: auto_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-22 17:24:42
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl

看到是2019-03-22 17:24:42，再看下当前时间：

[root@hadoop hadoop]# date
Mon Mar 25 17:54:54 CST 2019

接下来手动执行下这个脚本：

[root@hadoop hadoop]# ./auto_inr.sh

然后去看重定向的日志：

[root@hadoop hadoop]# cat auto_job_log.log 
======================2019-03-25 17:55:46增量======================
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 17:55:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/25 17:55:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/25 17:55:50 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/25 17:55:50 INFO tool.CodeGenTool: Beginning code generation
19/03/25 17:55:51 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0
19/03/25 17:55:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/25 17:55:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.jar
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0
19/03/25 17:55:54 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/25 17:55:54 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-22 17:24:42', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO mapreduce.ImportJobBase: Beginning import of INR_JOB
19/03/25 17:55:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/25 17:55:54 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/25 17:55:57 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/25 17:55:57 INFO mapreduce.JobSubmitter: number of splits:1
19/03/25 17:55:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0009
19/03/25 17:55:58 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0009
19/03/25 17:55:58 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0009/
19/03/25 17:55:58 INFO mapreduce.Job: Running job: job_1553503985304_0009
19/03/25 17:56:07 INFO mapreduce.Job: Job job_1553503985304_0009 running in uber mode : false
19/03/25 17:56:07 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 17:56:15 INFO mapreduce.Job:  map 100% reduce 0%
19/03/25 17:56:15 INFO mapreduce.Job: Job job_1553503985304_0009 completed successfully
19/03/25 17:56:15 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=144775
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=323
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5270
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=5270
        Total vcore-milliseconds taken by all map tasks=5270
        Total megabyte-milliseconds taken by all map tasks=5396480
    Map-Reduce Framework
        Map input records=8
        Map output records=8
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=73
        CPU time spent (ms)=3000
        Physical memory (bytes) snapshot=205058048
        Virtual memory (bytes) snapshot=2135244800
        Total committed heap usage (bytes)=109576192
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=323
19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Transferred 323 bytes in 20.9155 seconds (15.4431 bytes/sec)
19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Retrieved 8 records.
19/03/25 17:56:15 INFO tool.ImportTool: Final destination exists, will run merge job.
19/03/25 17:56:15 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
19/03/25 17:56:15 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/25 17:56:18 INFO input.FileInputFormat: Total input paths to process : 2
19/03/25 17:56:18 INFO mapreduce.JobSubmitter: number of splits:2
19/03/25 17:56:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0010
19/03/25 17:56:19 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0010
19/03/25 17:56:19 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0010/
19/03/25 17:56:19 INFO mapreduce.Job: Running job: job_1553503985304_0010
19/03/25 17:56:29 INFO mapreduce.Job: Job job_1553503985304_0010 running in uber mode : false
19/03/25 17:56:29 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 17:56:39 INFO mapreduce.Job:  map 100% reduce 0%
19/03/25 17:56:50 INFO mapreduce.Job:  map 100% reduce 100%
19/03/25 17:56:50 INFO mapreduce.Job: Job job_1553503985304_0010 completed successfully
19/03/25 17:56:50 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=1090
        FILE: Number of bytes written=436771
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=942
        HDFS: Number of bytes written=323
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=14667
        Total time spent by all reduces in occupied slots (ms)=7258
        Total time spent by all map tasks (ms)=14667
        Total time spent by all reduce tasks (ms)=7258
        Total vcore-milliseconds taken by all map tasks=14667
        Total vcore-milliseconds taken by all reduce tasks=7258
        Total megabyte-milliseconds taken by all map tasks=15019008
        Total megabyte-milliseconds taken by all reduce tasks=7432192
    Map-Reduce Framework
        Map input records=16
        Map output records=16
        Map output bytes=1052
        Map output materialized bytes=1096
        Input split bytes=296
        Combine input records=0
        Combine output records=0
        Reduce input groups=8
        Reduce shuffle bytes=1096
        Reduce input records=16
        Reduce output records=8
        Spilled Records=32
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=230
        CPU time spent (ms)=5420
        Physical memory (bytes) snapshot=684474368
        Virtual memory (bytes) snapshot=6394597376
        Total committed heap usage (bytes)=511705088
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=646
    File Output Format Counters 
        Bytes Written=323
19/03/25 17:56:50 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/25 17:56:51 INFO tool.ImportTool: Updated data for job: auto_job

可以看到日志中这么一段话：

19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')

说明上限是当前时间，然后再看下当前job的last_value:

hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-25 17:55:54.0
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl

和上面日志中的时间一致，如果job内容需要更改时，可以删了job重建更改好的job,手动指定时间为日志中的Upper bound或删除前记录下面last_value,为重建后的job提供增量时间。
上面手动调用没问题。就剩最后一步crontab定时了，crontab -e加入下面一段话（每五分钟增量一次）：

*/2 * * * *  /hadoop/auto_inr.sh

如果一个表很大。我第一次初始化一部分最新的数据到hive表，如果没初始化进来的历史数据今天发生了变更，那merge-key的增量方式会不会报错呢？看下一篇测试文章吧，等写完会放链接到这：
https://blog.csdn.net/qq_28356739/article/details/88803284

【大数据开发运维解决方案】sqoop避免输入密码自动增量job脚本介绍

一、知识储备

二、详细实验

热门文章

最新文章

相关课程

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【大数据开发运维解决方案】sqoop避免输入密码自动增量job脚本介绍

一、知识储备

二、详细实验

热门文章

最新文章

相关课程

相关电子书

相关实验场景