【大数据开发运维解决方案】sqoop避免输入密码自动增量job脚本介绍

本文涉及的产品
日志服务 SLS,月写入数据量 50GB 1个月
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
云原生大数据计算服务MaxCompute,500CU*H 100GB 3个月
简介: 上一篇文章介绍了sqoop增量同步数据到hive,同时上一篇文章也给出了本人写的hadoop+hive+hbase+sqoop+kylin的伪分布式安装方法及使用和增量同步实现的连接,本篇文章将介绍如何将上一篇文章介绍的增量方式同sqoop自带的job机制和shell脚本以及crontab结合起来实现自动增量同步的需求。

上一篇文章介绍了sqoop增量同步数据到hive,同时上一篇文章也给出了本人写的hadoop+hive+hbase+sqoop+kylin的伪分布式安装方法及使用和增量同步实现的连接,
本篇文章将介绍如何将上一篇文章介绍的增量方式同sqoop自带的job机制和shell脚本以及crontab结合起来实现自动增量同步的需求。

一、知识储备

sqoop job --help
usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [<tool-name>] [TOOL-ARGS]]
Job management arguments:
   --create <job-id> Create a new saved job
   --delete <job-id> Delete a saved job
   --exec <job-id> Run a saved job
   --help Print usage instructions
   --list List saved jobs
   --meta-connect <jdbc-uri> Specify JDBC connect string for the
                                metastore
   --show <job-id> Show the parameters for a saved job
   --verbose Print more information while working

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

二、详细实验

这里先来看一个根据时间戳增量append的job创建和执行的过程,然后再看merge-id方式。
1、先来创建一个增量追加的job:

[root@hadoop bin]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_LAS --fields-terminated-by '\t' --li
nes-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'19/03/13 18:12:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/03/13 18:12:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
    at org.apache.sqoop.util.SqoopJsonUtil.getJsonStringforMap(SqoopJsonUtil.java:43)
    at org.apache.sqoop.SqoopOptions.writeProperties(SqoopOptions.java:785)
    at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:399)
    at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.create(HsqldbJobStorage.java:379)
    at org.apache.sqoop.tool.JobTool.createJob(JobTool.java:181)
    at org.apache.sqoop.tool.JobTool.run(JobTool.java:294)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
Caused by: java.lang.ClassNotFoundException: org.json.JSONObject
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

报错了,sqoop缺少java-json.jar包。好吧,下载缺少的jar包然后上传到$SQOOP_HOME/lib,连接:

点此下载jar包
将下载好的jar包放到$SQOOP_HOME/lib下,然后重新创建:
先把之前创建失败的job删除了

[root@hadoop ~]# sqoop job --delete inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:40:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

创建job

[root@hadoop ~]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password "tiger" --table INR_LAS --fields-terminated-by '\t' --l
ines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:40:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/13 18:40:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

列出来刚刚创建的job

[root@hadoop ~]# sqoop job --list 
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:41:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Available jobs:
  inc_job

查看刚刚创建的job保存的last_value

[root@hadoop ~]# sqoop job --show inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:45:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password: -------这里输入的是执行这个job的系统用户密码。而执行的时候这里输入的是连接数据库的用户对应的密码
Job: inc_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-20 14:49:19
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
hbase.create.table = false
split.limit = null
null.string = \\N
db.require.password = true
skip.dist.cache = false
hdfs.append.dir = true
db.table = INR_LAS
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = scott
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = ETLTIME
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = true
codegen.input.delimiters.enclose = 0
hive.table.name = INR_LAS
accumulo.batch.size = 10240000
hive.database.name = oracle
hive.drop.delims = false
customtool.options.jsonmap = {}
null.non-string = \\N
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 9
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/1173d716481c4bd8f6cb589b87a382ea
direct.import = false
temporary.dirRoot = _sqoop
hive.fail.table.exists = false
db.batch = false

接下来手动执行

[root@hadoop ~]# sqoop job --exec inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 18:47:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Enter password: ---------这里输入的是连接数据库的用户对应的密码,如何做到免密登录?向下继续看
19/03/13 18:47:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/13 18:47:50 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/13 18:47:50 INFO tool.CodeGenTool: Beginning code generation
19/03/13 18:47:51 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 18:47:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/13 18:47:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f383a7cc7d1bc4f9665748405ec5dec2/INR_LAS.jar
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:55 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS
19/03/13 18:47:55 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/13 18:47:55 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 18:47:55 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:55 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS
19/03/13 18:47:55 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/13 18:47:55 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:47:56 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/13 18:47:57 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/13 18:48:00 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/13 18:48:00 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 18:48:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0016
19/03/13 18:48:01 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0016
19/03/13 18:48:01 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0016/
19/03/13 18:48:01 INFO mapreduce.Job: Running job: job_1552469242276_0016
19/03/13 18:48:11 INFO mapreduce.Job: Job job_1552469242276_0016 running in uber mode : false
19/03/13 18:48:11 INFO mapreduce.Job:  map 0% reduce 0%
19/03/13 18:48:18 INFO mapreduce.Job:  map 100% reduce 0%
19/03/13 18:48:18 INFO mapreduce.Job: Job job_1552469242276_0016 completed successfully
19/03/13 18:48:19 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=144628
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=39
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=4454
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=4454
        Total vcore-milliseconds taken by all map tasks=4454
        Total megabyte-milliseconds taken by all map tasks=4560896
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=229
        CPU time spent (ms)=2430
        Physical memory (bytes) snapshot=191975424
        Virtual memory (bytes) snapshot=2143756288
        Total committed heap usage (bytes)=116916224
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=39
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3135 seconds (1.7478 bytes/sec)
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Retrieved 1 records.
19/03/13 18:48:19 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS
19/03/13 18:48:19 INFO util.AppendUtils: Creating missing output directory - INR_LAS
19/03/13 18:48:19 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 18:48:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 18:48:19 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
19/03/13 18:48:19 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
19/03/13 18:48:19 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive
19/03/13 18:48:19 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/13 18:48:19 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml

Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 18:48:22 INFO SessionState: 
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0/_tmp_space.db
19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main
19/03/13 18:48:22 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:22 INFO ql.Driver: Compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:25 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:25 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:25 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/13 18:48:25 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27
19/03/13 18:48:25 INFO ql.Driver: Semantic Analysis Completed
19/03/13 18:48:25 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 18:48:25 INFO ql.Driver: Completed compiling command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 3.251 seconds
19/03/13 18:48:25 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 18:48:25 INFO ql.Driver: Executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 10:48:19' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 18:48:25 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=e09a2f96-2edd-4747-a65f-4899c2863aa
0, clientType=HIVECLI]19/03/13 18:48:25 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/13 18:48:25 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:26 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:26 INFO ql.Driver: Completed executing command(queryId=root_20190313104822_91cdb575-b0c9-4533-916c-247304d39b46); Time taken: 0.113 seconds
OK
19/03/13 18:48:26 INFO ql.Driver: OK
Time taken: 3.379 seconds
19/03/13 18:48:26 INFO CliDriver: Time taken: 3.379 seconds
19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:26 INFO session.SessionState: Resetting thread name to  main
19/03/13 18:48:26 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:26 INFO session.SessionState: Updating thread name to e09a2f96-2edd-4747-a65f-4899c2863aa0 main
19/03/13 18:48:26 INFO ql.Driver: Compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2): 
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 18:48:26 INFO ql.Driver: Semantic Analysis Completed
19/03/13 18:48:26 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 18:48:26 INFO ql.Driver: Completed compiling command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 0.426 seconds
19/03/13 18:48:26 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 18:48:26 INFO ql.Driver: Executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2): 
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 18:48:26 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/13 18:48:26 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.inr_las
19/03/13 18:48:26 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS
19/03/13 18:48:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:26 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:26 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:26 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/13 18:48:27 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
19/03/13 18:48:27 INFO exec.StatsTask: Executing stats task
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:27 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 18:48:27 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 18:48:27 INFO hive.metastore: Connected to metastore.
19/03/13 18:48:27 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=6, numRows=0, totalSize=518, rawDataSize=0]
19/03/13 18:48:27 INFO ql.Driver: Completed executing command(queryId=root_20190313104826_5da0b171-d4e8-41c5-83ef-bdcffec0fea2); Time taken: 1.225 seconds
OK
19/03/13 18:48:27 INFO ql.Driver: OK
Time taken: 1.653 seconds
19/03/13 18:48:27 INFO CliDriver: Time taken: 1.653 seconds
19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:27 INFO session.SessionState: Resetting thread name to  main
19/03/13 18:48:27 INFO conf.HiveConf: Using the default value passed in for log id: e09a2f96-2edd-4747-a65f-4899c2863aa0
19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /tmp/hive/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme hdfs
19/03/13 18:48:27 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/e09a2f96-2edd-4747-a65f-4899c2863aa0 on fs with scheme file
19/03/13 18:48:27 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 18:48:27 INFO hive.HiveImport: Hive import complete.
19/03/13 18:48:27 INFO hive.HiveImport: Export directory is empty, removing it.
19/03/13 18:48:27 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/13 18:48:27 INFO tool.ImportTool: Updated data for job: inc_job

通过上面实验我们发现每次执行job时,都要输入数据库用户密码,怎么实现免密登录,可以参照这种方式:
在创建Job时,使用--password-file参数,而且非--passoword。主要原因是在执行Job时使用--password参数将有警告,并且需要输入密码才能执行Job。当我们采用--password-file参数时,执行Job无需输入数据库密码,所以我们修改一下上面创建的job语句:
先drop原来的job

[root@hadoop conf]# sqoop job --delete inc_job

创建password-file文件
注:sqoop规定密码文件必须放在HDFS之上,并且权限必须为400

[root@hadoop sqoop]# mkdir pwd
[root@hadoop sqoop]# cd pwd
[root@hadoop pwd]# pwd
/hadoop/sqoop/pwd
[root@hadoop pwd]# echo -n "tiger" > scott.pwd
[root@hadoop pwd]# hdfs dfs -put scott.pwd /user/hive/warehouse
[root@hadoop pwd]# hdfs dfs -chmod 400 /user/hive/warehouse/scott.pwd

重新创建,这里不在指定password而是passwordfile

[root@hadoop conf]# sqoop job --create inc_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username "scott" --password-file /user/hive/warehouse/scott.pwd --table INR_LAS --fields-terminated-by '\t' 
--lines-terminated-by '\n' --hive-import --hive-database oracle --hive-table INR_LAS --incremental append --check-column ETLTIME --last-value '2019-03-20 14:49:19' -m 1 --null-string '\\N' --null-non-string '\\N'

验证,看下当前oracle数据库表:

select * from inr_las;
EMPNO        ENAME        JOB        SAL            ETLTIME
1            er        CLERK        800.00    2019/3/20 10:42:27
2            ALLEN    SALESMAN    1600.00    2019/3/20 10:42:27
3            WARD    SALESMAN    1250.00    2019/3/20 10:42:27
4            JONES    MANAGER        2975.00    2019/3/20 10:42:27
5            MARTIN    SALESMAN    1250.00    2019/3/20 10:42:27
6            zhao    DBA            1000.00    2019/3/20 10:52:34
7            yan        BI            100.00    2019/3/20 10:42:27
8            dong    JAVA        5232.00    2019/3/20 15:36:07

再看下当前hive表数据:

hive> select * from inr_las;
OK
1    er    CLERK    800.0    2019-03-20 10:42:27.0
2    ALLEN    SALESMAN    1600.0    2019-03-20 10:42:27.0
3    WARD    SALESMAN    1250.0    2019-03-20 10:42:27.0
4    JONES    MANAGER    2975.0    2019-03-20 10:42:27.0
5    MARTIN    SALESMAN    1250.0    2019-03-20 10:42:27.0
6    zhao    DBA    1000.0    2019-03-20 10:52:34.0
7    yan    BI    100.0    2019-03-20 10:42:27.0
8    dong    JAVA    332.0    2019-03-20 14:49:19.0
8    dong    JAVA    3232.0    2019-03-20 15:13:35.0
8    dong    JAVA    4232.0    2019-03-20 15:29:03.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    3232.0    2019-03-20 15:13:35.0
Time taken: 0.176 seconds, Fetched: 13 row(s)

我们job的增量时间设置的--last-value '2019-03-20 14:49:19',源端有一条数据empno=8符合增量条件,现在再执行一下新创建的job:

[root@hadoop pwd]# sqoop job --exec inc_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/13 19:14:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/13 19:14:32 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/13 19:14:32 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/13 19:14:32 INFO tool.CodeGenTool: Beginning code generation
19/03/13 19:14:33 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 19:14:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/13 19:14:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/8df9a3027ead0f69733bef4c331c8f15/INR_LAS.jar
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(ETLTIME) FROM INR_LAS
19/03/13 19:14:38 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/13 19:14:38 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-20 14:49:19', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 19:14:38 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-20 15:36:07.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO mapreduce.ImportJobBase: Beginning import of INR_LAS
19/03/13 19:14:38 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/13 19:14:38 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:14:38 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/13 19:14:38 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/13 19:14:42 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/13 19:14:42 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 19:14:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552469242276_0017
19/03/13 19:14:42 INFO impl.YarnClientImpl: Submitted application application_1552469242276_0017
19/03/13 19:14:43 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1552469242276_0017/
19/03/13 19:14:43 INFO mapreduce.Job: Running job: job_1552469242276_0017
19/03/13 19:14:53 INFO mapreduce.Job: Job job_1552469242276_0017 running in uber mode : false
19/03/13 19:14:53 INFO mapreduce.Job:  map 0% reduce 0%
19/03/13 19:15:00 INFO mapreduce.Job:  map 100% reduce 0%
19/03/13 19:15:00 INFO mapreduce.Job: Job job_1552469242276_0017 completed successfully
19/03/13 19:15:00 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=144775
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=39
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5332
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=5332
        Total vcore-milliseconds taken by all map tasks=5332
        Total megabyte-milliseconds taken by all map tasks=5459968
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=651
        CPU time spent (ms)=2670
        Physical memory (bytes) snapshot=188571648
        Virtual memory (bytes) snapshot=2148745216
        Total committed heap usage (bytes)=119537664
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=39
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Transferred 39 bytes in 22.3081 seconds (1.7482 bytes/sec)
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Retrieved 1 records.
19/03/13 19:15:00 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table INR_LAS
19/03/13 19:15:00 INFO util.AppendUtils: Creating missing output directory - INR_LAS
19/03/13 19:15:01 INFO manager.OracleManager: Time zone has been set to GMT
19/03/13 19:15:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_LAS t WHERE 1=0
19/03/13 19:15:01 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
19/03/13 19:15:01 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
19/03/13 19:15:01 WARN hive.TableDefWriter: Column ETLTIME had to be cast to a less precise type in Hive
19/03/13 19:15:01 INFO hive.HiveImport: Loading uploaded data into Hive
19/03/13 19:15:01 INFO conf.HiveConf: Found configuration file file:/hadoop/hive/conf/hive-site.xml

Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 19:15:04 INFO SessionState: 
Logging initialized using configuration in jar:file:/hadoop/hive/lib/hive-common-2.3.2.jar!/hive-log4j2.properties Async: true
19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Created local directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618/_tmp_space.db
19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main
19/03/13 19:15:04 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:04 INFO ql.Driver: Compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:09 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:09 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:09 INFO parse.CalcitePlanner: Starting Semantic Analysis
19/03/13 19:15:09 INFO parse.CalcitePlanner: Creating table oracle.INR_LAS position=27
19/03/13 19:15:09 INFO ql.Driver: Semantic Analysis Completed
19/03/13 19:15:09 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 19:15:09 INFO ql.Driver: Completed compiling command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 5.309 seconds
19/03/13 19:15:09 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 19:15:09 INFO ql.Driver: Executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f): CREATE TABLE IF NOT EXISTS `oracle`.`INR_LAS` ( `EMPNO` DOUBLE, `ENAME
` STRING, `JOB` STRING, `SAL` DOUBLE, `ETLTIME` STRING) COMMENT 'Imported by sqoop on 2019/03/13 11:15:01' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\011' LINES TERMINATED BY '\012' STORED AS TEXTFILE19/03/13 19:15:09 INFO sqlstd.SQLStdHiveAccessController: Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=7feac288-289d-4d74-8641-553c5ab6561
8, clientType=HIVECLI]19/03/13 19:15:09 WARN session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
19/03/13 19:15:09 INFO hive.metastore: Mestastore configuration hive.metastore.filter.hook changed from org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl to org.apache.hadoop.
hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:10 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:10 INFO ql.Driver: Completed executing command(queryId=root_20190313111504_d1db4a38-1b86-4c89-84c3-3d3be9404b0f); Time taken: 0.106 seconds
OK
19/03/13 19:15:10 INFO ql.Driver: OK
Time taken: 5.429 seconds
19/03/13 19:15:10 INFO CliDriver: Time taken: 5.429 seconds
19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:10 INFO session.SessionState: Resetting thread name to  main
19/03/13 19:15:10 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:10 INFO session.SessionState: Updating thread name to 7feac288-289d-4d74-8641-553c5ab65618 main
19/03/13 19:15:10 INFO ql.Driver: Compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01): 
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 19:15:10 INFO ql.Driver: Semantic Analysis Completed
19/03/13 19:15:10 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
19/03/13 19:15:10 INFO ql.Driver: Completed compiling command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 0.415 seconds
19/03/13 19:15:10 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
19/03/13 19:15:10 INFO ql.Driver: Executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01): 
LOAD DATA INPATH 'hdfs://192.168.1.66:9000/user/root/INR_LAS' INTO TABLE `oracle`.`INR_LAS`
19/03/13 19:15:10 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
19/03/13 19:15:10 INFO hive.metastore: Closed a connection to metastore, current connections: 0
Loading data to table oracle.inr_las
19/03/13 19:15:10 INFO exec.Task: Loading data to table oracle.inr_las from hdfs://192.168.1.66:9000/user/root/INR_LAS
19/03/13 19:15:10 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:10 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:10 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:10 ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
19/03/13 19:15:11 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
19/03/13 19:15:11 INFO exec.StatsTask: Executing stats task
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:11 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.metastore: Trying to connect to metastore with URI thrift://192.168.1.66:9083
19/03/13 19:15:11 INFO hive.metastore: Opened a connection to metastore, current connections: 1
19/03/13 19:15:11 INFO hive.metastore: Connected to metastore.
19/03/13 19:15:11 INFO exec.StatsTask: Table oracle.inr_las stats: [numFiles=7, numRows=0, totalSize=557, rawDataSize=0]
19/03/13 19:15:11 INFO ql.Driver: Completed executing command(queryId=root_20190313111510_cd9c21cf-b479-475c-a959-be8ff0ac5f01); Time taken: 1.296 seconds
OK
19/03/13 19:15:11 INFO ql.Driver: OK
Time taken: 1.713 seconds
19/03/13 19:15:11 INFO CliDriver: Time taken: 1.713 seconds
19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:11 INFO session.SessionState: Resetting thread name to  main
19/03/13 19:15:11 INFO conf.HiveConf: Using the default value passed in for log id: 7feac288-289d-4d74-8641-553c5ab65618
19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /tmp/hive/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme hdfs
19/03/13 19:15:11 INFO session.SessionState: Deleted directory: /hadoop/hive/tmp/root/7feac288-289d-4d74-8641-553c5ab65618 on fs with scheme file
19/03/13 19:15:11 INFO hive.metastore: Closed a connection to metastore, current connections: 0
19/03/13 19:15:11 INFO hive.HiveImport: Hive import complete.
19/03/13 19:15:11 INFO hive.HiveImport: Export directory is empty, removing it.
19/03/13 19:15:11 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/13 19:15:11 INFO tool.ImportTool: Updated data for job: inc_job

发现已经不需要输入密码了,再来看下hive表数据:

hive> select * from inr_las;
OK
1    er    CLERK    800.0    2019-03-20 10:42:27.0
2    ALLEN    SALESMAN    1600.0    2019-03-20 10:42:27.0
3    WARD    SALESMAN    1250.0    2019-03-20 10:42:27.0
4    JONES    MANAGER    2975.0    2019-03-20 10:42:27.0
5    MARTIN    SALESMAN    1250.0    2019-03-20 10:42:27.0
6    zhao    DBA    1000.0    2019-03-20 10:52:34.0
7    yan    BI    100.0    2019-03-20 10:42:27.0
8    dong    JAVA    332.0    2019-03-20 14:49:19.0
8    dong    JAVA    3232.0    2019-03-20 15:13:35.0
8    dong    JAVA    4232.0    2019-03-20 15:29:03.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    5232.0    2019-03-20 15:36:07.0
8    dong    JAVA    3232.0    2019-03-20 15:13:35.0
Time taken: 0.161 seconds, Fetched: 14 row(s)

成14条数据了,多了条empno=8的数据,成功了。
不过笔者这里的需求是源端oracle数据库做了update之后,由于时间戳也会跟着变化,所以我们要求根据时间戳找出变更的数据然后在hive增量更新,这里就使用到了根据时间和主键进行合并增量的nerge-id模式,job的创建类似上面的例子.
这里的例子为:我们通过shell脚本进行封装让crontab 定时执行增量。
注意:先声明一下,因为笔者是测试增量导入给kylin做增量cube用,测试数据量很少,所以hive表只创建外部表,不在分区。
下面全流程演示如何一步步把一个表通过sqoop job结合crontab+shell脚本自动增量导入到hive:
第一步,先在oracle端创建一个要同步的表,这里用上面的inr_las表再初始化一个表:

create table inr_job as select a.empno,
                       a.ename,
                       a.job,
                       a.sal,
                       sysdate etltime  from inr_las a ;

第二步,在hive创建目标表:

hive> use oracle;
OK
Time taken: 1.425 seconds
create table INR_JOB
(
  empno   int,
  ename   string,
  job     string,
  sal     float,
  etltime string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
location '/user/hive/warehouse/exter_inr_job'; 
Time taken: 2.836 seconds

第三步,全量把数据导入hive:
先删除一下上面创建外部表时指定的目录,因为创建外部表时会自动创建目录,而下面的全量导入也会自动创建,因此会导致冲突提示目录存在的错误:

[root@hadoop hadoop]# hadoop fs -rmr /user/hive/warehouse/exter_inr_job
rmr: DEPRECATED: Please use 'rm -r' instead.
19/03/14 06:01:23 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minut
es, Emptier interval = 0 minutes.Deleted /user/hive/warehouse/exter_inr_job

接下来全量导入:

sqoop import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott --password tiger --table INR_JOB -m 1 --target-dir /user/hive/warehouse/exter_inr_job --fields-terminated-by '\t'

导入完成查询下hive数据:

hive> select * from inr_job;
OK
1    er    CLERK    800.0    2019-03-22 17:24:42.0
2    ALLEN    SALESMAN    1600.0    2019-03-22 17:24:42.0
3    WARD    SALESMAN    1250.0    2019-03-22 17:24:42.0
4    JONES    MANAGER    2975.0    2019-03-22 17:24:42.0
5    MARTIN    SALESMAN    1250.0    2019-03-22 17:24:42.0
6    zhao    DBA    1000.0    2019-03-22 17:24:42.0
7    yan    BI    100.0    2019-03-22 17:24:42.0
8    dong    JAVA    400.0    2019-03-22 17:24:42.0
Time taken: 3.153 seconds, Fetched: 8 row(s)

第四步,创建增量sqoop job
下面的--password-file /user/hive/warehouse/scott.pwd 是之前上一篇文章创建的密码文件,读者可以看下上篇文章如何创建的

 sqoop job --create auto_job -- import --connect jdbc:oracle:thin:@192.168.1.6:1521:orcl --username scott  --password-file /user/hive/warehouse/scott.pwd  --table INR_JOB --fields-terminated-by '\t' --lines-terminated-by '\n'  --target-dir /user/hive/warehouse/exter_inr_job -m 1 --check-column ETLTIME --incremental lastmodified --merge-key EMPNO --last-value "2019-03-22 17:24:42"

看下创建的job信息:

[root@hadoop hadoop]# sqoop  job --show auto_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/14 06:10:57 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/i
mpl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLogg
erBinder.class]SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLog
gerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: auto_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-22 17:24:42
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
mainframe.input.dataset.type = p
split.limit = null
hbase.create.table = false
skip.dist.cache = false
hdfs.append.dir = false
db.table = INR_JOB
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = scott
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
sqoop.throwOnError = false
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = ETLTIME
codegen.input.delimiters.record = 0
db.password.file = /user/hive/warehouse/scott.pwd
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
customtool.options.jsonmap = {}
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.output.delimiters.field = 9
export.new.update = UpdateOnly
incremental.mode = DateLastModified
hdfs.file.format = TextFile
sqoop.oracle.escaping.disabled = true
codegen.compile.dir = /tmp/sqoop-root/compile/be3b358816e17c786d114afb7a4f2a6d
direct.import = false
temporary.dirRoot = _sqoop
hdfs.target.dir = /user/hive/warehouse/exter_inr_job
hive.fail.table.exists = false
merge.key.col = EMPNO
db.batch = false

第五步,封装到shell脚本,加入定时任务

[root@hadoop ~]# cd /hadoop/
[root@hadoop hadoop]# vim auto_inr.sh
加入下面内容:
#!/bin/bash
log="/hadoop/auto_job_log.log"
echo "======================`date "+%Y-%m-%d %H:%M:%S"`增量======================" >> $log
nohup sqoop job --exec auto_job >> $log 2>&1 &
保存退出,赋予权限
[root@hadoop hadoop]# chmod +x auto_inr.sh 

先来手动执行一下,不过执行前先再看看job的last_value时间:

[root@hadoop hadoop]# sqoop job --show auto_job
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 17:50:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Job: auto_job
Tool: import
Options:
----------------------------
verbose = false
hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-22 17:24:42
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl

看到是2019-03-22 17:24:42,再看下当前时间:

[root@hadoop hadoop]# date
Mon Mar 25 17:54:54 CST 2019

接下来手动执行下这个脚本:

[root@hadoop hadoop]# ./auto_inr.sh 

然后去看重定向的日志:

[root@hadoop hadoop]# cat auto_job_log.log 
======================2019-03-25 17:55:46增量======================
Warning: /hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/03/25 17:55:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hbase/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/03/25 17:55:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
19/03/25 17:55:50 INFO manager.SqlManager: Using default fetchSize of 1000
19/03/25 17:55:50 INFO tool.CodeGenTool: Beginning code generation
19/03/25 17:55:51 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0
19/03/25 17:55:51 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /hadoop
Note: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/03/25 17:55:54 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/6f5f7577c1f664b94d5c83b578fd3dac/INR_JOB.jar
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM INR_JOB t WHERE 1=0
19/03/25 17:55:54 INFO tool.ImportTool: Incremental import based on column ETLTIME
19/03/25 17:55:54 INFO tool.ImportTool: Lower bound value: TO_TIMESTAMP('2019-03-22 17:24:42', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO mapreduce.ImportJobBase: Beginning import of INR_JOB
19/03/25 17:55:54 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/03/25 17:55:54 INFO manager.OracleManager: Time zone has been set to GMT
19/03/25 17:55:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/03/25 17:55:54 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/25 17:55:57 INFO db.DBInputFormat: Using read commited transaction isolation
19/03/25 17:55:57 INFO mapreduce.JobSubmitter: number of splits:1
19/03/25 17:55:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0009
19/03/25 17:55:58 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0009
19/03/25 17:55:58 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0009/
19/03/25 17:55:58 INFO mapreduce.Job: Running job: job_1553503985304_0009
19/03/25 17:56:07 INFO mapreduce.Job: Job job_1553503985304_0009 running in uber mode : false
19/03/25 17:56:07 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 17:56:15 INFO mapreduce.Job:  map 100% reduce 0%
19/03/25 17:56:15 INFO mapreduce.Job: Job job_1553503985304_0009 completed successfully
19/03/25 17:56:15 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=144775
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=323
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5270
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=5270
        Total vcore-milliseconds taken by all map tasks=5270
        Total megabyte-milliseconds taken by all map tasks=5396480
    Map-Reduce Framework
        Map input records=8
        Map output records=8
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=73
        CPU time spent (ms)=3000
        Physical memory (bytes) snapshot=205058048
        Virtual memory (bytes) snapshot=2135244800
        Total committed heap usage (bytes)=109576192
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=323
19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Transferred 323 bytes in 20.9155 seconds (15.4431 bytes/sec)
19/03/25 17:56:15 INFO mapreduce.ImportJobBase: Retrieved 8 records.
19/03/25 17:56:15 INFO tool.ImportTool: Final destination exists, will run merge job.
19/03/25 17:56:15 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
19/03/25 17:56:15 INFO client.RMProxy: Connecting to ResourceManager at /192.168.1.66:8032
19/03/25 17:56:18 INFO input.FileInputFormat: Total input paths to process : 2
19/03/25 17:56:18 INFO mapreduce.JobSubmitter: number of splits:2
19/03/25 17:56:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1553503985304_0010
19/03/25 17:56:19 INFO impl.YarnClientImpl: Submitted application application_1553503985304_0010
19/03/25 17:56:19 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1553503985304_0010/
19/03/25 17:56:19 INFO mapreduce.Job: Running job: job_1553503985304_0010
19/03/25 17:56:29 INFO mapreduce.Job: Job job_1553503985304_0010 running in uber mode : false
19/03/25 17:56:29 INFO mapreduce.Job:  map 0% reduce 0%
19/03/25 17:56:39 INFO mapreduce.Job:  map 100% reduce 0%
19/03/25 17:56:50 INFO mapreduce.Job:  map 100% reduce 100%
19/03/25 17:56:50 INFO mapreduce.Job: Job job_1553503985304_0010 completed successfully
19/03/25 17:56:50 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=1090
        FILE: Number of bytes written=436771
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=942
        HDFS: Number of bytes written=323
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=14667
        Total time spent by all reduces in occupied slots (ms)=7258
        Total time spent by all map tasks (ms)=14667
        Total time spent by all reduce tasks (ms)=7258
        Total vcore-milliseconds taken by all map tasks=14667
        Total vcore-milliseconds taken by all reduce tasks=7258
        Total megabyte-milliseconds taken by all map tasks=15019008
        Total megabyte-milliseconds taken by all reduce tasks=7432192
    Map-Reduce Framework
        Map input records=16
        Map output records=16
        Map output bytes=1052
        Map output materialized bytes=1096
        Input split bytes=296
        Combine input records=0
        Combine output records=0
        Reduce input groups=8
        Reduce shuffle bytes=1096
        Reduce input records=16
        Reduce output records=8
        Spilled Records=32
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=230
        CPU time spent (ms)=5420
        Physical memory (bytes) snapshot=684474368
        Virtual memory (bytes) snapshot=6394597376
        Total committed heap usage (bytes)=511705088
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=646
    File Output Format Counters 
        Bytes Written=323
19/03/25 17:56:50 INFO tool.ImportTool: Saving incremental import state to the metastore
19/03/25 17:56:51 INFO tool.ImportTool: Updated data for job: auto_job

可以看到日志中这么一段话:

19/03/25 17:55:54 INFO tool.ImportTool: Upper bound value: TO_TIMESTAMP('2019-03-25 17:55:54.0', 'YYYY-MM-DD HH24:MI:SS.FF')

说明上限是当前时间,然后再看下当前job的last_value:

hcatalog.drop.and.create.table = false
incremental.last.value = 2019-03-25 17:55:54.0
db.connect.string = jdbc:oracle:thin:@192.168.1.6:1521:orcl

和上面日志中的时间一致,如果job内容需要更改时,可以删了job重建更改好的job,手动指定时间为日志中的Upper bound或删除前记录下面last_value,为重建后的job提供增量时间。
上面手动调用没问题。就剩最后一步crontab定时了,crontab -e加入下面一段话(每五分钟增量一次):

*/2 * * * *  /hadoop/auto_inr.sh

如果一个表很大。我第一次初始化一部分最新的数据到hive表,如果没初始化进来的历史数据今天发生了变更,那merge-key的增量方式会不会报错呢?看下一篇测试文章吧,等写完会放链接到这:
https://blog.csdn.net/qq_28356739/article/details/88803284

相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps&nbsp;
相关文章
|
7天前
|
机器学习/深度学习 人工智能 运维
智能运维:大数据与AI的融合之道###
【10月更文挑战第20天】 运维领域正经历一场静悄悄的变革,大数据与人工智能的深度融合正重塑着传统的运维模式。本文探讨了智能运维如何借助大数据分析和机器学习算法,实现从被动响应到主动预防的转变,提升系统稳定性和效率的同时,降低了运维成本。通过实例解析,揭示智能运维在现代IT架构中的核心价值,为读者提供一份关于未来运维趋势的深刻洞察。 ###
49 10
|
19天前
|
运维 Java Linux
【运维基础知识】掌握VI编辑器:提升你的Java开发效率
本文详细介绍了VI编辑器的常用命令,包括模式切换、文本编辑、搜索替换及退出操作,帮助Java开发者提高在Linux环境下的编码效率。掌握这些命令,将使你在开发过程中更加得心应手。
19 2
|
26天前
|
大数据 网络安全 数据安全/隐私保护
大数据-03-Hadoop集群 免密登录 超详细 3节点云 分发脚本 踩坑笔记 SSH免密 集群搭建(二)
大数据-03-Hadoop集群 免密登录 超详细 3节点云 分发脚本 踩坑笔记 SSH免密 集群搭建(二)
90 5
|
26天前
|
XML 大数据 网络安全
大数据-03-Hadoop集群 免密登录 超详细 3节点云 分发脚本 踩坑笔记 SSH免密 集群搭建(一)
大数据-03-Hadoop集群 免密登录 超详细 3节点云 分发脚本 踩坑笔记 SSH免密 集群搭建(一)
50 4
|
25天前
|
消息中间件 大数据 Java
大数据-55 Kafka sh脚本使用 与 JavaAPI使用 topics.sh producer.sh consumer.sh kafka-clients
大数据-55 Kafka sh脚本使用 与 JavaAPI使用 topics.sh producer.sh consumer.sh kafka-clients
23 2
|
25天前
|
缓存 分布式计算 NoSQL
大数据-43 Redis 功能扩展 Lua 脚本 对Redis扩展 eval redis.call redis.pcall
大数据-43 Redis 功能扩展 Lua 脚本 对Redis扩展 eval redis.call redis.pcall
24 2
|
16天前
|
存储 运维 监控
实时计算Flink版在稳定性、性能、开发运维、安全能力等等跟其他引擎及自建Flink集群比较。
实时计算Flink版在稳定性、性能、开发运维和安全能力等方面表现出色。其自研的高性能状态存储引擎GeminiStateBackend显著提升了作业稳定性,状态管理优化使性能提升40%以上。核心性能较开源Flink提升2-3倍,资源利用率提高100%。提供一站式开发管理、自动化运维和丰富的监控告警功能,支持多语言开发和智能调优。安全方面,具备访问控制、高可用保障和全链路容错能力,确保企业级应用的安全与稳定。
27 0
|
25天前
|
SQL 关系型数据库 MySQL
Hadoop-25 Sqoop迁移 增量数据导入 CDC 变化数据捕获 差量同步数据 触发器 快照 日志
Hadoop-25 Sqoop迁移 增量数据导入 CDC 变化数据捕获 差量同步数据 触发器 快照 日志
34 0
|
3月前
|
运维 Devops 持续交付
自动化运维之路:从脚本到DevOps探索后端开发:从基础到高级实践
【8月更文挑战第28天】在数字化时代的浪潮中,企业对于IT运维的要求越来越高。从最初的手动执行脚本,到如今的自动化运维和DevOps实践,本文将带你领略运维的演变之旅。我们将探索如何通过编写简单的自动化脚本来提升效率,进而介绍DevOps文化的兴起及其对现代运维的影响。文章将为你揭示,通过持续集成、持续部署和微服务架构的实践,如何构建一个高效、可靠的运维体系。准备好让你的运维工作变得更加智能化和自动化了吗?让我们一起踏上这段旅程。 【8月更文挑战第28天】 本文旨在为初学者和有一定经验的开发者提供一个深入浅出的后端开发之旅。我们将一起探索后端开发的多个方面,包括语言选择、框架应用、数据库设计
|
3月前
|
运维 Kubernetes 监控

热门文章

最新文章