业务数仓
之前我们搭建的是用户行为数仓,接下来我们来搭建业务数仓。
假设我们现在拥有下面8张业务表
MariaDB [gmall]> show tables; +-----------------+ | Tables_in_gmall | +-----------------+ | base_category1 | | base_category2 | | base_category3 | | order_detail | | order_info | | payment_info | | sku_info | | user_info | +-----------------+ 8 rows in set (0.00 sec)
Sqoop定时导入脚本
创建脚本sqoop_import.sh
#!/bin/bash export HADOOP_USER_NAME=hive db_date=$2 echo $db_date db_name=gmall import_data() { sqoop import \ --connect jdbc:mysql://192.168.0.208:3306/$db_name \ --username root \ --password password \ --target-dir /origin_data/$db_name/db/$1/$db_date \ --delete-target-dir \ --num-mappers 1 \ --fields-terminated-by "\t" \ --query "$2"' and $CONDITIONS;' } import_sku_info(){ import_data "sku_info" "select id, spu_id, price, sku_name, sku_desc, weight, tm_id, category3_id, create_time from sku_info where 1=1" } import_user_info(){ import_data "user_info" "select id, name, birthday, gender, email, user_level, create_time from user_info where 1=1" } import_base_category1(){ import_data "base_category1" "select id, name from base_category1 where 1=1" } import_base_category2(){ import_data "base_category2" "select id, name, category1_id from base_category2 where 1=1" } import_base_category3(){ import_data "base_category3" "select id, name, category2_id from base_category3 where 1=1" } import_order_detail(){ import_data "order_detail" "select od.id, order_id, user_id, sku_id, sku_name, order_price, sku_num, o.create_time from order_info o , order_detail od where o.id=od.order_id and DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date'" } import_payment_info(){ import_data "payment_info" "select id, out_trade_no, order_id, user_id, alipay_trade_no, total_amount, subject, payment_type, payment_time from payment_info where DATE_FORMAT(payment_time,'%Y-%m-%d')='$db_date'" } import_order_info(){ import_data "order_info" "select id, total_amount, order_status, user_id, payment_way, out_trade_no, create_time, operate_time from order_info where (DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$db_date')" } case $1 in "base_category1") import_base_category1 ;; "base_category2") import_base_category2 ;; "base_category3") import_base_category3 ;; "order_info") import_order_info ;; "order_detail") import_order_detail ;; "sku_info") import_sku_info ;; "user_info") import_user_info ;; "payment_info") import_payment_info ;; "all") import_base_category1 import_base_category2 import_base_category3 import_order_info import_order_detail import_sku_info import_user_info import_payment_info ;; esac
执行脚本导入数据
sqoop_import.sh all 2020-11-24 2020-11-24 Warning: /opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 20/11/25 11:06:38 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.7.1.3.0-100 20/11/25 11:06:38 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 20/11/25 11:06:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 20/11/25 11:06:39 INFO tool.CodeGenTool: Beginning code generation 20/11/25 11:06:40 INFO manager.SqlManager: Executing SQL statement: select id, name from base_category1 where 1=1 and (1 = 0) ; 20/11/25 11:06:40 INFO manager.SqlManager: Executing SQL statement: select id, name from base_category1 where 1=1 and (1 = 0) ; 20/11/25 11:06:40 INFO manager.SqlManager: Executing SQL statement: select id, name from base_category1 where 1=1 and (1 = 0) ; 20/11/25 11:06:40 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce warning: No SupportedSourceVersion annotation found on org.apache.hadoop.hdds.conf.ConfigFileGenerator, returning RELEASE_6. warning: Supported source version 'RELEASE_6' from annotation processor 'org.apache.hadoop.hdds.conf.ConfigFileGenerator' less than -source '1.8' 2 warnings 20/11/25 11:06:43 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/1c9ca0f79341623102daea7280554592/QueryResult.jar 20/11/25 11:06:44 INFO tool.ImportTool: Destination directory /origin_data/gmall/db/base_category1/2020-11-24 is not present, hence not deleting. 20/11/25 11:06:44 INFO mapreduce.ImportJobBase: Beginning query import. 20/11/25 11:06:44 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 20/11/25 11:06:45 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 20/11/25 11:06:45 INFO client.RMProxy: Connecting to ResourceManager at cdh2.macro.com/192.168.0.207:8032 20/11/25 11:06:46 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hive/.staging/job_1606125802436_0008 20/11/25 11:06:53 INFO db.DBInputFormat: Using read commited transaction isolation 20/11/25 11:06:53 INFO mapreduce.JobSubmitter: number of splits:1 20/11/25 11:06:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606125802436_0008 20/11/25 11:06:54 INFO mapreduce.JobSubmitter: Executing with tokens: [] 20/11/25 11:06:55 INFO conf.Configuration: resource-types.xml not found 20/11/25 11:06:55 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 20/11/25 11:06:55 INFO impl.YarnClientImpl: Submitted application application_1606125802436_0008 20/11/25 11:06:55 INFO mapreduce.Job: The url to track the job: http://cdh2.macro.com:8088/proxy/application_1606125802436_0008/ 20/11/25 11:06:55 INFO mapreduce.Job: Running job: job_1606125802436_0008 20/11/25 11:07:06 INFO mapreduce.Job: Job job_1606125802436_0008 running in uber mode : false 20/11/25 11:07:06 INFO mapreduce.Job: map 0% reduce 0% 20/11/25 11:07:12 INFO mapreduce.Job: map 100% reduce 0% 20/11/25 11:07:12 INFO mapreduce.Job: Job job_1606125802436_0008 completed successfully 20/11/25 11:07:12 INFO mapreduce.Job: Counters: 33 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=253013 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=85 HDFS: Number of bytes written=270 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4014 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4014 Total vcore-milliseconds taken by all map tasks=4014 Total megabyte-milliseconds taken by all map tasks=8220672 Map-Reduce Framework Map input records=18 Map output records=18 Input split bytes=85 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=76 CPU time spent (ms)=1540 Physical memory (bytes) snapshot=245944320 Virtual memory (bytes) snapshot=2841169920 Total committed heap usage (bytes)=249036800 Peak Map Physical memory (bytes)=245944320 Peak Map Virtual memory (bytes)=2841169920 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=270 20/11/25 11:07:12 INFO mapreduce.ImportJobBase: Transferred 270 bytes in 27.907 seconds (9.675 bytes/sec) 20/11/25 11:07:12 INFO mapreduce.ImportJobBase: Retrieved 18 records. Warning: /opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 20/11/25 11:07:18 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.7.1.3.0-100 20/11/25 11:07:18 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 20/11/25 11:07:18 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 20/11/25 11:07:18 INFO tool.CodeGenTool: Beginning code generation 20/11/25 11:07:18 INFO manager.SqlManager: Executing SQL statement: select id, name, category1_id from base_category2 where 1=1 and (1 = 0) ; 20/11/25 11:07:18 INFO manager.SqlManager: Executing SQL statement: select id, name, category1_id from base_category2 where 1=1 and (1 = 0) ; 20/11/25 11:07:18 INFO manager.SqlManager: Executing SQL statement: select id, name, category1_id from base_category2 where 1=1 and (1 = 0) ; 20/11/25 11:07:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce warning: No SupportedSourceVersion annotation found on org.apache.hadoop.hdds.conf.ConfigFileGenerator, returning RELEASE_6. warning: Supported source version 'RELEASE_6' from annotation processor 'org.apache.hadoop.hdds.conf.ConfigFileGenerator' less than -source '1.8' 2 warnings 20/11/25 11:07:21 WARN orm.CompilationManager: Could not rename /tmp/sqoop-root/compile/bcba56c3140f5b064e3dbba26e43279f/QueryResult.java to /data0/25k/./QueryResult.java. Error: Destination '/data0/25k/./QueryResult.java' already exists 20/11/25 11:07:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/bcba56c3140f5b064e3dbba26e43279f/QueryResult.jar 20/11/25 11:07:22 INFO tool.ImportTool: Destination directory /origin_data/gmall/db/base_category2/2020-11-24 is not present, hence not deleting. 20/11/25 11:07:22 INFO mapreduce.ImportJobBase: Beginning query import. 20/11/25 11:07:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 20/11/25 11:07:22 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 20/11/25 11:07:22 INFO client.RMProxy: Connecting to ResourceManager at cdh2.macro.com/192.168.0.207:8032 20/11/25 11:07:23 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hive/.staging/job_1606125802436_0009 20/11/25 11:07:26 INFO db.DBInputFormat: Using read commited transaction isolation 20/11/25 11:07:26 INFO mapreduce.JobSubmitter: number of splits:1 20/11/25 11:07:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606125802436_0009 20/11/25 11:07:27 INFO mapreduce.JobSubmitter: Executing with tokens: [] 20/11/25 11:07:27 INFO conf.Configuration: resource-types.xml not found 20/11/25 11:07:27 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 20/11/25 11:07:27 INFO impl.YarnClientImpl: Submitted application application_1606125802436_0009 20/11/25 11:07:27 INFO mapreduce.Job: The url to track the job: http://cdh2.macro.com:8088/proxy/application_1606125802436_0009/ 20/11/25 11:07:27 INFO mapreduce.Job: Running job: job_1606125802436_0009 20/11/25 11:07:44 INFO mapreduce.Job: Job job_1606125802436_0009 running in uber mode : false 20/11/25 11:07:44 INFO mapreduce.Job: map 0% reduce 0% 20/11/25 11:07:50 INFO mapreduce.Job: map 100% reduce 0% 20/11/25 11:07:51 INFO mapreduce.Job: Job job_1606125802436_0009 completed successfully 20/11/25 11:07:51 INFO mapreduce.Job: Counters: 33 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=253031 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=85 HDFS: Number of bytes written=2181 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 HDFS: Number of bytes read erasure-coded=0 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=4244 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=4244 Total vcore-milliseconds taken by all map tasks=4244 Total megabyte-milliseconds taken by all map tasks=8691712 Map-Reduce Framework Map input records=125 Map output records=125 Input split bytes=85 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=71 CPU time spent (ms)=1700 Physical memory (bytes) snapshot=259682304 Virtual memory (bytes) snapshot=2850103296 Total committed heap usage (bytes)=235929600 Peak Map Physical memory (bytes)=259682304 Peak Map Virtual memory (bytes)=2850103296 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=2181 20/11/25 11:07:51 INFO mapreduce.ImportJobBase: Transferred 2.1299 KB in 29.0742 seconds (75.0149 bytes/sec) 20/11/25 11:07:51 INFO mapreduce.ImportJobBase: Retrieved 125 records. Warning: /opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 20/11/25 11:07:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.7.1.3.0-100 20/11/25 11:07:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 20/11/25 11:07:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 20/11/25 11:07:56 INFO tool.CodeGenTool: Beginning code generation 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO manager.SqlManager: Executing SQL statement: select id, name, category2_id from base_category3 where 1=1 and (1 = 0) ; 20/11/25 11:07:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce warning: No SupportedSourceVersion annotation found on org.apache.hadoop.hdds.conf.ConfigFileGenerator, returning RELEASE_6. warning: Supported source version 'RELEASE_6' from annotation processor 'org.apache.hadoop.hdds.conf.ConfigFileGenerator' less than -source '1.8' 2 warnings 20/11/25 11:07:59 WARN orm.CompilationManager: Could not rename /tmp/sqoop-root/compile/7b64a4d6eff156e0c71805f454daf5af/QueryResult.java to /data0/25k/./QueryResult.java. Error: Destination '/data0/25k/./QueryResult.java' already exists 20/11/25 11:07:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/7b64a4d6eff156e0c71805f454daf5af/QueryResult.jar 20/11/25 11:08:00 INFO tool.ImportTool: Destination directory /origin_data/gmall/db/base_category3/2020-11-24 is not present, hence not deleting. 20/11/25 11:08:00 INFO mapreduce.ImportJobBase: Beginning query import. 20/11/25 11:08:00 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 20/11/25 11:08:00 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 20/11/25 11:08:01 INFO client.RMProxy: Connecting to ResourceManager at cdh2.macro.com/192.168.0.207:8032 20/11/25 11:08:02 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /user/hive/.staging/job_1606125802436_0010 20/11/25 11:08:05 INFO db.DBInputFormat: Using read commited transaction isolation 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: number of splits:1 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606125802436_0010 20/11/25 11:08:05 INFO mapreduce.JobSubmitter: Executing with tokens: [] 20/11/25 11:08:06 INFO conf.Configuration: resource-types.xml not found 20/11/25 11:08:06 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 20/11/25 11:08:06 INFO impl.YarnClientImpl: Submitted application application_1606125802436_0010 20/11/25 11:08:06 INFO mapreduce.Job: The url to track the job: http://cdh2.macro.com:8088/proxy/application_1606125802436_0010/ 20/11/25 11:08:06 INFO mapreduce.Job: Running job: job_1606125802436_0010 20/11/25 11:08:15 INFO mapreduce.Job: Job job_1606125802436_0010 running in uber mode : false 20/11/25 11:08:15 INFO mapreduce.Job: map 0% reduce 0% 20/11/25 11:08:25 INFO mapreduce.Job: map 100% reduce 0% 20/11/25 11:08:25 INFO mapreduce.Job: Job job_1606125802436_0010 completed successfully
ODS层
完全仿照业务数据库中的表字段,一模一样的创建ODS层对应表。
下面是建表语句:
drop table if exists ods_order_info; create external table ods_order_info ( `id` string COMMENT '订单编号', `total_amount` decimal(10,2) COMMENT '订单金额', `order_status` string COMMENT '订单状态', `user_id` string COMMENT '用户id' , `payment_way` string COMMENT '支付方式', `out_trade_no` string COMMENT '支付流水号', `create_time` string COMMENT '创建时间', `operate_time` string COMMENT '操作时间' ) COMMENT '订单表' PARTITIONED BY ( `dt` string) row format delimited fields terminated by '\t' ;
drop table if exists ods_order_detail; create external table ods_order_detail( `id` string COMMENT '订单编号', `order_id` string COMMENT '订单号', `user_id` string COMMENT '用户id' , `sku_id` string COMMENT '商品id', `sku_name` string COMMENT '商品名称', `order_price` string COMMENT '商品价格', `sku_num` string COMMENT '商品数量', `create_time` string COMMENT '创建时间' ) COMMENT '订单明细表' PARTITIONED BY ( `dt` string) row format delimited fields terminated by '\t'
drop table if exists ods_sku_info; create external table ods_sku_info( `id` string COMMENT 'skuId', `spu_id` string COMMENT 'spuid', `price` decimal(10,2) COMMENT '价格' , `sku_name` string COMMENT '商品名称', `sku_desc` string COMMENT '商品描述', `weight` string COMMENT '重量', `tm_id` string COMMENT '品牌id', `category3_id` string COMMENT '品类id', `create_time` string COMMENT '创建时间' ) COMMENT '商品表' PARTITIONED BY ( `dt` string) row format delimited fields terminated by '\t' ;
drop table if exists ods_user_info; create external table ods_user_info( `id` string COMMENT '用户id', `name` string COMMENT '姓名', `birthday` string COMMENT '生日' , `gender` string COMMENT '性别', `email` string COMMENT '邮箱', `user_level` string COMMENT '用户等级', `create_time` string COMMENT '创建时间' ) COMMENT '用户信息' PARTITIONED BY ( `dt` string) row format delimited fields terminated by '\t' ;
drop table if exists ods_base_category1; create external table ods_base_category1( `id` string COMMENT 'id', `name` string COMMENT '名称' ) COMMENT '商品一级分类' PARTITIONED BY ( `dt` string) row format delimited fields terminated by '\t';
drop table if exists ods_base_category2; create external table ods_base_category2( `id` string COMMENT ' id', `name` string COMMENT '名称', category1_id string COMMENT '一级品类id' ) COMMENT '商品二级分类' PARTITIONED BY ( `dt` string) row format delimited fields terminated by '\t'
drop table if exists ods_base_category3; create external table ods_base_category3( `id` string COMMENT ' id', `name` string COMMENT '名称', category2_id string COMMENT '二级品类id' ) COMMENT '商品三级分类' PARTITIONED BY ( `dt` string) row format delimited fields terminated by '\t';
drop table if exists `ods_payment_info`; create external table `ods_payment_info`( `id` bigint COMMENT '编号', `out_trade_no` string COMMENT '对外业务编号', `order_id` string COMMENT '订单编号', `user_id` string COMMENT '用户编号', `alipay_trade_no` string COMMENT '支付宝交易流水编号', `total_amount` decimal(16,2) COMMENT '支付金额', `subject` string COMMENT '交易内容', `payment_type` string COMMENT '支付类型', `payment_time` string COMMENT '支付时间' ) COMMENT '支付流水表' PARTITIONED BY ( `dt` string) row format delimited fields terminated by '\t' ;
ODS层数据导入脚本
新建脚本ods_db.sh
#!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$1" ] ;then do_date=$1 else do_date=`date -d "-1 day" +%F` fi sql=" load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table "$APP".ods_order_info partition(dt='$do_date'); load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table "$APP".ods_order_detail partition(dt='$do_date'); load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table "$APP".ods_sku_info partition(dt='$do_date'); load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table "$APP".ods_user_info partition(dt='$do_date'); load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table "$APP".ods_payment_info partition(dt='$do_date'); load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table "$APP".ods_base_category1 partition(dt='$do_date'); load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table "$APP".ods_base_category2 partition(dt='$do_date'); load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table "$APP".ods_base_category3 partition(dt='$do_date'); " hive -e "$sql"
执行脚本导入数据
ods_db.sh 2020-11-24 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console. Set system property 'log4j2.debug' to show Log4j2 internal initialization logging. WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.3-1.cdh7.1.3.p0.4992530/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2 20/11/25 12:04:07 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant} 20/11/25 12:04:07 [main-EventThread]: ERROR imps.EnsembleTracker: Invalid config event received: {server.1=cdh2.macro.com:3181:4181:participant, version=0, server.3=cdh1.macro.com:3181:4181:participant, server.2=cdh3.macro.com:3181:4181:participant} 20/11/25 12:04:07 [main]: INFO jdbc.HiveConnection: Connected to cdh2.macro.com:10000 Connected to: Apache Hive (version 3.1.3000.7.1.3.0-100) Driver: Hive JDBC (version 3.1.3000.7.1.3.0-100) Transaction isolation: TRANSACTION_REPEATABLE_READ INFO : Compiling command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2): load data inpath '/origin_data/gmall/db/order_info/2020-11-24' OVERWRITE into table gmall.ods_order_info partition(dt='2020-11-24') INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2); Time taken: 0.175 seconds INFO : Executing command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2): load data inpath '/origin_data/gmall/db/order_info/2020-11-24' OVERWRITE into table gmall.ods_order_info partition(dt='2020-11-24') INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table gmall.ods_order_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/order_info/2020-11-24 INFO : Starting task [Stage-1:STATS] in serial mode INFO : Completed executing command(queryId=hive_20201125120407_cb083501-a516-40dc-8f5f-1b7e091f98d2); Time taken: 0.462 seconds INFO : OK No rows affected (0.716 seconds) INFO : Compiling command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256): load data inpath '/origin_data/gmall/db/order_detail/2020-11-24' OVERWRITE into table gmall.ods_order_detail partition(dt='2020-11-24') INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256); Time taken: 0.16 seconds INFO : Executing command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256): load data inpath '/origin_data/gmall/db/order_detail/2020-11-24' OVERWRITE into table gmall.ods_order_detail partition(dt='2020-11-24') INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table gmall.ods_order_detail partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/order_detail/2020-11-24 INFO : Starting task [Stage-1:STATS] in serial mode INFO : Completed executing command(queryId=hive_20201125120408_2d33ea0d-8a3c-4247-b315-c6ec2c033256); Time taken: 0.473 seconds INFO : OK No rows affected (0.655 seconds) INFO : Compiling command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b): load data inpath '/origin_data/gmall/db/sku_info/2020-11-24' OVERWRITE into table gmall.ods_sku_info partition(dt='2020-11-24') INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b); Time taken: 0.154 seconds INFO : Executing command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b): load data inpath '/origin_data/gmall/db/sku_info/2020-11-24' OVERWRITE into table gmall.ods_sku_info partition(dt='2020-11-24') INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table gmall.ods_sku_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/sku_info/2020-11-24 INFO : Starting task [Stage-1:STATS] in serial mode INFO : Completed executing command(queryId=hive_20201125120409_c01eb4e3-beaa-4235-9ac1-4d2a70614b2b); Time taken: 0.418 seconds INFO : OK No rows affected (0.605 seconds) INFO : Compiling command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512): load data inpath '/origin_data/gmall/db/user_info/2020-11-24' OVERWRITE into table gmall.ods_user_info partition(dt='2020-11-24') INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512); Time taken: 0.154 seconds INFO : Executing command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512): load data inpath '/origin_data/gmall/db/user_info/2020-11-24' OVERWRITE into table gmall.ods_user_info partition(dt='2020-11-24') INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table gmall.ods_user_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/user_info/2020-11-24 INFO : Starting task [Stage-1:STATS] in serial mode INFO : Completed executing command(queryId=hive_20201125120409_808c17aa-5c8f-48b0-a27e-76f914c95512); Time taken: 0.419 seconds INFO : OK No rows affected (0.597 seconds) INFO : Compiling command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df): load data inpath '/origin_data/gmall/db/payment_info/2020-11-24' OVERWRITE into table gmall.ods_payment_info partition(dt='2020-11-24') INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df); Time taken: 0.154 seconds INFO : Executing command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df): load data inpath '/origin_data/gmall/db/payment_info/2020-11-24' OVERWRITE into table gmall.ods_payment_info partition(dt='2020-11-24') INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table gmall.ods_payment_info partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/payment_info/2020-11-24 INFO : Starting task [Stage-1:STATS] in serial mode INFO : Completed executing command(queryId=hive_20201125120410_676c568e-b172-473a-bbc6-f9f9b15e78df); Time taken: 0.441 seconds INFO : OK No rows affected (0.616 seconds) INFO : Compiling command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb): load data inpath '/origin_data/gmall/db/base_category1/2020-11-24' OVERWRITE into table gmall.ods_base_category1 partition(dt='2020-11-24') INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb); Time taken: 0.149 seconds INFO : Executing command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb): load data inpath '/origin_data/gmall/db/base_category1/2020-11-24' OVERWRITE into table gmall.ods_base_category1 partition(dt='2020-11-24') INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table gmall.ods_base_category1 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category1/2020-11-24 INFO : Starting task [Stage-1:STATS] in serial mode INFO : Completed executing command(queryId=hive_20201125120411_d082247f-1fb6-45c4-b6a6-622f60a312bb); Time taken: 0.413 seconds INFO : OK No rows affected (0.583 seconds) INFO : Compiling command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91): load data inpath '/origin_data/gmall/db/base_category2/2020-11-24' OVERWRITE into table gmall.ods_base_category2 partition(dt='2020-11-24') INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91); Time taken: 0.153 seconds INFO : Executing command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91): load data inpath '/origin_data/gmall/db/base_category2/2020-11-24' OVERWRITE into table gmall.ods_base_category2 partition(dt='2020-11-24') INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table gmall.ods_base_category2 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category2/2020-11-24 INFO : Starting task [Stage-1:STATS] in serial mode INFO : Completed executing command(queryId=hive_20201125120411_2d79b915-9fb8-4e71-9a26-97d4e369ed91); Time taken: 0.411 seconds INFO : OK No rows affected (0.585 seconds) INFO : Compiling command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d): load data inpath '/origin_data/gmall/db/base_category3/2020-11-24' OVERWRITE into table gmall.ods_base_category3 partition(dt='2020-11-24') INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d); Time taken: 0.154 seconds INFO : Executing command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d): load data inpath '/origin_data/gmall/db/base_category3/2020-11-24' OVERWRITE into table gmall.ods_base_category3 partition(dt='2020-11-24') INFO : Starting task [Stage-0:MOVE] in serial mode INFO : Loading data to table gmall.ods_base_category3 partition (dt=2020-11-24) from hdfs://cdh1.macro.com:8020/origin_data/gmall/db/base_category3/2020-11-24 INFO : Starting task [Stage-1:STATS] in serial mode INFO : Completed executing command(queryId=hive_20201125120412_02cc534c-8a19-4f4c-bbb9-d422efda325d); Time taken: 0.425 seconds INFO : OK No rows affected (0.601 seconds) Beeline version 3.1.3000.7.1.3.0-100 by Apache Hive Closing: 0: jdbc:hive2://cdh1.macro.com:2181,cdh2.macro.com:2181,cdh3.macro.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;user=root;zooKeeperNamespace=hiveserver2
DWD层
对ODS层数据进行判空过滤。对商品分类表进行维度退化(降维)
创建订单表
drop table if exists dwd_order_info; create external table dwd_order_info ( `id` string COMMENT '', `total_amount` decimal(10,2) COMMENT '', `order_status` string COMMENT ' 1 2 3 4 5', `user_id` string COMMENT 'id' , `payment_way` string COMMENT '', `out_trade_no` string COMMENT '', `create_time` string COMMENT '', `operate_time` string COMMENT '' ) PARTITIONED BY ( `dt` string) stored as parquet;
订单详情表
drop table if exists dwd_order_detail; create external table dwd_order_detail( `id` string COMMENT '', `order_id` decimal(10,2) COMMENT '', `user_id` string COMMENT 'id' , `sku_id` string COMMENT 'id', `sku_name` string COMMENT '', `order_price` string COMMENT '', `sku_num` string COMMENT '', `create_time` string COMMENT '' ) PARTITIONED BY (`dt` string) stored as parquet;
用户表
drop table if exists dwd_user_info; create external table dwd_user_info( `id` string COMMENT 'id', `name` string COMMENT '', `birthday` string COMMENT '' , `gender` string COMMENT '', `email` string COMMENT '', `user_level` string COMMENT '', `create_time` string COMMENT '' ) PARTITIONED BY (`dt` string) stored as parquet;
支付流水表
drop table if exists `dwd_payment_info`; create external table `dwd_payment_info`( `id` bigint COMMENT '', `out_trade_no` string COMMENT '', `order_id` string COMMENT '', `user_id` string COMMENT '', `alipay_trade_no` string COMMENT '', `total_amount` decimal(16,2) COMMENT '', `subject` string COMMENT '', `payment_type` string COMMENT '', `payment_time` string COMMENT '' ) PARTITIONED BY ( `dt` string) stored as parquet;
创建商品表(增加分类)
drop table if exists dwd_sku_info; create external table dwd_sku_info( `id` string COMMENT 'skuId', `spu_id` string COMMENT 'spuid', `price` decimal(10,2) COMMENT '' , `sku_name` string COMMENT '', `sku_desc` string COMMENT '', `weight` string COMMENT '', `tm_id` string COMMENT 'id', `category3_id` string COMMENT '1id', `category2_id` string COMMENT '2id', `category1_id` string COMMENT '3id', `category3_name` string COMMENT '3', `category2_name` string COMMENT '2', `category1_name` string COMMENT '1', `create_time` string COMMENT '' ) PARTITIONED BY ( `dt` string) stored as parquet;