开发者学堂课程【新电商大数据平台2020最新课程:电商项目之 DataX 案例参数讲解】学习笔记,与课程紧密联系,让用户快速学习知识。
课程地址:https://developer.aliyun.com/learning/course/640/detail/10571
电商项目之 DataX 案例参数讲解
内容介绍:
一、dataX 实战案例
二、dataX 安装步骤
一、dataX 实战案例
第一步:需要在 hive 中先创建相应的表
create table if not exists ods_ nshop.ods _02_ customer_ datax (
customer_ id string COMMENT ' 用户ID' ,
customer_ login string COMMENT ' 用户登录名',
customer_ nickname string COMMENT ‘用户名(昵称)',
customer_ name string COMMENT ' 用户真实姓名' ,
customer_ pass string COMMENT ' 用户密码',
customer_ mobile string COMMENT ' 用户手机',
customer_ idcard string COMMENT‘ 身份证',
customer_ gender TINYINT COMMENT '性别: 1男女',
customer_ birthday string COMMENT‘ 出生年月',
customer_ email string COMMENT ' 用户邮箱',
customer_ natives string COMMENT ' 所在地区',
customer_ ctime bigint COMMENT ' 创建时间',
customer_ utime bigint COMMENT ' 修改时间'
)row format delimited
fields terminated by”,”
location ' /data/nshop/ods/ods_ 02_ customer_ datax/'
第二步:编写脚本
{
“job”: (
"setting": {
//三星配置
speed": (
"channel": 3
"errorlinit": {
"record":0,
"percentage": 0.02
}
},
"content": [
//数组形式
"reader": {
"name": "mysqlreader" ,
"paraneter*: {
"writeMode":" insert",
"username": "root" ,
"password": "12345678" 。
"column": [
//列
"customer_ id" ,
"customer. login",
"customer_ nickname" ,
"customer_ name"
“customer_ pass",
"customer_ mobile" ,
"customer. jidcard" ,
"customer_ gender" ,
"customer. birthday" ,
“customer_ enal1" ,
"customer_ natives" a
"customer_ ctime" 。
"customer. ut ine"
],
"connection"I [
"table": [
"orders"
],
“jdbcur1": [
"jdbc :mysq1://10.0.88.242:3306/nshop"
]
}}}
},
"writer": {
"name": "hdfswriter" ,
"parameter": {
"defaultFs": "hdfs://hdfsCluster".
"hadoopConfig": {
"dfs .nameservices": "hdfsCluster",
"dfs.ha. namenodes .hdfsCluster": "nn1 ,nn2",
"dfs .namenode.rpc - address .hdfsCluster .nn1”: "node242: 8020",
"dfs .namenode .rpc - address .hdfsCluster .nn2" : "node244:8020 ,
"dfs.client. failover . proxy. provider .hdfsCluster":
"org. apache . hadoop .hdfs. server . namenode .ha. Conf iguredFai loverProxyrovider
}, //HDM上写
"fileType": "text",
//数据类型
"path": "/data/nshop/0ds/ods. 02. customer. datax",
//路径
"fileName": "ods_ 02 customer. datax",//文件名
"column":{
{
"name": "customer. id"。
"type": "string"
},{
"name": "customer_nickname' ,
"type": "string"
},
"name": "customer_login" ,
"type": "string"
},
{
"name": "customer_ name" ,
"type": "string" '
},
{
"name": "customer_pass",
"type": "string"
},
"name": "customer. mobile",
"type": "string"
},.....
],
writeode": "append".
"fieldDelimiter": ",”
//写入
二、dataX 安装步骤
1.方法一、直接下载 DataX 工具包: DataX 下载地址
下载后解压至本地某个目录,进入 bin 日录,即可运行同步作业:
$ cd {YOUR DATAX _HOME}/bin
$ python datax.py {YOUR_J0B. json}
自检脚本: python (YOUR_DATAX_HOME)/bin/datax.py (YOUR DATAX HOME)/job/job.json
2.方法二、下载 Datax 源码,自己编译: DataX 源码
(1)、下载 DataX 源码:
$ git clone git@github.com:alibaba/Datax.git
(2).通过 maven 打包:
$ cd {Dat ax _source_ code_ home}
$ mVn -U clean
package assembly:assembly - Dmaven. test. skip=true
打包成功,日志显示如下:
[INFO] BUILD SUCCESS
[INFO]-----------------------------------
[INFO] Total time: 08:12 min
[INFO] Finished at: 2015-12-13T16: 26:48+08:00
[INFO] Final Memory: 133M/966M
[INFO] ------------------------------------
打包成功后的 DataX 包位于
(DataX. source. .code home/target/datax/datax/.
结构如下:
$ cd {Datax_ source_ code_ home}
$ 1s ./target/datax/datax/
bin conf job lib log log_ perf plugin script tmp