Datax安装及基本使用

本文涉及的产品
智能开放搜索 OpenSearch行业算法版,1GB 20LCU 1个月
实时数仓Hologres,5000CU*H 100GB 3个月
实时计算 Flink 版,5000CU*H 3个月
简介: Datax安装及基本使用

image.png
@[TOC]

一、Datax概述

1.概述

image.png

2.DataX插件体系

image.png

3.DataX核心架构

image.png

二、安装

2.1下载并解压

源码地址: https://github.com/alibaba/DataX
这里我下载的是最新版本的 DataX3.0 。下载地址为:
http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

# 下载后进行解压
[xiaokang@hadoop ~]$ tar -zxvf datax.tar.gz -C /opt/software/

2.2运行自检脚本

[xiaokang@hadoop ~]$ cd /opt/software/datax/
[xiaokang@hadoop datax]$ bin/datax.py job/job.json

出现以下界面说明DataX安装成功
image.png

三、基本使用

3.1从stream读取数据并打印到控制台

1. 查看官方json配置模板

[xiaokang@hadoop ~]$ python /opt/software/datax/bin/datax.py -r streamreader -w streamwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the streamreader document:
     https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md 

Please refer to the streamwriter document:
     https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md 

Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "streamreader", 
                    "parameter": {
                        "column": [], 
                        "sliceRecordCount": ""
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

2. 根据模板编写json文件

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "streamreader", 
                    "parameter": {
                        "column": [
                            {
                                "type":"string",
                                "value":"xiaokang-微信公众号:小康新鲜事儿"
                            },
                            {
                                "type":"string",
                                "value":"你好,世界-DataX"
                            }
                        ], 
                        "sliceRecordCount": "10"
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "utf-8", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "2"
            }
        }
    }
}

3. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./stream2stream.json

image.png

3.2 Mysql导入数据到HDFS

示例:导出 MySQL 数据库中的 help_keyword 表到 HDFS 的 /datax目录下(此目录必须提前创建)。
image.png

1. 查看官方json配置模板

[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r mysqlreader -w hdfswriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the mysqlreader document:
     https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md 

Please refer to the hdfswriter document:
     https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md 

Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": [], 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "username": "", 
                        "where": ""
                    }
                }, 
                "writer": {
                    "name": "hdfswriter", 
                    "parameter": {
                        "column": [], 
                        "compress": "", 
                        "defaultFS": "", 
                        "fieldDelimiter": "", 
                        "fileName": "", 
                        "fileType": "", 
                        "path": "", 
                        "writeMode": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

2. 根据模板编写json文件

image.png

image.png

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "mysqlreader", 
                    "parameter": {
                        "column": [
                            "help_keyword_id",
                            "name"
                        ], 
                        "connection": [
                            {
                                "jdbcUrl": [
                                    "jdbc:mysql://192.168.1.106:3306/mysql"
                                ], 
                                "table": [
                                    "help_keyword"
                                ]
                            }
                        ], 
                        "password": "xiaokang", 
                        "username": "root"
                    }
                }, 
                "writer": {
                    "name": "hdfswriter", 
                    "parameter": {
                        "column": [
                            {
                                "name":"help_keyword_id",
                                "type":"int"
                            },
                            {
                                "name":"name",
                                "type":"string"
                            }
                        ], 
                        "defaultFS": "hdfs://hadoop:9000", 
                        "fieldDelimiter": "|", 
                        "fileName": "keyword.txt", 
                        "fileType": "text", 
                        "path": "/datax", 
                        "writeMode": "append"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "3"
            }
        }
    }
}

3. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./mysql2hdfs.json

3.3 HDFS数据导出到Mysql

1. 将3.2中导入的文件重命名并在数据库创建表

[xiaokang@hadoop ~]$ hdfs dfs -mv /datax/keyword.txt__4c0e0d04_e503_437a_a1e3_49db49cbaaed /datax/keyword.txt

表必须预先创建,建表语句如下:

CREATE  TABLE  help_keyword_from_hdfs_datax LIKE help_keyword;

2. 查看官方json配置模板

[xiaokang@hadoop json]$ python /opt/software/datax/bin/datax.py -r hdfsreader -w mysqlwriter

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.

Please refer to the hdfsreader document:
     https://github.com/alibaba/DataX/blob/master/hdfsreader/doc/hdfsreader.md 

Please refer to the mysqlwriter document:
     https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md 

Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "hdfsreader", 
                    "parameter": {
                        "column": [], 
                        "defaultFS": "", 
                        "encoding": "UTF-8", 
                        "fieldDelimiter": ",", 
                        "fileType": "orc", 
                        "path": ""
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": [], 
                        "connection": [
                            {
                                "jdbcUrl": "", 
                                "table": []
                            }
                        ], 
                        "password": "", 
                        "preSql": [], 
                        "session": [], 
                        "username": "", 
                        "writeMode": ""
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}

3. 根据模板编写json文件

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "hdfsreader", 
                    "parameter": {
                        "column": [
                            "*"
                        ], 
                        "defaultFS": "hdfs://hadoop:9000", 
                        "encoding": "UTF-8", 
                        "fieldDelimiter": "|", 
                        "fileType": "text", 
                        "path": "/datax/keyword.txt"
                    }
                }, 
                "writer": {
                    "name": "mysqlwriter", 
                    "parameter": {
                        "column": [
                            "help_keyword_id",
                            "name"
                        ], 
                        "connection": [
                            {
                                "jdbcUrl": "jdbc:mysql://192.168.1.106:3306/mysql", 
                                "table": ["help_keyword_from_hdfs_datax"]
                            }
                        ], 
                        "password": "xiaokang",  
                        "username": "root", 
                        "writeMode": "insert"
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": "3"
            }
        }
    }
}

4. 运行Job

[xiaokang@hadoop json]$ /opt/software/datax/bin/datax.py ./hdfs2mysql.json

3.4 mysql同步到mysql

{
    "job": {
        "content": [{
            "reader": {
                "name": "mysqlreader",
                "parameter": {
                    "password": "gee123456",
                    "username": "geespace",
                    "connection": [{
                        "jdbcUrl": ["jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"],
                        "querySql": ["SELECT id, name FROM test_test"]
                    }]
                }
            },
            "writer": {
                "name": "mysqlwriter",
                "parameter": {
                    "column": ["id", "name"],
                    "password": "gee123456",
                    "username": "geespace",
                    "writeMode": "insert",
                    "connection": [{
                        "table": ["test_test_1"],
                        "jdbcUrl": "jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"
                    }]
                }
            }
        }],
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}

3.5 mysql同步到hbase

{
    "job": {
        "content": [{
            "reader": {
                "name": "mysqlreader",
                "parameter": {
                    "password": "gee123456",
                    "username": "geespace",
                    "connection": [{
                        "jdbcUrl": ["jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"],
                        "querySql": ["SELECT id, name FROM test_test"]
                    }]
                }
            },
            "writer": {
                "name": "hbase11xwriter",
                "parameter": {
                    "mode": "normal",
                    "table": "test_test_1",
                    "column": [{
                        "name": "f:id",
                        "type": "string",
                        "index": 0
                    }, {
                        "name": "f:name",
                        "type": "string",
                        "index": 1
                    }],
                    "encoding": "utf-8",
                    "hbaseConfig": {
                        "hbase.zookeeper.quorum": "192.168.20.91:2181",
                        "zookeeper.znode.parent": "/hbase"
                    },
                    "rowkeyColumn": [{
                        "name": "f:id",
                        "type": "string",
                        "index": 0
                    }, {
                        "name": "f:name",
                        "type": "string",
                        "index": 1
                    }]
                }
            }
        }],
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}

3.6 hbase同步到hbase

{
    "job": {
        "content": [{
            "reader": {
                "name": "hbase11xreader",
                "parameter": {
                    "mode": "normal",
                    "table": "test_test",
                    "column": [{
                        "name": "f:id",
                        "type": "string"
                    }, {
                        "name": "f:name",
                        "type": "string"
                    }],
                    "encoding": "utf-8",
                    "hbaseConfig": {
                        "hbase.zookeeper.quorum": "192.168.20.91:2181",
                        "zookeeper.znode.parent": "/hbase"
                    }
                }
            },
            "writer": {
                "name": "hbase11xwriter",
                "parameter": {
                    "mode": "normal",
                    "table": "test_test_1",
                    "column": [{
                        "name": "f:id",
                        "type": "string",
                        "index": 0
                    }, {
                        "name": "f:name",
                        "type": "string",
                        "index": 1
                    }],
                    "encoding": "utf-8",
                    "hbaseConfig": {
                        "hbase.zookeeper.quorum": "192.168.20.91:2181",
                        "zookeeper.znode.parent": "/hbase"
                    },
                    "rowkeyColumn": [{
                        "name": "f:id",
                        "type": "string",
                        "index": 0
                    }, {
                        "name": "f:name",
                        "type": "string",
                        "index": 1
                    }]
                }
            }
        }],
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}

3.7 hbase同步到mysql

{
    "job": {
        "content": [{
            "reader": {
                "name": "hbase11xreader",
                "parameter": {
                    "mode": "normal",
                    "table": "test_test_1",
                    "column": [{
                        "name": "f:id",
                        "type": "string"
                    }, {
                        "name": "f:name",
                        "type": "string"
                    }],
                    "encoding": "utf-8",
                    "hbaseConfig": {
                        "hbase.zookeeper.quorum": "192.168.20.91:2181",
                        "zookeeper.znode.parent": "/hbase"
                    }
                }
            },
            "writer": {
                "name": "mysqlwriter",
                "parameter": {
                    "column": ["id", "name"],
                    "password": "gee123456",
                    "username": "geespace",
                    "writeMode": "insert",
                    "connection": [{
                        "table": ["test_test"],
                        "jdbcUrl": "jdbc:mysql://192.168.20.75:9950/geespace_bd_platform_dev"
                    }]
                }
            }
        }],
        "setting": {
            "speed": {
                "channel": 1
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        }
    }
}

四、辅助资料

DataX介绍以及优缺点分析:
https://blog.csdn.net/qq_29359303/article/details/100656445?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522162434122516780357231881%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=162434122516780357231881&biz_id=0&utm_med

datax详细介绍及使用:
https://blog.csdn.net/qq_39188747/article/details/102577017?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522162434122516780357231881%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=162434122516780357231881&biz_id=0&utm_med
image.png

重要信息

image.png
image.png
image.png

目录
相关文章
|
存储 SQL JSON
5、DataX(DataX简介、DataX架构原理、DataX部署、使用、同步MySQL数据到HDFS、同步HDFS数据到MySQL)(一)
5、DataX(DataX简介、DataX架构原理、DataX部署、使用、同步MySQL数据到HDFS、同步HDFS数据到MySQL)(一)
|
存储 监控 关系型数据库
DataX 概述、部署、数据同步运用示例
DataX是阿里巴巴开源的离线数据同步工具,支持多种数据源之间的高效传输。其特点是多数据源支持、可扩展性、灵活配置、高效传输、任务调度监控和活跃的开源社区支持。DataX通过Reader和Writer插件实现数据源的读取和写入,采用Framework+plugin架构。部署简单,解压即可用。示例展示了如何配置DataX同步MySQL到HDFS,并提供了速度和内存优化建议。此外,还解决了NULL值同步问题及配置文件变量传参的方法。
6216 5
|
Java Linux DataX
DataX入门指南:快速部署和安装指南
DataX入门指南:快速部署和安装指南
3018 2
DataX入门指南:快速部署和安装指南
|
11月前
|
缓存 负载均衡 Java
OpenFeign最核心组件LoadBalancerFeignClient详解(集成Ribbon负载均衡能力)
文章标题为“OpenFeign的Ribbon负载均衡详解”,是继OpenFeign十大可扩展组件讨论之后,深入探讨了Ribbon如何为OpenFeign提供负载均衡能力的详解。
OpenFeign最核心组件LoadBalancerFeignClient详解(集成Ribbon负载均衡能力)
|
存储 关系型数据库 MySQL
DataX: 阿里开源的又一款高效数据同步工具
DataX 是由阿里巴巴集团开源的一款大数据同步工具,旨在解决不同数据存储之间的数据迁移、同步和实时交换的问题。它支持多种数据源和数据存储系统,包括关系型数据库、NoSQL 数据库、Hadoop 等。 DataX 提供了丰富的数据读写插件,可以轻松地将数据从一个数据源抽取出来,并将其加载到另一个数据存储中。它还提供了灵活的配置选项和高度可扩展的架构,以适应各种复杂的数据同步需求。
|
11月前
|
SQL 前端开发 Java
在IDEA中使用Maven将SpringBoot项目打成jar包、同时运行打成的jar包(前后端项目分离)
这篇文章介绍了如何在IntelliJ IDEA中使用Maven将Spring Boot项目打包成可运行的jar包,并提供了运行jar包的方法。同时,还讨论了如何解决jar包冲突问题,并提供了在IDEA中同时启动Vue前端项目和Spring Boot后端项目的步骤。
在IDEA中使用Maven将SpringBoot项目打成jar包、同时运行打成的jar包(前后端项目分离)
|
SQL 存储 数据采集
【技术分享】元数据与数据血缘实现思路
【技术分享】元数据与数据血缘实现思路
5420 0
|
11月前
|
Java 关系型数据库 DataX
DATAX数据同步
DATAX数据同步
1566 0
|
存储 Java 关系型数据库
springboot整合datax实现数据同步
springboot整合datax实现数据同步
|
JavaScript 前端开发 应用服务中间件
解决504 GATEWAY TIMEOUT Nginx网关超时
解决504 GATEWAY TIMEOUT Nginx网关超时
5899 0