Sqoop1.99.6 Install And From MySQL To Hdfs

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
RDS MySQL Serverless 高可用系列,价值2615元额度,1个月
简介: 环境:  Hadoop2.7.2(NameNode HA,ResourceManage HA)1.下载[root@sht-sgmhadoopnn-01 hadoop]#  wget http://apache.

环境:  Hadoop2.7.2(NameNode HA,ResourceManage HA)

1.下载
[root@sht-sgmhadoopnn-01 hadoop]#  wget http://apache.mirrors.tds.net/sqoop/1.99.6/sqoop-1.99.6-bin-hadoop200.tar.gz
[root@sht-sgmhadoopnn-01 hadoop]#  tar -zxvf sqoop-1.99.6-bin-hadoop200.tar.gz
[root@sht-sgmhadoopnn-01 hadoop]# mv sqoop-1.99.6-bin-hadoop200 sqoop2


2.配置环境变量
[root@sht-sgmhadoopnn-01 ~]$ vim .bash_profile
添加如下内容: 
export SQOOP_HOME=/hadoop/sqoop2
export CATALINA_HOME=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs
export PATH=$SQOOP_HOME/bin:$PATH
保存退出即时生效: 
[root@sht-sgmhadoopnn-01 ~]source .bash_profile

 


3.配置服务端
[root@sht-sgmhadoopnn-01 hadoop]# cd server/conf
[root@sht-sgmhadoopnn-01 conf]# pwd
/hadoop/sqoop2/server/conf
[root@sht-sgmhadoopnn-01 conf]#

[root@sht-sgmhadoopnn-01 conf]$ vi catalina.properties
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/hadoop/hadoop/share/hadoop/common/*.jar,/hadoop/hadoop/share/hadoop/common/lib/*.jar,/hadoop/hadoop/share/hadoop/hdfs/*.jar,/hadoop/hadoop/share/hadoop/hdfs/lib/*.jar,/hadoop/hadoop/share/hadoop/mapreduce/*.jar,/hadoop/hadoop/share/hadoop/mapreduce/lib/*.jar,/hadoop/hadoop/share/hadoop/tools/lib/*.jar,/hadoop/hadoop/share/hadoop/tools/lib/*.jar,/hadoop/hadoop/share/hadoop/yarn/*.jar,/hadoop/hadoop/share/hadoop/yarn/lib/*.jar,/hadoop/hadoop/share/hadoop/httpfs/tomcat/lib/*.jar
###添加集群环境的本机的jar的目录

[root@sht-sgmhadoopnn-01 conf]$ vi sqoop.properties
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/hadoop/hadoop/etc/hadoop

 

4.添加mysql driver
[root@sht-sgmhadoopnn-01 conf]$ cd /hadoop/sqoop2/server/lib
[root@sht-sgmhadoopnn-01 lib]# rz
rz waiting to receive.
Starting zmodem transfer.  Press Ctrl+C to cancel.
Transferring  mysql-connector-java-5.1.36-bin.jar...
  100%     952 KB     952 KB/sec    00:00:01       0 Errors 

[root@sht-sgmhadoopnn-01 lib]# ll
total 10224
-rw-r--r-- 1 root root   15241 Apr 29  2013 annotations-api.jar
-rw-r--r-- 1 root root   54567 Apr 29  2013 catalina-ant.jar
-rw-r--r-- 1 root root  132134 Apr 29  2013 catalina-ha.jar
-rw-r--r-- 1 root root 1244212 Apr 29  2013 catalina.jar
-rw-r--r-- 1 root root  237615 Apr 29  2013 catalina-tribes.jar
-rw-r--r-- 1 root root 1801636 Apr 29  2013 ecj-4.2.2.jar
-rw-r--r-- 1 root root   33315 Apr 29  2013 el-api.jar
-rw-r--r-- 1 root root  112556 Apr 29  2013 jasper-el.jar
-rw-r--r-- 1 root root  527739 Apr 29  2013 jasper.jar
-rw-r--r-- 1 root root   76691 Apr 29  2013 jsp-api.jar
-rw-r--r-- 1 root root  989497 May  4 19:11  mysql-connector-java-5.1.36-bin.jar
-rw-r--r-- 1 root root   88575 Apr 29  2013 servlet-api.jar
-rw-r--r-- 1 root root    8821 Apr 29  2015 sqoop-tomcat-1.99.6.jar
-rw-r--r-- 1 root root  776946 Apr 29  2013 tomcat-coyote.jar
-rw-r--r-- 1 root root  253635 Apr 29  2013 tomcat-dbcp.jar
-rw-r--r-- 1 root root   70014 Apr 29  2013 tomcat-i18n-es.jar
-rw-r--r-- 1 root root   51903 Apr 29  2013 tomcat-i18n-fr.jar
-rw-r--r-- 1 root root   54511 Apr 29  2013 tomcat-i18n-ja.jar


5.启动服务端
[root@sht-sgmhadoopnn-01 bin]# ./sqoop2-server start
Sqoop home directory: /hadoop/sqoop2
Setting SQOOP_HTTP_PORT:     12000
Setting SQOOP_ADMIN_PORT:     12001
Using   CATALINA_OPTS:      
Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE:   /hadoop/sqoop2/server
Using CATALINA_HOME:   /hadoop/sqoop2/server
Using CATALINA_TMPDIR: /hadoop/sqoop2/server/temp
Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera
Using CLASSPATH:       /hadoop/sqoop2/server/bin/bootstrap.jar


6.启动客户端
[root@sht-sgmhadoopnn-01 bin]# sqoop2-shell
Sqoop home directory: /hadoop/sqoop2
Sqoop Shell: Type 'help' or '\h' for help.

#设置 连接服务端
sqoop:000> set server --host sht-sgmhadoopnn-01 --port 12000 --webapp sqoop  
Server is set successfully
sqoop:000>

#验证是否已经连上
sqoop:000> show version --all 
client version:
  Sqoop 1.99.6 source revision 07244c3915975f26f03d9e1edf09ab7d06619bb8
  Compiled by root on Wed Apr 29 10:40:43 CST 2015
server version:
  Sqoop 1.99.6 source revision 07244c3915975f26f03d9e1edf09ab7d06619bb8
  Compiled by root on Wed Apr 29 10:40:43 CST 2015
API versions:
  [v1]
sqoop:000>


#检查Sqoop服务(server)已经注册的 connectors
sqoop:000> show connector --all
+----+------------------------+---------+------------------------------------------------------+----------------------+
| Id |          Name          | Version |                        Class                         | Supported Directions |
+----+------------------------+---------+------------------------------------------------------+----------------------+
| 1  | kite-connector         | 1.99.6  | org.apache.sqoop.connector.kite.KiteConnector        | FROM/TO              |
| 2  | kafka-connector        | 1.99.6  | org.apache.sqoop.connector.kafka.KafkaConnector      | TO                   |
| 3  | hdfs-connector         | 1.99.6  | org.apache.sqoop.connector.hdfs.HdfsConnector        | FROM/TO              |
| 4  | generic-jdbc-connector | 1.99.6  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO              |
+----+------------------------+---------+------------------------------------------------------+----------------------+


###创建mysql connector的link
sqoop:000> create link -c 4  #注意:这边的4是connector的id,表明创建的是一个generic jdbc connector
Creating link for connector with id 4
Please fill following values to create new link object
Name: mysql-link    #注意:Name是唯一的

Link configuration

JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://sht-sgmhadoopnn-01:3306/test1   #注意:jdbc:mysql://主机名(ip):端口/数据库名
Username: root
Password: **********
JDBC Connection Properties:
There are currently 0 values in the map:
entry#   #按回车键

There were warnings while create or update, but saved successfully.
Warning message: Can't connect to the database with given credentials: No suitable driver found for jdbc:jdbc://sht-sgmhadoopnn-01:/test1
New link was successfully created with validation status WARNING and persistent id 1
sqoop:000>


###创建hdfs connector的link
sqoop:000> create link -c 3 
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs-link

Link configuration

HDFS URI: hdfs://mycluster/    #注意,为namenode HA的命名空间mycluster(当然也可以指定active namenode节点上,HDFS URI改为 hdfs://sht-sgmhadoopnn-01:8020/)
Hadoop conf directory: /hadoop/hadoop/etc/hadoop
New link was successfully created with validation status OK and persistent id 2
sqoop:000>


###查看 创建的link
sqoop:000> show link   
+----+------------+--------------+------------------------+---------+
| Id |    Name    | Connector Id |     Connector Name     | Enabled |
+----+------------+--------------+------------------------+---------+
| 1  | mysql-link | 4            | generic-jdbc-connector | true    |
| 2  | hdfs-link  | 3            | hdfs-connector         | true    |
+----+------------+--------------+------------------------+---------+
sqoop:000>
sqoop:000> show link -all
2 link(s) to show:
link with id 1 and name mysql-link (Enabled: true, Created by root at 7/29/16 10:56 PM, Updated by root at 7/29/16 10:56 PM)
Using Connector generic-jdbc-connector with id 4
  Link configuration
    JDBC Driver Class: com.mysql.jdbc.Driver
    JDBC Connection String: jdbc:mysql://sht-sgmhadoopnn-01:3306/test1
    Username: root
    Password:
    JDBC Connection Properties:
link with id 2 and name hdfs-link (Enabled: true, Created by root at 7/29/16 11:00 PM, Updated by root at 7/29/16 11:00 PM)
Using Connector hdfs-connector with id 3
  Link configuration
    HDFS URI: hdfs://mycluster/
    Hadoop conf directory: /hadoop/hadoop/etc/hadoop
sqoop:000>

 


#####创建job对象,这个别写错了,使用LINK ID,而不是 Connector Id
#####使用这两个link Id来关联job的From和To部分。说的通俗一点,就是我们需要从哪里(From)读取数据,把这些数据导入(To)到哪里
sqoop:000> create job -f 1 -t 2  
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: testsqoop  #Name必须唯一

From database configuration

Schema name: test1   ##数据库的database name
Table name: person   ##该库的表
Table SQL statement:  #可选
Table column names:   #可选
Partition column name:  #可选
Null value allowed for the partition column:  #可选
Boundary query:       #可选
 
Incremental read

Check column:     #可选
Last value:       #可选

To HDFS configuration

Override null value:    #可选
Null value:             #可选
Output format:
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
Choose:            #必选,为文本格式
Compression format:
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose:         #必选 ,不压缩
Custom compression format:
Output directory: /sqoop/test  ###必选, hdfs的输出路径
Append mode:

Throttling resources

Extractors:  #可选,对应mapreduce的job中的map的数量
Loaders: 1      #可选,对应mapreduce的job中的reduce的数量
New job was successfully created with validation status OK  and persistent id 1
sqoop:000>

###查看job
qoop:000> show job
+----+-----------+----------------+--------------+---------+
| Id |   Name    | From Connector | To Connector | Enabled |
+----+-----------+----------------+--------------+---------+
| 1  | testsqoop | 4              | 3            | true    |
+----+-----------+----------------+--------------+---------+
sqoop:000>


###开启错误信息显示
sqoop:000> set option --name verbose --value true
Verbose option was changed to true
sqoop:000>

###开启job 1
sqoop:000> start job -j 1 -s
Submission details
Job ID: 1
Server URL: http://sht-sgmhadoopnn-01:12000/sqoop/
Created by: root
Creation date: 2016-07-29 23:40:16 CST
Lastly updated by: root
External ID: job_1469795998430_0001
        http://sht-sgmhadoopnn-01:8088/proxy/application_1469795998430_0001/
Source Connector schema: Schema{name=test1.person,columns=[
        FixedPoint{name=ID,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
        Text{name=NAME,nullable=true,type=TEXT,charSize=null},
        Text{name=TITLE,nullable=true,type=TEXT,charSize=null}]}
2016-07-29 23:40:16 CST: BOOTING  - Progress is not available
2016-07-29 23:40:39 CST: RUNNING  - 0.00 %
2016-07-29 23:40:50 CST: RUNNING  - 0.00 %
2016-07-29 23:41:00 CST: RUNNING  - 0.00 %
2016-07-29 23:41:10 CST: RUNNING  - 0.00 %
2016-07-29 23:41:20 CST: RUNNING  - 100.00 %
2016-07-29 23:41:37 CST: FAILED
Exception: Job Failed with status:3
Stack trace:
sqoop:000>


#####错误    进入http://sht-sgmhadoopnn-01:8088/proxy/application_1469795998430_0001/   查看job的failed的原因
Log Length: 3070

2016-07-29 23:40:51,075 [main] INFO  org.apache.sqoop.job.mr.SqoopMapper  - Starting progress service
2016-07-29 23:40:51,077 [main] INFO  org.apache.sqoop.job.mr.SqoopMapper  - Running extractor class org.apache.sqoop.connector.jdbc.GenericJdbcExtractor
2016-07-29 23:40:51,613 [main] ERROR org.apache.sqoop.connector.jdbc.GenericJdbcExecutor  - Caught SQLException:
java.sql.SQLException: Access denied for user 'root'@'sht-sgmhadoopdn-03.telenav.cn' (using password: YES)

###解决: mysql-link的配置的host连接不上,需要将其中root权限记录选取一行改为%
sht-sgmhadoopnn-01:mysqladmin:/usr/local/mysql:>mysql $c
mysql> select host,user,password from mysql.user;
+-------------------------------+-----------+-------------------------------------------+
| host                          | user      | password                                  |
+-------------------------------+-----------+-------------------------------------------+
| localhost                     | root      | *6340BE3C15D246B0D74BAF3F135915ED19E0063F |
| sht-sgmhadoopnn-01.telenav.cn | root      | *6340BE3C15D246B0D74BAF3F135915ED19E0063F |
| 127.0.0.1                     | root      | *6340BE3C15D246B0D74BAF3F135915ED19E0063F |
| ::1                           | root      | *6340BE3C15D246B0D74BAF3F135915ED19E0063F |
| %                             | repl_user | *A8E5E2B374324130F4B4CC73C63F5032D1E4195E |
| %                             | repl      | *A424E797037BF97C19A2E88CF7891C5C2038C039 |
| %                             | hive      | *4DF1D66463C18D44E3B001A8FB1BBFBEA13E27FC |
+-------------------------------+-----------+-------------------------------------------+
7 rows in set (0.67 sec)

mysql> update mysql.user set host='%' where  host='127.0.0.1';
Query OK, 1 row affected (0.61 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> flush privileges;
Query OK, 0 rows affected (1.70 sec)

mysql>

###再次运行job 1
sqoop:000> start job -j 1  
Submission details
Job ID: 1
Server URL: http://sht-sgmhadoopnn-01:12000/sqoop/
Created by: root
Creation date: 2016-07-30 00:00:41 CST
Lastly updated by: root
External ID: job_1469795998430_0002
        http://sht-sgmhadoopnn-01:8088/proxy/application_1469795998430_0002/
Source Connector schema: Schema{name=test1.person,columns=[
        FixedPoint{name=ID,nullable=true,type=FIXED_POINT,byteSize=4,signed=true},
        Text{name=NAME,nullable=true,type=TEXT,charSize=null},
        Text{name=TITLE,nullable=true,type=TEXT,charSize=null}]}
2016-07-30 00:00:41 CST: BOOTING  - Progress is not available

 


###hdfs 查看
[root@sht-sgmhadoopnn-01 lib]# hadoop fs -text /sqoop/test/1b89afc4-c882-4fd2-abcb-9758a5d709ee.txt
1,'rc        ','dba'
[root@sht-sgmhadoopnn-01 lib]#


Example:
start job --jid 1
start job --jid 1 -s

Argument         Description
-j, --jid <x>     Start job with id <x>
-s, --synchronous Synchoronous job execution

 

Sqoop2命令行:
http://sqoop.apache.org/docs/1.99.6/CommandLineClient.html

相关实践学习
如何在云端创建MySQL数据库
开始实验后,系统会自动创建一台自建MySQL的 源数据库 ECS 实例和一台 目标数据库 RDS。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助 &nbsp; &nbsp; 相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
30天前
|
消息中间件 分布式计算 关系型数据库
大数据-140 - ClickHouse 集群 表引擎详解5 - MergeTree CollapsingMergeTree 与其他数据源 HDFS MySQL
大数据-140 - ClickHouse 集群 表引擎详解5 - MergeTree CollapsingMergeTree 与其他数据源 HDFS MySQL
38 0
|
1月前
|
SQL 分布式计算 关系型数据库
Hadoop-24 Sqoop迁移 MySQL到Hive 与 Hive到MySQL SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
Hadoop-24 Sqoop迁移 MySQL到Hive 与 Hive到MySQL SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
79 0
|
1月前
|
SQL 分布式计算 关系型数据库
Hadoop-23 Sqoop 数据MySQL到HDFS(部分) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
Hadoop-23 Sqoop 数据MySQL到HDFS(部分) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
34 0
|
27天前
|
存储 SQL 关系型数据库
Mysql学习笔记(二):数据库命令行代码总结
这篇文章是关于MySQL数据库命令行操作的总结,包括登录、退出、查看时间与版本、数据库和数据表的基本操作(如创建、删除、查看)、数据的增删改查等。它还涉及了如何通过SQL语句进行条件查询、模糊查询、范围查询和限制查询,以及如何进行表结构的修改。这些内容对于初学者来说非常实用,是学习MySQL数据库管理的基础。
104 6
|
24天前
|
存储 关系型数据库 MySQL
Mysql(4)—数据库索引
数据库索引是用于提高数据检索效率的数据结构,类似于书籍中的索引。它允许用户快速找到数据,而无需扫描整个表。MySQL中的索引可以显著提升查询速度,使数据库操作更加高效。索引的发展经历了从无索引、简单索引到B-树、哈希索引、位图索引、全文索引等多个阶段。
57 3
Mysql(4)—数据库索引
|
27天前
|
SQL Ubuntu 关系型数据库
Mysql学习笔记(一):数据库详细介绍以及Navicat简单使用
本文为MySQL学习笔记,介绍了数据库的基本概念,包括行、列、主键等,并解释了C/S和B/S架构以及SQL语言的分类。接着,指导如何在Windows和Ubuntu系统上安装MySQL,并提供了启动、停止和重启服务的命令。文章还涵盖了Navicat的使用,包括安装、登录和新建表格等步骤。最后,介绍了MySQL中的数据类型和字段约束,如主键、外键、非空和唯一等。
62 3
Mysql学习笔记(一):数据库详细介绍以及Navicat简单使用
|
10天前
|
关系型数据库 MySQL Linux
在 CentOS 7 中通过编译源码方式安装 MySQL 数据库的详细步骤,包括准备工作、下载源码、编译安装、配置 MySQL 服务、登录设置等。
本文介绍了在 CentOS 7 中通过编译源码方式安装 MySQL 数据库的详细步骤,包括准备工作、下载源码、编译安装、配置 MySQL 服务、登录设置等。同时,文章还对比了编译源码安装与使用 RPM 包安装的优缺点,帮助读者根据需求选择最合适的方法。通过具体案例,展示了编译源码安装的灵活性和定制性。
46 2
|
13天前
|
存储 关系型数据库 MySQL
MySQL vs. PostgreSQL:选择适合你的开源数据库
在众多开源数据库中,MySQL和PostgreSQL无疑是最受欢迎的两个。它们都有着强大的功能、广泛的社区支持和丰富的生态系统。然而,它们在设计理念、性能特点、功能特性等方面存在着显著的差异。本文将从这三个方面对MySQL和PostgreSQL进行比较,以帮助您选择更适合您需求的开源数据库。
54 4
|
18天前
|
存储 关系型数据库 MySQL
如何在MySQL中创建数据库?
【10月更文挑战第16天】如何在MySQL中创建数据库?
|
22天前
|
SQL Oracle 关系型数据库
安装最新 MySQL 8.0 数据库(教学用)
安装最新 MySQL 8.0 数据库(教学用)
95 4