sqoop是hadoop项目中一个插件,可以把分布式文件系统hdfs中内容导入到mysql指定表里面,也可以把mysql中内容导入到hdfs文件系统中进行后续操作。
测试环境说明:
hadoop版本:hadoop-0.20.2
sqoop版本:sqoop-1.2.0-CDH3B4
java版本:jdk1.7.0_67
mysql版本:5.1.65
特别说明:
因为我安装的是hadoop-0.20.2版本,sqoop不支持这个版本,但是可以使用CDH3版本的hadoop,也可以通过copy相关文件,达到目的。
下载链接:
http://archive.cloudera.com/cdh/3/hadoop-0.20.2-CDH3B4.tar.gz
http://archive.cloudera.com/cdh/3/sqoop-1.2.0-CDH3B4.tar.gz
sqoop-1.2.0-CDH3B4依赖hadoop-core-0.20.2-CDH3B4.jar,所以你需要下载hadoop- 0.20.2-CDH3B4.tar.gz,解压缩后将hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2- CDH3B4.jar复制到sqoop-1.2.0-CDH3B4/lib中。
另外,sqoop导入mysql数据运行过程中依赖mysql-connector-java-*.jar,所以你需要下载mysql-connector-java-*.jar并复制到sqoop-1.2.0-CDH3B4/lib中
一 sqoop基本配置:
1)profile环境变量说明:
1
2
3
4
5
6
7
8
9
10
11
|
export
JAVA_HOME=
/usr/local/jdk1
.7.0_67/
export
JRE_HOME=
/usr/local/jdk1
.7.0_67
/jre
export
CLASSPATH=.:$JAVA_HOME
/lib
:$JAVA_HOME
/lib/dt
.jar:$JAVA_HOME
/lib/tools
.jar:$CLASSPATH
export
PIG_HOME=
/usr/local/pig-0
.9.2
#告诉pig客户端,本机hadoop配置文件在什么地方
export
PIG_CLASSPATH=
/usr/local/hadoop-0
.20.2
/conf
export
HBASE_HOME=
/usr/local/hbase-0
.90.5
export
HADOOP_HOME=
/usr/local/hadoop-0
.20.2
export
SQOOP_HOME=
/usr/local/sqoop-1
.2.0-CDH3B4
export
HIVE_HOME=
/usr/local/hive-0
.8.1
export
PATH=$JAVA_HOME
/bin
:$PATH:
/usr/local/bin
:$HADOOP_HOME
/bin
:$HBASE_HOME
/bin
:$PIG_HOME
/bin
:$PIG_CLASSPATHi:$HIVE_HOME
/bin
:$SQOOP_HOME
/bin
:$PATH
|
2)由于本测试案例不需要sqoop检查到hbase和zookeeper环境变量,所以要做一***释:
编辑文件/usr/local/sqoop-1.2.0-CDH3B4/bin/configure-sqoop,修改一下行,增加在注释:
1
2
3
4
5
6
7
8
9
10
|
#if [ ! -d "${HBASE_HOME}" ];then
# echo "Error: $HBASE_HOME does not exist!"
# echo 'Please set $HBASE_HOME to the root of your HBase installation.'
# exit 1
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo "Error: $ZOOKEEPER_HOME does not exist!"
# echo 'Please set $ZOOKEEPER_HOME to the root of your ZooKeeperinstallation.'
# exit 1
#fi
|
3)创建sqoop库,平且授权:
mysql> GRANT ALL PRIVILEGES ON sqoop.* TO 'sqoop'@'172.16.41.%' identified by 'routon' with grant option;
mysql> flush privileges;
4)创建students表,并且插入数据:
mysql> create table students (id int not null primary key, name varchar(20), age int);
插入测试数据:
1
2
3
4
5
|
insert into studentsvalues(
'001'
,
'abc'
,29);
insert into students values(
'002'
,
'def'
,28);
insert into students values(
'003'
,
'aaaa'
,26);
insert into students values(
'004'
,
'efsaz'
,60);
insert into students values(
'005'
,
'kiass'
,63);
|
表中数据为:
mysql> select * from students;
+----+-------+------+
| id | name | age |
+----+-------+------+
| 2 | def | 28 |
| 3 | aaaa | 26 |
| 4 | efsaz | 60 |
| 5 | kiass | 63 |
+----+-------+------+
4 rows in set (0.00 sec)
5)在master设备node1节点上测试sqoop能否成功连接远程主机node29上的mysql,如何出现mysql库中创建的表,就说明已经连接成功了。
如图所示,通过sqoop,可以查看远端测试mysql数据库中students表;
6)将mysql数据库中数据导入到hdfs文件系统中:
说明mysql中国内容,已经导入到分布式文件系统中!
7)hdfs文件系统中文件导入到mysql
说明:数据导入前,先清空students表中数据,使用命令:delete from students;
在master服务器上执行sqoop命令,将hdfs文件内容,导入到mysql数据中students表内!
查看mysql表:
mysql> select * from students;
+----+-------+------+
| id | name | age |
+----+-------+------+
| 2 | def | 28 |
| 3 | aaaa | 26 |
| 4 | efsaz | 60 |
| 5 | kiass | 63 |
+----+-------+------+
4 rows in set (0.00 sec)
测试完成!
本文转自 shine_forever 51CTO博客,原文链接:http://blog.51cto.com/shineforever/1566788