mac os 下安装hadoop-2.7.3+hive-2.1.1+sqoop-1.99.3

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS PostgreSQL,集群系列 2核4GB
简介: hadoop+hive+sqoop安装与使用

hadoop 安装

安装jdk

vim ~/.bash_profile

export JAVA_HOME="YOUR_JAVA_HOME"
export PATH=$PATH:$JAVA_HOME/bin

配置完成后,运行

java -version
--------------
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

ssh免密登入

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh localhost # 验证

配置hadoop

下载hadoop,解压到指定目录,这里是/opt
配置系统变量

vim ~/.bash_profile

export HADOOP_HOME=/opt/hadoop-2.7.3
export HADOOP_PREFIX=$HADOOP_HOME
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

修改 /etc/hadoop/hadoop-env.sh

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home
export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop
export HADOOP_OPTS="$HADOOP_OPTS -Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

修改/etc/hadoop/core-site.xml

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
    <description>The name of the default file system.</description>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>/Users/micmiu/tmp/hadoop</value>
    <description>A base for other temporary directories.</description>
  </property>

  <property>
    <name>io.native.lib.available</name>
    <value>false</value>
    <description>default value is true:Should native hadoop libraries, if present, be used.</description>
  </property>

修改hdfs-site.xml

<property>
        <name>dfs.replication</name>
        <value>1</value>
        <!--如果是单节点配置为1,如果是集群根据实际集群数量配置 -->
</property>

修改yarn-site.xml

<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

修改mapred-site.xml

cp mapred-site.xml.template mapred-site.xml.
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <final>true</final>
</property>

格式化namenode

hadoop namenode -format

启动hdfs和yarn

start-dfs.sh
start-yarn.sh

查看守护进程是否开启

jps

6917 DataNode
6838 NameNode
2810 Launcher
7130 ResourceManager
7019 SecondaryNameNode
7772 Jps
7215 NodeManager

wordcount示例

hdfs dfs -mkdir -p /user/jjzhu/wordcount/in
hdfs dfs -put xxxxx.txt /user/jjzhu/wordcount/in
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /user/jjzhu/wordcount/in /user/jjzhu/wordcount/out

运行过程

17/04/07 13:04:10 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/04/07 13:04:10 INFO input.FileInputFormat: Total input paths to process : 1
17/04/07 13:04:10 INFO mapreduce.JobSubmitter: number of splits:1
17/04/07 13:04:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491532908338_0004
17/04/07 13:04:11 INFO impl.YarnClientImpl: Submitted application application_1491532908338_0004
17/04/07 13:04:11 INFO mapreduce.Job: The url to track the job: http://jjzhu:8088/proxy/application_1491532908338_0004/
17/04/07 13:04:11 INFO mapreduce.Job: Running job: job_1491532908338_0004
17/04/07 13:04:18 INFO mapreduce.Job: Job job_1491532908338_0004 running in uber mode : false
17/04/07 13:04:18 INFO mapreduce.Job:  map 0% reduce 0%
17/04/07 13:04:23 INFO mapreduce.Job:  map 100% reduce 0%
17/04/07 13:04:29 INFO mapreduce.Job:  map 100% reduce 100%
17/04/07 13:04:29 INFO mapreduce.Job: Job job_1491532908338_0004 completed successfully
17/04/07 13:04:29 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=1141
        FILE: Number of bytes written=239913
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=869
        HDFS: Number of bytes written=779
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=2859
        Total time spent by all reduces in occupied slots (ms)=2527
        Total time spent by all map tasks (ms)=2859
        Total time spent by all reduce tasks (ms)=2527
        Total vcore-milliseconds taken by all map tasks=2859
        Total vcore-milliseconds taken by all reduce tasks=2527
        Total megabyte-milliseconds taken by all map tasks=2927616
        Total megabyte-milliseconds taken by all reduce tasks=2587648
    Map-Reduce Framework
        Map input records=1
        Map output records=118
        Map output bytes=1219
        Map output materialized bytes=1141
        Input split bytes=122
        Combine input records=118
        Combine output records=89
        Reduce input groups=89
        Reduce shuffle bytes=1141
        Reduce input records=89
        Reduce output records=89
        Spilled Records=178
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=103
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=329252864
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=747
    File Output Format Counters 
        Bytes Written=779

查看结果

hdfs dfs -ls /user/jjzhu/wordcount/out

-rw-r--r--   1 didi supergroup          0 2017-04-07 13:04 /user/jjzhu/wordcount/out/_SUCCESS
-rw-r--r--   1 didi supergroup        779 2017-04-07 13:04 /user/jjzhu/wordcount/out/part-r-00000

hdfs dfs -cat /user/jjzhu/wordcount/out/part-r-00000

A    1
Other    1
Others    1
Some    2
There    1
a    1
access    2
access);    1
according    1
adding    1
allowing    1
......

关闭hadoop

stop-hdfs.sh
stop-yarn.sh

安装hive

下载解压配置环境变量
export HIVE_HOME=/opt/hive-2.1.1
export PATH=$HIVE_HOME/bin:$PATH

配置hive

cd /opt/hive/conf
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml

vim hive-env.sh
HADOOP_HOME=/opt/hadoop-2.7.3
export HIVE_CONF_DIR=/opt/hive-2.1.1/conf
export HIVE_AUX_JARS_PATH=/opt/hive-2.1.1//lib

下载mysql-connector-xx.xx.xx.jar 到lib下

vim hive-site.xml
将所有${system:java.io.tmpdir} 和 ${system:user.name}替换
并配置mysql数据库连接信息

<property>
 <name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
</property>
<property>
 <name>javax.jdo.option.ConnectionDriverName</name>
 <value>com.mysql.jdbc.Driver</value>
</property>
<property>
 <name>javax.jdo.option.ConnectionUserName</name>
 <value>root</value>
</property>
<property>
 <name>javax.jdo.option.ConnectionPassword</name>
 <value>123456</value>
</property>

为hive创建HDFS目录

hdfs dfs -mkdir -p  /usr/hive/warehouse

hdfs dfs -mkdir -p /usr/hive/tmp

hdfs dfs -mkdir -p /usr/hive/log

hdfs dfs -chmod -R  777 /usr/hive

初始化数据库

./bin/schematool -initSchema -dbType mysql

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
mysql> use hive;
Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| AUX_TABLE                 |
| BUCKETING_COLS            |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
| WRITE_SET                 |
+---------------------------+

启动hive

jjzhu:opt didi$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive-2.1.1/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> 

安装sqoop

下载解压配置环境变量

export $SQOOP_HOME=/opt/sqoop-1.99.7
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra  
export PATH=$SQOOP_HOME/bin:$PATH

修改sqoop配置

在conf目录下的两个主要配置文件sqoop.properties和sqoop_bootstrap.properties
主要修改sqoop.properties

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/opt/hadoop-2.7.3/etc/hadoop  
  
org.apache.sqoop.security.authentication.type=SIMPLE  
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler  
org.apache.sqoop.security.authentication.anonymous=true  

验证配置是否有效

jzhu:bin didi$ ./sqoop2-tool verify
Setting conf dir: /opt/sqoop-1.99.7/bin/../conf
Sqoop home directory: /opt/sqoop-1.99.7
Sqoop tool executor:
    Version: 1.99.7
    Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
    Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
12   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
jjzhu:bin didi$ 

开启服务器

./bin/sqoop2-server start

jps

9505 SqoopJettyServer
....
相关实践学习
如何快速连接云数据库RDS MySQL
本场景介绍如何通过阿里云数据管理服务DMS快速连接云数据库RDS MySQL,然后进行数据表的CRUD操作。
全面了解阿里云能为你做什么
阿里云在全球各地部署高效节能的绿色数据中心,利用清洁计算为万物互联的新世界提供源源不断的能源动力,目前开服的区域包括中国(华北、华东、华南、香港)、新加坡、美国(美东、美西)、欧洲、中东、澳大利亚、日本。目前阿里云的产品涵盖弹性计算、数据库、存储与CDN、分析与搜索、云通信、网络、管理与监控、应用服务、互联网中间件、移动服务、视频服务等。通过本课程,来了解阿里云能够为你的业务带来哪些帮助 &nbsp; &nbsp; 相关的阿里云产品:云服务器ECS 云服务器 ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,助您降低 IT 成本,提升运维效率,使您更专注于核心业务创新。产品详情: https://www.aliyun.com/product/ecs
目录
相关文章
|
3月前
|
SQL 分布式计算 关系型数据库
Hadoop-21 Sqoop 数据迁移工具 简介与环境配置 云服务器 ETL工具 MySQL与Hive数据互相迁移 导入导出
Hadoop-21 Sqoop 数据迁移工具 简介与环境配置 云服务器 ETL工具 MySQL与Hive数据互相迁移 导入导出
122 3
|
3月前
|
SQL 分布式计算 Hadoop
Hadoop-12-Hive 基本介绍 下载安装配置 MariaDB安装 3台云服务Hadoop集群 架构图 对比SQL HQL
Hadoop-12-Hive 基本介绍 下载安装配置 MariaDB安装 3台云服务Hadoop集群 架构图 对比SQL HQL
104 3
|
3月前
|
SQL 关系型数据库 MySQL
Hadoop-25 Sqoop迁移 增量数据导入 CDC 变化数据捕获 差量同步数据 触发器 快照 日志
Hadoop-25 Sqoop迁移 增量数据导入 CDC 变化数据捕获 差量同步数据 触发器 快照 日志
61 0
|
3月前
|
SQL 分布式计算 关系型数据库
Hadoop-24 Sqoop迁移 MySQL到Hive 与 Hive到MySQL SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
Hadoop-24 Sqoop迁移 MySQL到Hive 与 Hive到MySQL SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
142 0
|
3月前
|
SQL 分布式计算 关系型数据库
Hadoop-23 Sqoop 数据MySQL到HDFS(部分) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
Hadoop-23 Sqoop 数据MySQL到HDFS(部分) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
60 0
|
3月前
|
SQL 分布式计算 关系型数据库
Hadoop-22 Sqoop 数据MySQL到HDFS(全量) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
Hadoop-22 Sqoop 数据MySQL到HDFS(全量) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
80 0
|
5月前
|
分布式计算 资源调度 Hadoop
centos7二进制安装Hadoop3
centos7二进制安装Hadoop3
|
5月前
|
分布式计算 Ubuntu Hadoop
在Ubuntu 16.04上如何在独立模式下安装Hadoop
在Ubuntu 16.04上如何在独立模式下安装Hadoop
60 1
|
6月前
|
SQL 分布式计算 关系型数据库
Hadoop-12-Hive 基本介绍 下载安装配置 MariaDB安装 3台云服务Hadoop集群 架构图 对比SQL HQL
Hadoop-12-Hive 基本介绍 下载安装配置 MariaDB安装 3台云服务Hadoop集群 架构图 对比SQL HQL
87 2
|
7月前
|
SQL 分布式计算 关系型数据库
分布式系统详解 -- Hive1.2.1 安装
分布式系统详解 -- Hive1.2.1 安装
87 1