hadoop&spark安装（上）-阿里云开发者社区

硬件环境：

hddcluster1 10.0.0.197 redhat7

hddcluster2 10.0.0.228 centos7 这台作为master

hddcluster3 10.0.0.202 redhat7

hddcluster4 10.0.0.181 centos7

软件环境：

关闭所有防火墙firewall

openssh-clients

openssh-server

java-1.8.0-openjdk

java-1.8.0-openjdk-devel

hadoop-2.7.3.tar.gz

流程：

选定一台机器作为 Master
在 Master 节点上配置 hadoop 用户、安装 SSH server、安装 Java 环境
在 Master 节点上安装 Hadoop，并完成配置
在其他 Slave 节点上配置 hadoop 用户、安装 SSH server、安装 Java 环境
将 Master 节点上的 /usr/local/hadoop 目录复制到其他 Slave 节点上
在 Master 节点上开启 Hadoop

 
        #节点的名称与对应的 IP 关系
       
        [hadoop@hddcluster2 ~]$ 
        cat 
        /etc/hosts 
       
        127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
       
        ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
       
        10.0.0.228      hddcluster2
       
        10.0.0.197      hddcluster1
       
        10.0.0.202      hddcluster3
       
        10.0.0.181      hddcluster4

 
        创建hadoop用户
       
        su  
        # 上述提到的以 root 用户登录 
       
        useradd 
        -m hadoop -s 
        /bin/bash   
        # 创建新用户hadoop 
       
        passwd 
        hadoop                     
        #设置hadoop密码 
       
        visudo                            
        #root ALL=(ALL) ALL 这行下面添加hadoop ALL=(ALL) ALL

 
  
    
      
     
        #登录hadoop用户，安装SSH、配置SSH无密码登陆
       
 
        [hadoop@hddcluster2 ~]$ rpm -qa | 
        grep 
        ssh 
       
 
        [hadoop@hddcluster2 ~]$ 
        sudo 
        yum 
        install 
        openssh-clients 
       
 
        [hadoop@hddcluster2 ~]$ 
        sudo 
        yum 
        install 
        openssh-server 
       
 
        [hadoop@hddcluster2 ~]$
        cd 
        ~/.
        ssh
        /     
        # 若没有该目录，请先执行一次ssh localhost 
       
 
        [hadoop@hddcluster2 ~]$
        ssh
        -keygen -t rsa              
        # 会有提示，都按回车就可以 
       
 
        [hadoop@hddcluster2 ~]$
        ssh
        -copy-
        id 
        -i ~/.
        ssh
        /id_rsa
        .pub localhost 
        # 加入授权 
       
 
        [hadoop@hddcluster2 ~]$
        chmod 
        600 .
        /authorized_keys    
        # 修改文件权限 
       
 
        [hadoop@hddcluster2 ~]$
        ssh
        -copy-
        id 
        -i ~/.
        ssh
        /id_rsa
        .pub  hadoop@hddcluster1 
       
 
        [hadoop@hddcluster2 ~]$
        ssh
        -copy-
        id 
        -i ~/.
        ssh
        /id_rsa
        .pub  hadoop@hddcluster3 
       
 
        [hadoop@hddcluster2 ~]$
        ssh
        -copy-
        id 
        -i ~/.
        ssh
        /id_rsa
        .pub  hadoop@hddcluster4 
       
 
    

   
 

 
        #解压hadoop文件到/usr/local/hadoop
       
        [hadoop@hddcluster2 ~]$
        sudo 
        tar 
        -zxf hadoop-2.7.3.
        tar
        .gz -C 
        /usr/local/ 
       
        [hadoop@hddcluster2 ~]$
        sudo 
        mv 
        /usr/local/hadoop-2
        .7.3 
        /usr/local/hadoop 
       
        [hadoop@hddcluster2 ~]$
        sudo 
        chown 
        -R hadoop:hadoop 
        /usr/local/hadoop 
       
        cd 
        /usr/local/hadoop 
       
        .
        /bin/hadoop 
        version 
       
        #安装java环境
       
        [hadoop@hddcluster2 ~]$
        sudo 
        yum 
        install 
        java-1.8.0-openjdk java-1.8.0-openjdk-devel 
       
        [hadoop@hddcluster2 ~]$ rpm -ql java-1.8.0-openjdk-devel | 
        grep 
        '/bin/javac' 
       
        /usr/lib/jvm/java-1
        .8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
        /bin/javac 
       
        [hadoop@hddcluster2 ~]$ vim ~/.bashrc
       
        export 
        JAVA_HOME=
        /usr/lib/jvm/java-1
        .8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64 
       
        export 
        HADOOP_HOME=
        /usr/local/hadoop 
       
        export 
        HADOOP_INSTALL=$HADOOP_HOME 
       
        export 
        HADOOP_MAPRED_HOME=$HADOOP_HOME 
       
        export 
        HADOOP_COMMON_HOME=$HADOOP_HOME 
       
        export 
        HADOOP_HDFS_HOME=$HADOOP_HOME 
       
        export 
        YARN_HOME=$HADOOP_HOME 
       
        export 
        HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME
        /lib/native 
       
        export 
        PATH=$PATH:$HADOOP_HOME
        /sbin
        :$HADOOP_HOME
        /bin 
       
        export 
        HADOOP_PREFIX=$HADOOP_HOME 
       
        export 
        HADOOP_OPTS=
        "-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native" 
       
        #测试java环境
       
        source 
        ~/.bashrc 
       
        java -version
       
        $JAVA_HOME
        /bin/java 
        -version  
        # 与直接执行 java -version 一样

 
        #修改hadoop文件配置
       
        [hadoop@hddcluster2 hadoop]$ 
        pwd 
       
        /usr/local/hadoop/etc/hadoop
       
        [hadoop@hddcluster2 hadoop]$ 
        cat 
        core-site.xml 
       
        <?xml version=
        "1.0" 
        encoding=
        "UTF-8"
        ?> 
       
        <?xml-stylesheet 
        type
        =
        "text/xsl" 
        href=
        "configuration.xsl"
        ?> 
       
        <!--
       
        Licensed under the Apache License, Version 2.0 (the 
        "License"
        ); 
       
        you may not use this 
        file 
        except 
        in 
        compliance with the License. 
       
        You may obtain a copy of the License at 
       
        http:
        //www
        .apache.org
        /licenses/LICENSE-2
        .0 
       
        Unless required by applicable law or agreed to 
        in 
        writing, software 
       
        distributed under the License is distributed on an 
        "AS IS" 
        BASIS, 
       
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
       
        See the License 
        for 
        the specific language governing permissions and 
       
        limitations under the License. See accompanying LICENSE 
        file
        . 
       
        -->
       
        <!-- Put site-specific property overrides 
        in 
        this 
        file
        . --> 
       
        <configuration>
       
        <property> 
       
        <name>fs.defaultFS<
        /name
        > 
       
        <value>hdfs:
        //hddcluster2
        :9000<
        /value
        > 
       
        <
        /property
        > 
       
        <property> 
       
        <name>hadoop.tmp.
        dir
        <
        /name
        > 
       
        <value>
        file
        :
        /usr/local/hadoop/tmp
        <
        /value
        > 
       
        <description>Abase 
        for 
        other temporary directories.<
        /description
        > 
       
        <
        /property
        > 
       
        <
        /configuration
        > 
       
        [hadoop@hddcluster2 hadoop]$ 
        cat 
        hdfs-site.xml 
       
        <?xml version=
        "1.0" 
        encoding=
        "UTF-8"
        ?> 
       
        <?xml-stylesheet 
        type
        =
        "text/xsl" 
        href=
        "configuration.xsl"
        ?> 
       
        <!--
       
        Licensed under the Apache License, Version 2.0 (the 
        "License"
        ); 
       
        you may not use this 
        file 
        except 
        in 
        compliance with the License. 
       
        You may obtain a copy of the License at 
       
        http:
        //www
        .apache.org
        /licenses/LICENSE-2
        .0 
       
        Unless required by applicable law or agreed to 
        in 
        writing, software 
       
        distributed under the License is distributed on an 
        "AS IS" 
        BASIS, 
       
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
       
        See the License 
        for 
        the specific language governing permissions and 
       
        limitations under the License. See accompanying LICENSE 
        file
        . 
       
        -->
       
        <!-- Put site-specific property overrides 
        in 
        this 
        file
        . --> 
       
        <configuration>
       
        <property> 
       
        <name>dfs.namenode.secondary.http-address<
        /name
        > 
       
        <value>hddcluster2:50090<
        /value
        > 
       
        <
        /property
        > 
       
        <property> 
       
        <name>dfs.replication<
        /name
        > 
       
        <value>3<
        /value
        > 
       
        <
        /property
        > 
       
        <property> 
       
        <name>dfs.namenode.name.
        dir
        <
        /name
        > 
       
        <value>
        file
        :
        /usr/local/hadoop/tmp/dfs/name
        <
        /value
        > 
       
        <
        /property
        > 
       
        <property> 
       
        <name>dfs.datanode.data.
        dir
        <
        /name
        > 
       
        <value>
        file
        :
        /usr/local/hadoop/tmp/dfs/data
        <
        /value
        > 
       
        <
        /property
        > 
       
        <
        /configuration
        > 
       
        [hadoop@hddcluster2 hadoop]$ 
       
        [hadoop@hddcluster2 hadoop]$ 
        cat 
        mapred-site.xml 
       
        <?xml version=
        "1.0"
        ?> 
       
        <?xml-stylesheet 
        type
        =
        "text/xsl" 
        href=
        "configuration.xsl"
        ?> 
       
        <!--
       
        Licensed under the Apache License, Version 2.0 (the 
        "License"
        ); 
       
        you may not use this 
        file 
        except 
        in 
        compliance with the License. 
       
        You may obtain a copy of the License at 
       
        http:
        //www
        .apache.org
        /licenses/LICENSE-2
        .0 
       
        Unless required by applicable law or agreed to 
        in 
        writing, software 
       
        distributed under the License is distributed on an 
        "AS IS" 
        BASIS, 
       
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
       
        See the License 
        for 
        the specific language governing permissions and 
       
        limitations under the License. See accompanying LICENSE 
        file
        . 
       
        -->
       
        <!-- Put site-specific property overrides 
        in 
        this 
        file
        . --> 
       
        <configuration>
       
        <property> 
       
        <name>mapreduce.framework.name<
        /name
        > 
       
        <value>yarn<
        /value
        > 
       
        <
        /property
        > 
       
        <property> 
       
        <name>mapreduce.jobhistory.address<
        /name
        > 
       
        <value>hddcluster2:10020<
        /value
        > 
       
        <
        /property
        > 
       
        <property> 
       
        <name>mapreduce.jobhistory.webapp.address<
        /name
        > 
       
        <value>hddcluster2:19888<
        /value
        > 
       
        <
        /property
        > 
       
        <
        /configuration
        > 
       
        [hadoop@hddcluster2 hadoop]$ 
       
        [hadoop@hddcluster2 hadoop]$ 
        cat 
        yarn-site.xml  
       
        <?xml version=
        "1.0"
        ?> 
       
        <!--
       
        Licensed under the Apache License, Version 2.0 (the 
        "License"
        ); 
       
        you may not use this 
        file 
        except 
        in 
        compliance with the License. 
       
        You may obtain a copy of the License at 
       
        http:
        //www
        .apache.org
        /licenses/LICENSE-2
        .0 
       
        Unless required by applicable law or agreed to 
        in 
        writing, software 
       
        distributed under the License is distributed on an 
        "AS IS" 
        BASIS, 
       
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
       
        See the License 
        for 
        the specific language governing permissions and 
       
        limitations under the License. See accompanying LICENSE 
        file
        . 
       
        -->
       
        <configuration>
       
        <!-- Site specific YARN configuration properties -->
       
        <property> 
       
        <name>yarn.resourcemanager.
        hostname
        <
        /name
        > 
       
        <value>hddcluster2<
        /value
        > 
       
        <
        /property
        > 
       
        <property> 
       
        <name>yarn.nodemanager.aux-services<
        /name
        > 
       
        <value>mapreduce_shuffle<
        /value
        > 
       
        <
        /property
        > 
       
        <
        /configuration
        > 
       
        [hadoop@hddcluster2 hadoop]$ 
       
        [hadoop@hddcluster2 hadoop]$ 
        cat 
        slaves  
       
        hddcluster1
       
        hddcluster2
       
        hddcluster3
       
        hddcluster4

 
  
    
      
      
        $
        cd 
        /usr/local 
       
 
        $
        sudo 
        rm 
        -r .
        /hadoop/tmp     
        # 删除 Hadoop 临时文件 
       
 
        $
        sudo 
        rm 
        -r .
        /hadoop/logs/
        *   
        # 删除日志文件 
       
 
        $
        tar 
        -zcf ~
        /hadoop
        .master.
        tar
        .gz .
        /hadoop   
        # 先压缩再复制 
       
 
        $
        cd 
        ~ 
       
 
        $
        scp 
        .
        /hadoop
        .master.
        tar
        .gz hddcluster1:
        /home/hadoop 
       
 
        $
        scp 
        .
        /hadoop
        .master.
        tar
        .gz hddcluster3:
        /home/hadoop 
       
 
        $
        scp 
        .
        /hadoop
        .master.
        tar
        .gz hddcluster4:
        /home/hadoop 
       
 
    

   
 

 
        在salve节点上操作，安装软件环境并配置好.bashrc
       
        sudo 
        tar 
        -zxf ~
        /hadoop
        .master.
        tar
        .gz -C 
        /usr/local 
       
        sudo 
        chown 
        -R hadoop 
        /usr/local/hadoop

 
        [hadoop@hddcluster2 ~]$hdfs namenode -
        format       
        # 首次运行需要执行初始化，之后不需要 
       
        接着可以启动 hadoop 了，启动需要在 Master 节点上进行启动命令：
       
        $start-dfs.sh
       
        $start-yarn.sh
       
        $mr-jobhistory-daemon.sh start historyserver
       
        通过命令 jps 可以查看各个节点所启动的进程。正确的话，
       
        在 Master 节点上可以看到 NameNode、ResourceManager、SecondrryNameNode、JobHistoryServer 进程，
       
        另外还需要在 Master 节点上通过命令 hdfs dfsadmin -report 查看 DataNode 是否正常启动，如果 Live datanodes 不为 0 ，则说明集群启动成功。
       
        [hadoop@hddcluster2 ~]$ hdfs dfsadmin -report
       
        Configured Capacity: 2125104381952 (1.93 TB)
       
        Present Capacity: 1975826509824 (1.80 TB)
       
        DFS Remaining: 1975824982016 (1.80 TB)
       
        DFS Used: 1527808 (1.46 MB)
       
        DFS Used%: 0.00%
       
        Under replicated blocks: 0
       
        Blocks with corrupt replicas: 0
       
        Missing blocks: 0
       
        Missing blocks (with replication factor 1): 0
       
        -------------------------------------------------
       
        Live datanodes (4):
       
        也可以通过 Web 页面看到查看 DataNode 和 NameNode 的状态：http:
        //hddcluster2
        :50070/。如果不成功，可以通过启动日志排查原因。

 
        在 Slave 节点操作可以看到 DataNode 和 NodeManager 进程

 
        测试hadoop分布式实例
       
        首先创建 HDFS 上的用户目录：
       
        hdfs dfs -
        mkdir 
        -p 
        /user/hadoop 
       
        将 
        /usr/local/hadoop/etc/hadoop 
        中的配置文件作为输入文件复制到分布式文件系统中： 
       
        hdfs dfs -
        mkdir 
        input 
       
        hdfs dfs -put 
        /usr/local/hadoop/etc/hadoop/
        *.xml input 
       
        通过查看   的DataNode 的状态（占用大小有改变），输入文件确实复制到了 DataNode 中。
       
        接着就可以运行 MapReduce 作业了：
       
        hadoop jar 
        /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-
        *.jar 
        grep 
        input output 
        'dfs[a-z.]+' 
       
        等待执行完毕后的输出结果：

 
        hadoop启动命令：
       
        start-dfs.sh
       
        start-yarn.sh
       
        mr-jobhistory-daemon.sh start historyserver
       
        hadoop关闭命令：
       
        stop-dfs.sh
       
        stop-yarn.sh
       
        mr-jobhistory-daemon.sh stop historyserver

PS：如果集群有一两台无法启动的话，先尝试一下删除hadoop临时文件

cd /usr/local

sudo rm -r ./hadoop/tmp

sudo rm -r ./hadoop/logs/*

然后执行

hdfs namenode -format

再启动

本文参考了一下网站并实验成功：

http://www.powerxing.com/install-hadoop-cluster/

本文转自 yanconggod 51CTO博客，原文链接:
http://blog.51cto.com/yanconggod/1884998

hadoop&spark安装（上）

热门文章

最新文章

相关课程

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料