0x01 Dockerfile文件的编写

1. 编写Dockerfile文件

为了方便，我复制了一份zk集群的文件，取名hbase_sny_all。

a. HBase集群安装步骤

参考文章：D005 复制粘贴玩大数据之安装与配置HBase集群

其实安装内容都是一样的，这里只是就根据我写的步骤整理了一下

2. 编写Dockerfile文件的关键点

与D004 复制粘贴玩大数据之Dockerfile安装Zookeeper集群的“0x01 3. a. Dockerfile参考文件”相比较，不同点体现在：

具体步骤：

a. 添加安装包并解压（ADD指令会自动解压）

#添加HBase
ADD ./hbase-1.2.6-bin.tar.gz /usr/local/

b. 添加环境变量（HBASE_HOME、PATH）

#HBase环境变量
ENV HBASE_HOME /usr/local/hbase-1.2.6

#PATH里面追加内容
$HBASE_HOME/bin:

c. 添加配置文件（注意给之前的语句加“&& \”，表示未结束）

&& \
mv /tmp/init_zk.sh ~/init_zk.sh && \
mv /tmp/hbase-env.sh $HBASE_HOME/conf/hbase-env.sh && \
mv /tmp/hbase-site.xml $HBASE_HOME/conf/hbase-site.xml  && \
mv /tmp/regionservers $HBASE_HOME/conf/regionservers

d. 添加修改权限语句

#修改init_zk.sh权限为700
RUN chmod 700 init_zk.sh

3. 完整的Dockerfile文件参考

a. 安装hadoop、spark、zookeeper、hbase

FROM ubuntu
MAINTAINER shaonaiyi shaonaiyi@163.com
ENV BUILD_ON 2019-02-13
RUN apt-get update -qqy
RUN apt-get -qqy install vim wget net-tools  iputils-ping  openssh-server
#添加JDK
ADD ./jdk-8u161-linux-x64.tar.gz /usr/local/
#添加hadoop
ADD ./hadoop-2.7.5.tar.gz  /usr/local/
#添加scala
ADD ./scala-2.11.8.tgz /usr/local/
#添加spark
ADD ./spark-2.2.0-bin-hadoop2.7.tgz /usr/local/
#添加zookeeper
ADD ./zookeeper-3.4.10.tar.gz /usr/local/
#添加HBase
ADD ./hbase-1.2.6-bin.tar.gz /usr/local/
ENV CHECKPOINT 2019-02-13
#增加JAVA_HOME环境变量
ENV JAVA_HOME /usr/local/jdk1.8.0_161
#hadoop环境变量
ENV HADOOP_HOME /usr/local/hadoop-2.7.5
#scala环境变量
ENV SCALA_HOME /usr/local/scala-2.11.8
#spark环境变量
ENV SPARK_HOME /usr/local/spark-2.2.0-bin-hadoop2.7
#zk环境变量
ENV ZK_HOME /usr/local/zookeeper-3.4.10
#HBase环境变量
ENV HBASE_HOME /usr/local/hbase-1.2.6
#将环境变量添加到系统变量中
ENV PATH $HBASE_HOME/bin:$ZK_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$HADOOP_HOME/bin:$JAVA_HOME/bin:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$PATH
RUN ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' && \
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \
    chmod 600 ~/.ssh/authorized_keys
#复制配置到/tmp目录
COPY config /tmp
#将配置移动到正确的位置
RUN mv /tmp/ssh_config    ~/.ssh/config && \
    mv /tmp/profile /etc/profile && \
    mv /tmp/masters $SPARK_HOME/conf/masters && \
    cp /tmp/slaves $SPARK_HOME/conf/ && \
    mv /tmp/spark-defaults.conf $SPARK_HOME/conf/spark-defaults.conf && \
    mv /tmp/spark-env.sh $SPARK_HOME/conf/spark-env.sh && \ 
    mv /tmp/hadoop-env.sh $HADOOP_HOME/etc/hadoop/hadoop-env.sh && \
    mv /tmp/hdfs-site.xml $HADOOP_HOME/etc/hadoop/hdfs-site.xml && \ 
    mv /tmp/core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml && \
    mv /tmp/yarn-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml && \
    mv /tmp/mapred-site.xml $HADOOP_HOME/etc/hadoop/mapred-site.xml && \
    mv /tmp/master $HADOOP_HOME/etc/hadoop/master && \
    mv /tmp/slaves $HADOOP_HOME/etc/hadoop/slaves && \
    mv /tmp/start-hadoop.sh ~/start-hadoop.sh && \
    mv /tmp/init_zk.sh ~/init_zk.sh && \
    mkdir -p /usr/local/hadoop2.7/dfs/data && \
    mkdir -p /usr/local/hadoop2.7/dfs/name && \
    mkdir -p /usr/local/zookeeper-3.4.10/datadir && \
    mkdir -p /usr/local/zookeeper-3.4.10/log && \
    mv /tmp/zoo.cfg $ZK_HOME/conf/zoo.cfg && \
  mv /tmp/hbase-env.sh $HBASE_HOME/conf/hbase-env.sh && \
  mv /tmp/hbase-site.xml $HBASE_HOME/conf/hbase-site.xml  && \
  mv /tmp/regionservers $HBASE_HOME/conf/regionservers
RUN echo $JAVA_HOME
#设置工作目录
WORKDIR /root
#启动sshd服务
RUN /etc/init.d/ssh start
#修改start-hadoop.sh权限为700
RUN chmod 700 start-hadoop.sh
#修改init_zk.sh权限为700
RUN chmod 700 init_zk.sh
#修改root密码
RUN echo "root:shaonaiyi" | chpasswd
CMD ["/bin/bash"]

0x02 校验HBase集群前准备工作

1. 环境及资源准备

a. 安装Docker

请参考：D001.5 Docker入门（超级详细基础篇）的“0x01 Docker的安装”小节

b. 准备资源

安装ZK集群时的文件：D004 复制粘贴玩大数据之Dockerfile安装Zookeeper集群

c. 准备HBase安装包（hbase-1.2.6-bin.tar.gz），像其他安装包一样

d. 准备HBase的三份配置文件（放于config目录下）

cd /home/shaonaiyi/docker_bigdata/hbase_sny_all/config

配置文件一：vi hbase-env.sh

#
#/**
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */
# Set environment variables here.
# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)
# The java implementation to use.  Java 1.7+ required.
# export JAVA_HOME=/usr/java/jdk1.6.0/
export JAVA_HOME=/usr/local/jdk1.8.0_161/
export HBASE_CLASSPATH=/usr/local/hadoop-2.7.5/etc/hadoop
export HBASE_MANAGES_ZK=false
# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=
# The maximum amount of heap to use. Default is left to JVM default.
# export HBASE_HEAPSIZE=1G
# Uncomment below if you intend to use off heap cache. For example, to allocate 8G of 
# offheap, set the value to "8G".
# export HBASE_OFFHEAPSIZE=1G
# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
# Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
#export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
#export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.
# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
# Uncomment one of the below three options to enable java garbage collection logging for the client processes.
# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
# See the package documentation for org.apache.hadoop.hbase.io.hfile for other configurations
# needed setting up off-heap block caching. 
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# NOTE: HBase provides an alternative JMX implementation to fix the random ports issue, please see JMX
# section in HBase Reference Guide for instructions.
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"
# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"
# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs
# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids
# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
# export HBASE_MANAGES_ZK=true
# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.

配置文件二：vi hbase-site.xml

<property>
  <name>hbase.rootdir</name>
  <value>hdfs://hadoop-master:9000/hbase</value>
</property>
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
<property>
  <name>hbase.zookeeper.quorum</name>
  <value>hadoop-master,hadoop-slave1,hadoop-slave2</value>
</property>

配置文件三：vi regionservers

hadoop-slave1
hadoop-slave2

PS:添加下面两行，配置环境变量：

vi profile

export HBASE_HOME=/usr/local/hbase-1.2.6
export PATH=$PATH:$HBASE_HOME/bin

初始化zookeeper的脚本（后面三句启动命令已从之前的start-hadoop.sh剪切到这里）：

vi init_zk.sh

#!/bin/bash
ssh root@hadoop-master "echo '0' >> $ZK_HOME/datadir/myid"
ssh root@hadoop-slave1 "echo '1' >> $ZK_HOME/datadir/myid"
ssh root@hadoop-slave2 "echo '2' >> $ZK_HOME/datadir/myid"
#修改需要配置及启动zk命令的命令
ssh root@hadoop-master "source /etc/profile;/usr/local/zookeeper-3.4.10/bin/zkServer.sh start"
ssh root@hadoop-slave1 "source /etc/profile;/usr/local/zookeeper-3.4.10/bin/zkServer.sh start"
ssh root@hadoop-slave2 "source /etc/profile;/usr/local/zookeeper-3.4.10/bin/zkServer.sh start"

0x03 校验是否HBase安装成功

1. 修改生成容器脚本

a. 修改start_containers.sh文件（样本镜像名称成shaonaiyi/hbase、ip）

本人把里面的三个shaonaiyi/zk改为了shaonaiyi/hbase，ip最后一位加了1，如：

172.21.0.12改为了172.21.0.22等等~

将hbase的16010端口暴露出去，加上：

\-p 17010:16010

ps:当然，你可以新建一个新的网络，换ip，这里偷懒，用了旧的网络，只换了ip

2. 生成镜像

a. 删除之前的spark集群容器（节省资源），如已删可省略此步

cd /home/shaonaiyi/docker_bigdata/zk_sny_all/config/

chmod 700 stop_containers.sh

./stop_containers.sh

b. 生成装好hadoop、spark、zookeeper、hbase的镜像（如果之前shaonaiyi/spark未删除，则此次会快很多）

cd /home/shaonaiyi/docker_bigdata/hbase_sny_all

docker build -t shaonaiyi/hbase .

2. 生成容器

a. 生成容器（start_containers.sh如果没权限则给权限）：

config/start_containers.sh

b. 进入master容器

sh ~/master.sh

3. 启动集群并查看进程

a. 启动集群，初始化zk配置：

./start-hadoop.sh

./init_zk.sh

之前出现了个问题（已修复）：

在windows上写了脚本放到linux上执行报错

解决方法是：

vi init_zk.sh

用命令:set ff可以看到fileformat=dos

修改::set ff=unix，然后wq!保存退出即可。

重新执行：

./init_zk.sh

b. 启动HBase

start-hbase.sh

c. 执行查看进程

./jps_all.sh

请参考：D002 复制粘贴玩大数据之便捷配置的“0x03 1. jps_all.sh脚本”

d. 使用Web UI界面一样可以打开：

0xFF 总结

组件越来越多，与上一篇文章相比，又复杂了一些，于是又迭代了一下，其实很多东西并没有这么麻烦，离一键部署大数据集群不远了。
Dockerfile常用指令，请参考文章：D004.1 Dockerfile例子详解及常用指令

D006 复制粘贴玩大数据之Dockerfile安装HBase集群

0x01 Dockerfile文件的编写

1. 编写Dockerfile文件

2. 编写Dockerfile文件的关键点

3. 完整的Dockerfile文件参考

0x02 校验HBase集群前准备工作

1. 环境及资源准备

PS:添加下面两行，配置环境变量：

0x03 校验是否HBase安装成功

1. 修改生成容器脚本

2. 生成镜像

2. 生成容器

3. 启动集群并查看进程

0xFF 总结

热门文章

最新文章

相关课程

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

D006 复制粘贴玩大数据之Dockerfile安装HBase集群

0x01 Dockerfile文件的编写

1. 编写Dockerfile文件

2. 编写Dockerfile文件的关键点

3. 完整的Dockerfile文件参考

0x02 校验HBase集群前准备工作

1. 环境及资源准备

PS:添加下面两行，配置环境变量：

0x03 校验是否HBase安装成功

1. 修改生成容器脚本

2. 生成镜像

2. 生成容器

3. 启动集群并查看进程

0xFF 总结

热门文章

最新文章

相关课程

相关电子书