死磕HBase(二)

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
云数据库 RDS PostgreSQL,集群系列 2核4GB
简介: 死磕HBase(二)

二:安装和配置HBase

参考的文章地址: https://hbase.org.cn/docs/3.html

HBase的安装可以分为以下几种方式:

①、本地模式:本地模式是最简单的安装方式, 适用于在本地单机上进行开发和测试,在本地模式下,HBase将运行在单一的java进程中,数据存储在本地文件系统中。下面是使用docker-compose的方式来安装hbase,步骤如下:

1:docker-compose文件如下:


version: "3.5"services:  hbase:    image: 'harisekhon/hbase'    container_name: 'hbase'    networks:      - base-env-network    ports:      - '16010:16010'      - '16020:16020'      - '16000:16000'      - '16301:16301'      - '42182:2181'    volumes:      - ./hbase-data:/hbase-data    environment:      - CLUSTER_DNS=hbase      - TZ=Asia/Shanghai# docker network create base-env-network          networks:  base-env-network:    external:      name: "base-env-network"

2:启动hbase

62695f0498f1e3ebc4086e4a018037ae.png

3:进入hbase容器如下:

ebfa83260dfb2e11e228057b8cc6dc98.png

②、完全分布式模式:完全分布式模式是在真实的分布式环境中部署,HBase的方式,在完全分布式模式下,HBase的各个组件分布在多台计算机上,以实现高可用性,容错性和性能扩展。步骤如下:

1:部署docker


# 安装yum-config-manager配置工具yum -y install yum-utils # 建议使用阿里云yum源:(推荐)#yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo yum-config-manager --add-repo http://mirrors.aliyun.com/dockerce/linux/centos/docker-ce.repo # 安装docker-ce版本yum install -y docker-ce # 启动并开机启动systemctl enable --now dockerdocker --version

2:部署docker-compose


curl -SL https://github.com/docker/compose/releases/download/v2.16.0/dockercompose-linux-x86_64 -o /usr/local/bin/docker-composechmod +x /usr/local/bin/docker-composedocker-compose --version

3:创建网络


# 创建,注意不能使用hadoop_network,要不然启动hs2服务的时候会有问题!!!docker network create hadoop-network # 查看docker network ls

4:部署zookeeper ,创建目录和文件


[root@cdh1 zookeeper]# tree . ├── docker-compose.yml ├── zk1 ├── zk2 └── zk3 3 directories, 1 file

docker-compose.yml 文件如下:


version: '3.7' # 给zk集群配置一个网络,网络名为hadoop-network networks:  hadoop-network:    external: true # 配置zk集群的# container services下的每一个子配置都对应一个zk节点的docker container services:  zk1:    # docker container所使用的docker image    image: zookeeper    hostname: zk1    container_name: zk1    restart: always    # 配置docker container和宿主机的端口映射    ports:      - 2181:2181      - 28081:8080    # 配置docker container的环境变量    environment:      # 当前zk实例的id      ZOO_MY_ID: 1      # 整个zk集群的机器、端口列表      ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zk2:2888:3888;2181 server.3=zk3:2888:3888;2181    # 将docker container上的路径挂载到宿主机上 实现宿主机和docker container的数据共享    volumes:      - ./zk1/data:/data      - ./zk1/datalog:/datalog    # 当前docker container加入名为zk-net的隔离网络    networks:      - hadoop-network  zk2:    image: zookeeper    hostname: zk2    container_name: zk2    restart: always    ports:      - 2182:2181      - 28082:8080    environment:      ZOO_MY_ID: 2      ZOO_SERVERS: server.1=zk1:2888:3888;2181 server.2=0.0.0.0:2888:3888;2181 server.3=zk3:2888:3888;2181    volumes:      - ./zk2/data:/data      - ./zk2/datalog:/datalog    networks:      - hadoop-network  zk3:    image: zookeeper    hostname: zk3    container_name: zk3    restart: always    ports:      - 2183:2181      - 28083:8080environment: ZOO_MY_ID: 3 ZOO_SERVERS: server.1=zk1:2888:3888;2181 server.2=zk2:2888:3888;2181 server.3=0.0.0.0:2888:3888;2181 volumes:- ./zk3/data:/data- ./zk3/datalog:/datalog networks:- hadoop-network


wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper3.8.2/apache-zookeeper-3.8.2-bin.tar.gz --no-check-certificate

启动


[root@cdh1 zookeeper]# docker-compose up -d Creating zk3 ... done Creating zk2 ... done Creating zk1 ... done

5:下载Hadoop部署包


git clone https://gitee.com/hadoop-bigdata/docker-compose-hadoop.git

6:安装部署mysql5.7,这里mysql主要是提供hive存储元数据


cd docker-compose-hadoop/mysql docker-compose -f mysql-compose.yaml up -d docker-compose -f mysql-compose.yaml ps #root 密码:123456,以下是登录命令,注意一般在公司不能直接在命令行明文输入密码,要不然容易被安全抓,切记,切记!!!docker exec -it mysql mysql -uroot -p123456

7:安装hadoop和Hive


cd docker-compose-hadoop/hadoop_hive docker-compose -f docker-compose.yaml up -d # 查看docker-compose -f docker-compose.yaml ps # hive docker exec -it hive-hiveserver2 hive -shoe "show databases"; # hiveserver2 docker exec -it hive-hiveserver2 beeline -u jdbc:hive2://hivehiveserver2:10000  -n hadoop -e "show databases;"

启动后,如果发现hadoop historyserver容器未健康启动,可以执行以下命令:


docker exec -it hadoop-hdfs-nn hdfs dfs -chmod 777 /tmpdocker restart hadoop-mr-historyserver

hdfs格式化可以执行以下命令


[root@cdh1 ~]# docker exec -it hadoop-hdfs-nn hdfs dfsadmin -refreshNodes Refresh nodes successful [root@cdh1 ~]# docker exec -it hadoop-hdfs-dn-0 hdfs dfsadmin -fs hdfs://hadoop-hdfs-nn:9000 -refreshNodes Refresh nodes successful [root@cdh1 ~]# docker exec -it hadoop-hdfs-dn-1 hdfs dfsadmin -fs hdfs://hadoop-hdfs-nn:9000 -refreshNodes Refresh nodes successful [root@cdh1 ~]# docker exec -it hadoop-hdfs-dn-2 hdfs dfsadmin -fs hdfs://hadoop-hdfs-nn:9000 -refreshNodes Refresh nodes successful

可以通过 cdh1:30070 查看HDFS分布情况

dff52ff41aa39d9cb3e93a4baf7e2562.png

以及访问 http://cdh1:30888/cluster 查看yarn资源情况

5865c100103d2e6edb56ee623de8d752.png

8. 配置Hbase参数


mkdir conf


conf/hbase-env.sh


export JAVA_HOME=/opt/apache/jdk export HBASE_CLASSPATH=/opt/apache/hbase/conf export HBASE_MANAGES_ZK=false



<configuration>   <property>     <name>hbase.rootdir</name>     <value>hdfs://hadoop-hdfs-nn:9000/hbase</value> <!-- hdfs://ns1/hbase 对应hdfs-site.xml的dfs.nameservices属性值 -->    </property>    <property>     <name>hbase.cluster.distributed</name>     <value>true</value>    </property>    <property>     <name>hbase.zookeeper.quorum</name>     <value>zk1,zk2,zk3</value>    </property>    <property>      <name>hbase.zookeeper.property.clientPort</name>      <value>2181</value>    </property>    <property>     <name>hbase.master</name>     <value>60000</value>     <description>单机版需要配主机名/IP和端口,HA方式只需要配端口</description>    </property>    <property>     <name>hbase.master.info.bindAddress</name>     <value>0.0.0.0</value>     </property>     <property>      <name>hbase.master.port</name>      <value>16000</value>     </property>     <property>      <name>hbase.master.info.port</name>      <value>16010</value>     </property>     <property>      <name>hbase.regionserver.port</name>      <value>16020</value>     </property>     <property>      <name>hbase.regionserver.info.port</name>      <value>16030</value>     </property>     <property>      <name>hbase.wal.provider</name>      <value>filesystem</value> <!--也可以用multiwal-->     </property> </configuration>


conf/backup-masters


hbase-master-2


conf/regionservers


hbase-regionserver-1 hbase-regionserver-2 hbase-regionserver-3


conf/hadoop/core-site.xml


<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --> <configuration>        <!--配置namenode的地址 -->        <property>                <name>fs.defaultFS</name>                <value>hdfs://hadoop-hdfs-nn:9000</value>        </property>        <!-- 文件的缓冲区大小(128KB),默认值是4KB -->        <property>                <name>io.file.buffer.size</name>                <value>131072</value>        </property>        <!-- 文件系统垃圾桶保存时间 -->        <property>                <name>fs.trash.interval</name>                <value>1440</value>        </property>        <!-- 配置hadoop临时目录,存储元数据用的,请确保该目录(/opt/apache/hadoop/data/hdfs/)已被手动创建,tmp目录会自动创建 -->        <property>                <name>hadoop.tmp.dir</name>                <value>/opt/apache/hadoop/data/hdfs/tmp</value>        </property>        <!--配置HDFS网页登录使用的静态用户为root-->        <property>                <name>hadoop.http.staticuser.user</name>                <value>root</value>        </property>        <!--配置root(超级用户)允许通过代理访问的主机节点-->        <property>                <name>hadoop.proxyuser.root.hosts</name>                <value>*</value>        </property>        <!--配置root(超级用户)允许通过代理用户所属组-->        <property>                <name>hadoop.proxyuser.root.groups</name>                <value>*</value>        </property>        <!--配置root(超级用户)允许通过代理的用户-->        <property>                <name>hadoop.proxyuser.root.user</name>                <value>*</value>        </property>        <!--配置hive允许通过代理访问的主机节点-->        <property>                <name>hadoop.proxyuser.hive.hosts</name>                <value>*</value>        </property> conf/hadoop/hdfs-site.xml        <!--配置hive允许通过代理用户所属组-->        <property>                <name>hadoop.proxyuser.hive.groups</name>                <value>*</value>        </property>        <!--配置hive允许通过代理访问的主机节点-->        <property>                <name>hadoop.proxyuser.hadoop.hosts</name>                <value>*</value>        </property>        <!--配置hive允许通过代理用户所属组-->        <property>                <name>hadoop.proxyuser.hadoop.groups</name>                <value>*</value>        </property> </configuration>


conf/hadoop/hdfs-site.xml


<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--> <!-- Put site-specific property overrides in this file. --> <configuration>        <!-- namenode web访问配置 -->        <property>                <name>dfs.namenode.http-address</name>                <value>0.0.0.0:9870</value>        </property>        <!-- 必须将dfs.webhdfs.enabled属性设置为true,否则就不能使用webhdfs的LISTSTATUS、LISTFILESTATUS等需要列出文件、文件夹状态的命令,因为这些信息都是由namenode来保存的。-->        <property>                <name>dfs.webhdfs.enabled</name>                <value>true</value>完成conf配置后,需要设置读写权限 8. 编写环境.env文件        </property>        <property>                <name>dfs.namenode.name.dir</name>                <value>/opt/apache/hadoop/data/hdfs/namenode</value>        </property>        <property>                <name>dfs.datanode.data.dir</name>                <value>/opt/apache/hadoop/data/hdfs/datanode/data1,/opt/apache/hadoop/data/h dfs/datanode/data2,/opt/apache/hadoop/data/hdfs/datanode/data3</value>        </property>        <property>                <name>dfs.replication</name>                <value>3</value>        </property>        <!-- 设置SNN进程运行机器位置信息 -->        <property>                <name>dfs.namenode.secondary.http-address</name>                <value>hadoop-hdfs-nn2:9868</value>        </property>        <property>                <name>dfs.namenode.datanode.registration.ip-hostnamecheck</name>                <value>false</value>        </property>        <!-- 白名单 -->        <property>                <name>dfs.hosts</name>                <value>/opt/apache/hadoop/etc/hadoop/dfs.hosts</value>        </property>        <!-- 黑名单 -->        <property>                <name>dfs.hosts.exclude</name>                <value>/opt/apache/hadoop/etc/hadoop/dfs.hosts.exclude</value>        </property> </configuration>

完成conf配置后,需要设置读写权限


chmod -R 777 conf/

8. 编写环境.env文件


HBASE_MASTER_PORT=16000 HBASE_MASTER_INFO_PORT=16010 HBASE_HOME=/opt/apache/hbase HBASE_REGIONSERVER_PORT=16020

9. 编排docker-compose.yaml


version: '3' services:  hbase-master-1:    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4    user: "hadoop:hadoop"    container_name: hbase-master-1    hostname: hbase-master-1    restart: always    privileged: true    env_file:      - .env    volumes:      - ./conf/hbase-env.sh:${HBASE_HOME}/conf/hbase-env.sh      - ./conf/hbase-site.xml:${HBASE_HOME}/conf/hbase-site.xml      - ./conf/backup-masters:${HBASE_HOME}/conf/backup-masters      - ./conf/regionservers:${HBASE_HOME}/conf/regionservers      - ./conf/hadoop/core-site.xml:${HBASE_HOME}/conf/core-site.xml      - ./conf/hadoop/hdfs-site.xml:${HBASE_HOME}/conf/hdfs-site.xml    ports:      - "36010:${HBASE_MASTER_PORT}"      - "36020:${HBASE_MASTER_INFO_PORT}"    command: ["sh","-c","/opt/apache/bootstrap.sh hbase-master"]    networks:      - hadoop-network    healthcheck:      test: ["CMD-SHELL", "netstat -tnlp|grep :${HBASE_MASTER_PORT} || exit 1"]      interval: 10s      timeout: 20s      retries: 3  hbase-master-2:    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4    user: "hadoop:hadoop"    container_name: hbase-master-2    hostname: hbase-master-2    restart: always    privileged: true    env_file:      - .env    volumes:      - ./conf/hbase-env.sh:${HBASE_HOME}/conf/hbase-env.sh      - ./conf/hbase-site.xml:${HBASE_HOME}/conf/hbase-site.xml      - ./conf/backup-masters:${HBASE_HOME}/conf/backup-masters      - ./conf/regionservers:${HBASE_HOME}/conf/regionservers      - ./conf/hadoop/core-site.xml:${HBASE_HOME}/conf/core-site.xml      - ./conf/hadoop/hdfs-site.xml:${HBASE_HOME}/conf/hdfs-site.xml    ports:      - "36011:${HBASE_MASTER_PORT}"      - "36021:${HBASE_MASTER_INFO_PORT}"    command: ["sh","-c","/opt/apache/bootstrap.sh hbase-master hbase-master1 ${HBASE_MASTER_PORT}"]    networks:      - hadoop-network    healthcheck:      test: ["CMD-SHELL", "netstat -tnlp|grep :${HBASE_MASTER_PORT} || exit 1"]      interval: 10s      timeout: 20s      retries: 3  hbase-regionserver-1:    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4    user: "hadoop:hadoop"    container_name: hbase-regionserver-1    hostname: hbase-regionserver-1    restart: always    privileged: true    env_file:      - .env    volumes:      - ./conf/hbase-env.sh:${HBASE_HOME}/conf/hbase-env.sh      - ./conf/hbase-site.xml:${HBASE_HOME}/conf/hbase-site.xml      - ./conf/backup-masters:${HBASE_HOME}/conf/backup-masters      - ./conf/regionservers:${HBASE_HOME}/conf/regionservers      - ./conf/hadoop/core-site.xml:${HBASE_HOME}/conf/core-site.xml      - ./conf/hadoop/hdfs-site.xml:${HBASE_HOME}/conf/hdfs-site.xml    ports:      - "36030:${HBASE_REGIONSERVER_PORT}"    command: ["sh","-c","/opt/apache/bootstrap.sh hbase-regionserver hbasemaster-1 ${HBASE_MASTER_PORT}"]    networks:      - hadoop-network    healthcheck:      test: ["CMD-SHELL", "netstat -tnlp|grep :${HBASE_REGIONSERVER_PORT} || exit 1"]      interval: 10s      timeout: 10s      retries: 3  hbase-regionserver-2:    image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4    user: "hadoop:hadoop"    container_name: hbase-regionserver-2    hostname: hbase-regionserver-2    restart: always    privileged: true    env_file:      - .env    volumes:      - ./conf/hbase-env.sh:${HBASE_HOME}/conf/hbase-env.sh      - ./conf/hbase-site.xml:${HBASE_HOME}/conf/hbase-site.xml      - ./conf/backup-masters:${HBASE_HOME}/conf/backup-masters      - ./conf/regionservers:${HBASE_HOME}/conf/regionservers      - ./conf/hadoop/core-site.xml:${HBASE_HOME}/conf/core-site.xml      - ./conf/hadoop/hdfs-site.xml:${HBASE_HOME}/conf/hdfs-site.xml    ports:      - "36031:${HBASE_REGIONSERVER_PORT}"command: ["sh","-c","/opt/apache/bootstrap.sh hbase-regionserver hbasemaster-1 ${HBASE_MASTER_PORT}"] networks:- hadoop-network healthcheck: test: ["CMD-SHELL", "netstat -tnlp|grep :${HBASE_REGIONSERVER_PORT} || exit 1"] interval: 10s timeout: 10s retries: 3 hbase-regionserver-3: image: registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4 user: "hadoop:hadoop" container_name: hbase-regionserver-3 hostname: hbase-regionserver-3 restart: always privileged: true env_file:- .env volumes:- ./conf/hbase-env.sh:${HBASE_HOME}/conf/hbase-env.sh- ./conf/hbase-site.xml:${HBASE_HOME}/conf/hbase-site.xml- ./conf/backup-masters:${HBASE_HOME}/conf/backup-masters- ./conf/regionservers:${HBASE_HOME}/conf/regionservers- ./conf/hadoop/core-site.xml:${HBASE_HOME}/conf/core-site.xml- ./conf/hadoop/hdfs-site.xml:${HBASE_HOME}/conf/hdfs-site.xml ports:- "36032:${HBASE_REGIONSERVER_PORT}" command: ["sh","-c","/opt/apache/bootstrap.sh hbase-regionserver hbasemaster-1 ${HBASE_MASTER_PORT}"] networks:- hadoop-network healthcheck: test: ["CMD-SHELL", "netstat -tnlp|grep :${HBASE_REGIONSERVER_PORT} || exit 1"] interval: 10s timeout: 10s retries: 3 # 连接外部网络networks: hadoop-network: external: true

10. 开始部署 当前目录结构如下:


[root@cdh1 hbase]# tree . ├── .env ├── conf ├── backup-masters  ├── hadoop     ├── core-site.xml └── hdfs-site.xml ├── hbase-env.sh ├── hbase-site.xml └── regionservers ├── docker-compose.yaml

启动:


docker-compose -f docker-compose.yaml up -d # 查看docker-compose -f docker-compose.yaml ps [root@cdh1 hbase]# docker-compose ps Name                      Command                  Ports----------------------------------------------------------------------------------------------------------------------------------------------------------------hbase-master-1         State             sh -c /opt/apache/bootstra ...   Up (healthy)   0.0.0.0:36010->16000/tcp,:::36010->16000/tcp, 0.0.0.0:36020>16010/tcp,:::36020->16010/tcp hbase-master-2         sh -c /opt/apache/bootstra ...   Up (healthy)   0.0.0.0:36011->16000/tcp,:::36011->16000/tcp, 0.0.0.0:36021>16010/tcp,:::36021->16010/tcp hbase-regionserver-1   sh -c /opt/apache/bootstra ...   Up (healthy)   0.0.0.0:36030->16020/tcp,:::36030->16020/tcp hbase-regionserver-2   sh -c /opt/apache/bootstra ...   Up (healthy)   0.0.0.0:36031->16020/tcp,:::36031->16020/tcp hbase-regionserver-3   sh -c /opt/apache/bootstra ...   Up (healthy)   0.0.0.0:36032->16020/tcp,:::36032->16020/tcp

通过 Master: hbase-master-1 访问集群信息

3be248f2c816b74cfb4b635a8198e955.png

相关实践学习
lindorm多模间数据无缝流转
展现了Lindorm多模融合能力——用kafka API写入,无缝流转在各引擎内进行数据存储和计算的实验。
云数据库HBase版使用教程
&nbsp; 相关的阿里云产品:云数据库 HBase 版 面向大数据领域的一站式NoSQL服务,100%兼容开源HBase并深度扩展,支持海量数据下的实时存储、高并发吞吐、轻SQL分析、全文检索、时序时空查询等能力,是风控、推荐、广告、物联网、车联网、Feeds流、数据大屏等场景首选数据库,是为淘宝、支付宝、菜鸟等众多阿里核心业务提供关键支撑的数据库。 了解产品详情:&nbsp;https://cn.aliyun.com/product/hbase &nbsp; ------------------------------------------------------------------------- 阿里云数据库体验:数据库上云实战 开发者云会免费提供一台带自建MySQL的源数据库&nbsp;ECS 实例和一台目标数据库&nbsp;RDS实例。跟着指引,您可以一步步实现将ECS自建数据库迁移到目标数据库RDS。 点击下方链接,领取免费ECS&amp;RDS资源,30分钟完成数据库上云实战!https://developer.aliyun.com/adc/scenario/51eefbd1894e42f6bb9acacadd3f9121?spm=a2c6h.13788135.J_3257954370.9.4ba85f24utseFl
相关文章
|
2天前
|
消息中间件 Kafka 调度
死磕-kafka(一)
死磕-kafka(一)
|
2天前
|
消息中间件 存储 算法
死磕-kafka(二)
死磕-kafka(二)
|
2天前
|
消息中间件 存储 分布式计算
死磕-kafka(三)
死磕-kafka(三)
|
2天前
|
存储 NoSQL 关系型数据库
死磕HBase(一)
死磕HBase(一)
|
2天前
|
存储 分布式计算 大数据
死磕Flink(二)
死磕Flink(二)
|
2天前
|
Linux 流计算
死磕flink(四)
死磕flink(四)
|
2天前
|
消息中间件 存储 API
死磕flink(六)
死磕flink(六)
|
2天前
|
SQL 算法 API
死磕flink(三)
死磕flink(三)
|
2天前
|
资源调度 流计算 Docker
死磕flink(七)
死磕flink(七)
|
2天前
|
SQL 资源调度 Kubernetes
死磕flink(五)
死磕flink(五)