ECS集群部署大数据环境

本文涉及的产品
云服务器 ECS,每月免费额度200元 3个月
云数据库 RDS MySQL Serverless,0.5-2RCU 50GB
云服务器ECS,u1 2核4GB 1个月
简介: 大数据


本参考文档使用4节点机器集群进行部署,操作用户为root

·部署Hadoop环境


一、机器初始化

1,绑定机器的网卡和硬盘(此步骤只针对未绑定网卡和硬盘的erdma机器,大部分机器开出来网卡和硬盘是绑定的)

查看如图即为绑定成功。

申请到ecs机器后,需绑定机器的网卡和硬盘。具体步骤如下:

绑定与解绑 rdma 网卡脚本:

bind() {
    echo $1 > /sys/bus/pci/drivers/iohub_sriov/unbind
    echo $1 > /sys/bus/pci/drivers/virtio-pci/bind
}
ubind() {
    echo $1 > /sys/bus/pci/drivers/virtio-pci/unbind
    echo $1 > /sys/bus/pci/drivers/iohub_sriov/bind
}
main() {
    local op=$1
    local bdf=$2
    if [ $bdf == "all" ]; then
    for i in `xdragon-bdf net all`
    do
        eval $op $i
    done
    else
        eval $op $bdf
    fi
}
main $@

绑定与解绑 硬盘 脚本:

bind() {
    echo $1 > /sys/bus/pci/drivers/iohub_sriov/unbind
    echo $1 > /sys/bus/pci/drivers/virtio-pci/bind
}
ubind() {
    echo $1 > /sys/bus/pci/drivers/virtio-pci/unbind
    echo $1 > /sys/bus/pci/drivers/iohub_sriov/bind
}
main() {
    local op=$1
    local bdf=$2
    if [ $bdf == "all" ]; then
    for i in `xdragon-bdf blk  all`
    do
        eval $op $i
    done
    else
        eval $op $bdf
    fi
}
main $@

分别登录机器,保存脚本后执行  bash ./<name> bind all ,对网卡及硬盘进行绑定。

硬盘绑定后进行初始化及挂载操作:

#初始化

mkfs -t ext4 /dev/vdb
mkfs.ext4 /dev/vdc
mkfs.ext4 /dev/vdd
mkfs.ext4 /dev/vde
mkfs.ext4 /dev/vdf
mkfs.ext4 /dev/vdg
mkfs.ext4 /dev/vdh
mkfs.ext4 /dev/vdi
mkfs.ext4 /dev/vdj
mkfs.ext4 /dev/vdk


#挂载

mount /dev/vdb /mnt/disk1
mount /dev/vdc /mnt/disk2
mount /dev/vdd /mnt/disk3
mount /dev/vde /mnt/disk4
mount /dev/vdf /mnt/disk5
mount /dev/vdg /mnt/disk6
mount /dev/vdh /mnt/disk7
mount /dev/vdi /mnt/disk8
mount /dev/vdj /mnt/disk9
mount /dev/vdj /mnt/disk10





2,配置集群间互信

①每台节点配置/etc/hosts文件

11.png

各节点ssh-keygen生成RSA密钥和公钥

ssh-keygen -q -t rsa  -N "" -f  ~/.ssh/id_rsa

将所有的公钥文件汇总到一个总的授权key文件中

ssh 192.168.70.210 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh 192.168.70.213 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh 192.168.70.202 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh 192.168.70.201 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

④ 将该总授权key文件分发至其他各个机器中

scp ~/.ssh/authorized_keys 192.168.70.213:~/.ssh/

scp ~/.ssh/authorized_keys 192.168.70.202:~/.ssh/

scp ~/.ssh/authorized_keys 192.168.70.201:~/.ssh/

3,下载相关资源包

下载hadoop,jdk,spark安装包到/opt 路径下,本文以hadoop-2.7.7、spark-2.4.4及jdk1.8.0_291为例


下载Hadoop:   wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

下载spark :   wget https://archive.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz

下载jdk   :   wget https://download.oracle.com/otn/java/jdk/8u301-b09/d3c52aa6bfa54d3ca74e617f18309292/jdk-8u301-linux-x64.tar.gz

下载hive  :    wget https://archive.apache.org/dist/hive/hive-2.3.7/apache-hive-2.3.7-bin.tar.gz

(由于jdk版本众多,此下载链接仅作参考,实际可根据需求自行在官网下载jdk版本。配置jdk也需按照实际下载版本进行配置)

将3个压缩包分发至其余3个节点并解压

scp -r hadoop-2.7.7.tar.gz  jdk-8u291-linux-x64.tar.gz spark-2.4.4-bin-hadoop2.7  iZ8vbi4yik4sloxwhzcmxzZ:/opt

scp -r hadoop-2.7.7.tar.gz  jdk-8u291-linux-x64.tar.gz spark-2.4.4-bin-hadoop2.7  iZ8vb7zxw3jzrodh0htmgiZ:/opt

scp -r hadoop-2.7.7.tar.gz  jdk-8u291-linux-x64.tar.gz spark-2.4.4-bin-hadoop2.7  iZ8vb7zxw3jzrodh0htmgjZ:/opt

二、配置JDK

1,解压jdk压缩包

在/opt路径下解压jdk包。

2,配置环境变量

在/etc/profile中增加:

 export JAVA_HOME=/opt/jdk1.8.0_291/

 export JRE_HOME=/opt/jdk1.8.0_291/jre

 export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$HIVE_HOME/bin

  执行 source /etc/profile 使环境变量生效

三、安装Hadoop

1.配置core-site.xml文件

📎core-site.xml

·需在各节点中创建此路径

2.配置 hadoop-env.sh文件

📎hadoop-env.sh

export JAVA_HOME环境变量

 3.配置 yarn-env.sh

📎yarn-env.sh

修改JAVA_HOME为本机JAVA_HOME路径

4.配置hdfs-site.xml

📎hdfs-site.xml

5.配置mapred-site.xml

📎mapred-site.xml

配置yarn管理

6.配置yarn-site.xml

📎yarn-site.xml

7.配置slaves 文件

将主机名添加进slaves文件

hadoop3.0之后slaves文件改名为workers

8.配置hadoop的环境变量

在/etc/profile中增加:

export HADOOP_HOME=/opt/hadoop-2.7.7/

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

9.分发至各节点

将/opt/hadoop-2.7.7分发至其余各节点

10.格式化hdfs

主节点执行

hadoop namenode -format

11.开启hadoop

在master节点/opt/hadoop-2.7.7/路径下执行:

sbin/start-dfs.sh

sbin/start-yarn.sh

完成 后执行jps查询,master如下图所示,则配置成功。

四、安装Spark

1.配置环境变量

vi /etc/profile  中加入

export SPARK_HOME=/opt/spark-2.4.4-bin-hadoop2.7

分发/etc/profile文件至其余节点

1.配置slaves 文件

将主机名添加进slaves

vi /opt/spark-2.4.4-bin-hadoop2.7/conf/slaves

2.配置spark-env.sh

在/opt/spark-2.4.4-bin-hadoop2.7/conf 路径下执行

cp spark-env.sh.template spark-env.sh

在spark-env.sh中增加:

export SPARK_MASTER_HOST=iZ8vbi4yik4sloxwhzcmxuZ

export SPARK_LOCAL_IP=`/sbin/ip addr show eth0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1`

export SPARK_LOCAL_DIRS=/mnt/data/spark_tmp

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/

export SPARK_DAEMON_MEMORY=10g


3.配置spark-defaults.conf

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
# Example:
# spark.master                     spark://master1:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
# spark.serializer                 org.apache.spark.serializer.KryoSerializer
# spark.driver.memory              5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
spark.master yarn
spark.deploy-mode client
#driver
spark.driver.cores 4
spark.driver.memory 10g
spark.driver.maxResultSize 10g
##executor
spark.executor.instances 45
spark.executor.memory 13g
spark.executor.cores 4
#shuffle
spark.task.maxFailures 4
spark.default.parallelism  180
spark.sql.shuffle.partitions 180
spark.shuffle.compress true
spark.shuffle.spill.compress true
#other
spark.task.maxFailures 4
spark.kryoserializer.buffer 640k
spark.memory.storageFraction 0.5
spark.shuffle.file.buffer 32k
spark.kryoserializer.buffer.max 2000m
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.memory.fraction 0.6
spark.network.timeout 3600
spark.sql.broadcastTimeout 3600
spark.locality.wait=0s
#speculation
#spark.speculation=true
#spark.speculation.interval=300s
#spark.speculation.quantile=0.9
#spark.speculation.multiplier=1.5
#aqe
spark.sql.adaptive.enabled true
spark.sql.autoBroadcastJoinThreshold 128m
spark.sql.adaptive.advisoryPartitionSizeInBytes 128MB
spark.sql.adaptive.coalescePartitions.minPartitionNum 1
spark.sql.adaptive.coalescePartitions.initialPartitionNum 180
spark.sql.adaptive.forceApply true
spark.sql.adaptive.coalescePartitions.enabled   true
spark.sql.adaptive.localShuffleReader.enabled   true
spark.sql.adaptive.skewJoin.enabled     true
spark.sql.adaptive.skewJoin.skewedPartitionFactor 5
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes 256m
#DF
#spark.sql.optimizer.dynamicPartitionPruning.enabled  false
#spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly true
spark.sql.optimizer.dynamicDataPruning.pruningSideThreshold 10GB
#spark.sql.optimizer.dynamicDataPruning.enabled false
#cbo
#spark.sql.statistics.histogram.enabled true
#spark.sql.statistics.histogram.numBins 32
spark.sql.cbo.enabled true
spark.sql.cbo.joinReorder.enabled true
spark.sql.cbo.planStats.enabled  true
spark.sql.cbo.starSchemaDetection true
spark.sql.cbo.joinReorder.card.weight  0.6
spark.sql.cbo.joinReorder.ga.enabled true
spark.sql.autoBroadcastJoinRowThreshold 500000
#log
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master1:9000/sparklogs
spark.eventLog.compress true
log4j.logger.org.apache.storage.ShuffleBlockFetcherIterator TRACE



在hdfs中创建/sparklogs目录

hadoop fs -mkdir /sparklogs

分发/opt/spark-2.4.4-bin-hadoop2.7至其余节点

4.启动spark

在/opt/spark-2.4.4-bin-hadoop2.7路径下执行:

sbin/start-all.sh

jps执行如下图,则启动成功。


五、安装Mysql

1,下载安装包(以8.0.21版本为例)

#下载

wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.21-linux-glibc2.12-x86_64.tar.xz

#解压

tar -xf mysql-8.0.21-linux-glibc2.12-x86_64.tar.xz

2,设置mysql目录

#将解压的文件移动到/usr/local下,并重命名为mysql

 mv mysql-8.0.21-linux-glibc2.12-x86_64 /usr/local/mysql

3.创建data文件夹,并授权

cd /usr/local/mysql/

# 创建文件夹

mkdir data

# 给文件夹授权

chown -R root:root /usr/local/mysql

chmod -R 755  /usr/local/mysql

4,初始化数据库

/usr/local/mysql/bin/mysqld --initialize --user=root --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data

5.配置my.cnf

cd /usr/local/mysql/support-files/
touch my-default.cnf
chmod 777 my-default.cnf
cp /usr/local/mysql/support-files/my-default.cnf /etc/my.cnf
vi /etc/my.cnf
#my.cnf中添加:
---------------------------------------------------------------------------------------------------------------------------------------------
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
# These are commonly set, remove the # and set as required.
[mysqld]
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
socket = /tmp/mysql.sock
#socket =/var/lib/mysql/mysql.socket
log-error = /usr/local/mysql/data/error.log
pid-file = /usr/local/mysql/data/mysql.pid
user = root
tmpdir = /tmp
port = 3306
#skip-grant-tables
#lower_case_table_names = 1
# server_id = .....
# socket = .....
#lower_case_table_names = 1
max_allowed_packet=32M
default-authentication-plugin = mysql_native_password
#lower_case_file_system = on
#lower_case_table_names = 1
log_bin_trust_function_creators = ON
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES


6.设置开机自启

cd /usr/local/mysql/support-files

cp mysql.server /etc/init.d/mysql

chmod +x /etc/init.d/mysql

7. 注册服务并检测

#注册

chkconfig --add mysql

#检测

chkconfig --list mysql


###UBUNTU
echo 'deb http://archive.ubuntu.com/ubuntu/ trusty main universe restricted multiverse' >>/etc/apt/sources.list
sudo apt-get update
sudo apt-get install sysv-rc-conf
sysv-rc-conf --list




8. 配置/etc/ld.so.conf

vim /etc/ld.so.conf

# 添加如下内容:

/usr/local/mysql/lib

9. 配置环境变量

vim /etc/profile

# 添加如下内容:

# MYSQL ENVIRONMENT

export PATH=$PATH:/usr/local/mysql/bin:/usr/local/mysql/lib


source /etc/profile


10.启动mysql

service mysql start

11.登录mysql

mysql -uroot -p

使用临时密码登录,修改root用户密码,设置root远程登录

#报错:Your password does not satisfy the current policy requirements

#可设置弱口令(mysql8.0)

set global validate_password.policy=0;

set global validate_password.length=1;

之后再修改密码

mysql> alter user user() identified by "123456";
Query OK, 0 rows affected (0.01 sec)
mysql> ALTER user 'root'@'localhost' IDENTIFIED BY '123456';
Query OK, 0 rows affected (0.00 sec)
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> update user set host = '%' where user = 'root';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)
用户授权
grant all privileges on  *.* to 'hive'@'%';





六、安装Hive

以hive-2.3.7为例

1.将压缩包解压至/opt下

2. 配置环境变量

vim /etc/profile

#hive

export HIVE_HOME=/opt/apache-hive-2.3.7-bin

3. 配置hive-site.xml

vim /opt/apache-hive-2.3.7-bin/conf/hive-site.xml

📎hive-site.xml

4.拷贝mysql的驱动程序

#下载mysql-connector-java-8.0.21.zip

https://dev.mysql.com/downloads/file/?id=496589

将下载的mysql-connector-java-8.0.21.zip解压后拷贝至/opt/apache-hive-2.3.7-bin/lib下

5.hive初始化

执行

schematool -dbType mysql -initSchema

[报错]

[解决]

一,java.lang.NoSuchMethodError

原因:1.系统找不到相关jar包

2.同一类型的 jar 包有不同版本存在,系统无法决定使用哪一个

二,com.google.common.base.Preconditions.checkArgument

根据百度可知,该类来自于guava.jar

三,查看该jar包在hadoop和hive中的版本信息

hadoop-3.2.1(路径:/opt/hadoop-3.2.1/share/hadoop/common/lib)中该jar包为  guava-27.0-jre.jar

hive-2.3.6(路径:hive/lib)中该jar包为guava-14.0.1.jar

四,解决方案

删除hive中低版本的guava-14.0.1.jar包,将hadoop中的guava-27.0-jre.jar复制到hive的lib目录下即可

[报错]

[解决]

修改hive-site.xml配置

<configuration>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>
   <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    </property>
  <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.cj.jdbc.Driver</value>
    </property>
 <property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
  </property>
  <property>
                <name>hive.metastore.schema.verification</name>
                <value>false</value>
        </property>
</configuration>



6.在mysql增加hive用户

mysql> create user 'hive'@'localhost' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all privileges on *.* to 'hive'@'localhost';
Query OK, 0 rows affected (0.00 sec)
mysql> create user 'hive'@'%' identified by '123456';
Query OK, 0 rows affected (0.00 sec)
mysql> grant all privileges on *.* to 'hive'@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

7.启动hive

执行

hive




相关实践学习
一小时快速掌握 SQL 语法
本实验带您学习SQL的基础语法,快速入门SQL。
7天玩转云服务器
云服务器ECS(Elastic Compute Service)是一种弹性可伸缩的计算服务,可降低 IT 成本,提升运维效率。本课程手把手带你了解ECS、掌握基本操作、动手实操快照管理、镜像管理等。了解产品详情:&nbsp;https://www.aliyun.com/product/ecs
目录
相关文章
|
1天前
|
监控 Cloud Native 测试技术
云原生之使用Docker部署ServerBee服务器监控工具
【5月更文挑战第6天】云原生之使用Docker部署ServerBee服务器监控工具
10 1
|
3天前
|
Devops jenkins 网络安全
【DevOps】(四)jekins服务器ssh部署
【DevOps】(四)jekins服务器ssh部署
11 1
|
4天前
|
存储 安全 网络协议
云服务器 Centos7 部署 Elasticsearch 8.0 + Kibana 8.0 指南
云服务器 Centos7 部署 Elasticsearch 8.0 + Kibana 8.0 指南
13 0
|
4天前
|
安全 Linux 网络安全
Linux _ apache服务器部署 不同域名—访问不同网站(多网站)
Linux _ apache服务器部署 不同域名—访问不同网站(多网站)
|
5天前
|
PyTorch TensorFlow 算法框架/工具
【科研入门】搭建与配置云服务器的论文环境
本文介绍了如何搭建云服务器并配置论文代码环境,以AutoDL平台为例。首先,租用服务器并选择符合代码需求的镜像版本,如Python 3.7、TensorFlow 1.15和PyTorch。接着,启动服务器进入终端,克隆项目代码并使用Conda创建隔离的环境安装所需包。如果需在Pycharm中工作,还需在Pycharm内创建相同环境。最后,根据项目配置安装Tensorflow和PyTorch,遇到缺失包时通过`pip install`补充。完成配置后,可克隆服务器以备后续使用。遇到版本不兼容问题,可调整Conda环境的Python版本。
16 1
【科研入门】搭建与配置云服务器的论文环境
|
5天前
|
弹性计算 运维 数据安全/隐私保护
【雾锁王国开服】阿里云一键部署雾锁王国联机服务器详细教程
阿里云提供雾锁王国服务器搭建教程,借助计算巢服务,用户可在3分钟内创建Enshrouded游戏服务器。8核32G服务器1个月109元,3个月327元;4核16G10M带宽1个月30元,3个月90元。需先注册并实名认证阿里云账号,然后通过傻瓜式一键部署入口进行购买和设置,包括地域、购买时长、服务器参数等。部署完成后,分享服务器信息给游戏伙伴,即可开始游戏。详细教程和更多配置信息可在阿里云ECS产品页查看。
23 0
|
9天前
|
Java
如何解决使用若依前后端分离打包部署到服务器上后主包无法找到从包中的文件的问题?如何在 Java 代码中访问 jar 包中的资源文件?
如何解决使用若依前后端分离打包部署到服务器上后主包无法找到从包中的文件的问题?如何在 Java 代码中访问 jar 包中的资源文件?
45 0
|
10天前
|
数据安全/隐私保护 Windows
使用Serv-U FTP服务器共享文件,实现无公网IP环境下远程访问-2
使用Serv-U FTP服务器共享文件,实现无公网IP环境下远程访问
|
10天前
|
存储 网络协议 文件存储
使用Serv-U FTP服务器共享文件,实现无公网IP环境下远程访问-1
使用Serv-U FTP服务器共享文件,实现无公网IP环境下远程访问
|
11天前
本地部署Jellyfin影音服务器并实现远程访问影音库-2
本地部署Jellyfin影音服务器并实现远程访问影音库

热门文章

最新文章