本参考文档使用4节点机器集群进行部署,操作用户为root
·部署Hadoop环境
一、机器初始化
1,绑定机器的网卡和硬盘(此步骤只针对未绑定网卡和硬盘的erdma机器,大部分机器开出来网卡和硬盘是绑定的)
查看如图即为绑定成功。
申请到ecs机器后,需绑定机器的网卡和硬盘。具体步骤如下:
绑定与解绑 rdma 网卡脚本:
bind() { echo $1 > /sys/bus/pci/drivers/iohub_sriov/unbind echo $1 > /sys/bus/pci/drivers/virtio-pci/bind } ubind() { echo $1 > /sys/bus/pci/drivers/virtio-pci/unbind echo $1 > /sys/bus/pci/drivers/iohub_sriov/bind } main() { local op=$1 local bdf=$2 if [ $bdf == "all" ]; then for i in `xdragon-bdf net all` do eval $op $i done else eval $op $bdf fi } main $@
绑定与解绑 硬盘 脚本:
bind() { echo $1 > /sys/bus/pci/drivers/iohub_sriov/unbind echo $1 > /sys/bus/pci/drivers/virtio-pci/bind } ubind() { echo $1 > /sys/bus/pci/drivers/virtio-pci/unbind echo $1 > /sys/bus/pci/drivers/iohub_sriov/bind } main() { local op=$1 local bdf=$2 if [ $bdf == "all" ]; then for i in `xdragon-bdf blk all` do eval $op $i done else eval $op $bdf fi } main $@
分别登录机器,保存脚本后执行 bash ./<name> bind all ,对网卡及硬盘进行绑定。
硬盘绑定后进行初始化及挂载操作:
#初始化
mkfs -t ext4 /dev/vdb mkfs.ext4 /dev/vdc mkfs.ext4 /dev/vdd mkfs.ext4 /dev/vde mkfs.ext4 /dev/vdf mkfs.ext4 /dev/vdg mkfs.ext4 /dev/vdh mkfs.ext4 /dev/vdi mkfs.ext4 /dev/vdj mkfs.ext4 /dev/vdk
#挂载
mount /dev/vdb /mnt/disk1 mount /dev/vdc /mnt/disk2 mount /dev/vdd /mnt/disk3 mount /dev/vde /mnt/disk4 mount /dev/vdf /mnt/disk5 mount /dev/vdg /mnt/disk6 mount /dev/vdh /mnt/disk7 mount /dev/vdi /mnt/disk8 mount /dev/vdj /mnt/disk9 mount /dev/vdj /mnt/disk10
2,配置集群间互信
①每台节点配置/etc/hosts文件
②各节点ssh-keygen生成RSA密钥和公钥
ssh-keygen -q -t rsa -N "" -f ~/.ssh/id_rsa
③ 将所有的公钥文件汇总到一个总的授权key文件中
ssh 192.168.70.210 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh 192.168.70.213 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh 192.168.70.202 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh 192.168.70.201 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
④ 将该总授权key文件分发至其他各个机器中
scp ~/.ssh/authorized_keys 192.168.70.213:~/.ssh/
scp ~/.ssh/authorized_keys 192.168.70.202:~/.ssh/
scp ~/.ssh/authorized_keys 192.168.70.201:~/.ssh/
3,下载相关资源包
下载hadoop,jdk,spark安装包到/opt 路径下,本文以hadoop-2.7.7、spark-2.4.4及jdk1.8.0_291为例
下载Hadoop: wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
下载spark : wget https://archive.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
下载jdk : wget https://download.oracle.com/otn/java/jdk/8u301-b09/d3c52aa6bfa54d3ca74e617f18309292/jdk-8u301-linux-x64.tar.gz
下载hive : wget https://archive.apache.org/dist/hive/hive-2.3.7/apache-hive-2.3.7-bin.tar.gz
(由于jdk版本众多,此下载链接仅作参考,实际可根据需求自行在官网下载jdk版本。配置jdk也需按照实际下载版本进行配置)
将3个压缩包分发至其余3个节点并解压
scp -r hadoop-2.7.7.tar.gz jdk-8u291-linux-x64.tar.gz spark-2.4.4-bin-hadoop2.7 iZ8vbi4yik4sloxwhzcmxzZ:/opt
scp -r hadoop-2.7.7.tar.gz jdk-8u291-linux-x64.tar.gz spark-2.4.4-bin-hadoop2.7 iZ8vb7zxw3jzrodh0htmgiZ:/opt
scp -r hadoop-2.7.7.tar.gz jdk-8u291-linux-x64.tar.gz spark-2.4.4-bin-hadoop2.7 iZ8vb7zxw3jzrodh0htmgjZ:/opt
二、配置JDK
1,解压jdk压缩包
在/opt路径下解压jdk包。
2,配置环境变量
在/etc/profile中增加:
export JAVA_HOME=/opt/jdk1.8.0_291/
export JRE_HOME=/opt/jdk1.8.0_291/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$HIVE_HOME/bin
执行 source /etc/profile 使环境变量生效
三、安装Hadoop
1.配置core-site.xml文件
·需在各节点中创建此路径
2.配置 hadoop-env.sh文件
export JAVA_HOME环境变量
3.配置 yarn-env.sh
修改JAVA_HOME为本机JAVA_HOME路径
4.配置hdfs-site.xml
5.配置mapred-site.xml
配置yarn管理
6.配置yarn-site.xml
7.配置slaves 文件
将主机名添加进slaves文件
hadoop3.0之后slaves文件改名为workers
8.配置hadoop的环境变量
在/etc/profile中增加:
export HADOOP_HOME=/opt/hadoop-2.7.7/
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
9.分发至各节点
将/opt/hadoop-2.7.7分发至其余各节点
10.格式化hdfs
主节点执行
hadoop namenode -format
11.开启hadoop
在master节点/opt/hadoop-2.7.7/路径下执行:
sbin/start-dfs.sh
sbin/start-yarn.sh
完成 后执行jps查询,master如下图所示,则配置成功。
四、安装Spark
1.配置环境变量
vi /etc/profile 中加入
export SPARK_HOME=/opt/spark-2.4.4-bin-hadoop2.7
分发/etc/profile文件至其余节点
1.配置slaves 文件
将主机名添加进slaves
vi /opt/spark-2.4.4-bin-hadoop2.7/conf/slaves
2.配置spark-env.sh
在/opt/spark-2.4.4-bin-hadoop2.7/conf 路径下执行
cp spark-env.sh.template spark-env.sh
在spark-env.sh中增加:
export SPARK_MASTER_HOST=iZ8vbi4yik4sloxwhzcmxuZ
export SPARK_LOCAL_IP=`/sbin/ip addr show eth0 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1`
export SPARK_LOCAL_DIRS=/mnt/data/spark_tmp
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/
export SPARK_DAEMON_MEMORY=10g
3.配置spark-defaults.conf
# # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Default system properties included when running spark-submit. # This is useful for setting default environmental settings. # Example: # spark.master spark://master1:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three" spark.master yarn spark.deploy-mode client #driver spark.driver.cores 4 spark.driver.memory 10g spark.driver.maxResultSize 10g ##executor spark.executor.instances 45 spark.executor.memory 13g spark.executor.cores 4 #shuffle spark.task.maxFailures 4 spark.default.parallelism 180 spark.sql.shuffle.partitions 180 spark.shuffle.compress true spark.shuffle.spill.compress true #other spark.task.maxFailures 4 spark.kryoserializer.buffer 640k spark.memory.storageFraction 0.5 spark.shuffle.file.buffer 32k spark.kryoserializer.buffer.max 2000m spark.serializer org.apache.spark.serializer.KryoSerializer spark.memory.fraction 0.6 spark.network.timeout 3600 spark.sql.broadcastTimeout 3600 spark.locality.wait=0s #speculation #spark.speculation=true #spark.speculation.interval=300s #spark.speculation.quantile=0.9 #spark.speculation.multiplier=1.5 #aqe spark.sql.adaptive.enabled true spark.sql.autoBroadcastJoinThreshold 128m spark.sql.adaptive.advisoryPartitionSizeInBytes 128MB spark.sql.adaptive.coalescePartitions.minPartitionNum 1 spark.sql.adaptive.coalescePartitions.initialPartitionNum 180 spark.sql.adaptive.forceApply true spark.sql.adaptive.coalescePartitions.enabled true spark.sql.adaptive.localShuffleReader.enabled true spark.sql.adaptive.skewJoin.enabled true spark.sql.adaptive.skewJoin.skewedPartitionFactor 5 spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes 256m #DF #spark.sql.optimizer.dynamicPartitionPruning.enabled false #spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly true spark.sql.optimizer.dynamicDataPruning.pruningSideThreshold 10GB #spark.sql.optimizer.dynamicDataPruning.enabled false #cbo #spark.sql.statistics.histogram.enabled true #spark.sql.statistics.histogram.numBins 32 spark.sql.cbo.enabled true spark.sql.cbo.joinReorder.enabled true spark.sql.cbo.planStats.enabled true spark.sql.cbo.starSchemaDetection true spark.sql.cbo.joinReorder.card.weight 0.6 spark.sql.cbo.joinReorder.ga.enabled true spark.sql.autoBroadcastJoinRowThreshold 500000 #log spark.eventLog.enabled true spark.eventLog.dir hdfs://master1:9000/sparklogs spark.eventLog.compress true log4j.logger.org.apache.storage.ShuffleBlockFetcherIterator TRACE
在hdfs中创建/sparklogs目录
hadoop fs -mkdir /sparklogs
分发/opt/spark-2.4.4-bin-hadoop2.7至其余节点
4.启动spark
在/opt/spark-2.4.4-bin-hadoop2.7路径下执行:
sbin/start-all.sh
jps执行如下图,则启动成功。
五、安装Mysql
1,下载安装包(以8.0.21版本为例)
#下载
wget https://dev.mysql.com/get/Downloads/MySQL-8.0/mysql-8.0.21-linux-glibc2.12-x86_64.tar.xz
#解压
tar -xf mysql-8.0.21-linux-glibc2.12-x86_64.tar.xz
2,设置mysql目录
#将解压的文件移动到/usr/local下,并重命名为mysql
mv mysql-8.0.21-linux-glibc2.12-x86_64 /usr/local/mysql
3.创建data文件夹,并授权
cd /usr/local/mysql/
# 创建文件夹
mkdir data
# 给文件夹授权
chown -R root:root /usr/local/mysql
chmod -R 755 /usr/local/mysql
4,初始化数据库
/usr/local/mysql/bin/mysqld --initialize --user=root --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data
5.配置my.cnf
cd /usr/local/mysql/support-files/ touch my-default.cnf chmod 777 my-default.cnf cp /usr/local/mysql/support-files/my-default.cnf /etc/my.cnf vi /etc/my.cnf #my.cnf中添加: --------------------------------------------------------------------------------------------------------------------------------------------- # Remove leading # and set to the amount of RAM for the most important data # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%. # innodb_buffer_pool_size = 128M # Remove leading # to turn on a very important data integrity option: logging # changes to the binary log between backups. # log_bin # These are commonly set, remove the # and set as required. [mysqld] basedir = /usr/local/mysql datadir = /usr/local/mysql/data socket = /tmp/mysql.sock #socket =/var/lib/mysql/mysql.socket log-error = /usr/local/mysql/data/error.log pid-file = /usr/local/mysql/data/mysql.pid user = root tmpdir = /tmp port = 3306 #skip-grant-tables #lower_case_table_names = 1 # server_id = ..... # socket = ..... #lower_case_table_names = 1 max_allowed_packet=32M default-authentication-plugin = mysql_native_password #lower_case_file_system = on #lower_case_table_names = 1 log_bin_trust_function_creators = ON # Remove leading # to set options mainly useful for reporting servers. # The server defaults are faster for transactions and fast SELECTs. # Adjust sizes as needed, experiment to find the optimal values. # join_buffer_size = 128M # sort_buffer_size = 2M # read_rnd_buffer_size = 2M sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
6.设置开机自启
cd /usr/local/mysql/support-files
cp mysql.server /etc/init.d/mysql
chmod +x /etc/init.d/mysql
7. 注册服务并检测
#注册
chkconfig --add mysql
#检测
chkconfig --list mysql
###UBUNTU echo 'deb http://archive.ubuntu.com/ubuntu/ trusty main universe restricted multiverse' >>/etc/apt/sources.list sudo apt-get update sudo apt-get install sysv-rc-conf sysv-rc-conf --list
8. 配置/etc/ld.so.conf
vim /etc/ld.so.conf
# 添加如下内容:
/usr/local/mysql/lib
9. 配置环境变量
vim /etc/profile
# 添加如下内容:
# MYSQL ENVIRONMENT
export PATH=$PATH:/usr/local/mysql/bin:/usr/local/mysql/lib
source /etc/profile
10.启动mysql
service mysql start
11.登录mysql
mysql -uroot -p
使用临时密码登录,修改root用户密码,设置root远程登录
#报错:Your password does not satisfy the current policy requirements
#可设置弱口令(mysql8.0)
set global validate_password.policy=0;
set global validate_password.length=1;
之后再修改密码
mysql> alter user user() identified by "123456"; Query OK, 0 rows affected (0.01 sec) mysql> ALTER user 'root'@'localhost' IDENTIFIED BY '123456'; Query OK, 0 rows affected (0.00 sec) mysql> use mysql; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> update user set host = '%' where user = 'root'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) 用户授权 grant all privileges on *.* to 'hive'@'%';
六、安装Hive
以hive-2.3.7为例
1.将压缩包解压至/opt下
2. 配置环境变量
vim /etc/profile
#hive
export HIVE_HOME=/opt/apache-hive-2.3.7-bin
3. 配置hive-site.xml
vim /opt/apache-hive-2.3.7-bin/conf/hive-site.xml
4.拷贝mysql的驱动程序
#下载mysql-connector-java-8.0.21.zip
https://dev.mysql.com/downloads/file/?id=496589
将下载的mysql-connector-java-8.0.21.zip解压后拷贝至/opt/apache-hive-2.3.7-bin/lib下
5.hive初始化
执行
schematool -dbType mysql -initSchema
[报错]
[解决]
一,java.lang.NoSuchMethodError
原因:1.系统找不到相关jar包
2.同一类型的 jar 包有不同版本存在,系统无法决定使用哪一个
二,com.google.common.base.Preconditions.checkArgument
根据百度可知,该类来自于guava.jar
三,查看该jar包在hadoop和hive中的版本信息
hadoop-3.2.1(路径:/opt/hadoop-3.2.1/share/hadoop/common/lib)中该jar包为 guava-27.0-jre.jar
hive-2.3.6(路径:hive/lib)中该jar包为guava-14.0.1.jar
四,解决方案
删除hive中低版本的guava-14.0.1.jar包,将hadoop中的guava-27.0-jre.jar复制到hive的lib目录下即可
[报错]
[解决]
修改hive-site.xml配置
<configuration> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> </property> <property> <name>datanucleus.schema.autoCreateAll</name> <value>true</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> </configuration>
6.在mysql增加hive用户
mysql> create user 'hive'@'localhost' identified by '123456'; Query OK, 0 rows affected (0.00 sec) mysql> grant all privileges on *.* to 'hive'@'localhost'; Query OK, 0 rows affected (0.00 sec) mysql> create user 'hive'@'%' identified by '123456'; Query OK, 0 rows affected (0.00 sec) mysql> grant all privileges on *.* to 'hive'@'%'; Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec)
7.启动hive
执行
hive