两种配置大数据环境的方法Ambari以及hadoop源代码安装的步骤

本文涉及的产品
云原生大数据计算服务 MaxCompute,5000CU*H 100GB 3个月
云原生网关 MSE Higress,422元/月
服务治理 MSE Sentinel/OpenSergo,Agent数量 不受限
简介: 1.Ambari安装 Ambari & HDP(Hortonworks Data Platform) ***************************************************************************************************** Base: 0.操作系统原则与对应的HDP对应的版本。

1.Ambari安装



Ambari & HDP(Hortonworks Data Platform)
*****************************************************************************************************
Base:
0.操作系统原则与对应的HDP对应的版本。rhel6 or rhel7
1.操作系统原则完全安装(Desktop),所有的包都安装。
2.关闭防火墙,IPV6等服务(海涛Python脚本)。SELinux-->>IPv6-->>Iptables
_____________________________________________________________


SELINUX:


vim /etc/selinux/config
SELINUX=disabled
或者:
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config;
_____________________________________________________________
IPV6:


chkconfig ip6tables off


cat>>/etc/modprobe.d/ECS.conf<<EOF
alias net-pf-10 off
alias ipv6 off
EOF


cat>>/etc/sysconfig/network<<EOF
NETWORKING_IPV6=off 
EOF


cat>>/etc/modprobe.d/disable-ipv6.conf<<EOF
install ipv6 /bin/true
EOF


cat>>/etc/modprobe.d/dist.conf<<EOF
alias net-pf-10 off
alias ipv6 off
EOF


cat>>/etc/sysctl.conf<<EOF
net.ipv6.conf.all.disable_ipv6 = 1
EOF
_____________________________________________________________


iptables:


chkconfig iptables off
_____________________________________________________________
ONBOOT:
sed -i 's/ONBOOT=no/ONBOOT=yes/g' /etc/sysconfig/network-scripts/ifcfg-eth0
sed -i 's/ONBOOT=no/ONBOOT=yes/g' /etc/sysconfig/network-scripts/ifcfg-eth1
sed -i 's/ONBOOT=no/ONBOOT=yes/g' /etc/sysconfig/network-scripts/ifcfg-eth2
sed -i 's/ONBOOT=no/ONBOOT=yes/g' /etc/sysconfig/network-scripts/ifcfg-eth3
sed -i 's/ONBOOT=no/ONBOOT=yes/g' /etc/sysconfig/network-scripts/ifcfg-eth4
_____________________________________________________________


Swap Closed


cat >> /etc/sysctl.conf << EOF
vm.swappiness=0
EOF
_____________________________________________________________
Time Zone:


cp  /usr/share/zoneinfo/Asia/Shanghai  /etc/localtime
_____________________________________________________________


*****************************************************************************************************
/etc/sysconfig/network
Hostname
*****************************************************************************************************


/etc/hosts:


127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost
172.31.200.7 data1
172.31.200.8 data2
172.31.200.9 data3


why not?


127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.200.7 data1
172.31.200.8 data2
172.31.200.9 data3
*****************************************************************************************************


PackageKit


pkill -9 packagekitd
vim /etc/yum/pluginconf.d/refresh-packagekit.conf
enabled=0


*****************************************************************************************************


THP(Transparent Huge Pages):


echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag


*****************************************************************************************************


ulimit & nproc


[root@data2 yum.repos.d]# vim /etc/security/limits.conf


soft nproc 16384
hard nproc 16384
soft nofile 65536
hard nofile 65536
*****************************************************************************************************


REBOOT all the machine


*****************************************************************************************************
REPO for rhel:


first:


[root@server2 opt]# cd /etc/yum.repos.d/
[root@server2 yum.repos.d]# ls -al


drwxr-xr-x.   2 root root  4096 3月  22 04:02 .
drwxr-xr-x. 182 root root 16384 4月  14 22:27 ..
-rw-r--r--.   1 root root  1991 10月 23 2014 CentOS-Base.repo
-rw-r--r--.   1 root root   647 10月 23 2014 CentOS-Debuginfo.repo
-rw-r--r--.   1 root root   289 10月 23 2014 CentOS-fasttrack.repo
-rw-r--r--.   1 root root   630 10月 23 2014 CentOS-Media.repo
-rw-r--r--.   1 root root  5394 10月 23 2014 CentOS-Vault.repo
-rw-r--r--.   1 root root   270 12月 15 14:36 cloudera.repo
-rw-r--r--.   1 root root   134 12月  8 08:31 rhel65.repo




rm -rf ALL
---->>>>>>we don't get internet connection.




second:


[root@data2 yum.repos.d]# cat centos6.6.repo 
[centos6]
name=cloudera
baseurl=http://172.31.200.216/centos6
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release


scp /etc/yum.repos.d/centos6.6.repo root@Hostname:/etc/yum.repos.d/


yum clean all
yum search lib*


*****************************************************************************************************


SSH:
yum install openssl
yum upgrade openssl


rm -rf ~/.ssh/*
ssh-keygen  -t rsa -f ~/.ssh/id_rsa  -N ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys


scp -r ~/.ssh root@172.31.200.8:~/.ssh
chmod 600 ~/.ssh
注意:chmod 777 为什么不行???
*****************************************************************************************************


jdk:


rpm -ivh jdk-7XXX-linux-XXXX.rpm
echo "JAVA_HOME=/usr/java/latest/">> /etc/environment
java -version
*****************************************************************************************************
NTP:


ntp-master node
 
[root@data1 yum.repos.d]# vim /etc/ntp.conf


server data1 prefer
server 127.127.1.0
fudge 127.127.1.0 stratum 10


service ntpd restart
[root@data1 yum.repos.d]# chkconfig --list ntpd






ntp-master node


/var/spool/cron/root<<EOF
*/10 * * * * /usr/sbin/ntpdate NameNode && /sbin/clock -w
EOF


service ntpd restart


ntpdate -u NameNode


*****************************************************************************************************


/var/www/html:


which httpd


or 


yum install httpd


tar -xzf HDP-UTILS-1.1.0.20-centos6.tar.gz
tar -xzf AMBARI-2.1.2-377-centos6.tar.gz
tar -xzf HDP-2.3.0.0-centos6-rpm.tar.gz


check whether the listening port of http service is blocked.
---->>>>netstat -nltp | grep 80
---->>>>vim /etc/httpd/conf/httpd.conf
change value of the default port


service httpd start


*****************************************************************************************************
Repo for HDP & Ambari


[root@data2 yum.repos.d]# cat ambari.repo 
[Updates-ambari-2.1.2]
name=ambari-2.1.2-Updates
baseurl=http://data1/AMBARI-2.1.2/centos6
gpgcheck=0
enabled=1


[HDP-2.3.0.0]
name=HDP Version-HDP-2.3.0.0
baseurl=http://data1/HDP/centos6/2.x/updates/2.3.0.0
gpgcheck=0
enabled=1




[HDP-UTILS-1.1.0.20]
name=HDP Utils Version - HDP-UTILS-1.1.0.20
baseurl=http://data1/HDP-UTILS-1.1.0.20/repos/centos6
gpgcheck=0
enabled=1


scp /etc/yum.repos.d/ambari.repo root@Hostname:/etc/yum.repos.d/
yum clean all
yum search ambari-agent
yum search Oozie
yum search gangli


*****************************************************************************************************


SO Address:


http://172.31.200.7/HDP/centos6/2.x/updates/2.3.0.0
http://172.31.200.7/HDP-UTILS-1.1.0.20/repos/centos6


*****************************************************************************************************
yum clean all
yum search ambari-server
yum search ambari-agent
yum search oozie
yum remove *****






Master:
yum install ambari-server
yum install ambari-agent
ambari-agent start
conf of ambari server:
/etc/ambari-server/conf/ambari.properties

Slave:
yum install ambari-agent
ambari-agent start


 
ambari-server start 


ambari-server setup -j /usr/java/jdk1.7.0_71/   
--->>>>Run the setup command to configure your Ambari Server, Database, JDK, LDAP, and other options:
--->>>>enter numeric number(n means default)
ambari-server start




http://MasterHostName:8080
Account:admin  Password:admin


*****************************************************************************************************


Logs to see student:
See the log:


cat /var/log/ambari-agent/ambari-agent.lo


cat /var/log/ambari-server/ambari-server.log


*****************************************************************************************************


To Do:


HDFS:
[root@data1 yum.repos.d]# su hdfs -c "hadoop fs -ls /"
[root@data1 yum.repos.d]# su hdfs -c "hadoop fs -mkdir /lgd"


MR:




Spark:




HBase:




Hive:




ES:


*******************************************************************************************************


FAQ


1, The hostname of the machine better be Fully Qualified Domain Name---->>>>>>>hoastname.domain,such as,data.hdp.worker1


2, Zookeeper-Agent端修改Server指向的HOSTNAME, /etc/ambari-agent/conf/ambari-agent.ini,如修改过主机hostname
安装失败后或重新安装先执行ambari-server reset 后 ambari-setup
3, 最后一步安装可能会失败,多数原因是下载包错误引起的,可重复安装直到成功,本人反复几个最终成功了,网络,网络,尤其就朝民,各种干扰!
4, 如果遇到访问https://xxx:8440/ca的错误,升级openssl就可以。
5,Heartbeat lost for the host错误,检查出错节点的ambari-agent是否停止,ambari-angent是python脚本运行的,
可能遇到没有捕捉到的异常,导致进程crash或者停止了。
6,App Timeline server安装出错,retry解决。
7,如果出现乱码:echo 'LANG="en_US.UTF-8"' > /etc/sysconfig/i18n,修改字符集即可解决!
8, 如果安装linux的时候基础包未选择,缺包可以制作cdrom挂载,来安装即可解决!
9, selinux开启 导致本地yum源访问403
10, centosos6.5 openssh 版本bug 导致 agent安装失败,解决 yum upgrade openssl
11, 


*******************************************************************************************************


总结:


1,日志查看,追溯问题。
2,如果要安装一切顺利,可在安装操作系统时把linux基础组件一并安装!
补救方案为:
yum groupinstall "Compatibility libraries" "Base" "Development tools"
yum groupinstall "debugging Tools" "Dial-up Networking Support"

3,




*******************************************************************************************************
备注: + Ambari安装的环境路径:


各台机器的安装目录:


/usr/lib/hadoop
/usr/lib/hbase
/usr/lib/zookeeper
/usr/lib/hcatalog
/usr/lib/hive 


+ Log路径, 这里需要看出错信息都可以在目录下找到相关的日志 


/var/log/hadoop
/var/log/hbase


+ 配置文件的路径 


/etc/hadoop
/etc/hbase
/etc/hive


+ HDFS的存储路径 


/hadoop/hdfs


*******************************************************************************************************






其他1:


安装过程中使用了桌面,火狐等安装命令
yum install firefox
yum groupinstall -y “Desktop” “Desktop Platform” “Desktop Platform
Development”  “Fonts”  “General Purpose Desktop”  “Graphical
Administration Tools”  “Graphics Creation Tools”  “Input Methods”  “X
Window System”  “Chinese Support [zh]” “Internet Browser”
iso yum 源来安装一些基础包
sudo mount -o loop /home/whoami/rhel-server-6.7-x86_64-dvd.iso /mnt/cdimg/
$ cat rhel-source.repo
[rhel-Server]
name=Red Hat Server
baseurl=file:///mnt/cdimg
enable=1
gpgcheck=0




*******************************************************************************************************


其他2:


Ambari配置时在Confirm Hosts的步骤时,中间遇到一个很奇怪的问题:总是报错误:
Ambari agent machine hostname (localhost.localdomain) does not match expected ambari server hostname (xxx).
后来修改的/etc/hosts文件中
修改前:
127.0.0.1   xxx localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         xxx localhost localhost.localdomain localhost6 localhost6.localdomain6
修改后:
127.0.0.1   xxx localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         xxx
感觉应该是走的ipv6协议,很奇怪,不过修改后就可以了。







2.hadoop源代码配置

首先配置hosts文件关联主机名和ip地址
host1=
host2=
host3=


=== security shell
rm -rf ~/.ssh/*
ssh-keygen  -t rsa -f ~/.ssh/id_rsa  -N ''
ssh-copy-id -o StrictHostKeyChecking=no $remothostname
ssh $remothostname hostname




######################## Hadoop cluster deploy
1. tar -xzf hadoop-2.7.1.tar.gz
2. add profile
Shell> cat << EOF >/etc/profile.d/hadoop.sh
#!/bin/sh
export JAVA_HOME=/root/BIGDATA/jdk1.8.0_65
export HADOOP_PREFIX=/root/BIGDATA/hadoop-2.7.1


export HADOOP_HOME=\$HADOOP_PREFIX
export HADOOP_CONF_DIR=\$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=\$HADOOP_HOME/etc/hadoop


export JAVA_LIBRARY_PATH=\$HADOOP_HOME/lib/native:\$JAVA_LIBRARY_PATH
export LD_LIBRARY_PATH=\$HADOOP_HOME/lib/native:\$LD_LIBRARY_PATH
export CLASSPATH=.:\$JAVA_HOME/lib/dt.jar:\$JAVA_HOME/lib/tools.jar:
export PATH=\$JAVA_HOME/bin:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin:\${PATH}
EOF


Shell> source /etc/profile
 
3. create hdfs dirs on all hosts


HADOOP_LOCAL_BASE_DIR=/opt/local/hdfs

mkdir -p ${HADOOP_LOCAL_BASE_DIR}
mkdir -p ${HADOOP_LOCAL_BASE_DIR}/dfs/data
mkdir -p ${HADOOP_LOCAL_BASE_DIR}/dfs/name
mkdir -p ${HADOOP_LOCAL_BASE_DIR}/dfs/snn
mkdir -p ${HADOOP_LOCAL_BASE_DIR}/tmp
mkdir -p ${HADOOP_LOCAL_BASE_DIR}/yarn/logs


4. config etc/hadoop/
1. add all slaves to slaves
bigdata1
bigdata3


2.
HADOOP_DFS_MASTER=masternode
HADOOP_DFS_SECONDARY_NAMENODE=masternode
YARN_RESOURCE_MANAGER=masternode
JOBHISTORY_SERVER=masternode
JOBTRACKRT_HOST=masternode
HADOOP_TOOL_INSTALL_DIR=/root/BIGDATA/DOCS/hadoop_doc/hadoop_demo
#core-site.xml
conf_file=core-site.xml
cp -raf ${HADOOP_TOOL_INSTALL_DIR}/${conf_file}  ${HADOOP_PREFIX}/etc/hadoop/
sed -i "s^\${HADOOP_LOCAL_BASE_DIR}^${HADOOP_LOCAL_BASE_DIR}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"
sed -i "s^\${HADOOP_DFS_MASTER}^${HADOOP_DFS_MASTER}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"
#hdfs-site.xml
conf_file=hdfs-site.xml
cp -raf ${HADOOP_TOOL_INSTALL_DIR}/${conf_file}  ${HADOOP_PREFIX}/etc/hadoop/
sed -i "s^\${HADOOP_LOCAL_BASE_DIR}^${HADOOP_LOCAL_BASE_DIR}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"
sed -i "s^\${HADOOP_DFS_SECONDARY_NAMENODE}^${HADOOP_DFS_SECONDARY_NAMENODE}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"
sed -i "s^\${HADOOP_DFS_MASTER}^${HADOOP_DFS_MASTER}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"
#mapreducesite.xml
conf_file=mapred-site.xml
cp -raf ${HADOOP_TOOL_INSTALL_DIR}/${conf_file}  ${HADOOP_PREFIX}/etc/hadoop/
sed -i "s^\${JOBTRACKRT_HOST}^${JOBTRACKRT_HOST}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"
sed -i "s^\${JOBHISTORY_SERVER}^${JOBHISTORY_SERVER}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"
#yarn-site.xml
conf_file=yarn-site.xml
cp -raf ${HADOOP_TOOL_INSTALL_DIR}/${conf_file}  ${HADOOP_PREFIX}/etc/hadoop/
sed -i "s^\${YARN_RESOURCE_MANAGER}^${YARN_RESOURCE_MANAGER}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"
      sed -i "s^\${HADOOP_LOCAL_BASE_DIR}^${HADOOP_LOCAL_BASE_DIR}^g" "${HADOOP_PREFIX}/etc/hadoop/${conf_file}"






5. init namenode
Shell>hdfs namenode -format cluster1
6. start all
Shell>$HADOOP_HOME/sbin/start-all.sh
Shell> $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh  start historyserver



===Hadoop check
1. After deploy hadoop.
   Shell>hadoop checknative -a 
   Shell>hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 4 100
   
   Shell> cat <<EOF >/tmp/file1
Hello World Bye World
EOF
   Shell> cat <<EOF >/tmp/file2
Hello Hadoop Goodbye Hadoop
EOF

   Shell> hadoop fs -mkdir /tmp 
   Shell> hadoop fs -copyFromLocal -f /tmp/file1  /tmp/file2  /tmp
   Shell> hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount  /tmp/file1  /tmp/file2  /tmp/wordcount
   Shell> hadoop fs -cat /tmp/wordcount/part-r-00000




===hadoop Daemon Web Interface
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
#MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.




######################## Spark cluster deploy
1. tar -xzf spark-1.6.1-bin-hadoop2.6.tgz
2. add profile
cat << EOF >>/etc/profile.d/hadoop.sh
export SPARK_HOME=/root/BIGDATA/spark-1.6.1-bin-hadoop2.6
export PATH=\${SPARK_HOME}/sbin:\${PATH}:\${SPARK_HOME}/bin:


EOF


Shell>source /etc/profile

3. create local dir
SPARK_LOCAL_BASE_DIR=/opt/local/spark

Shell>mkdir -p ${SPARK_LOCAL_BASE_DIR}/tmp

Shell>hadoop fs -mkdir /sparkHistoryLogs /sparkEventLogs
4. config
1. add all slaves to slaves
       Shell>mv slaves.template slaves
bigdata1
bigdata3

2.
SPARK_MASTER=masternode
HADOOP_DFS_MASTER=masternode

Shell> cat << EOF > ${SPARK_HOME}/conf/spark-defaults.conf
spark.master   spark://${SPARK_MASTER}:7077
spark.local.dir   ${SPARK_LOCAL_BASE_DIR}/tmp
spark.master.rest.port   7177
#Spark UI
spark.eventLog.enabled   true
spark.eventLog.dir   hdfs://${HADOOP_DFS_MASTER}:9000/sparkEventLogs
spark.ui.killEnabled   true
spark.ui.port   4040
spark.history.ui.port   18080
spark.history.fs.logDirectory   hdfs://${HADOOP_DFS_MASTER}:9000/sparkHistoryLogs


#
spark.shuffle.service.enabled   false


#
spark.yarn.am.extraJavaOptions   -Xmx3g
spark.executor.extrajavaoptions   -Xmx3g


#Amount of memory to use for the YARN Application Master in client mode
spark.yarn.am.memory   2048m
#The amount of off-heap memory (in megabytes) to be allocated per executor. 
spark.yarn.driver.memoryOverhead   512
#The amount of off-heap memory (in megabytes) to be allocated per driver in cluster mode
spark.yarn.executor.memoryOverhead   512
#Same as spark.yarn.driver.memoryOverhead, but for the YARN Application Master in client mode, fix yarn-client OOM, "ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM"
spark.yarn.am.memoryOverhead   1024  
  
EOF


Shell> cat << EOF > ${SPARK_HOME}/conf/spark-env.sh
SPARK_WORKER_WEBUI_PORT=8081
SPARK_WORKER_DIR=\${SPARK_HOME}/work
#SPARK_LOCAL_DIRS=\${SPARK_WORKER_DIR}
EOF


5. start all
Shell> ${SPARK_HOME}/sbin/start-all.sh
check cluster status
http://${SPARK_MASTER}:8080


===Spark Daemon Web Interface
spark.history.ui.port 18080
spark master 8080


http://${SPARK_MASTER}:port/




===Spark check


1. Spark Standalone (client, cluster(spark.master.rest.port))
  # Run application locally on 1 cores
  Shell>  ${SPARK_HOME}/bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://masternode:7077 \
  --deploy-mode  client \
   ${SPARK_HOME}/lib/spark-examples*.jar \
  10


  # Run on a Spark standalone cluster
  Shell>  ${SPARK_HOME}/bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://$SPARK_MASTER:7177 \
  --deploy-mode  cluster \
  --executor-memory 1G \
  --total-executor-cores 1 \
   ${SPARK_HOME}/lib/spark-examples*.jar \
  10
  
   #spark shell
   Shell> ${SPARK_HOME}/bin/spark-shell --master spark://$SPARK_MASTER:7077
   


2. Spark on Yarn (It needn't start spark cluster, only need start hadoop)
#run yarn-client
Shell> ${SPARK_HOME}/bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-client \
    --driver-java-options '-Xmx3096m'  \
    --conf spark.executor.extrajavaoptions=-Xmx3096m  \
    --executor-memory 3096m  \
    --num-executors  1  \
    --conf spark.yarn.am.memoryOverhead=1024  \
    ${SPARK_HOME}/lib/spark-examples*.jar \
    10
    

#run yarn-cluster
Shell> ${SPARK_HOME}/bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode  cluster \
    --driver-memory 2g \
    --executor-memory 2g \
    ${SPARK_HOME}/lib/spark-examples*.jar \
    10




######################## Hbase cluster deploy
1. Shell> tar -xzf hbase-1.1.4-bin.tar.gz
2. add profile
cat << EOF >>/etc/profile.d/hadoop.sh
export HBASE_HOME=/root/BIGDATA/hbase-1.1.4
export PATH=\${PATH}:\${HBASE_HOME}/bin:


EOF


Shell>source /etc/profile

3. create local dir
HBASE_ROOTDIR=/hbase
HBASE_TMP_DIR=/opt/local/hbase

Shell> hadoop fs -mkdir ${HBASE_ROOTDIR}
Shell> mkdir -p ${HBASE_TMP_DIR}
4. config
1. add all hosts to regionservers
bigdata1
bigdata2

2. modify hbase-site.xml
cat <<EOF >${HBASE_HOME}/conf/hbase-site.xml
<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://masternode:9000/hbase </value>
    <description>The directory shared by RegionServers.
    Default: \${hbase.tmp.dir}/hbase
    </description>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>masternode,slavesnode</value>
    <description>The directory shared by RegionServers.
    </description>
  </property>
  <property>
    <name>hbase.tmp.dir</name>
    <value>/opt/local/hbase</value>
    <description>Temporary directory on the local filesystem
    Default: \${java.io.tmpdir}/hbase-${user.name}.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
  </property>
  <!--
  <property>
    <name>hbase.fs.tmp.dir</name>
    <value></value>
    <description>A staging directory in default file system (HDFS) for keeping temporary data
    Default: /user/\${user.name}/hbase-staging
    </description>
  </property>
  <property>
    <name>hbase.local.dir</name>
    <value></value>
    <description>Directory on the local filesystem to be used as a local storage.
    Default: ${hbase.tmp.dir}/local/
    </description>
  </property>
  <property>
    <name>hbase.master.port</name>
    <value>16000</value>
    <description>The port the HBase Master should bind to.
    Default: 16000
    </description>
  </property>
  <property>
    <name>hbase.master.info.port</name>
    <value>16010</value>
    <description>The port for the HBase Master web UI. Set to -1 if you do not want a UI instance run.
    Default: 16010
    </description>
  </property>
  <property>
    <name>hbase.regionserver.port</name>
    <value>16020</value>
    <description>The port the HBase RegionServer binds to.
    Default: 16020
    </description>
  </property>
  <property>
    <name>hbase.regionserver.info.port</name>
    <value>16030</value>
    <description>The port for the HBase RegionServer web UI Set to -1 if you do not want the RegionServer UI to run.
    Default: 16030
    </description>
  </property>
  <property>
    <name>hbase.zookeeper.peerport</name>
    <value>2888</value>
    <description>Port used by ZooKeeper peers to talk to each other
    Default: 2888
    </description>
  </property>
  <property>
    <name>hbase.zookeeper.leaderport</name>
    <value>3888</value>
    <description>Port used by ZooKeeper for leader election
    Default: 3888
    </description>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value></value>
    <description>Property from ZooKeeper’s config zoo.cfg. The directory where the snapshot is stored.
    Default: ${hbase.tmp.dir}/zookeeper
    </description>
  </property>
  <property>
    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
    <description>Property from ZooKeeper’s config zoo.cfg. The port at which the clients will connect.
    Default: 2181
    </description>
  </property>
  -->
  
</configuration>
EOF
3. ln -s $HADOOP_HOME/etc/hadoop/hdfs-site.xml  ${HBASE_HOME}/conf/hdfs-site.xml 
4. ulimit 咿nproc
cat <<EOF > /etc/security/limits.conf
 root -       nofile  32768
 root soft/hard nproc 32000
EOF

5. start all
Shell> ${HBASE_HOME}/bin/start-hbase.sh


===Hbase Daemon Web Interface
hbase.master.info.port  16010
hbase.regionserver.info.port  16030


http://${HBASE_MASTER}:port/


===Hbase check


1. run hbase shell
Shell> ${HBASE_HOME}/bin/hbase shell


hbase(main):003:0> create 'test', 'cf'
0 row(s) in 1.2200 seconds
hbase(main):003:0> list 'table'
test
1 row(s) in 0.0550 seconds
hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0560 seconds
hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0370 seconds
hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0450 seconds


hbase(main):007:0> scan 'test'
ROW        COLUMN+CELL
row1       column=cf:a, timestamp=1288380727188, value=value1
row2       column=cf:b, timestamp=1288380738440, value=value2
row3       column=cf:c, timestamp=1288380747365, value=value3
3 row(s) in 0.0590 seconds


hbase(main):008:0> get 'test', 'row1'
COLUMN      CELL
cf:a        timestamp=1288380727188, value=value1
1 row(s) in 0.0400 seconds


hbase(main):012:0> disable 'test'
0 row(s) in 1.0930 seconds
hbase(main):013:0> drop 'test'
0 row(s) in 0.0770 seconds 


hbase(main):014:0> exit




######################## Hive cluster deploy
1. tar -xzf apache-hive-2.0.0-bin.tar.gz
2. add profile
cat << EOF >>/etc/profile.d/hadoop.sh
export HIVE_HOME=/root/BIGDATA/apache-hive-2.0.0-bin
export HIVE_CONF_DIR=\${HIVE_HOME}/conf
export PATH=\${HIVE_HOME}/bin:\${PATH}


EOF


Shell>source /etc/profile

3. create local dir
   $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
   $HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
   $HADOOP_HOME/bin/hadoop fs -chmod g+w  /tmp
   $HADOOP_HOME/bin/hadoop fs -chmod g+w  /user/hive/warehouse

   Shell> mkdir -p  ${HBASE_TMP_DIR}

4. config 
=M1. [ Local Embedded Derby ]
HIVE_LOCAL_WAREHOUSE=/opt/hive/warehouse
Shell> mkdir -p  ${HIVE_LOCAL_WAREHOUSE}

Shell>cat <<EOF > ${HIVE_CONF_DIR}/hive-site.xml
<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>


<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.EmbeddedDriver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>


<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>APP</value>
  <description>username to use against metastore database</description>
</property>


<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mine</value>
  <description>password to use against metastore database</description>
</property>


<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>${HIVE_LOCAL_WAREHOUSE}</value>
  <description>unit test data goes in here on your local filesystem</description>
</property>


</configuration>
EOF


Shell> $HIVE_HOME/bin/schematool -initSchema -dbType derby
Shell> $HIVE_HOME/bin/schematool -dbType derby-info
Shell> $HIVE_HOME/bin/hive





=M2. [Remote Metastore Server Derby]
Shell> tar -xzf db-derby-10.12.1.1-bin.tar.gz
Shell> cd db-derby-10.12.1.1-bin
Shell> mkdir data
Shell> cd data
Shell> ../bin/startNetworkServer  -h 172.31.200.110 -p 1527  &
Shell> cp -raf  ../lib/derbyclient.jar   ../lib/derbytools.jar  $HIVE_HOME/lib/

DERBY_SERVER_HOST=masternode


Shell>cat <<EOF > ${HIVE_CONF_DIR}/hive-site.xml
<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:derby://${DERBY_SERVER_HOST}:1527/hive_meta;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>


<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.ClientDriver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>


<property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
    <description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
</property>


<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>app</value>
  <description>username to use against metastore database</description>
</property>


<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>app</value>
  <description>password to use against metastore database</description>
</property>


<property>
  <name>hive.metastore.warehouse.dir</name>
  <!-- base hdfs path -->
  <value>/user/hive/warehouse</value>
  <description>base hdfs path :location of default database for the warehouse</description>
</property>
  
<!-- hive client -->
 <!-- thrift://<host_name>:<port> -->
 <property>
      <name>hive.metastore.uris</name>
      <value>thrift://masternode:9083</value>
 </property>
</configuration>
EOF


#start metastore service
$HIVE_HOME/bin/hive --service metastore &


#star thiveserver service
$HIVE_HOME/bin/hiveserver2 &


5. start
$HIVE_HOME/bin/hive
hive> CREATE TABLE pokes (foo INT, bar STRING);
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
hive> SHOW TABLES;
hive> SHOW TABLES '.*s';
hive> DESCRIBE invites;

hive> LOAD DATA LOCAL INPATH '/root/BIGDATA/apache-hive-2.0.0-bin/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

======#
#Remote Metastore Server
   $HIVE_HOME/bin/hive --service metastore -p 9083
#Running HiveServer2 and Beeline
   $HIVE_HOME/bin/hiveserver2
   $HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000
#Running HCatalog
   $HIVE_HOME/hcatalog/sbin/hcat_server.sh
   $HIVE_HOME/hcatalog/bin/hcat
#Running WebHCat
   $HIVE_HOME/hcatalog/sbin/webhcat_server.sh






####### pig
2. add profile
cat << EOF >>/etc/profile.d/hadoop.sh
export PIG_HOME=/BIGDATA/pig-0.15.0
export PATH=\${PIG_HOME}/bin:\${PATH}


EOF


Shell>source /etc/profile

相关实践学习
基于MaxCompute的热门话题分析
本实验围绕社交用户发布的文章做了详尽的分析,通过分析能得到用户群体年龄分布,性别分布,地理位置分布,以及热门话题的热度。
SaaS 模式云数据仓库必修课
本课程由阿里云开发者社区和阿里云大数据团队共同出品,是SaaS模式云原生数据仓库领导者MaxCompute核心课程。本课程由阿里云资深产品和技术专家们从概念到方法,从场景到实践,体系化的将阿里巴巴飞天大数据平台10多年的经过验证的方法与实践深入浅出的讲给开发者们。帮助大数据开发者快速了解并掌握SaaS模式的云原生的数据仓库,助力开发者学习了解先进的技术栈,并能在实际业务中敏捷的进行大数据分析,赋能企业业务。 通过本课程可以了解SaaS模式云原生数据仓库领导者MaxCompute核心功能及典型适用场景,可应用MaxCompute实现数仓搭建,快速进行大数据分析。适合大数据工程师、大数据分析师 大量数据需要处理、存储和管理,需要搭建数据仓库?学它! 没有足够人员和经验来运维大数据平台,不想自建IDC买机器,需要免运维的大数据平台?会SQL就等于会大数据?学它! 想知道大数据用得对不对,想用更少的钱得到持续演进的数仓能力?获得极致弹性的计算资源和更好的性能,以及持续保护数据安全的生产环境?学它! 想要获得灵活的分析能力,快速洞察数据规律特征?想要兼得数据湖的灵活性与数据仓库的成长性?学它! 出品人:阿里云大数据产品及研发团队专家 产品 MaxCompute 官网 https://www.aliyun.com/product/odps&nbsp;
相关文章
|
2月前
|
分布式计算 资源调度 大数据
大数据-110 Flink 安装部署 下载解压配置 Standalone模式启动 打包依赖(一)
大数据-110 Flink 安装部署 下载解压配置 Standalone模式启动 打包依赖(一)
57 0
|
2月前
|
分布式计算 资源调度 大数据
大数据-110 Flink 安装部署 下载解压配置 Standalone模式启动 打包依赖(二)
大数据-110 Flink 安装部署 下载解压配置 Standalone模式启动 打包依赖(二)
70 0
|
2月前
|
消息中间件 监控 Ubuntu
大数据-54 Kafka 安装配置 环境变量配置 启动服务 Ubuntu配置 ZooKeeper
大数据-54 Kafka 安装配置 环境变量配置 启动服务 Ubuntu配置 ZooKeeper
82 3
大数据-54 Kafka 安装配置 环境变量配置 启动服务 Ubuntu配置 ZooKeeper
|
2月前
|
存储 分布式计算 druid
大数据-152 Apache Druid 集群模式 配置启动【下篇】 超详细!(一)
大数据-152 Apache Druid 集群模式 配置启动【下篇】 超详细!(一)
40 1
大数据-152 Apache Druid 集群模式 配置启动【下篇】 超详细!(一)
|
2月前
|
存储 机器学习/深度学习 大数据
量子计算与大数据:处理海量信息的新方法
【10月更文挑战第31天】量子计算凭借其独特的量子比特和量子门技术,为大数据处理带来了革命性的变革。相比传统计算机,量子计算在计算效率、存储容量及并行处理能力上具有显著优势,能有效应对信息爆炸带来的挑战。本文探讨了量子计算如何通过量子叠加和纠缠等原理,加速数据处理过程,提升计算效率,特别是在金融、医疗和物流等领域中的具体应用案例,同时也指出了量子计算目前面临的挑战及其未来的发展方向。
|
2月前
|
分布式计算 Hadoop 大数据
大数据体系知识学习(一):PySpark和Hadoop环境的搭建与测试
这篇文章是关于大数据体系知识学习的,主要介绍了Apache Spark的基本概念、特点、组件,以及如何安装配置Java、PySpark和Hadoop环境。文章还提供了详细的安装步骤和测试代码,帮助读者搭建和测试大数据环境。
61 1
|
2月前
|
分布式计算 Hadoop Shell
Hadoop-35 HBase 集群配置和启动 3节点云服务器 集群效果测试 Shell测试
Hadoop-35 HBase 集群配置和启动 3节点云服务器 集群效果测试 Shell测试
72 4
|
2月前
|
运维 监控 数据可视化
大数据-171 Elasticsearch ES-Head 与 Kibana 配置 使用 测试
大数据-171 Elasticsearch ES-Head 与 Kibana 配置 使用 测试
67 1
|
2月前
|
消息中间件 分布式计算 druid
大数据-152 Apache Druid 集群模式 配置启动【下篇】 超详细!(二)
大数据-152 Apache Druid 集群模式 配置启动【下篇】 超详细!(二)
42 2
|
2月前
|
消息中间件 分布式计算 Java
大数据-73 Kafka 高级特性 稳定性-事务 相关配置 事务操作Java 幂等性 仅一次发送
大数据-73 Kafka 高级特性 稳定性-事务 相关配置 事务操作Java 幂等性 仅一次发送
31 2
下一篇
无影云桌面