HBSAE实践
(要求先配置好hadoop环境,版本hadoop2皆可,先启动zookeeper)
1、安装(hbase-0.98.6-hadoop2)
(1)安装zookeeper(zookeeper-3.4.5)
首先,将zoo_sample.cfg改名为zoo.cfg(要求所有机器保持一致)
server.0=master:8880:7770 server.1=slave1:8881:7771 server.2=slave2:8882:7772
然后,zookeeper根目录创建myid(每个机器独立分配,不能重复)
启动,]# ./bin/zkServer.sh start
检查:]# ./bin/zkServer.sh status
(2)安装hbase
首先:hbase-env.sh设置环境变量
export JAVA_HOME=/usr/local/src/jdk1.8.0_172 export HBASE_MANAGES_ZK=false #用第三方 修改hbase-site.xml <property> <name>hbase.rootdir</name> <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave1,slave2</value> </property> <property> <name>hbase.master.maxclockskew</name> <value>150000</value> </property>
接下来,创建regionservers文件,里面存储regionserver的master地址
1 slave1
2 slave2
将这些配置分发拷贝到其他节点(scp)
启动 ./bin/start-hbase.sh
(1)在主节点看到HMaster进程,在从节点看到HRegionServer进程
(2)]# hbase shell,进入终端执行status,查看状态
(3)web UI:192.168.179.10:60010
2、hbase shell的基础操作
list查看有哪些表格
删除表格:
> disable "m_table" > drop "m_table"
创建表格:
> create 'm_table', 'meta_data', 'action'
查看数据(全表扫描)——不建议直接用
> scan "m_table"
增加cf:
> alter "m_table", {NAME=>'cf_new'}
删除cf:
> alter "m_table", {NAME=>'cf_new', METHOD=>'delete'}
查看表格有多少条记录:
> count "m_table"
删掉一条记录
delete "m_table", "user|4001", "meta_data:name"
写数据:
put "m_table", '1002', 'meta_data:name', 'li4' put "m_table", '1001', 'meta_data:age', '18' put "m_table", '1002', 'meta_data:gender', 'man'
查看数据
逐条读:get "m_table", '1002'
get "m_table", '1002', 'meta_data:name'
批量读:scan "m_table"
过滤
(1)找zhang3,值
--通过明确的value,反查记录
scan "m_table", FILTER=>"ValueFilter(=, 'binary:zhang3')" scan "m_table", FILTER=>"ValueFilter(=, 'binary:wang5')"
(2)找包含‘a’的value
--通过value漫匹配,反查记录
scan "m_table", FILTER=>"ValueFilter(=, 'substring:a')"scan "m_table", FILTER=>"ValueFilter(=, 'substring:a')"
(3)列名匹配
两个条件同时限制,对列明的前缀做校验
scan "m_table", FILTER=>"ColumnPrefixFilter('na') AND ValueFilter(=, 'substring:zhang3')" scan "m_table", FILTER=>"ColumnPrefixFilter('na') put "m_table", '3001', 'meta_data:name', '777'
(4)rowkey匹配---查询rowkey prefix的方式:
以10开头:
> scan "m_table", FILTER=>"PrefixFilter('10')" 指定rowkey的范围,rowkey之后的数据 > scan "m_table", {STARTROW=>'1002'} > scan "m_table", {STARTROW=>'1002', FILTER=>"PrefixFilter('10')"}
修改版本号:
> alter "m_table", {NAME=>'meta_data', VERSIONS => 3} put "m_table", '1001', 'meta_data:name', 'wang5' put "m_table", '1001', 'meta_data:name', 'zhao6' put "m_table", '1001', 'meta_data:name', 'heng7' get "m_table", '1001'
指定版本号读取:
get "m_table", '1001', {COLUMN=>"meta_data:name", VERSIONS => 1} get "m_table", '1001', {COLUMN=>"meta_data:name", VERSIONS => 2} > get "m_table", '1001', {COLUMN=>"meta_data:name", VERSIONS => 3} get "m_table", '1001', {COLUMN=>"meta_data:name", TIMESTAMP=>1573349851782} get "m_table", '1001', {COLUMN=>"meta_data:name", TIMESTAMP=>1573349547463}
正则过滤
1.行正则
import org.apache.hadoop.hbase.filter.RegexStringComparator import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.filter.RowFilter scan 'm_table', {FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),RegexStringComparator.new('^10'))}
2.值正则
import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator scan "m_table", {FILTER=>RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), RegexStringComparator.new('^user\|\d+$'))}
值正则:
scan 'm_table', {COLUMNS => 'meta_data:name', FILTER => SingleColumnValueFilter.new(Bytes.toBytes('meta_data'),Bytes.toBytes('name'),CompareFilter::CompareOp.valueOf('EQUAL'),Bytes.toBytes('zhang3'))}
清空词表
> truncate "m_table"
查看行数
> count 'm_table'