HBase是一个分布式、面向列的开源数据库,是Apache Hadoop项目的子项目,适用于非结构化数据存储的数据库。在Hadoop家族中,很多产品为HBase提供服务:
Hadoop HDFS为HBase提供了高可靠性的底层存储支持;
Hadoop MapReduce为HBase提供了高性能的计算能力;
Zookeeper为HBase提供了稳定服务和failover机制;
Pig和Hive为HBase提供了高层语言支持,使得在HBase上进行数据统计处理变的非常简单;
Sqoop为HBase提供了方便的RDBMS数据导入功能,使得传统数据库数据向HBase中迁移变的非常方便。
1 安装
1.1 下载、解压
从http://hbase.apache.org/找最新的稳定版下载,本文使用的是hbase-0.98.6.1-hadoop2-bin.tar.gz。
解压缩,然后进入到那个要解压的目录:
$ tar xzvf hbase-0.98.6.1-hadoop2-bin.tar.gz $ cd hbase-0.98.6.1-hadoop2/
1.2 简单配置
这一步可以选择跳过。
此处需要配置的是$HBASE_HOME/conf/hbase-site.xml中的hbase.rootdir,即HBase保存数据的目录。如果不进行配置,默认hbase.rootdir指向/tmp/hbase-${user.name},因为系统重启时会清理/tmp目录,所以重启后会丢失数据。如果是在分布式模式部署中,需要提供的是HDFS上的目录位置。
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hbase.rootdir</name> <value>file:/home/lxh/hadoop/hbase</value> </property> </configuration>
2 启动HBase
直接使用start-hbase.sh脚本启动
$ ./bin/start-hbase.sh
启动正常时,在$HBASE_HOME/logs/hbase-lxh-master-ubuntu.log日志的中会提示下面内容:
2014-10-14 09:47:07,189 INFO [M:0;ubuntu:40435] master.HMaster: Master has completed initializatio
通过jps查询进程,会发现多了HMaster这个进程:
2694 HMaster
3 初探HBase
3.1 启动shell
进入HBase提供的shell中进行测试。
$ ./bin/hbase shell 2014-10-14 10:14:55,859 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.98.6.1-hadoop2, r96a1af660b33879f19a47e9113bf802ad59c7146, Sun Sep 14 21:27:25 PDT 2014 hbase(main):001:0>
3.2 查看帮助
通过键入help命令查看在HBase的shell中的命令。
hbase(main):001:0> help HBase Shell, version 0.98.6.1-hadoop2, r96a1af660b33879f19a47e9113bf802ad59c7146, Sun Sep 14 21:27:25 PDT 2014 Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command. Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group. COMMAND GROUPS: Group name: general Commands: status, table_help, version, whoami Group name: ddl Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, show_filters Group name: namespace Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables Group name: dml Commands: append, count, delete, deleteall, get, get_counter, incr, put, scan, truncate, truncate_preserve Group name: tools Commands: assign, balance_switch, balancer, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, close_region, compact, flush, hlog_roll, major_compact, merge_region, move, split, trace, unassign, zk_dump Group name: replication Commands: add_peer, disable_peer, enable_peer, list_peers, list_replicated_tables, remove_peer, set_peer_tableCFs, show_peer_tableCFs Group name: snapshots Commands: clone_snapshot, delete_snapshot, list_snapshots, rename_snapshot, restore_snapshot, snapshot Group name: security Commands: grant, revoke, user_permission Group name: visibility labels Commands: add_labels, clear_auths, get_auths, set_auths, set_visibility SHELL USAGE: Quote all names in HBase Shell such as table and column names. Commas delimit command parameters. Type <RETURN> after entering a command to run it. Dictionaries of configuration used in the creation and alteration of tables are Ruby Hashes. They look like this: {'key1' => 'value1', 'key2' => 'value2', ...} and are opened and closed with curley-braces. Key/values are delimited by the '=>' character combination. Usually keys are predefined constants such as NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type 'Object.constants' to see a (messy) list of all constants in the environment. If you are using binary keys or values and need to enter them in the shell, use double-quote'd hexadecimal representation. For example: hbase> get 't1', "key\x03\x3f\xcd" hbase> get 't1', "key\003\023\011" hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40" The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added. For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html
3.3 create创建表
首先创建一个名为test的表,这个表只有一个列族为cf。可以通过list命令列出所有的表来检查创建情况。
hbase(main):002:0> create 'test', 'cf' 0 row(s) in 0.4330 seconds => Hbase::Table - test hbase(main):003:0> list TABLE test 1 row(s) in 0.0590 seconds => ["test"]
3.4 put插入数据
test表已经创建成功,通过put 'table', 'row', 'col-pre:col-name', 'value'向其中插入数据。table至表名,row指每行的键key,col-pre是列族前缀,col-name是列名,列族前缀与列名之间通过冒号隔开,value是值value。
hbase(main):005:0> put 'test', 'row1', 'cf:a', 'value1' 0 row(s) in 0.1380 seconds hbase(main):006:0> put 'test', 'row2', 'cf:b', 'value2-b' 0 row(s) in 0.0130 seconds hbase(main):007:0> put 'test', 'row2', 'cf:c', 'value2-c' 0 row(s) in 0.0100 seconds hbase(main):008:0> put 'test', 'row3', 'cf', 'value3' 0 row(s) in 0.0110 seconds hbase(main):011:0> put 'test', 'row3', 'cf:e', 'value3-e' 0 row(s) in 0.0060 seconds
3.5 scan扫描全表
通过scan 'table'命令查询表test的数据:
hbase(main):012:0> scan 'test' ROW COLUMN+CELL row1 column=cf:a, timestamp=1413253976039, value=value1 row2 column=cf:b, timestamp=1413253980776, value=value2-b row2 column=cf:c, timestamp=1413253985691, value=value2-c row3 column=cf:, timestamp=1413253990953, value=value3 row3 column=cf:e, timestamp=1413254206302, value=value3-e 3 row(s) in 0.0430 seconds
3.6 get查询某一行
通过get 'table', 'row'命令查询某一行数据:
hbase(main):013:0> get 'test', 'row1' COLUMN CELL cf:a timestamp=1413253976039, value=value1 1 row(s) in 0.0150 seconds hbase(main):014:0> get 'test', 'row2' COLUMN CELL cf:b timestamp=1413253980776, value=value2-b cf:c timestamp=1413253985691, value=value2-c 2 row(s) in 0.0120 seconds hbase(main):015:0> get 'test', 'row3' COLUMN CELL cf: timestamp=1413253990953, value=value3 cf:e timestamp=1413254206302, value=value3-e 2 row(s) in 0.0050 seconds
3.7 disable使表无效
disable 'table'命令可以使表无效,表并没有删除,但是不能进行查询等操作。
hbase(main):017:0> disable 'test' 0 row(s) in 1.4850 seconds
如果此时再通过get 'table', 'row'查询,则会报错:
hbase(main):018:0> get 'test', 'row3' COLUMN CELL ERROR: test is disabled.
3.8 enable使表有效
对于无效的表,可以使用enable 'table'命令使其有效,此时可以进行一系列对表的操作:
hbase(main):020:0> enable 'test' 0 row(s) in 0.5540 seconds hbase(main):021:0> get 'test', 'row3' COLUMN CELL cf: timestamp=1413253990953, value=value3 cf:e timestamp=1413254206302, value=value3-e 2 row(s) in 0.0160 seconds
3.9 drop删除表
drop 'table'命令可以删除表,该表必须是无效的表,即通过disable 'table'命令操作的表。
hbase(main):030:0> drop 'test' 0 row(s) in 0.2300 seconds
3.10 关闭shell
与其他shell类似,退出shell的命令是exit
hbase(main):031:0> exit
4 停止HBase
直接使用脚本stop-hbase.sh停止。
$ ./bin/stop-hbase.sh stopping hbase....................