hbase优化实践

2017-05-09 1219

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： hbase优化一：垃圾回收优化：region服务器处理过大的负载，内存分配策略无法安全地只依赖JRE对程序的行为的各种假设，需要使用JRE提供的选项调整垃圾回收策略应对。

hbase优化

一：垃圾回收优化：

region服务器处理过大的负载，内存分配策略无法安全地只依赖JRE对程序的行为的各种假设，需要使用JRE提供的选项调整垃圾回收策略应对。

写入磁盘的数据客户端不连续，导致Java虚拟机堆内存出现空洞。

年轻代空间：128~512M之间老生代：好几G。

配置文件添加：

hbase-env.sh：

HBASEOPTS或者HBASEREGIONSERVER_OPT(推荐) 推荐配置：

exportHBASE_REGIONOBSERVER_OPTS="

-Xmx8g \

-Xms8g \

-Xmn128m\

-XX:+UseParNewGC\

-XX:+UseConcMarkSweepGC \

-XX:CMSInitiatingOccupancyFraction=70 \

-verbose:gc \

-XX:+PrintGCDetails\

-XX:+PrintGCTimeStamps \

-Xloggc:$HBASE_HOME/logs/gc-${HOSTNAME}-hbase.log"

参照：

http://blog.csdn.net/kthq/article/details/8618052

http://swcdxd.iteye.com/blog/1859858

二：hbase压缩

可用编码器：GZIP/LZO/Snappy

Snappy性能稍好，多使用Snappy

hbase启动检查压缩：

hbase.regionserver.codecs

snappy,lzo

启用压缩：

hbase> create 'test2', { NAME => 'cf2', COMPRESSION => 'SNAPPY' }

hbase> describe 'test'

DESCRIPTION ENABLED

'test', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE false

', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',

VERSIONS => '1', COMPRESSION => 'GZ', MIN_VERSIONS

=> '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'fa

lse', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

1 row(s) in 0.1070 seconds

或者：hbase> disable 'test'

hbase> alter 'test', {NAME => 'cf', COMPRESSION => 'GZ'}

hbase> enable 'test'

三：优化拆分与合并

3.1管理拆分

hbase可能出现‘拆分/合并风暴’

关闭自动管理拆分，启用手动

To disable automatic splitting, set hbase.hregion.max.filesize to a very large value,

such as 100 GB It is not recommended to set it to its absolute maximum value of Long.MAX_VALUE.

3.2 region热点问题

/rowkey的设计一：salting前缀设计/

byte prefix = (byte) (Long.hashCode(System.currentTimeMillis()) % 8);

byte[] rowkey1 = Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(System.currentTimeMillis()));

/rowkey的设计二：字段交换，提升权重/

value + System.currentTimeMillis();

/rowkey的设计三：随机化/

MessageDigest md = MessageDigest.getInstance("MD5");

byte[] rowkey3 = md.digest(Bytes.toBytes(System.currentTimeMillis()));

/rowkey的设计四：时间顺序/

long rowkey4 = Long.MAX_VALUE - System.currentTimeMillis();

还可以使用API中move（）region移动到另一个regionserver；或者UNassign移除受影响的表的region

3.3预拆分region

创建表指定需要的region数目

hbase>create 't1','f',SPLITS => ['10','20',30']

hbase>create 't14','f',SPLITS_FILE=>'splits.txt'

# create table with four regions based on random bytes keys

hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }

# create table with five regions based on hex keys

hbase>create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }

参考：http://hbase.apache.org/book.html#compression

四：负载均衡：

Use the shell to disable the balancer:

hbase(main):001:0> balance_switch false

true

0 row(s) in 0.3590 seconds

This turns the balancer OFF. To reenable, do:

hbase(main):001:0> balance_switch true

false

0 row(s) in 0.3590 seconds

五：合并region：

某些特出情况下，用户需要合并region（删除了大量数据）

$ bin/hbase org.apache.hadoop.hbase.util.Merge

（If you feel you have too many regions and want to consolidate them, Merge is the utility you need.

Merge must run be done when the cluster is down）

六：客户端api优化：

6.1禁止自动刷写

有大量的写入操作

When performing a lot of Puts, make sure that setAutoFlush is set to false on your Table instance.

Otherwise, the Puts will be sent one at a time to the RegionServer.

Puts added via table.add(Put) and table.add( Put) wind up in the same write buffer.

If autoFlush = false, these messages are not sent until the write-buffer is filled.

To explicitly flush the messages, call flushCommits.

Calling close on the Table instance will invoke flushCommits.

6.2使用扫描缓存

比如：hbase作为mapreduce输入源。

设置setCaching比默认值大多的值。

If HBase is used as an input source for a MapReduce job,

for example, make sure that the input Scan instance to the MapReduce job has setCaching set to something greater than the default (which is 1).

Using the default value means that the map-task will make call back to the region-server for every record processed.

Setting this value to 500, for example, will transfer 500 rows at a time to the client to be processed

6.3限定扫描范围

6.4关闭resultScanner

七：配置优化；

7.1减少zookeeper超时

zookeeper.session.timeout

默认三分钟

7.2增加regionserver处理线程

hbase.regionserver.handler.count

默认10

7.3增加region大小

管理较少的region可以集群运行更平稳

默认256M

7.4减少最大日志文件数目

对于写压力比较大的应用，降低值强迫服务器频繁将数据写到磁盘，刷写到磁盘的数据的日志就可以丢弃了。

hbase优化实践

hbase优化

一：垃圾回收优化：

二：hbase压缩

三：优化拆分与合并

3.1管理拆分

3.2 region热点问题

3.3预拆分region

四：负载均衡：

五：合并region：

六：客户端api优化：

6.1禁止自动刷写

6.2使用扫描缓存

6.3限定扫描范围

6.4关闭resultScanner

七：配置优化；

7.1减少zookeeper超时

7.2增加regionserver处理线程

7.3增加region大小

7.4减少最大日志文件数目

7.5启用数据压缩

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

hbase优化实践

hbase优化

一：垃圾回收优化 ：

二：hbase压缩

三：优化拆分与合并

3.1管理拆分

3.2 region热点问题

3.3预拆分region

四：负载均衡：

五：合并region：

六：客户端api优化：

6.1禁止自动刷写

6.2使用扫描缓存

6.3限定扫描范围

6.4关闭resultScanner

七：配置优化；

7.1减少zookeeper超时

7.2增加regionserver处理线程

7.3增加region大小

7.4减少最大日志文件数目

7.5启用数据压缩

热门文章

最新文章

相关课程

相关电子书

相关实验场景

一：垃圾回收优化：