- 浏览: 128583 次
- 性别:
- 来自: 北京
文章分类
hbase compaction
- 博客分类:
- hbase
http://hbase.apache.org/book/regions.arch.html#compaction
http://hbase.apache.org/book.html
摘:
9.7.5.5.1. Compaction File Selection
To understand the core algorithm for StoreFile selection, there is some ASCII-art in the Store source code that will serve as useful reference. It has been copied below:
/* normal skew: * * older ----> newer * _ * | | _ * | | | | _ * --|-|- |-|- |-|---_-------_------- minCompactSize * | | | | | | | | _ | | * | | | | | | | | | | | | * | | | | | | | | | | | | */
Important knobs:
-
hbase.store.compaction.ratio
Ratio used in compaction file selection algorithm (default 1.2f). -
hbase.hstore.compaction.min
(.90 hbase.hstore.compactionThreshold) (files) Minimum number of StoreFiles per Store to be selected for a compaction to occur (default 2). -
hbase.hstore.compaction.max
(files) Maximum number of StoreFiles to compact per minor compaction (default 10). -
hbase.hstore.compaction.min.size
(bytes) Any StoreFile smaller than this setting with automatically be a candidate for compaction. Defaults tohbase.hregion.memstore.flush.size
(128 mb). -
hbase.hstore.compaction.max.size
(.92) (bytes) Any StoreFile larger than this setting with automatically be excluded from compaction (default Long.MAX_VALUE).
The minor compaction StoreFile selection logic is size based, and selects a file for compaction when the file <= sum(smaller_files) *hbase.hstore.compaction.ratio
.
This example mirrors an example from the unit test TestCompactSelection
.
-
hbase.store.compaction.ratio
= 1.0f -
hbase.hstore.compaction.min
= 3 (files) -
hbase.hstore.compaction.max
= 5 (files) -
hbase.hstore.compaction.min.size
= 10 (bytes) -
hbase.hstore.compaction.max.size
= 1000 (bytes)
The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
Why?
- 100 --> No, because sum(50, 23, 12, 12) * 1.0 = 97.
- 50 --> No, because sum(23, 12, 12) * 1.0 = 47.
- 23 --> Yes, because sum(12, 12) * 1.0 = 24.
- 12 --> Yes, because the previous file has been included, and because this does not exceed the the max-file limit of 5
- 12 --> Yes, because the previous file had been included, and because this does not exceed the the max-file limit of 5.
This example mirrors an example from the unit test TestCompactSelection
.
-
hbase.store.compaction.ratio
= 1.0f -
hbase.hstore.compaction.min
= 3 (files) -
hbase.hstore.compaction.max
= 5 (files) -
hbase.hstore.compaction.min.size
= 10 (bytes) -
hbase.hstore.compaction.max.size
= 1000 (bytes)
The following StoreFiles exist: 100, 25, 12, and 12 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 23, 12, and 12.
Why?
- 100 --> No, because sum(25, 12, 12) * 1.0 = 47
- 25 --> No, because sum(12, 12) * 1.0 = 24
- 12 --> No. Candidate because sum(12) * 1.0 = 12, there are only 2 files to compact and that is less than the threshold of 3
- 12 --> No. Candidate because the previous StoreFile was, but there are not enough files to compact
This example mirrors an example from the unit test TestCompactSelection
.
-
hbase.store.compaction.ratio
= 1.0f -
hbase.hstore.compaction.min
= 3 (files) -
hbase.hstore.compaction.max
= 5 (files) -
hbase.hstore.compaction.min.size
= 10 (bytes) -
hbase.hstore.compaction.max.size
= 1000 (bytes)
The following StoreFiles exist: 7, 6, 5, 4, 3, 2, and 1 bytes apiece (oldest to newest). With the above parameters, the files that would be selected for minor compaction are 7, 6, 5, 4, 3.
Why?
- 7 --> Yes, because sum(6, 5, 4, 3, 2, 1) * 1.0 = 21. Also, 7 is less than the min-size
- 6 --> Yes, because sum(5, 4, 3, 2, 1) * 1.0 = 15. Also, 6 is less than the min-size.
- 5 --> Yes, because sum(4, 3, 2, 1) * 1.0 = 10. Also, 5 is less than the min-size.
- 4 --> Yes, because sum(3, 2, 1) * 1.0 = 6. Also, 4 is less than the min-size.
- 3 --> Yes, because sum(2, 1) * 1.0 = 3. Also, 3 is less than the min-size.
- 2 --> No. Candidate because previous file was selected and 2 is less than the min-size, but the max-number of files to compact has been reached.
- 1 --> No. Candidate because previous file was selected and 1 is less than the min-size, but max-number of files to compact has been reached.
hbase.store.compaction.ratio
. A large ratio (e.g., 10) will produce a single giant file. Conversely, a value of .25 will produce behavior similar to the BigTable compaction algorithm - resulting in 4 StoreFiles.
hbase.hstore.compaction.min.size
. Because this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to be adjusted downwards in write-heavy environments where many 1 or 2 mb StoreFiles are being flushed, because every file will be targeted for compaction and the resulting files may still be under the min-size and require further compaction, etc.
发表评论
-
hbase 调优
2012-07-26 13:47 605http://hbase.info/2011/06/23/hb ... -
hbase 表导入导出
2012-07-26 13:44 1664./hbase org.apache.hadoop.hbase ... -
MAPR HBase and lzo installation
2012-07-05 16:28 730http://www.mapr.com/doc/display ... -
hbase schema design
2012-07-04 15:22 727http://www.slideshare.net/cloud ... -
HBase技术介绍
2012-06-25 14:38 624HBase简介 HBase – Hadoop Data ... -
hadoop 相关博客推荐
2012-06-04 10:26 815http://www.cnblogs.com/xuqiang/ ... -
hbase region, store, storefile和列簇,的关系
2012-05-29 11:02 5559The HRegionServer opens th ... -
HBase官方文档
2012-05-24 14:09 2113http://www.yankay.com/wp-conten ... -
RowPaginationFilter
2012-05-24 13:35 855http://code.google.com/p/hbase- ... -
HBase技术介绍
2012-04-13 10:16 669http://www.searchtb.com/2011/01 ... -
hbase zookeeper 问题备忘
2012-03-31 15:02 902Apache Zookeeper入门1 http:// ... -
hbase基本概念和hbase shell常用命令用法
2012-03-30 09:55 8021. 简介 HBase是一个分布式的、面向列的开源数据 ... -
HBase中的Client如何路由到正确的RegionServer
2012-03-29 17:55 760在HBase中,大部分的操作都是在RegionServe ...
相关推荐
将HBase作为研究对象,分析其存储架构,针对HBase存储机制进行深入研究
藏经阁-HBase In-Memory Compaction.pdf
Hbase初探
然后在hbase shell中更改TSDB的't'列族的配置 disable 'tsdb' alter 'tsdb', {NAME => 't', CONFIGURATION => {'hbase.hstore.defaultengine.compactor.class' => ' com.twilio.compaction.TSDCompactor'}} ...
/hbase/archive (1) 进行snapshot或者升级的时候使用到的归档目录。compaction删除hfile的时 候,也会把旧的hfile归档到这里等。 /hbase/corrupt (2) splitlog的corrupt目录,以及corrupt hfile的目录。
HBase的特性与生态:自动分区、LSM Tree、存储...全新的HBase2.0版本新功能:小对象存储MOB、读写链路Off-heap 、Region Replica 、In Memory Compaction 、Assignment MangerV2 、其他;HBase未来规划。 主要章节:
hbase 常用参数含义,默认值,调优建议(必须参数,split,compaction,blockcache,memstore flush,hlog,zookeeper,其他,等相参数名称、含义、默认值、调优建议)
全面描述大数据列式存储HBase知识,涵盖概念、架构、工作原理、Hbase优化、读写流程、系统优化等方面。本思维导图内容全面,同时对Flush、compaction工作原理进行深度总结。是个非常不错的资源!
HBase In-Memory Compaction
Intro,Configure,Upgrade,Shell,Data Model,Schema Design,Hbase and MapReduce,Security,Architecture,In-memory Compaction,Backup and Restore, Synchronous Replication,Hbase APIs, 等等
随着大数据的越来越普及,HBase也变得...笔者总结至少有如下几个方面:HDFS相关配置优化,HBase服务器端优化(GC优化、Compaction优化、硬件配置优化),列族设计优化,客户端优化等,其中客户端优化在前面已经通过超
SparkStreaming应用与实战系列...关闭WALog后写入能到20万,但是发现还是不是特别稳定,有时耗时还是比较长的,发现此阶段正在做Compaction!!!查看streaming统计,发现耗时不稳定HBase界面统计信息HBase是一种Log-Struct