minor compaction时的scan操作分析 -

hongs_yang

浏览: 59542 次
性别:
来自: 西安

最近访客更多访客>>

jlbhdfsl

longlongkong

qq85609655

hsujamy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

minor compaction时的scan操作分析

博客分类：

hbase
大数据
hadoop

hbase 源代码分析 hbase minor compact scan

minor compaction时的scan操作分析

minor compaction时的scan主要是对store下的几个storefile文件进行合并,通常不做数据删除操作。

compaction的发起通过CompactSplitThread.requestCompactionInternal-->

CompactSplitThread.CompactionRunner.run-->region.compact-->HStore.compact

-->DefaultStoreEngine.DefaultCompactionContext.compact-->

DefaultCompactor.compact

生成compaction时的StoreScanner

1.通过要进行compact的几个storefile生成StoreFileScanner,,以下是生成实例时的方法层次调用

DefaultCompactor.compact方法中的部分代码,得到每一个storefile的StoreFileScanner实例

List<StoreFileScanner> scanners = createFileScanners(request.getFiles());

protectedList<StoreFileScanner> createFileScanners(

finalCollection<StoreFile> filesToCompact) throws IOException {

return StoreFileScanner.getScannersForStoreFiles(filesToCompact, false, false, true);

}

publicstaticList<StoreFileScanner> getScannersForStoreFiles(

Collection<StoreFile> files, booleancacheBlocks, booleanusePread,

booleanisCompaction) throws IOException {

returngetScannersForStoreFiles(files, cacheBlocks, usePread, isCompaction,

null);

}

在调用此方法时，ScanQueryMatcher传入为null

publicstaticList<StoreFileScanner> getScannersForStoreFiles(

Collection<StoreFile> files, booleancacheBlocks, booleanusePread,

booleanisCompaction, ScanQueryMatcher matcher) throws IOException {

List<StoreFileScanner> scanners = newArrayList<StoreFileScanner>(

files.size());

for (StoreFile file : files) {

迭代每一个storefile，生成storefile的reader实例，并根据reader生成storefilescanner

生成reader实例-->HFile.createReader-->HFileReaderV2-->StoreFile.Reader

StoreFile.Reader r = file.createReader();

每一个StoreFileScanner中包含一个HFileScanner

实例生成HFileReaderV2.getScanner-->

检查在table的此cf中配置有DATA_BLOCK_ENCODING属性，表示有指定ENCODING,

此配置的可选值，请参见DataBlockEncoding(如前缀树等)

如果encoding的配置不是NODE，HFileScanner的实例生成为HFileReaderV2.EncodedScannerV2

否则生成的实例为HFileReaderV2.ScannerV2-->

生成StoreFileScanner实例，此实例引用StoreFile.Reader与HFileScanner

以下代码中的isCompaction为true

StoreFileScanner scanner = r.getStoreFileScanner(cacheBlocks, usePread,

isCompaction);

此时的matcher为null

scanner.setScanQueryMatcher(matcher);

scanners.add(scanner);

}

returnscanners;

}

DefaultCompactor.compact方法中的部分代码,生成StoreScanner实例

得到一个ScanType为保留删除数据的ScanType,scanType=COMPACT_RETAIN_DELETES

ScanTypescanType =

request.isMajor() ? ScanType.COMPACT_DROP_DELETES

: ScanType.COMPACT_RETAIN_DELETES;

scanner = preCreateCoprocScanner(request, scanType, fd.earliestPutTs, scanners);

if (scanner == null) {

生成一个Scan实例，这个Scan为查询所有版本的Scan,maxVersion为cf设置的最大的maxVersion

生成StoreScanner实例

scanner = createScanner(store, scanners, scanType, smallestReadPoint, fd.earliestPutTs);

}

scanner = postCreateCoprocScanner(request, scanType, scanner);

if (scanner == null) {

// NULL scanner returned from coprocessor hooks means skip normal processing.

returnnewFiles;

}

生成StoreScanner的构造方法要做和处理流程：代码调用层级如下所示：

protectedInternalScannercreateScanner(Storestore, List<StoreFileScanner> scanners,

ScanTypescanType, longsmallestReadPoint, longearliestPutTs) throws IOException {

Scan scan = newScan();

scan.setMaxVersions(store.getFamily().getMaxVersions());

returnnewStoreScanner(store, store.getScanInfo(), scan, scanners,

scanType, smallestReadPoint, earliestPutTs);

}

publicStoreScanner(Storestore, ScanInfo scanInfo, Scan scan,

List<? extendsKeyValueScanner> scanners, ScanTypescanType,

longsmallestReadPoint, longearliestPutTs) throws IOException {

this(store, scanInfo, scan, scanners, scanType, smallestReadPoint, earliestPutTs, null, null);

}

privateStoreScanner(Storestore, ScanInfo scanInfo, Scan scan,

List<? extendsKeyValueScanner> scanners, ScanTypescanType, longsmallestReadPoint,

longearliestPutTs, byte[] dropDeletesFromRow, byte[] dropDeletesToRow)

throws IOException {

调用相关构造方法生成ttl的过期时间，最小版本等信息

检查hbase.storescanner.parallel.seek.enable配置是否为true,为true表示并行scanner

如果是并行scan时，拿到rs中的执行线程池

this(store, false, scan, null, scanInfo.getTtl(),

scanInfo.getMinVersions());

if (dropDeletesFromRow == null) {

此时通过这里生成ScanQueryMatcher实例

matcher = newScanQueryMatcher(scan, scanInfo, null, scanType,

smallestReadPoint, earliestPutTs, oldestUnexpiredTS);

} else {

matcher = newScanQueryMatcher(scan, scanInfo, null, smallestReadPoint,

earliestPutTs, oldestUnexpiredTS, dropDeletesFromRow, dropDeletesToRow);

}

过滤掉bloom filter不存在的storefilescanner，不在时间范围内的scanner与ttl过期的scanner

如果一个storefile中最大的更新时间超过了ttl的设置，那么此storefile已经没用，不用参与scan

// Filter the list of scanners using Bloom filters, time range, TTL, etc.

scanners = selectScannersFrom(scanners);

如果没有配置并行scanner,迭代把每一个scanner seek到指定的开始key处，由于是compaction的scan，默认不seek

// Seek all scanners to the initial key

if (!isParallelSeekEnabled) {

for (KeyValueScannerscanner : scanners) {

scanner.seek(matcher.getStartKey());

}

} else {

通过线程池，生成ParallelSeekHandler实例，并行去seek到指定的开始位置

parallelSeek(scanners, matcher.getStartKey());

}

生成一个具体的扫描的scanner,把所有要查找的storefilescanner添加进去，

每次的next都需要从不同的scanner里找到最小的一个kv。

KeyValueHeap中维护一个PriorityQueue的优先级队列，

在默认生成此实例时会生成根据如下来检查那一个storefilescanner在队列的前面

1.比较两个storefilescanner中最前面的一个kv，

a.如果rowkey部分不相同直接返回按大小的排序

b.如果rowkey部分相同，比较cf/column/type谁更大，

c.可参见KeyValue.KVComparator.compare

2.如果两个storefilescanner中最小的kv相同，比较谁的storefile的seqid更大，返回更大的

3.得到当前所有的storefilescanner中最小的kv的一个storefilescanner为HeyValueHead.current属性的值

// Combine all seeked scanners with a heap

heap = newKeyValueHeap(scanners, store.getComparator());

}

KeyValueScanner.seek流程分析：

KeyValueScanner的实例StoreFileScanner,调用StoreFileScanner.seek,代码调用层级

publicbooleanseek(KeyValue key) throws IOException {

if (seekCount != null) seekCount.incrementAndGet();

try {

if(!seekAtOrAfter(hfs, key)) {

close();

returnfalse;

}

cur = hfs.getKeyValue();

return !hasMVCCInfo ? true : skipKVsNewerThanReadpoint();

} finally {

realSeekDone = true;

}

} catch (IOException ioe) {

thrownewIOException("Could not seek " + this + " to key " + key, ioe);

}

调用HFileScanner的实现HFileReaderV2.EncodedScannerV2 or HFileReaderV2.ScannerV2的seekTo方法

publicstaticbooleanseekAtOrAfter(HFileScanners, KeyValue k)

throws IOException {

调用下面会提到的HFileReaderV2.AbstractScannerV2.seekTo方法

如果返回的值==0表示刚好对应上，直接返回true,不需要在进行next操作(当前的kv就是对的kv)

intresult = s.seekTo(k.getBuffer(), k.getKeyOffset(), k.getKeyLength());

if(result < 0) {

小米搞的一个对index中存储的key的优化，HBASE-7845

indexkey的值在小米的hbase-7845进行了优化，

存储的key是大于上一个block的最后一个key与小于当前block第一个key的一个值,如果是此值返回的值为-2

此时不需要像其它小于0的情况把当前的kv向下移动一个指针位，因为当前的值已经在第一位上

if (result == HConstants.INDEX_KEY_MAGIC) {

// using faked key

returntrue;

}

移动到文件的第一个block的开始位置,此部分代码通常不会被执行

// Passed KV is smaller than first KV in file, work from start of file

returns.seekTo();

} elseif(result > 0) {

当前scan的startkey小于当前的block的currentkey，移动到下一条数据

// Passed KV is larger than current KV in file, if there is a next

// it is the "after", if not then this scanner is done.

returns.next();

}

// Seeked to the exact key

returntrue;

}

HFileReaderV2.AbstractScannerV2.seekTo方法

publicintseekTo(byte[] key, intoffset, intlength) throws IOException {

// Always rewind to the first key of the block, because the given key

// might be before or after the current key.

returnseekTo(key, offset, length, true);

}

seekTo的嵌套调用

protectedintseekTo(byte[] key, intoffset, intlength, booleanrewind)

throws IOException {

得到HFileReaderV2中的block索引的reader实例，HFileBlockIndex.BlockIndexReader

HFileBlockIndex.BlockIndexReader indexReader =

reader.getDataBlockIndexReader();

从blockindexreader中得到key对应的HFileBlock信息，

每一个block的第一个key都存储在meta的block中在reader的blockKeys,

indexkey的值在小米的hbase-7845进行了优化，

存储的key是大于上一个block的最后一个key与小于当前block第一个key的一个值

同时存储有此block对应的offset(在reader的blockOffsets)与block size大小(在reader的blockDataSizes)

1.通过二分查找到meta block的所有key中比较，得到当前scan的startkey对应的block块的下标值

2.通过下标拿到block的开始位置，

3.通过下标拿到block的大小

4.加载对应的block信息，并封装成BlockWithScanInfo实例返回

BlockWithScanInfo blockWithScanInfo =

indexReader.loadDataBlockWithScanInfo(key, offset, length, block,

cacheBlocks, pread, isCompaction);

if (blockWithScanInfo == null || blockWithScanInfo.getHFileBlock() == null) {

// This happens if the key e.g. falls before the beginning of the file.

return -1;

}

调用HFileReaderV2.EncodedScannerV2 or HFileReaderV2.ScannerV2 的loadBlockAndSeekToKey方法

1.更新当前的block块为seek后的block块，

2.把指标移动到指定的key的指针位置。

returnloadBlockAndSeekToKey(blockWithScanInfo.getHFileBlock(),

blockWithScanInfo.getNextIndexedKey(), rewind, key, offset, length, false);

}

执行StoreScanner.next方法处理

回到DefaultCompactor.compact的代码内,得到scanner后，要执行的写入新storefile文件的操作。

writer = store.createWriterInTmp(fd.maxKeyCount, this.compactionCompression, true,

fd.maxMVCCReadpoint >= smallestReadPoint);

booleanfinished = performCompaction(scanner, writer, smallestReadPoint);

在performcompaction中通过StoreScanner.next(kvlist,limit)读取kv数据，

其中limit的大小通过hbase.hstore.compaction.kv.max配置，默认值为10,太大可能会出现oom的情况

通过HFileWriterV2.append添加kv到新的storefile文件中。

通过hbase.hstore.close.check.interval配置写入多少数据后检查一次store是否是可写的状态，

默认10*1000*1000(10m)

StoreScanner.next(kvlist,limit)：

publicbooleannext(List<Cell> outResult, intlimit) throws IOException {

lock.lock();

try {

if (checkReseek()) {

returntrue;

}

// if the heap was left null, then the scanners had previously run out anyways, close and

// return.

if (this.heap == null) {

close();

returnfalse;

}

通过调用KeyValueHeap.peek-->StoreFileScanner.peek,得到当前seek后的keyvalue

如果当前的keyvalue为null,表示没有要查找的数据了,结束此次scan

KeyValue peeked = this.heap.peek();

if (peeked == null) {

close();

returnfalse;

}

// only call setRow if the row changes; avoids confusing the query matcher

// if scanning intra-row

byte[] row = peeked.getBuffer();

intoffset = peeked.getRowOffset();

shortlength = peeked.getRowLength();

此处的if检查通常在第一次运行时，或者说已经不是在一行查询内时，会进行,设置matcher.row为当前行的rowkey

if (limit < 0 || matcher.row == null || !Bytes.equals(row, offset, length, matcher.row,

matcher.rowOffset, matcher.rowLength)) {

this.countPerRow = 0;

matcher.setRow(row, offset, length);

}

KeyValue kv;

KeyValue prevKV = null;

// Only do a sanity-check if store and comparator are available.

KeyValue.KVComparator comparator =

store != null ? store.getComparator() : null;

intcount = 0;

LOOP: while((kv = this.heap.peek()) != null) {

++kvsScanned;

// Check that the heap gives us KVs in an increasing order.

assertprevKV == null || comparator == null || comparator.compare(prevKV, kv) <= 0 :

"Key " + prevKV + " followed by a " + "smaller key " + kv + " in cf " + store;

prevKV = kv;

检查kv：

1.过滤filter.filterAllRemaining()==true,表示结束查询,返回DONE_SCAN

2.检查matcher中的rowkey(row属性，表示当前查找的所有kv在相同行),

如果matcher.row小于当前的peek的kv,表示当前row的查找结束(current kv已经在下一行,返回DONE)

如果matcher.row大于当前的peek的kv,peek出来的kv比matcher.row小，需要seek到下一行，返回SEEK_NEXT_ROW。

3.检查ttl是否过期，如果过期返回SEEK_NEXT_COL。

4.如果是minor的compact的scan,这时的scantype为COMPACT_RETAIN_DELETES，返回INCLUDE。

5.如果kv非delete的类型，同时在deletes（ScanDeleteTracker）中包含此条数据

如果删除类型为FAMILY_DELETED/COLUMN_DELETED,那么返回SEEK_NEXT_COL。

如果删除类型为VERSION_DELETED/FAMILY_VERSION_DELETED,那么返回SKIP。

6.检查timestamp的值是否在TimeRange的范围内。如果超过最大值，返回SKIP，否则返回SEEK_NEXT_COL。

7.执行filter.filterKeyValue().

如果filter返回为SKIP，直接返回SKIP。

如果filter返回为NEXT_COL，返回SEEK_NEXT_COL。

如果filter返回为NEXT_ROW，返回SEEK_NEXT_ROW。

如果filter返回为SEEK_NEXT_USING_HINT，返回SEEK_NEXT_USING_HINT。

否则表示filter返回为INCLUDE或INCLUDE AND SEEK NEXT,执行下面流程

8.检查如果非delete类型的kv，是否超过maxVersion，如果是，或者数据ttl过期，返回SEEK_NEXT_ROW。

如果数据没有过期，同时没有超过maxVersion,同时filter返回为INCLUDE_AND_NEXT_COL。

返回INCLUDE_AND_SEEK_NEXT_COL。否则返回INCLUDE。

ScanQueryMatcher.MatchCodeqcode = matcher.match(kv);

switch(qcode) {

caseINCLUDE:

caseINCLUDE_AND_SEEK_NEXT_ROW:

caseINCLUDE_AND_SEEK_NEXT_COL:

执行filter的transformCell操作，此处可以想办法让KV的值最可能的小，减少返回的值大小。

Filterf = matcher.getFilter();

if (f != null) {

// TODO convert Scan Query Matcher to be Cell instead of KV based ?

kv = KeyValueUtil.ensureKeyValue(f.transformCell(kv));

}

this.countPerRow++;

此时是compact的scan,storeLimit为-1,storeOffset为0，此处的if检查不会执行

if (storeLimit > -1 &&

this.countPerRow > (storeLimit + storeOffset)) {

// do what SEEK_NEXT_ROW does.

if (!matcher.moreRowsMayExistAfter(kv)) {

returnfalse;

}

reseek(matcher.getKeyForNextRow(kv));

break LOOP;

}

把数据添加到返回的列表中。可通过storeLimit与storeOffset来设置每一个store查询的分页值。

前提是只有一个cf，只有一个kv的情况下

// add to results only if we have skipped #storeOffset kvs

// also update metric accordingly

if (this.countPerRow > storeOffset) {

outResult.add(kv);

count++;

}

if (qcode == ScanQueryMatcher.MatchCode.INCLUDE_AND_SEEK_NEXT_ROW) {

检查是否有下一行数据，也就是检查当前的kv是否达到stop的kv值。

if (!matcher.moreRowsMayExistAfter(kv)) {

returnfalse;

}

移动到当前kv的后面，通过kv的rowkey部分，加上long.minvalue,

把cf与column的值都设置为null，这个值就是最大的kv,kv的比较方式可参见KeyValue.KVComparator

reseek(matcher.getKeyForNextRow(kv));

} elseif (qcode == ScanQueryMatcher.MatchCode.INCLUDE_AND_SEEK_NEXT_COL) {

由于此时是compaction的next col，所以直接移动到下一行去了。

否则得到下一个column的列名，移动到下一个列的数据前。见ScanQueryMatcher.getKeyForNextColumn方法

reseek(matcher.getKeyForNextColumn(kv));

} else {

否则是include，直接移动到下一行

this.heap.next();

}

if (limit > 0 && (count == limit)) {

如果达到limit的值，跳出while

break LOOP;

}

continue;

caseDONE:

当前row查询结束

returntrue;

caseDONE_SCAN:

结束本次的SCAN操作

close();

returnfalse;

caseSEEK_NEXT_ROW:

计算出当前的ROW的后面位置，也就是比当前的KV大，比下一行的KV小，并通过

reseek-->StoreFileScanner.reseek-->HFile.seekTo移动到下一个大于此row的kv上

// This is just a relatively simple end of scan fix, to short-cut end

// us if there is an endKey in the scan.

if (!matcher.moreRowsMayExistAfter(kv)) {

returnfalse;

}

reseek(matcher.getKeyForNextRow(kv));

break;

caseSEEK_NEXT_COL:

计算出比当前KV大的下一列的KV值，移动到下一个KV上

reseek(matcher.getKeyForNextColumn(kv));

break;

caseSKIP:

执行StoreScanner.KeyValueHeap.next

this.heap.next();

break;

caseSEEK_NEXT_USING_HINT:

如果存在下一列(kv),移动到下一个KV上，否则执行StoreScanner.KeyValueHeap.next

// TODO convert resee to Cell?

KeyValue nextKV = KeyValueUtil.ensureKeyValue(matcher.getNextKeyHint(kv));

if (nextKV != null) {

reseek(nextKV);

} else {

heap.next();

}

break;

default:

thrownewRuntimeException("UNEXPECTED");

}

if (count > 0) {

returntrue;

}

// No more keys

close();

returnfalse;

} finally {

lock.unlock();

}

KeyValueHeap.next方法流程：

public KeyValue next() throws IOException {

if(this.current == null) {

returnnull;

}

得到当前队列中top的StoreFileScanner中的current kv的值，并把top的scanner指针向下移动到下一个kv的位置

KeyValue kvReturn = this.current.next();

得到移动后的top的current(此时是kvReturn的下一个kv的值)

KeyValue kvNext = this.current.peek();

如果next kv的值是null,表示top的scanner已经移动到文件的尾部，关闭此scanner,重新计算队列中的top

if (kvNext == null) {

this.current.close();

this.current = pollRealKV();

} else {

重新计算出current top的scanner

KeyValueScannertopScanner = this.heap.peek();

if (topScanner == null ||

this.comparator.compare(kvNext, topScanner.peek()) >= 0) {

this.heap.add(this.current);

this.current = pollRealKV();

}

returnkvReturn;

}

compaction时storefile合并的新storefile写入流程

回到DefaultCompactor.compact的代码内,-->performcompaction(在DefaultCompactor的上级类中Compactor)

在performcompaction中通过StoreScanner.next(kvlist,limit)读取kv数据，

其中limit的大小通过hbase.hstore.compaction.kv.max配置，默认值为10,太大可能会出现oom的情况

通过HFileWriterV2.append添加kv到新的storefile文件中。

通过hbase.hstore.close.check.interval配置写入多少数据后检查一次store是否是可写的状态，

默认10*1000*1000(10m)

在每next一条数据后，一条数据包含多个column,所以会有多个kv的值。通过如下代码写入到新的storefile

do {

查找一行数据

hasMore = scanner.next(kvs, compactionKVMax);

// output to writer:

for (Cellc : kvs) {

KeyValue kv = KeyValueUtil.ensureKeyValue(c);

if (kv.getMvccVersion() <= smallestReadPoint) {

kv.setMvccVersion(0);

}

执行写入操作

writer.append(kv);

++progress.currentCompactedKVs;

.................................此处省去一些代码

kvs.clear();

} while (hasMore);

通过writer实例append kv到新的storefile中，writer实例通过如下代码生成：

在DefaultCompactor.compact方法代码中：

writer = store.createWriterInTmp(fd.maxKeyCount, this.compactionCompression, true,

fd.maxMVCCReadpoint >= smallestReadPoint);

Hstore.createWriterIntmp-->StoreFile.WriterBuilder.build生成StoreFile.Writer实例，

此实例中引用的具体writer实例为HFileWriterV2，

通过hfile.format.version配置，writer/reader的具体的版本，目前只能配置为2

HstoreFile.Writer.append(kv)流程：

publicvoidappend(final KeyValue kv) throws IOException {

写入到bloomfilter中,如果kv与上一次写入的kv的row/rowcol的值是相同的，不写入，

保证每次写入到bloomfilter中的数据都是不同的row或rowcol

通过io.storefile.bloom.block.size配置bloomblock的大小，默认为128*1024

appendGeneralBloomfilter(kv);

如果kv是一个delete的kv，把row写入到delete的bloomfilter block中。

同一个行的多个kv只添加一次，要添加到此bloomfilter中，kv的delete type要是如下类型：

kv.isDeleteFamily==true,同时kv.isDeleteFamilyVersion==true

appendDeleteFamilyBloomFilter(kv);

把数据写入到HFileWriterV2的output中。计算出此storefile的最大的timestamp(所有append的kv中最大的mvcc值)

hfilev2的写入格式:klen(int) vlen(int) key value

hfilev2的key的格式:klen(int) vlen(int)

rowlen(short) row cflen(byte)

cf column timestamp(long) type(byte)

每次append的过程中会检查block是否达到flush的值，

如果达到cf中配置的BLOCKSIZE的值，默认为65536,执行finishBlock操作写入数据，

同时写入此block的bloomfilter.生成一个新的block

writer.append(kv);

更新此storefile的包含的timestamp的范围，也就是更新最大／最小值

trackTimestamps(kv);

}

完成数据读取与写入操作后，回到DefaultCompactor.compact方法中，关闭writer实例

if (writer != null) {

writer.appendMetadata(fd.maxSeqId, request.isMajor());

writer.close();

newFiles.add(writer.getPath());

}

添加此storefile的最大的seqid到fileinfo中。StoreFile.Writer中的方法

publicvoidappendMetadata(finallongmaxSequenceId, finalbooleanmajorCompaction)

throws IOException {

writer.appendFileInfo(MAX_SEQ_ID_KEY, Bytes.toBytes(maxSequenceId));

是否执行的majorCompaction

writer.appendFileInfo(MAJOR_COMPACTION_KEY,

Bytes.toBytes(majorCompaction));

appendTrackedTimestampsToMetadata();

}

publicvoidappendTrackedTimestampsToMetadata() throws IOException {

appendFileInfo(TIMERANGE_KEY,WritableUtils.toByteArray(timeRangeTracker));

appendFileInfo(EARLIEST_PUT_TS, Bytes.toBytes(earliestPutTs));

}

publicvoidclose() throws IOException {

以下两行代码作用于添加相关信息到fileinfo中,see 下面的两个方法流程,不说明。

booleanhasGeneralBloom = this.closeGeneralBloomFilter();

booleanhasDeleteFamilyBloom = this.closeDeleteFamilyBloomFilter();

writer.close();

// Log final Bloom filter statistics. This needs to be done after close()

// because compound Bloom filters might be finalized as part of closing.

if (StoreFile.LOG.isTraceEnabled()) {

StoreFile.LOG.trace((hasGeneralBloom ? "" : "NO ") + "General Bloom and " +

(hasDeleteFamilyBloom ? "" : "NO ") + "DeleteFamily" + " was added to HFile " +

getPath());

}

privatebooleancloseGeneralBloomFilter() throws IOException {

booleanhasGeneralBloom = closeBloomFilter(generalBloomFilterWriter);

// add the general Bloom filter writer and append file info

if (hasGeneralBloom) {

writer.addGeneralBloomFilter(generalBloomFilterWriter);

writer.appendFileInfo(BLOOM_FILTER_TYPE_KEY,

Bytes.toBytes(bloomType.toString()));

if (lastBloomKey != null) {

writer.appendFileInfo(LAST_BLOOM_KEY, Arrays.copyOfRange(

lastBloomKey, lastBloomKeyOffset, lastBloomKeyOffset

+ lastBloomKeyLen));

}

returnhasGeneralBloom;

}

privatebooleancloseDeleteFamilyBloomFilter() throws IOException {

booleanhasDeleteFamilyBloom = closeBloomFilter(deleteFamilyBloomFilterWriter);

// add the delete family Bloom filter writer

if (hasDeleteFamilyBloom) {

writer.addDeleteFamilyBloomFilter(deleteFamilyBloomFilterWriter);

}

// append file info about the number of delete family kvs

// even if there is no delete family Bloom.

writer.appendFileInfo(DELETE_FAMILY_COUNT,

Bytes.toBytes(this.deleteFamilyCnt));

returnhasDeleteFamilyBloom;

}

HFileWriterV2.close()方法流程：

写入用户数据/写入bloomfilter的数据，写入datablockindex的数据，更新写入fileinfo,

写入FixedFileTrailer到文件最后。

0
顶

0
踩

分享到：

Major compaction时的scan操作 | compact处理流程分析

2014-04-23 14:00
浏览 1895
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

minor compaction时的scan操作分析

minor compaction时的scan操作分析

生成compaction时的StoreScanner

执行StoreScanner.next方法处理

compaction时storefile合并的新storefile写入流程

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

minor compaction时的scan操作分析

minor compaction时的scan操作分析

生成compaction时的StoreScanner

执行StoreScanner.next方法处理

compaction时storefile合并的新storefile写入流程

评论

发表评论

相关推荐

关于Hbase的cache配置

hadoop ha配置

hadoop-mapreduce中reducetask运行分析

hadoop-mapreduce中maptask运行分析

hbase hfilev2文件

Hbase MemStoreLAB

spark shuffle部分分析

Task的执行过程分析

Spark中的Scheduler

RDD的依赖关系

从wordcount分析spark提交job

UserScan的处理流程分析

Major compaction时的scan操作

compact处理流程分析

region split流程分析

memstore的flush流程分析

Hlog的相关处理流程不完全分析

hbase put 流程分析regionserver端

hbase put 流程分析client端

日志重播分析

最近访客更多访客>>