HBase Compaction策略

jopen 11年前發布 | 28K 次閱讀 HBase NoSQL數據庫

RegionServer這種類LSM存儲引擎需要不斷的進行Compaction來減少磁盤上數據文件的個數和刪除無用的數據從而保證讀性能。

RegionServer后臺有一組負責flush region的線程(MemStoreFlusher)，每次從flushQueue中拿出一個flush region請求，會檢查這個region是否有某個store包含的storefile個數超過配置
hbase.hstore.blockingStoreFiles，默認7，如果超過,說明storefile個數已經到了會影響讀性能的地步，那么就看這個flush region
請求是否已經有blockingWaitTime（hbase.hstore.blockingWaitTime,默認90s）沒有執行了,如果是，這時候需要立即執行flush region，為了防止OOM。如果沒有超過blockingWaitTime，那么先
看看region是否需要分裂，如果不需要，則向后臺的CompactionSplitThread請求做一次
Compaction(從這里可以看出，split優先級比compaction高)，然后重新將這個flush region加入
flushQueue，延后做flush.

重點看Compaction

Compaction以store為單位，
CompactSplitThread會為region的每個store生成一個CompactionRequest.
一個Compaction根據它包含的storefile的總大小，可以分為
large compaction和small compaction，這兩種compaction分別被兩個不同的線程池處理。
系統一般認為small compaction占大多數，所以上文中由于storefile過多系統自動觸發的system compaction 默認放入small compaction池子中處理.

//系統自動觸發的system compaction，selectNow參數為false，實際選取待compact的
filelist過程延后在CompactionRunner中做.
if (selectNow) {
    // 通過hbase shell觸發的major compaction,selectNow為true.這里進行實際的選取待compact filelist操作
    compaction = selectCompaction(r, s, priority, request);
    if (compaction == null) return null; // message logged inside
}
// We assume that most compactions are small. So, put system compactions
//into small pool; we will do selection there, and move to large pool if //necessary.

long size = selectNow ? compaction.getRequest().getSize() : 0;

// 從這里可以看出，用戶外部觸發的compaction默認放入small compaction線程池中處理，并且
// system compaction 也會放入small compaction線程池中，后續真正執行
// system compaction時，會根據選出的storefile的總大小來決定最終放入large還是small線程池
ThreadPoolExecutor pool = (!selectNow && s.throttleCompaction(size))? largeCompactions : smallCompactions;
pool.execute(new CompactionRunner(s, r, compaction, pool));

看看執行compaction過程的CompactionRunner任務。

 // Common case - system compaction without a file selection. Select now.
 // system compaction 還沒有選擇待compact的filelist,為null
 if (this.compaction == null) {
   int oldPriority = this.queuedPriority;
   this.queuedPriority = this.store.getCompactPriority();
   if (this.queuedPriority > oldPriority) {
     // Store priority decreased while we were in queue (due to some other compaction?),
     // requeue with new priority to avoid blocking potential higher priorities.
     this.parent.execute(this);
     return;
   }
   try {
     // 選擇storefile
     this.compaction = selectCompaction(this.region, this.store, queuedPriority, null);
   } catch (IOException ex) {
     LOG.error("Compaction selection failed " + this, ex);
     server.checkFileSystem();
     return;
   }
   if (this.compaction == null) return; // nothing to do
   // Now see if we are in correct pool for the size; if not, go to the correct one.
   // We might end up waiting for a while, so cancel the selection.
   assert this.compaction.hasSelection();
   // 判斷這次compaction放入small還是large池中執行
   ThreadPoolExecutor pool = store.throttleCompaction(
       compaction.getRequest().getSize()) ? largeCompactions : smallCompactions;
   // system compaction應該放入large池
   if (this.parent != pool) {
     this.store.cancelRequestedCompaction(this.compaction);
     this.compaction = null;
     this.parent = pool;
     // 在large池子中執行
     this.parent.execute(this);
     return;
   }
 }

large compaction和small compaction分界線由
hbase.regionserver.thread.compaction.throttle參數決定，如果沒有設置，
默認為2 * hbase.hstore.compaction.max * hbase.hregion.memstore.flush.size
全部取默認值等于2*10*128MB = 2.5GB

從以上可以看出，system compaction默認放入small池，當選出storefile list
后，再根據size去判斷最終放入small還是large線程池中執行.
對于外部觸發的compaction，放入small中執行.

選定池子后，下面看每個store compaction具體的步驟

兩個步驟:

根據某種策略生成compact的目標storefile集合
進行compaction

這兩步都在CompactionRunner這個runnable任務中完成。
這里主要說第一個步驟：入口在HStore::requestCompaction.

首先創建storeEngine相應的CompactionContext，這個context用來存各種compact相關的信息，
最重要的就是CompactionRequest，作為上面第二個步驟的輸入. HBase 0.98主要有兩種存儲引擎,DefaultStoreEngine和StripeStoreEngine，這里的存儲引擎主要是管理磁盤上的storefile文件和flush 內存中的snapshot memstore到磁盤。StripStoreEngine比較特別，
一個snapshot memstore刷到磁盤上有可能多于一個storefile文件，這里不討論.大部分人都使用默認的storeEngine.

其次，創建完context后，然后調用compactionPolicy的selectCompaction()，將store下面的所有storefile傳進去，供其挑選.HBase的compaction policy可通過
配置項hbase.hstore.defaultengine.compactionpolicy.class配置，默認是
ExploringCompactionPolicy.class
下面看selectCompaction()，主要有幾個步驟：

從store下面的storefiles中過濾掉比正在compacting的storefilelist中最新的storefile更老的storefile(輸入的storefile按照如下規則排序)
public static final Comparator<StoreFile> SEQ_ID = Ordering.compound(ImmutableList.of( Ordering.natural().onResultOf(new GetSeqId()), Ordering.natural().onResultOf(new GetFileSize()).reverse(), Ordering.natural().onResultOf(new GetBulkTime()), Ordering.natural().onResultOf(new GetPathName()) ));
seq id是storefile對應的snapshot memstore在flush時，從region內部的全局遞增計
數器sequenceId中取到的，可以看到，seq id越大的storefile越新.對多個文件進行compact后產生的新的storefile的seq id被設置為多個文件中最大的seq id
如果不是major compaction，就檢查：如果配置了刪除ttl到期的storefile，并且ttl是一個有限的值，那么這次compaction只會選ttl到期的storefile，如果目前確實存在ttl過期的storefile，則這次compaction選取的文件列表就是這些過期的 storefile，選取文件流程結束,CompactionRequest產生。如果沒有配置，則過濾掉文件大小大于配置值
hbase.hstore.compaction.max.size(默認是 Long.MAX_VALUE)的storefile(實際上，這塊實現有問題)
根據一些規則和參數，判斷是否升級為major compaction，比較煩，直接貼代碼
```
 // Force a major compaction if this is a user-requested major compaction,
 // or if we do not have too many files to compact and this was requested
 // as a major compaction.
 // Or, if there are any references among the candidates.
 boolean majorCompaction = (
   (forceMajor && isUserCompaction)
   || ((forceMajor || isMajorCompaction(candidateSelection))
       && (candidateSelection.size() < comConf.getMaxFilesToCompact()))
   || StoreUtils.hasReferences(candidateSelection)
   );
```
- 如果不是，那么這次compaction是一個minor compaction，做以下幾件事
  - 過濾掉bulk load的storefile
  - 應用ExploringCompactionPolicy重寫的applyCompactionPolicy方法，挑選
    storefile 的思想是：枚舉所有的n個連續文件，n位于[hbase.hstore.compaction.min(默認3), hbase.hstore.compaction.max(默認10)]之間, 這是對于minor compaction的文件個數的限制。并且n個連續的文件大小總和不能超過hbase.hstore.compaction.max.size(默認 MAX_VALUE)，并且n個文件的大小之間的"方差"不能太大. 最后選出n個文件，選擇的原則是：刪除最多的文件同時這些文件的
    大小總和小(消耗更少的磁盤IO)
- 檢查是否選出來的storefile個數超過hbase.hstore.compaction.max，如果超過，并且
  這只是minor compaction，則從storefile文件集合尾部將多余的storefile過濾掉，如果超過但是是major compaction并且是用戶發起的，則不過濾.至此，這次compact的storefile集合產生，結束。

至此，第一個步驟結束，compact的目標storefile選出.

參考資料

https://github.com/apache/hbase/tree/0.98
來自：http://www.cnblogs.com/foxmailed/p/3981940.html

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/lib/view/open1415668863446.html

HBase NoSQL數據庫

HBase Compaction策略

參考資料

相關經驗

相關資訊

相關文檔

目錄