一個快速,高效的批處理計算引擎:Cubert

jopen 10年前發布 | 14K 次閱讀 Cubert

Cubert是一個快速,高效的批處理計算引擎,用于對Hadoop的大規模數據集進行復雜的分析和報告。
Cubert非常適合以下應用領域:

  1. 統計計算,聯接和聚合。Statistical Calculations, Joins and Aggregations

    Cubert introduces a new model of computation that allows users to organize data in a format that is ideally suited for scalable execution of subsequent query processing operators, and a set of algorithmically-efficient operators (MeshJoin and CUBE) that exploit the organization to provide significantly improved CPU and resource utilization compared to existing solutions.

  2. 多維數據集和分組集聚合。Cubes and Grouping Set Aggregations

    The power-horse is the new CUBE operator that can efficiently (CPU and memory) compute additive, non-additive (e.g. Count Distinct) and exact percentile rank (e.g. Median) statistics; can roll up inner dimensions on-the-fly and compute multiple measures within a single job.

  3. 時間范圍計算和增量計算。Time range calculation and Incremental computations

    Cubert primitives are specially suited for reporting workflows that employ computation pattern that is both regular and repetitive, allowing for efficiency gains from partial result caching and incremental processing.

  4. 圖形計算。Graph computations

    Cubert provides a novel sparse matrix multiplication algorithm that is best suited for analytics with large-scale graphs.

  5. 在性能還是資源是值得關注的問題。When performance or resources are a matter of concern

    Cubert Script is a developer-friendly language that takes out the hints, guesswork and surprises when running the script. The script provides the developers complete control over the execution plan (without resorting to low-level programming!), and is extremely extensible by adding new functions, aggregators and even operators.

項目主頁:http://www.baiduhome.net/lib/view/home/1416195502539

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!