一個快速,高效的批處理計算引擎:Cubert
Cubert是一個快速,高效的批處理計算引擎,用于對Hadoop的大規模數據集進行復雜的分析和報告。
Cubert非常適合以下應用領域:
-
統計計算,聯接和聚合。Statistical Calculations, Joins and Aggregations
Cubert introduces a new model of computation that allows users to organize data in a format that is ideally suited for scalable execution of subsequent query processing operators, and a set of algorithmically-efficient operators (MeshJoin and CUBE) that exploit the organization to provide significantly improved CPU and resource utilization compared to existing solutions.
-
多維數據集和分組集聚合。Cubes and Grouping Set Aggregations
The power-horse is the new CUBE operator that can efficiently (CPU and memory) compute additive, non-additive (e.g. Count Distinct) and exact percentile rank (e.g. Median) statistics; can roll up inner dimensions on-the-fly and compute multiple measures within a single job.
-
時間范圍計算和增量計算。Time range calculation and Incremental computations
Cubert primitives are specially suited for reporting workflows that employ computation pattern that is both regular and repetitive, allowing for efficiency gains from partial result caching and incremental processing.
-
圖形計算。Graph computations
Cubert provides a novel sparse matrix multiplication algorithm that is best suited for analytics with large-scale graphs.
-
在性能還是資源是值得關注的問題。When performance or resources are a matter of concern
Cubert Script is a developer-friendly language that takes out the hints, guesswork and surprises when running the script. The script provides the developers complete control over the execution plan (without resorting to low-level programming!), and is extremely extensible by adding new functions, aggregators and even operators.