高效和分布式的通用數據處理平臺:Apache Flink
Apache Flink 是高效和分布式的通用數據處理平臺。
Apache Flink 聲明式的數據分析開源系統,結合了分布式 MapReduce 類平臺的高效,靈活的編程和擴展性。同時在并行數據庫發現查詢優化方案。
DataSet<String> input = env.readTextFile(inputPath); input.flatMap(new FlatMapFunction() { public void flatMap(String value, Collector out) { for (String s : value.split(" ")) { out.collect(new Tuple2<String, Long>(s, 1L); } } }) .groupBy(0) .sum(1) .writeAsText(outputPath);
System Stack
The Apache Flink stack consists of
- Programming APIs for different languages (Java, Scala) and paradigms (record-oriented, graph-oriented).
- A program optimizer that decides how to execute the program for good performance. It decides among other things about data movement and caching strategies.
- A distributed runtime that executes programs in parallel distributed over many machines.
Flink runs independently from Hadoop, but integrates seamlessly with YARN (Hadoop's next-generation scheduler). Various file systems (including the Hadoop Distributed File System) can act as data sources.

本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!