Apache Spark 1.4.1 發布，開源集群計算系統

jopen 9年前發布 | 20K 次閱讀 Apache Spark

Apache Spark 1.4.1 發布，此版本是個維護版本，包括一些穩定性修復（DataFrame API, Spark Streaming, PySpark, Spark SQL 和 MLlib），基于 Spark 1.4 分支。強烈建議所有用戶升級到最新版本，此版本貢獻開發者有 85 人。

Spark 1.4.1 現已提供在下載頁面。

更新內容

Data Sources and DataFrames

SPARK-8804: Order of UTF8String is not consistent with String if there is any non-ascii character in it
SPARK-8406: Race condition when writing Parquet files
SPARK-8329: DataSource options parser no longer accepts ‘_’
SPARK-8368: ClassNotFoundException in closure for map
SPARK-8470: MissingRequirementError for ScalaReflection on user classes
SPARK-8358: DataFrame explode with alias and * fails

MLLib

SPARK-8151: Pipeline components should correctly implement copy
SPARK-8468: Some metrics in RegressionEvaluator should have negative sign
SPARK-8736: GBTRegressionModel shouldn’t threshold predictions
SPARK-8563: IndexedRowMatrix.computeSVD() yields the U with wrong numCols

PySpark

SPARK-8202: Infinite loop during external sort
SPARK-8573: Trigger exceptions when invalid operators are used
SPARK-8766: Support non ASCII characters in columns

SparkR

SPARK-8506: Support for Spark packages when initializing SparkR
SPARK-8085: Support for user defined schemas when reading from data sources

更多內容請看發行說明。

Apache Spark 是一種與 Hadoop 相似的開源集群計算環境，但是兩者之間還存在一些不同之處，這些有用的不同之處使 Spark 在某些工作負載方面表現得更加優越，換句話說，Spark 啟用了內存分布數據集，除了能夠提供交互式查詢外，它還可以優化迭代工作負載。

Spark 是在 Scala 語言中實現的，它將 Scala 用作其應用程序框架。與 Hadoop 不同，Spark 和 Scala 能夠緊密集成，其中的 Scala 可以像操作本地集合對象一樣輕松地操作分布式數據集。

盡管創建 Spark 是為了支持分布式數據集上的迭代作業，但是實際上它是對 Hadoop 的補充，可以在 Hadoo 文件系統中并行運行。通過名為 Mesos 的第三方集群框架可以支持此行為。Spark 由加州大學伯克利分校 AMP 實驗室 (Algorithms, Machines, and People Lab) 開發，可用來構建大型的、低延遲的數據分析應用程序。

Apache Spark 1.4.1 發布，開源集群計算系統

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/news/view/1525837

Apache Spark

Apache Spark 1.4.1 發布，開源集群計算系統

更新內容

Data Sources and DataFrames

MLLib

PySpark

SparkR

相關資訊

相關經驗

相關文檔