Apache Spark 1.6.1 發布,集群計算環境

Apache Spark 1.6.1 發布了,Apache Spark 是一種與 Hadoop 相似的開源集群計算環境,但是兩者之間還存在一些不同之處,這些有用的不同之處使 Spark 在某些工作負載方面表現得更加優越,換句話說,Spark 啟用了內存分布數據集,除了能夠提供交互式查詢外,它還可以優化迭代工作負載。
Spark 是在 Scala 語言中實現的,它將 Scala 用作其應用程序框架。與 Hadoop 不同,Spark 和 Scala 能夠緊密集成,其中的 Scala 可以像操作本地集合對象一樣輕松地操作分布式數據集。
盡 管創建 Spark 是為了支持分布式數據集上的迭代作業,但是實際上它是對 Hadoop 的補充,可以在 Hadoo 文件系統中并行運行。通過名為 Mesos 的第三方集群框架可以支持此行為。Spark 由加州大學伯克利分校 AMP 實驗室 (Algorithms, Machines, and People Lab) 開發,可用來構建大型的、低延遲的數據分析應用程序。
新特性
[SPARK-10359] - Enumerate Spark's dependencies in a file and diff against it for new pull requests
Bug 修復
[SPARK-7615] - MLLIB Word2Vec wordVectors divided by Euclidean Norm equals to zero
[SPARK-9844] - File appender race condition during SparkWorker shutdown
[SPARK-10524] - Decision tree binary classification with ordered categorical features: incorrect centroid
[SPARK-10847] - Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
[SPARK-11394] - PostgreDialect cannot handle BYTE types
[SPARK-11624] - Spark SQL CLI will set sessionstate twice
[SPARK-11972] - [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter spark-sql session
[SPARK-12006] - GaussianMixture.train crashes if an initial model is not None
[SPARK-12010] - Spark JDBC requires support for column-name-free INSERT syntax
[SPARK-12016] - word2vec load model can't use findSynonyms to get words
[SPARK-12026] - ChiSqTest gets slower and slower over time when number of features is large
[SPARK-12268] - pyspark shell uses execfile which breaks python3 compatibility
[SPARK-12300] - Fix schema inferance on local collections
[SPARK-12316] - Stack overflow with endless call of `Delegation token thread` when application end.
[SPARK-12327] - lint-r checks fail with commented code
下載地址:http://spark.apache.org/downloads.html
來自: http://www.oschina.net//news/71432/apache-spark-1-6-1