Apache Kudu v0.9.0 發布，一個數據存儲系統

jopen 8年前發布 | 10K 次閱讀 Hadoop 數據存儲 Apache Kudu

為了應對先前發現的這些趨勢，有兩種不同的方式：持續更新現有的Hadoop工具或者重新設計開發一個新的組件。其目標是：

我們為了實現這些目標，首先在現有的開源項目上實現原型，但是最終我們得出結論：需要從架構層作出重大改變。而這些改變足以讓我們重新開發一個全新的數據存儲系統。經過多年的努力如今終于可以分享我們多年來的努力成果：Kudu，一個新的數據存儲系統。

更新日志

The KuduTableInputFormat command has changed the way in which it handles scan predicates, including how it serializes predicates to the job configuration object. The new configuration key is kudu.mapreduce.encoded.predicate. Clients using theTableInputFormatConfigurator are not affected.
The kudu-spark sub-project has been renamed to follow naming conventions for Scala. The new name is kudu-spark_2.10.
Default table partitioning has been removed. All tables must now be created with explicit partitioning. Existing tables are unaffected. See the schema design guide for more details.

KUDU-1002 Added support for UPSERT operations, whereby a row is inserted if it does not already exist, but updated if it does. Support for UPSERT is included in Java, C++, and Python APIs, but not in Impala.
KUDU-1306 Scan token API for creating partition-aware scan descriptors. This API simplifies executing parallel scans for clients and query engines.
Gerrit 2848 Added a kudu datasource for Spark. This datasource uses the Kudu client directly instead of using the MapReduce API. Predicate pushdowns for spark-sql and Spark filters are included, as well as parallel retrieval for multiple tablets and column projections. See an example of Kudu integration with Spark.
Gerrit 2992 Added the ability to update and insert from Spark using a Kudu datasource.

KUDU-1415 Added statistics in the Java client such as the number of bytes written and the number of operations applied.
KUDU-1451 Improved tablet server restart time when the tablet server needs to clean up of a lot previously deleted tablets. Tablets are now cleaned up after they are deleted.

KUDU-678 Fixed a leak that happened during DiskRowSet compactions where tiny blocks were still written to disk even if there were no REDO records. With the default block manager, it usually resulted in block containers with thousands of tiny blocks.
KUDU-1437 Fixed a data corruption issue that occured after compacting sequences of negative INT32 values in a column that was configured with RLE encoding.

All Kudu clients have longer default timeout values, as listed below.

The default operation timeout and the default admin operation timeout are now set to 30 seconds instead of 10.
The default socket read timeout is now 10 seconds instead of 5.

The default admin timeout is now 30 seconds instead of 10.
The default RPC timeout is now 10 seconds instead of 5.
The default scan timeout is now 30 seconds instead of 15.
Some default settings related to I/O behavior during flushes and compactions have been changed: The default forflush_threshold_mb has been increased from 64MB to 1000MB. The default cfile_do_on_finish has been changed fromclose to flush. Experiments using YCSB indicate that these values will provide better throughput for write-heavy applications on typical server hardware.

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！