Apache Hive v2.1.0 發布
Hive是一個基于Hadoop的開源數據倉庫,用于存儲和處理海量結構化數據。它是非死book 2008年8月開源的一個數據倉庫框架,提供了類似于SQL語法的HQL語句作為數據訪問接口,Hive有如下優缺點:
優點:
- Hive 使用類SQL 查詢語法, 最大限度的實現了和SQL標準的兼容,大大降低了傳統數據分析人員學習的曲線;
- 使用JDBC 接口/ODBC接口,開發人員更易開發應用;
- 以MR 作為計算引擎、HDFS 作為存儲系統,為超大數據集設計的計算/ 擴展能力;
- 統一的元數據管理(Derby、MySql等),并可與Pig 、Presto 等共享;
缺點:
- Hive 的HQL 表達的能力有限,有些復雜運算用HQL 不易表達;
- 由于Hive自動生成MapReduce 作業, HQL 調優困難;
- 粒度較粗,可控性差
Hive運行架構
更新日志
- [HIVE-9774] - Print yarn application id to console [Spark Branch]
- [HIVE-10280] - LLAP: Handle errors while sending source state updates to the daemons
- [HIVE-11107] - Support for Performance regression test suite with TPCDS
- [HIVE-11417] - Create shims for the row by row read path that is backed by VectorizedRowBatch
- [HIVE-11526] - LLAP: implement LLAP UI as a separate service - part 1
- [HIVE-11766] - LLAP: Remove MiniLlapCluster from shim layer after hadoop-1 removal
- [HIVE-11927] - Implement/Enable constant related optimization rules in Calcite: enable HiveReduceExpressionsRule to fold constants
- [HIVE-12049] - HiveServer2: Provide an option to write serialized thrift objects in final tasks
Bug修復
- [HIVE-1608] - use sequencefile as the default for storing intermediate results
- [HIVE-4662] - first_value can't have more than one order by column
- [HIVE-8343] - Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner
- [HIVE-9144] - Beeline + Kerberos shouldn't prompt for unused username + password
- [HIVE-9457] - Fix obsolete parameter name in HiveConf description of hive.hashtable.initialCapacity
- [HIVE-9499] - hive.limit.query.max.table.partition makes queries fail on non-partitioned tables
- [HIVE-9534] - incorrect result set for query that projects a windowed aggregate
- [HIVE-9862] - Vectorized execution corrupts timestamp values
- [HIVE-10171] - Create a storage-api module
- [HIVE-10187] - Avro backed tables don't handle cyclical or recursive records
- [HIVE-10632] - Make sure TXN_COMPONENTS gets cleaned up if table is dropped before compaction.
- [HIVE-10729] - Query failed when select complex columns from joinned table (tez map join only)
- [HIVE-11097] - HiveInputFormat uses String.startsWith to compare splitPath and PathToAliases
- [HIVE-11388] - Allow ACID Compactor components to run in multiple metastores
- [HIVE-11427] - Location of temporary table for CREATE TABLE SELECT broken by HIVE-7079
- [HIVE-11484] - Fix ObjectInspector for Char and VarChar
- [HIVE-11550] - ACID queries pollute HiveConf
- [HIVE-11675] - make use of file footer PPD API in ETL strategy or separate strategy
- [HIVE-11716] - Reading ACID table from non-acid session should raise an error
- [HIVE-11806] - Create test for HIVE-11174
功能改進
- [HIVE-4570] - More information to user on GetOperationStatus in Hive Server2 when query is still executing
- [HIVE-4924] - JDBC: Support query timeout for jdbc
- [HIVE-5370] - format_number udf should take user specifed format as argument
- [HIVE-6535] - JDBC: provide an async API to execute query and fetch results
- [HIVE-10115] - HS2 running on a Kerberized cluster should offer Kerberos(GSSAPI) and Delegation token(DIGEST) when alternate authentication is enabled
- [HIVE-10249] - ACID: show locks should show who the lock is waiting for
- [HIVE-10468] - Create scripts to do metastore upgrade tests on jenkins for Oracle DB.
- [HIVE-10982] - Customizable the value of java.sql.statement.setFetchSize in Hive JDBC Driver
- [HIVE-11424] - Rule to transform OR clauses into IN clauses in CBO
- [HIVE-11483] - Add encoding and decoding for query string config
- [HIVE-11487] - Add getNumPartitionsByFilter api in metastore api
- [HIVE-11752] - Pre-materializing complex CTE queries
- [HIVE-11793] - SHOW LOCKS with DbTxnManager ignores filter options
- [HIVE-11956] - SHOW LOCKS should indicate what acquired the lock
- [HIVE-12431] - Support timeout for compile lock
- [HIVE-12439] - CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
- [HIVE-12467] - Add number of dynamic partitions to error message
- [HIVE-12481] - Occasionally "Request is a replay" will be thrown from HS2
- [HIVE-12515] - Clean the SparkCounters related code after remove counter based stats collection[Spark Branch]
- [HIVE-12541] - SymbolicTextInputFormat should supports the path with regex
- [HIVE-12545] - Add sessionId and queryId logging support for methods like getCatalogs in HiveSessionImpl class
- [HIVE-12595] - [REFACTOR] Make physical compiler more type safe
新功能
- [HIVE-12270] - Add DBTokenStore support to HS2 delegation token
- [HIVE-12634] - Add command to kill an ACID transaction
- [HIVE-12730] - MetadataUpdater: provide a mechanism to edit the basic statistics of a table (or a partition)
- [HIVE-12878] - Support Vectorization for TEXTFILE and other formats
- [HIVE-12994] - Implement support for NULLS FIRST/NULLS LAST
- [HIVE-13029] - NVDIMM support for LLAP Cache
- [HIVE-13095] - Support view column authorization
- [HIVE-13125] - Support masking and filtering of rows/columns
- [HIVE-13307] - LLAP: Slider package should contain permanent functions
- [HIVE-13418] - HiveServer2 HTTP mode should support X-Forwarded-Host header for authorization/audits
- [HIVE-13475] - Allow aggregate functions in over clause
- [HIVE-13736] - View's input/output formats are TEXT by default
更多日志見:ReleaseNote
下載
本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!