Hadoop2.3、 Hbase0.98、 Hive0.13架構中Hive的安裝部署配置以及數據測試
簡介:
Hive 是基于 Hadoop 的一個數據倉庫工具,可以將結構化的數據文件映射為一張數據庫表,并提供簡單的 sql 查詢功能,可以將 sql 語句轉換為 MapReduce 任務進行運行。 其優點是學習成本低,可以通過類 SQL 語句快速實現簡單的 MapReduce 統計,不必開發專門的 MapReduce 應用,十分適合數據倉庫的統計分析。
1, 適用場景
Hive 構建在基于靜態批處理的 Hadoop 之上, Hadoop 通常都有較高的延遲并且在作業提交和調度的時候需要大量的開銷。因此, Hive 并不能夠在大規模數據集上實現低延遲快速的查詢,例如, Hive 在幾百 MB 的數據集上執行查詢一般有分鐘級的時間延遲。因此,
Hive 并不適合那些需要低延遲的應用,例如,聯機事務處理( OLTP )。 Hive 查詢操作過程嚴格遵守 Hadoop MapReduce 的作業執行模型, Hive 將用戶的 HiveQL 語句通過解釋器轉換為 MapReduce 作業提交到 Hadoop 集群上, Hadoop 監控作業執行過程,然后返回作業執行結果給用戶。 Hive 并非為聯機事務處理而設計, Hive 并不提供實時的查詢和基于行級的數據更新操作。 Hive 的最佳使用場合是大數據集的批處理作業,例如,網絡日志分析。
2 ,下載安裝
前期hadoop安裝準備,參考: http://blog.itpub.net/26230597/viewspace-1257609/
下載地址
wget http://mirror.bit.edu.cn/apache/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz
解壓安裝
tar zxvf apache-hive-0.13.1-bin.tar.gz -C /home/hadoop/src/
PS : Hive 只需要在一個節點上安裝即可,本例安裝在 name 節點上面的虛擬機上面,與 hadoop 的 name 節點復用一臺虛擬機器。
3 ,配置 hive 環境變量
vim hive-env.sh
export HIVE_HOME=/home/hadoop/src/hive-0.13.1
export PATH=$PATH:$HIVE_HOME/bin
4 ,配置 hadoop 以及 hbase 參數
vim hive-env.sh
# Set HADOOP_HOME to point to a specific hadoop install directory</p>
HADOOP_HOME=/home/hadoop/src/hadoop-2.3.0/
# Hive Configuration Directory can be controlled by:</p>
export HIVE_CONF_DIR=/home/hadoop/src/hive-0.13.1/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:</p>
export HIVE_AUX_JARS_PATH=/home/hadoop/src/hive-0.13.1/lib
5 ,驗證安裝:
啟動 hive 命令行模式,出現 hive ,說明安裝成功了
[hadoop@name01 lib]$ hive --service cli
15/01/09 00:20:32 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in jar:file:/home/hadoop/src/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
創建表,執行 create 命令,出現 OK ,說明命令執行成功,也說明 hive 安裝成功。
hive> create table test(key string);
OK
Time taken: 8.749 seconds
hive>
6 ,驗證可用性
啟動 hive
[hadoop@name01 root]$hive --service metastore &
查看后臺 hive 運行進程
[hadoop@name01 root]$ ps -eaf|grep hive
hadoop 4025 2460 1 22:52 pts/0 00:00:19 /usr/lib/jvm/jdk1.7.0_60/bin/java -Xmx256m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/hadoop/src/hadoop-2.3.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/home/hadoop/src/hadoop-2.3.0 -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,console -Djava.library.path=/home/hadoop/src/hadoop-2.3.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /home/hadoop/src/hive-0.13.1/lib/hive-service-0.13.1.jar org.apache.hadoop.hive.metastore.HiveMetaStore
hadoop 4575 4547 0 23:14 pts/1 00:00:00 grep hive
[hadoop@name01 root]$
6.1 在 hive 下執行命令,創建 2 個字段的表,字段間隔用 ’,’ 隔開:
hive> create table test(key string);
OK
Time taken: 8.749 seconds
hive> create table tim_test(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.145 seconds
hive>
6.2 準備導入到數據庫的 txt 文件,并輸入值:
[hadoop@name01 hive-0.13.1]$ more tim_hive_test.txt
123,xinhua
456,dingxilu
789,fanyulu
903,fahuazhengroad
[hadoop@name01 hive-0.13.1]$
6.4 再打開一個 xshell 端口,進入服務器端啟動 hive :
[hadoop@name01 root]$ hive --service metastore
Starting Hive Metastore Server
6.5 再打開一個 xshell 端口,進入 hive 客戶端錄入數據:
[hadoop@name01 hive-0.13.1]$ hive
Logging initialized using configuration in jar:file:/home/hadoop/src/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> load data local inpath '/home/hadoop/src/hive-0.13.1/tim_hive_test.txt' into table tim_test;
Copying data from file:/home/hadoop/src/hive-0.13.1/tim_hive_test.txt
Copying file: file:/home/hadoop/src/hive-0.13.1/tim_hive_test.txt
Loading data to table default.tim_test
[Warning] could not update stats.
OK
Time taken: 7.208 seconds
hive>
6.6 驗證錄入數據是否成功,看到 dfs 出來有 tim_test
hive> dfs -ls /home/hadoop/hive/warehouse;
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2015-01-12 01:47 /home/hadoop/hive/warehouse/hive_hbase_mapping_table_1
drwxr-xr-x - hadoop supergroup 0 2015-01-12 02:11 /home/hadoop/hive/warehouse/tim_test
hive>
7,安裝部署中的報錯記錄:
報錯 1 :
[hadoop@name01 conf]$ hive --service metastore
Starting Hive Metastore Server
javax.jdo.JDOFatalInternalException: Error creating transactional connection factory
Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.
缺少 mysql 的 jar 包, copy 到 hive 的 lib 目錄下面, OK 。
報錯 2 :
[hadoop@name01 conf]$ hive --service metastore
Starting Hive Metastore Server
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://192.168.52.130:3306/hive_remote?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: null, message from server: "Host '192.168.52.128' is not allowed to connect to this MySQL server"
將 hadoop 用戶添加到 mysql 組:
[root@data02 mysql]# gpasswd -a hadoop mysql
Adding user hadoop to group mysql
[root@data02 mysql]#
^C[hadoop@name01 conf]$ telnet 192.168.52.130 3306
Trying 192.168.52.130...
Connected to 192.168.52.130.
Escape character is '^]'.
G
Host '192.168.52.128' is not allowed to connect to this MySQL serverConnection closed by foreign host.
[hadoop@name01 conf]$
解決辦法:修改 mysql 賬號
mysql> update user set user = 'hadoop' where user = 'root' and host='%';
Query OK, 1 row affected (0.04 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.09 sec)
mysql>
報錯 3 :
[hadoop@name01 conf]$ hive --service metastore
Starting Hive Metastore Server
javax.jdo.JDOException: Exception thrown calling table.exists() for hive_remote.`SEQUENCE_TABLE`
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
……
NestedThrowablesStackTrace:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
解決,去遠程 mysql 庫上修改字符集從 utf8mb4 修改成 utf8
mysql> alter database hive_remote /*!40100 DEFAULT CHARACTER SET utf8 */;
Query OK, 1 row affected (0.03 sec)
mysql>
然后在 data01 上面配置 hive client 端
scp -r hive-0.13.1/ data01:/home/hadoop/src/
報錯 3 :
繼續啟動,查看日志信息:
[hadoop@name01 conf]$ hive --service metastore
Starting Hive Metastore Server
卡在這里不動,去看日志信息
[hadoop@name01 hadoop]$ tail -f hive.log
2015-01-09 03:46:27,692 INFO [main]: metastore.ObjectStore (ObjectStore.java:setConf(229)) - Initialized ObjectStore
2015-01-09 03:46:27,892 WARN [main]: metastore.ObjectStore (ObjectStore.java:checkSchema(6295)) - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.0
2015-01-09 03:46:30,574 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore
2015-01-09 03:46:30,582 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore
2015-01-09 03:46:31,168 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since config is empty
2015-01-09 03:46:31,473 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5178)) - Starting DB backed MetaStore Server
2015-01-09 03:46:31,481 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5190)) - Started the new metaserver on port [9083]...
2015-01-09 03:46:31,481 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5192)) - Options.minWorkerThreads = 200
2015-01-09 03:46:31,482 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5194)) - Options.maxWorkerThreads = 100000
2015-01-09 03:46:31,482 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5196)) - TCP keepalive = true
在 hive-site.xml 上添加如下:
<property>
<name>hive.metastore.uris</name>
<value>thrift://192.168.52.128:9083</value>
</property>
報錯 4 :
2015-01-09 04:01:43,053 INFO [main]: metastore.ObjectStore (ObjectStore.java:setConf(229)) - Initialized ObjectStore
2015-01-09 04:01:43,540 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(551)) - Added admin role in metastore
2015-01-09 04:01:43,546 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:createDefaultRoles(560)) - Added public role in metastore
2015-01-09 04:01:43,684 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:addAdminUsers(588)) - No user is added in admin role, since config is empty
2015-01-09 04:01:44,041 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5178)) - Starting DB backed MetaStore Server
2015-01-09 04:01:44,054 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5190)) - Started the new metaserver on port [9083]...
2015-01-09 04:01:44,054 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5192)) - Options.minWorkerThreads = 200
2015-01-09 04:01:44,054 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5194)) - Options.maxWorkerThreads = 100000
2015-01-09 04:01:44,054 INFO [main]: metastore.HiveMetaStore (HiveMetaStore.java:startMetaStore(5196)) - TCP keepalive = true
2015-01-09 04:24:13,917 INFO [Thread-3]: metastore.HiveMetaStore (HiveMetaStore.java:run(5073)) - Shutting down hive metastore.
解決:
查了好久, No user is added in admin role, since config is empty 沒有查到問題所在,碰到此類情況的一起交流下,歡迎留言。
-------- - ------- ----------------------------------------------------------------- - ------------------------------ </span>原博客地址: http://blog.itpub.net/26230597/viewspace-1400379/
原作者: 黃杉 (mchdba)