Hadoop2.2.0源代碼編譯
一、環境說明
虛擬軟件:VMware Workstation 10
虛擬機配置:
RHEL Server release 6.5 (Santiago) 2.6.32-431.el6.x86_64
cpu:4核心,內存:4G,硬盤:50G
二、前提條件:
1:將rhel6.5的iso文件作為yum源
2:hadoop-2.2.0-src.tar.gz
3:安裝JDK 1.6.0_43
4:安裝并配置apache-maven 3.0.5(apache-maven-3.0.5-bin.tar.gz)
源碼中BUILDING.txt中要求使用3.0,從hadoop2.0版本以后使用maven編譯,之前用Ant)
解壓并配置環境變量
mvn -version
5:安裝并配置apache-ant-1.9.3-bin.zip(下載二進制版本的,這個需要編譯安裝findbugs)
解壓并配置環境變量
ant -version
6:下載并安裝cmake cmake-2.8.12.1,安裝命令如下:
tar -zxvf cmake-2.8.12.1.tar.gz
cd cmake-2.8.12.1
./bootstrap
make
make install
檢查安裝是否正確
cmake --version(如果能正確顯示版本號,則說明安裝正確)
7:下載并安裝配置findbugs-2.0.2-source.zip
http://sourceforge.jp/projects/sfnet_findbugs/releases/
使用ant編譯安裝。如果不編譯安裝則編譯的時候會報
hadoop-common-project/hadoop-common/${env.FINDBUGS_HOME}/src/xsl/default.xsl doesn’t exist. -> [Help 1]
進入到解壓后的目錄,直接運行ant命令
如果不安裝,則在編譯時會報如下錯誤:
Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (site) on project hadoop-common: An Ant BuildException has occured
8:安裝zlib-devel
默認情況下,系統沒有安裝zlib-devel
yum install zlib-devel
如果不安裝,則在編譯時會報如下錯誤:
Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hadoop-common
9: protobuf-2.5.0
yum install gcc-c++ (如果不安裝,則cmake configure失敗)
./configure
make
make check
make install
檢查安裝是否正確
protoc --version((如果能正確顯示版本號,則說明安裝正確)
三、hadoop2.2源碼編譯
1:進入hadoop2.2.0解壓后的源碼目錄
2:執行mvn命令編譯,此過程需要連接網絡,編譯的速度取決于你的網速
mvn clean package -Pdist,native -DskipTests -Dtar
2.1.Create binary distribution without native code and without documentation:
$ mvn package -Pdist -DskipTests -Dtar
2.2.Create binary distribution with native code and with documentation:
$ mvn package -Pdist,native,docs -DskipTests -Dtar
2.3.Create source distribution:
$ mvn package -Psrc -DskipTests
2.4.Create source and binary distributions with native code and documentation:
$ mvn package -Pdist,native,docs,src -DskipTests -Dtar
2.5.Create a local staging version of the website (in /tmp/hadoop-site)
$ mvn clean site; mvn site:stage -DstagingDirectory=/tmp/hadoop-site
3:編譯后的項目發布版本在hadoop-2.2.0-src/hadoop-dist/target/目錄下
hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0/
四、安裝hadoop 單點偽分布式模式
1:配置ssh互信
ssh-keygen -t dsa (或行ssh-keygen -t rsa -P "" 加上-P ""參數只需要一次回車就可以執行完畢。不加需要三次回車)
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
如果仍然需要輸入密碼,則執行如下命令
chmod 600 ~/.ssh/authorized_keys
2:將編譯后的hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0復制到/data/hadoop/目錄下
3:建議軟鏈接 ln -s hadoop-2.2.0 hadoop2
4: 在用戶的.bash_profile增加如下變量
export HADOOP_HOME=/data/hadoop/hadoop2
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
5:建議data.dir及namenode.dir目錄
mkdir hdfs
mkdir namenode
chmod -R 755 hdfs
6:修改hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_43
export HADOOP_HOME=/data/hadoop/hadoop2
7:修改core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml,內容見附錄2
8:格式化HDFS文件系統
執行命令 hadoop namenode -format
9:啟動hdfs,yarn
start-dfs.sh
start-yarn.sh
10:驗證啟動是否成功
jps
如果顯示如下6個進程,則啟動成功
53244 ResourceManager
53083 SecondaryNameNode
52928 DataNode
53640 Jps
52810 NameNode
53348 NodeManager
五、運行自帶的wordcount例子
hadoop fs -mkdir /tmp
hadoop fs -mkdir /tmp/input
hadoop fs -put /usr/hadoop/test.txt /tmp/input
cd /data/hadoop/hadoop2/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /tmp/input /tmp/output
如果能正確運行,則hadoop安裝配置正確
六、附錄1:
設置的環境變量(/etc/profile,編輯后運行source /etc/profile/ 使配置生效)
#java set
export JAVA_HOME=/usr/java/jdk1.6.0_43
export JRE_HOME=/usr/java/jdk1.6.0_43/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
#maven set
export M2_HOME=/home/soft/maven
export PATH=$PATH:$M2_HOME/bin
#ant
export ANT_HOME=/home/soft/apache-ant-1.9.3
export PATH=$PATH:$ANT_HOME/bin
#findbugs
export FINDBUGS_HOME=/home/soft/findbugs-2.0.2
export PATH=$PATH:$FINDBUGS_HOME/bin
附錄2:core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml的文件內容
core-site.xml內容:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://vdata.kt:8020</value>
</property>
</configuration>
hdfs-site.xml內容:
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/hdfs</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
mapred-site.xml內容:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml內容:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>