使用hadoop平臺進行小型網站日志分析

jopen 9年前發布 | 13K 次閱讀 Hadoop 分布式/云計算/大數據

0.上傳日志文件到linux中,通過flume將文件收集到hdfs中。
執行命令/home/cloud/flume/bin/flume-ng agent -n a4 -c conf -f /home/cloud/flume/conf/a4.conf -Dflume.root.logger=DEBUG,console


1.建立hive表
create external table bbslog (ip string,logtime string,url string) partitioned by (logdate string) row format delimited fields terminated by '\t' location '/cleaned';


2.創建shell腳本
touch daily.sh
添加執行權限
chmod +x daily.sh
daily.sh:
CURRENT=date +%Y%m%d

對數據進行清理,保存到cleaned文件夾,按照當前日期進行保存

/home/cloud/hadoop/bin/hadoop jar /home/cloud/cleaner.jar /flume/$CURRENT /cleaned/$CURRENT

修改hive表,添加當前日期的分區

/home/cloud/hive/bin/hive -e "alter table bbslog add partition (logdate=$CURRENT) location 'cleaned/$CURRENT'"

使用hive進行分析,根據業務需求而定

統計pv并計入每日的pv表

/home/cloud/hive/bin/hive -e "create table pv_$CURRENT row format delimited fields terminated by '\t' as select count(*) from bbslog where logdate=$CURRENT;"

統計點擊次數過20的潛在用戶

/home/cloud/hive/bin/hive -e "create table vip _$CURRENT row format delimited fields terminated by '\t' as select $CURRENT,ip,count(*) as hits from bbslog where logdate=$CURRENT group by ip having hits > 20 order by hits desc"

查詢uv

/home/cloud/hive/bin/hive -e "create table uv_$CURRENT row format delimited fields terminated by '\t' as select count(distinct ip) from bbslog where logdate=$CURRENT"

查詢每天的注冊人數

/home/cloud/hive/bin/hive -e "create table reg_$CURRENT row format delimited fields terminated by '\t' as select count(*) from bbslog where logdate=$CURRENT AND instr(url,'member.php?mod=register')>0"

將hive表中的數據導入mysql

/home/cloud/sqoop/bin/sqoop export --connect jdbc:mysql://cloud3:3306/jchubby --username root --password JChubby123 --export-dir "/user/hive/warehouse/vip_$CURRENT" --table vip --fields-terminated-by '\t' 

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!