Hadoop的Jython封裝 Happy
Hadoop + Python = Happy
Happy 為Jython開發者使用Hadoop框架提供了便利,Happy框架封裝了Hadoop的復雜調用過程,讓Map-Reduce開發變得更為容易。Happy中的Map-Reduce作業過程在子類happy.HappyJob中定義,當用戶創建類實例后,設置作業任務的輸入輸出參數,然后調用run()方法即可啟動分治規約處理,此時,Happy框架將序列化用戶的作業實例,并將任務及相應依賴庫拷貝到Hadoop集群執行。目前,Happy框架已被數據集成站點 freebase.com采納,用于進行站點的數據挖掘與分析工作。
import sys, happy, happy.log happy.log.setLevel("debug") log = happy.log.getLogger("wordcount") class WordCount(happy.HappyJob): def __init__(self, inputpath, outputpath): happy.HappyJob.__init__(self) self.inputpaths = inputpath self.outputpath = outputpath self.inputformat = "text" def map(self, records, task): for _, value in records: for word in value.split(): task.collect(word, "1") def reduce(self, key, values, task): count = 0; for _ in values: count += 1 task.collect(key, str(count)) log.debug(key + ":" + str(count)) happy.results["words"] = happy.results.setdefault("words", 0) + count happy.results["unique"] = happy.results.setdefault("unique", 0) + 1 if __name__ == "__main__": if len(sys.argv) < 3: print "Usage: <inputpath> <outputpath>" sys.exit(-1) wc = WordCount(sys.argv[1], sys.argv[2]) results = wc.run() print str(sum(results["words"])) + " total words" print str(sum(results["unique"])) + " unique words"
本文由用戶 openkk 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!