Python 的mapreduce 單詞統計

en9 9年前發布 | 2K 次閱讀 Python

#!/usr/bin/env python
import random

'abc..z'

alphaStr = "".join(map(chr, range(97,123))) fp = open("word.txt", "w") maxIter = 100000 for i in range(maxIter): word = "" len =random.randint(1,5) for j in range(len): word + = alphaStr[random.randint(0,25)] fp.write(word + '\n') fp.close()

cat word.txt | ./wordcount_mapper.py | ./wordcount_reducer.py .

word count reduce, python

filename: wordcount_reducer.py

from operator import itemgetter import sys

wordcount = {} for line in sys.stdin: word, count = line.strip().split('\t',1) try: count = int(count) wordcount[word] = wordcount.get(word,0) + count except ValueError pass

sorted_wordcount = sorted(wordcount.iterms(), key = itemgettter(0)) for word,count in sorted_wordcount: print("%s\t%s") %(word, count)</pre>

 本文由用戶 en9 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!