Python下開源可持久化樸素貝葉斯分類庫:simplebayes
simplebayes是Python下開源可持久化樸素貝葉斯分類庫。
This work is heavily inspired by the python "redisbayes" module found here: [https://github.com/jart/redisbayes] and [https://pypi.python.org/pypi/redisbayes]I've elected to write this to alleviate the network/time requirements when using the bayesian classifier to classify large sets of text, or when attempting to train with very large sets of sample data.</pre>
Installation
sudo pip install simplebayesBasic Usage
import simplebayes bayes = simplebayes.SimpleBayes()bayes.train('good', 'sunshine drugs love sex lobster sloth') bayes.train('bad', 'fear death horror government zombie')
assert bayes.classify('sloths are so cute i love them') == 'good' assert bayes.classify('i would fear a zombie and love the government') == 'bad'
print bayes.score('i fear zombies and love the government')</pre>
</div>Cache Usage
import simplebayes bayes = simplebayes.SimpleBayes(cache_path='/my/cache/')Cache file is '/my/cache/_simplebayes.pickle'
Default cache_path is '/tmp/'
if not bayes.cache_train():
# Unable to load cache data, so we're training it bayes.train('good', 'sunshine drugs love sex lobster sloth') bayes.train('bad', 'fear death horror government zombie') # Saving the cache so next time the training won't be needed bayes.persist_cache()</pre></h2>
Tokenizer Override
import simplebayesdef my_tokenizer(sample): return sample.split()
bayes = simplebayes.SimpleBayes(tokenizer=my_tokenizer)</pre></h2>
本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!