Python下開源可持久化樸素貝葉斯分類庫:simplebayes

jopen 9年前發布 | 30K 次閱讀 算法 simplebayes

simplebayes是Python下開源可持久化樸素貝葉斯分類庫。

This work is heavily inspired by the python "redisbayes" module found here:
[https://github.com/jart/redisbayes] and [https://pypi.python.org/pypi/redisbayes]

I've elected to write this to alleviate the network/time requirements when using the bayesian classifier to classify large sets of text, or when attempting to train with very large sets of sample data.</pre>

Installation

sudo pip install simplebayes

Basic Usage

import simplebayes
bayes = simplebayes.SimpleBayes()

bayes.train('good', 'sunshine drugs love sex lobster sloth') bayes.train('bad', 'fear death horror government zombie')

assert bayes.classify('sloths are so cute i love them') == 'good' assert bayes.classify('i would fear a zombie and love the government') == 'bad'

print bayes.score('i fear zombies and love the government')</pre>
</div>

Cache Usage
import simplebayes
bayes = simplebayes.SimpleBayes(cache_path='/my/cache/')

Cache file is '/my/cache/_simplebayes.pickle'

Default cache_path is '/tmp/'

if not bayes.cache_train():

# Unable to load cache data, so we're training it
bayes.train('good', 'sunshine drugs love sex lobster sloth')
bayes.train('bad', 'fear death horror government zombie')

# Saving the cache so next time the training won't be needed
bayes.persist_cache()</pre></h2>

Tokenizer Override
import simplebayes

def my_tokenizer(sample): return sample.split()

bayes = simplebayes.SimpleBayes(tokenizer=my_tokenizer)</pre></h2>

項目主頁:http://www.baiduhome.net/lib/view/home/1427787192959

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!