Python文本處理工具包：TextBlob

jopen 11年前發布 | 37K 次閱讀 TextBlob Python開發

TextBlob是一個很有意思的Python文本處理工具包，它其實是基于上面兩個Python工具包NLKT和Pattern做了封裝（TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both），同時提供了很多文本處理功能的接口，包括詞性標注，名詞短語提取，情感分析，文本分類，拼寫檢查等，甚至包括翻譯和語言檢測，不過這個是基于Google的API的，有調用次數限制。TextBlob相對比較年輕，有興趣的同學可以關注。

from textblob import TextBlob

text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. '''

blob = TextBlob(text) blob.tags # [(u'The', u'DT'), (u'titular', u'JJ'),

                #  (u'threat', u'NN'), (u'of', u'IN'), ...]

blob.noun_phrases # WordList(['titular threat', 'blob',

                #            'ultimate movie monster',
                #            'amoeba-like mass', ...])

for sentence in blob.sentences: print(sentence.sentiment.polarity)

0.060 -0.341

blob.translate(to="es") # 'La amenaza titular de The Blob...'</pre>

特性：

Noun phrase extraction

Part-of-speech tagging

Sentiment analysis

Classification (Naive Bayes, Decision Tree)

Language translation and detection powered by Google Translate

Tokenization (splitting text into words and sentences)

Word and phrase frequencies

Parsing

n-grams

Word inflection (pluralization and singularization) and lemmatization

Spelling correction

Add new models or languages through extensions

WordNet integration

官方主頁：

http://textblob.readthedocs.org/en/dev/

Github代碼頁：

https://github.com/sloria/textblob

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/lib/view/open1406615426390.html

TextBlob Python開發

Python文本處理工具包：TextBlob

相關經驗

相關資訊

相關文檔

目錄