Python文本處理工具包:TextBlob
TextBlob是一個很有意思的Python文本處理工具包,它其實是基于上面兩個Python工具包NLKT和Pattern做了封裝(TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both),同時提供了很多文本處理功能的接口,包括詞性標注,名詞短語提取,情感分析,文本分類,拼寫檢查等,甚至包括翻譯和語言檢測,不過這個是基于Google的API的,有調用次數限制。TextBlob相對比較年輕,有興趣的同學可以關注。
from textblob import TextBlobtext = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. '''
blob = TextBlob(text) blob.tags # [(u'The', u'DT'), (u'titular', u'JJ'),
# (u'threat', u'NN'), (u'of', u'IN'), ...]
blob.noun_phrases # WordList(['titular threat', 'blob',
# 'ultimate movie monster', # 'amoeba-like mass', ...])
for sentence in blob.sentences: print(sentence.sentiment.polarity)
0.060
-0.341
blob.translate(to="es") # 'La amenaza titular de The Blob...'</pre>
特性:
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration
</ul> </div>
</span>官方主頁:http://textblob.readthedocs.org/en/dev/
Github代碼頁:https://github.com/sloria/textblob本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!