Python文本處理工具包:TextBlob

jopen 10年前發布 | 37K 次閱讀 TextBlob Python開發

TextBlob是一個很有意思的Python文本處理工具包,它其實是基于上面兩個Python工具包NLKT和Pattern做了封裝(TextBlob stands on the giant shoulders of NLTK and pattern, and plays nicely with both),同時提供了很多文本處理功能的接口,包括詞性標注,名詞短語提取,情感分析,文本分類,拼寫檢查等,甚至包括翻譯和語言檢測,不過這個是基于Google的API的,有調用次數限制。TextBlob相對比較年輕,有興趣的同學可以關注。

from textblob import TextBlob

text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. '''

blob = TextBlob(text) blob.tags # [(u'The', u'DT'), (u'titular', u'JJ'),

                #  (u'threat', u'NN'), (u'of', u'IN'), ...]

blob.noun_phrases # WordList(['titular threat', 'blob',

                #            'ultimate movie monster',
                #            'amoeba-like mass', ...])

for sentence in blob.sentences: print(sentence.sentiment.polarity)

0.060

-0.341

blob.translate(to="es") # 'La amenaza titular de The Blob...'</pre>

特性:

  • Noun phrase extraction
  • Part-of-speech tagging
  • Sentiment analysis
  • Classification (Naive Bayes, Decision Tree)
  • Language translation and detection powered by Google Translate
  • Tokenization (splitting text into words and sentences)
  • Word and phrase frequencies
  • Parsing
  • n-grams
  • Word inflection (pluralization and singularization) and lemmatization
  • Spelling correction
  • Add new models or languages through extensions
  • WordNet integration
  • </ul> </div>
    </span>官方主頁:http://textblob.readthedocs.org/en/dev/
    Github代碼頁:https://github.com/sloria/textblob

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!