對文本進行情感分析:TextBlob
TextBlob是一個用 Python (2和3)編寫的開源的文本處理庫。它可以用來執行很多自然語言處理的任務,比如,詞性標注,名詞性成分提取,情感分析,文本翻譯,等等。你可以在官方文檔閱讀TextBlog的所有特性。
為什么我要關心TextBlob?
我學習TextBlob的原因如下:
-
我想開發需要進行文本處理的應用。我們給應用添加文本處理功能之后,應用能更好地理解人們的行為,因而顯得更加人性化。文本處理很難做對。TextBlob站在巨人的肩膀上(NTLK),NLTK是創建處理自然語言的Python程序的最佳選擇。
-
我想學習下如何用 Python 進行文本處理。
from textblob import TextBlob text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. ''' blob = TextBlob(text) blob.tags # [(u'The', u'DT'), (u'titular', u'JJ'), # (u'threat', u'NN'), (u'of', u'IN'), ...] blob.noun_phrases # WordList(['titular threat', 'blob', # 'ultimate movie monster', # 'amoeba-like mass', ...]) for sentence in blob.sentences: print(sentence.sentiment.polarity) # 0.060 # -0.341 blob.translate(to="es") # 'La amenaza titular de The Blob...'
特性
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
- n-grams
- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration
本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!