斯坦福大學怎樣講“情感分析”

jopen 11年前發布 | 255K 次閱讀情感分析機器學習

一、自然語言處理概覽——什么是自然語言處理（NLP)

1）相關技術與應用

自動問答（Question Answering，QA）：它是一套可以理解復雜問題，并以充分的準確度、可信度和速度給出答案的計算系統，以IBM‘s Waston為代表；

信息抽取（Information Extraction，IE）：其目的是將非結構化或半結構化的自然語言描述文本轉化結構化的數據，如自動根據郵件內容生成Calendar；

情感分析（Sentiment Analysis，SA）：又稱傾向性分析和意見挖掘，它是對帶有情感色彩的主觀性文本進行分析、處理、歸納和推理的過程，如從大量網頁文本中分析用戶對“數碼相機”的“變焦、價格、大小、重量、閃光、易用性”等屬性的情感傾向；

機器翻譯（Machine Translation，MT）：將文本從一種語言轉成另一種語言，如中英機器翻譯。

… …

2）發展現狀

基本解決：詞性標注、命名實體識別、Spam識別

取得長足進展：情感分析、共指消解、詞義消歧、句法分析、機器翻譯、信息抽取

挑戰：自動問答、復述、文摘、會話機器人

3）NLP主要難點——歧義問題

詞法分析歧義

分詞，如“嚴守一把手機關了”，可能的分詞結果“嚴守一/ 把/ 手機/ 關/ 了” 和“嚴守/ 一把手/ 機關/ 了”

詞性標注，如“計劃”在不同上下文中有不同的詞性：“我/ 計劃/v 考/ 研/”和“我/ 完成/ 了/ 計劃/n”

語法分析歧義

“那只狼咬死了獵人的狗”

”咬死了獵人的狗失蹤了”

語義分析歧義

計算機會像你的母親那樣很好的理解你（的語言）

計算機理解你喜歡你的母親

計算機會像很好的理解你的母親那樣理解你

機器翻譯：句子“At last, a computer that understands you like your mother”可以有多種含義，如下：

NLP應用中的歧義

音字轉換：拼音串“ji qi fan yi ji qi ying yong ji qi le ren men ji qi nong hou de xing qu”中的“ji qi”如何轉換成正確的詞條

4）為什么自然語言理解如此困難？

用戶生成內容中存在大量口語化、成語、方言等非標準的語言描述

分詞問題

新詞不斷產生

基本常識與上下文知識

各式各樣的實體詞

… …

為了解決以上難題，我們需要掌握較多的語言學知識，構建知識庫資源，并找到一種融合各種知識、資源的方法，目前使用較多是概率模型（probabilistic model）或稱為統計模型（statistical model），或者稱為“經驗主義模型”，其建模過程基于大規模真實語料庫，從中各級語言單位上的統計信息，并且，依據較低級語言單位上的統計信息，運行相關的統計、推理等技術計算較高級語言單位上的統計信息。與其相對的“理想主義模型”，即基于Chomsky形式語言的確定性語言模型，它建立在人腦中先天存在語法規則這一假設基礎上，認為語言是人腦語言能力推導出來的，建立語言模型就是通過建立人工編輯的語言規則集來模擬這種先天的語言能力。

本課程主要側重于基于統計的NLP技術，如Viterbi、貝葉斯和最大熵分類器、N-gram語言模型等等。

二、情感分析（Sentiment Analysis）

1）What is Sentiment Analysis?

情感分析（Sentiment analysis），又稱傾向性分析，意見抽取（Opinion extraction），意見挖掘（Opinion mining），情感挖掘（Sentiment mining），主觀分析（Subjectivity analysis），它是對帶有情感色彩的主觀性文本進行分析、處理、歸納和推理的過程，如從評論文本中分析用戶對“數碼相機”的“變焦、價格、大小、重量、閃光、易用性”等屬性的情感傾向。

更多例子如下：

l 從電影評論中識別用戶對電影的褒貶評價：

l Google Product Search識別用戶對產品各種屬性的評價，并從評論中選擇代表性評論展示給用戶：

l Bing Shopping識別用戶對產品各種屬性的評價：

l 推ter sentiment versus Gallup Poll of Consumer Confidence：挖掘推ter（中文：微博）中的用戶情感發現，其與傳統的調查、投票等方法結果有高度的一致性（以消費者信心和政治選舉為例，corelation達80%），詳細見論文：Brendan O’Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In ICWSM-2010。（注：下圖中2008年到2009年初，網民情緒低谷是金融危機導致，從2009年5月份開始慢慢恢復）

l 推ter sentiment: 通過推ter用戶情感預測股票走勢，2012年5月，世界首家基于社交媒體的對沖基金 Derwent Capital Markets 在屢次跳票后終于上線。它會即時關注推ter 中的公眾情緒指導投資。正如基金創始人保羅?郝汀（Paul Hawtin）表示：“長期以來，投資者已經廣泛地認可金融市場由恐懼和貪婪驅使，但我們從未擁有一種技術或數據來量化人們的情感。”一直為金融市場非理性舉動所困惑的投資者，終于有了一扇可以了解心靈世界的窗戶——那便是推ter 每天浩如煙海的推文，在一份八月份的報道中顯示，利用推ter 的對沖基金 Derwent Capital Markets 在首月的交易中已經盈利，它以1.85%的收益率，讓平均數只有0.76%的其他對沖基金相形見絀。類似的工作還有預測電影票房、選舉結果等，均是將公眾情緒與社會事件對比，發現一致性，并用于預測，如將“冷靜CLAM”情緒指數后移3天后和道瓊斯工業平均指數DIJA驚人一致。詳細見論文： Johan Bollen, Huina Mao, Xiaojun Zeng. 2011. 推ter mood predicts the stock market, Journal of Computational Science 2:1, 1-8.（注：DIJA，全稱Dow Jones Industrial Average）

l Target Sentiment on 推ter（推ter Sentiment App）：對推ter中包含給定query的tweets進行情感分類。對于公司了解用戶對公司、產品的喜好，用于指導改善產品和服務，公司還可以據此發現競爭對手的優劣勢，用戶也可以根據網友甚至親友評價決定是否購買特定產品。詳細見論文：Alec Go, Richa Bhayani, Lei Huang. 2009. 推ter Sentiment Classification using Distant Supervision.

情感分析的意義何在？下面以實際應用為例進行直觀的闡述：

? Movie: is this review positive or negative?

? Products: what do people think about the new iPhone?

? Public sentiment: how is consumer confidence? Is despair increasing?

? Politics: what do people think about this candidate or issue?

? Prediction: predict election outcomes or market trends from sentiment

情感分析主要目的就是識別用戶對事物或人的看法、態度（attitudes：enduring, affectively colored beliefs, dispositions towards objects or persons），參與主體主要包括：

Holder (source) of attitude：觀點持有者

Target (aspect) of attitude：評價對象

Type of attitude：評價觀點

From a set of types：Like, love, hate, value, desire, etc.

Or (more commonly) simple weighted polarity: positive, negative, neutral,together with strength

Text containing the attitude：評價文本，一般是句子或整篇文檔

更細更深入的還包括評價屬性，情感詞/極性詞，評價搭配等、

通常，我們面臨的情感分析任務包括如下幾類：

Simplest task: Is the attitude of this text positive or negative?

More complex: Rank the attitude of this text from 1 to 5

Advanced: Detect the target, source, or complex attitude types

后續章節將以Simplest task為例進行介紹。

2）A Baseline Algorithm

本小節對影評進行情感分析為例，向大家展示一個簡單、實用的情感分析系統。我們面臨的任務是“Polarity detection: Is an IMDB movie review positive or negative?”，數據集為“Polrity Data 2.0: http://www.cs.cornell.edu/people/pabo/movie-review-data”.作者將情感分析當作分類任務，拆分成如下子任務：

Tokenization：正文提取，過濾時間、電話號碼等，保留大寫字母開頭的字符串，保留表情符號，切詞；

Feature Extraction：直觀上，我們會認為形容詞直接決定文本的情感，而Pang和Lee的實驗表明，采用所有詞（unigram）作為特征，可以達到更好的情感分類效果。

其中，需要對否定句進行特別的處理，如句子”I didn’t like this movie”vs “I really like this movie”，unigram只差一個詞，但是有著截然不同的含義。為了有效處理這種情況，Das and Chen (2001)提出了“Add NOT_ to every word between negation and following punctuation”，根據此規則可以將句子“didn’t like this movie , but I”轉換為“didn’t NOT_like NOT_this NOT_movie, but I”。

另外，在抽取特征時，直觀的感覺“Word occurrence may matter more than word frequency”，這是因為最相關的情感詞在一些文本片段中僅僅出現一次，詞頻模型起得作用有限，甚至是負作用，則使用多重伯努利模型事件空間代替多項式事件空間，實驗也的確證明了這一點。所以，論文最終選擇二值特征，即詞的出現與否，代替傳統的頻率特征。log(freq(w))也是一種值得嘗試的降低頻率干擾的方法。

Classification using different classifiers:如Na?ve Bayes、MaxEnt、SVM，以樸素貝葉斯分類器為例，訓練過程如下：

預測過程如下：

實驗表明，MaxEnt和SVM相比Na?ve Bayes可以得到更好的效果。

最后，通過case review可以總結下，影評情感分類的難點是什么？

語言表達的含蓄微妙：“If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.”，“ She runs the gamut of emotions from A to B”。

挫敗感表達方式：先描述開始的期待（不吝贊美之詞），后表達最后失望感受，如“This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up.”，“Well as usual Keanu Reeves is nothing special, but surprisingly, the very talented Laurence Fishbourne is not so good either, I was surprised.”。

3）Sentiment Lexicons

情感分析模型非常依賴于情感詞典抽取特征或規則，以下羅列了較為流行且成熟的開放情感詞典資源：

GI（The General Inquirer）：該詞典給出了每個詞條非常全面的信息，如詞性，反義詞，褒貶，等，組織結構如下：

詳細見論文：Philip J. Stone, Dexter C Dunphy, Marshall S. Smith, Daniel M. Ogilvie. 1966.The General Inquirer: A Computer Approach to Content Analysis. MIT Press

LIWC (Linguistic Inquiry and Word Count)：該詞典通過大量正則表達式描述不同類別的情感詞規律，其類別體系與GI（The General Inquirer）基本一致，組織結構如下：

詳細見論文：Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguistic Inquiry and Word Count: LIWC 2007. Austin, TX

MPQA Subjectivity Cues Lexicon：其中包含Positive words: 2718，Negative words: 4912，組織結構如下圖所示：

詳細見論文：Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.

Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.

Bing Liu Opinion Lexicon：其中包含Positive words: 2006，Negative words: 4783，需要特別說明的是，詞典不但包含正常的用詞，還包含了拼寫錯誤、語法變形，俚語以及社交媒體標記等，詳細見論文：Minqing Hu and Bing Liu. Mining and Summarizing Customer Reviews. ACM SIGKDD-2004.

SentiWordNet：其通過對WordNet中的詞條進行情感分類，并標注出每個詞條屬于positive和negative類別的權重大小，組織結構如下：

詳細見論文：Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. LREC-2010

以上給出了一系列可用的情感詞典資源，但是，如何選擇一個合適的為我所用呢？這里，通過對比同一詞條在不同詞典之間的分類，衡量詞典資源的不一致程度，如下：

對于在不同詞典中表現不一致的詞條，我們至少可以做兩件事情。第一，review這些詞條，通過少量人工加以糾正；第二，可以得到一些存在褒貶歧義的詞條。

給定一個詞，如何確定其以多大概率出現在某種情感類別文本中呢？以IMDB下不同打分下影評為例，最簡單的方法就是計算每個分數（星的個數）對應的文本中詞條出現的頻率，如下圖所示為Count(“bad”)分布情況：

如下圖所示，列出了部分詞條在不同類別下的Scaled likelihood，據此可以判斷每個詞條的傾向性。

另外，我們通常會有這么一個疑問：否定詞（如not, n’t, no, never）是否更容易出現在negative情感文本中？Potts, Christopher（2011）等通過實驗給出了答案：More negation in negative sentiment，如下圖所示：

4）Learning Sentiment Lexicons

我們在慶幸和贊揚眾多公開情感詞典為我所用的同時，我們不免還想了解構建情感詞典的方法，正所謂知其然知其所以然。一方面在面臨新的情感分析問題，解決新的情感分析任務時，難免會需要結合實際需求構建或完善情感詞典，另一方面，可以將成熟的詞典構建方法應用于其他領域，知識無邊界，許多方法都是相通的。

常見的情感詞典構建方法是基于半指導的bootstrapping學習方法，主要包括兩步：

Use a small amount of information（Seed）

A few labeled examples

A few hand-built patterns

To bootstrap a lexicon

接下來，通過相關的幾篇論文，詳細闡述下構建情感詞典的方法。具體如下：

1. Hatzivassiloglou & McKeown：論文見Vasileios Hatzivassiloglou and Kathleen R. McKeown. 1997. Predicting the Semantic Orientation of Adjectives. ACL, 174–181，基于這樣的一種語言現象：“Adjectives conjoined by ‘and’’ have same polarity；Adjectives conjoined by ‘but ‘ do not”，如下示例：

Fair and legitimate, corrupt and brutal

*fair and brutal, *corrupt and legitimate

fair but brutal

Hatzivassiloglou & McKeown（1997）提出了基于bootstrapping的學習方法，主要包括四步：

Step 1：Label seed set of 1336 adjectives (all >20 in 21 million word WSJ corpus)

初始種子集包括657個 positive words（如adequate central clever famous intelligent remarkable reputed sensitive slender thriving…）和679個 negative words（如contagious drunken ignorant lanky listless primitive strident troublesome unresolved unsuspecting…）

Step 2：Expand seed set to conjoined adjectives，如下圖所示：

Step 3：Supervised classifier assigns “polarity similarity” to each word pair, resulting in graph，如下圖所示：

Step 4：Clustering for partitioning the graph into two

最終，輸出新的情感詞典，如下（加粗詞條為自動挖掘出的詞條）：

Positive: bold decisive disturbing generous good honest important large mature patient peaceful positive proud sound stimulating straightforwardstrange talented vigorous witty…

Negative: ambiguous cautious cynical evasive harmful hypocritical inefficient insecure irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful…

2. Turney Algorithm：論文見Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews，具體步驟如下：

Step 1：Extract a phrasal lexicon from reviews，通過規則抽取的phrasal如下圖所示：

Step 2：Learn polarity of each phrase，那么，如何評價phrase的polarity呢？直觀上，有這樣的結論：“Positive phrases co-occur more with ‘excellent’，Negative phrases co-occur more with ’poor’”，這時，將問題轉換成如何衡量詞條之間的共現關系？于是，學者們引入了點互信息（Pointwise mutual information，PMI），它經常被用于度量兩個具體事件的相關程度，公式為：

Turney Algorithm在410 reviews（from Epinions）的數據集上，其中170 (41%) negative，240 (59%) positive，取得了74%的準確率（baseline為59%，均標注為positive）。

Step 3：Rate a review by the average polarity of its phrases

3. Using WordNet to learn polarity：論文見S.M. Kim and E. Hovy. 2004.Determining the sentiment of opinions. COLING 2004，M. Hu and B. Liu. Mining and summarizing customer reviews. In Proceedings of KDD, 2004.該方法步驟如下：

Create positive (“good”) and negative seed-words (“terrible”)

Find Synonyms and Antonyms

Positive Set: Add synonyms of positive words (“well”) and antonyms of negative words

Negative Set: Add synonyms of negative words (“awful”) and antonyms of positive words (”evil”)

Repeat, following chains of synonyms

Filter

以上幾個方法都有較好的領域適應性和魯棒性，基本思想可以概括為“Use seeds and semi-supervised learning to induce lexicons”，即：

Start with a seed set of words (‘good’, ‘poor’)

Find other words that have similar polarity:

Using “and” and “but”

Using words that occur nearby in the same document

Using WordNet synonyms and antonyms

Use seeds and semi-supervised learning to induce lexicons

5）Other Sentiment Tasks

上面介紹了文檔級或句子級情感分析，但是，實際中，一篇文檔（評論）中往往會提及不同的方面/屬性/對象（以下統稱屬性），且可能對不同的屬性持有不同的傾向性，如“The food wasgreat but the service was awful”。一般通過Frequent phrases + rules的方法抽取評價屬性，如下：

Find all highly frequent phrases across reviews (“fish tacos”)

Filter by rules like “occurs right after sentiment word”：“…great fish tacos” means fish tacos a likely aspect

通常，我們還會面臨一種問題：評價屬性缺失，準確的講，評價屬性不在句子中。這是很常見的現象，此時就需要結合上下文環境，如來自某電影的評論缺失的評價屬性基本上就是電影名或演員，可以基于已知評價屬性的句子訓練分類器，然后對評價屬性缺失的句子進行屬性預測。

Blair-Goldensohn et al.提出了一套通用的aspect-based summarization models，如下圖所示：

詳細見論文：S. Blair-Goldensohn, K. Hannan, R. McDonald, T. Neylon, G. Reis, and J. Reynar. 2008. Building a Sentiment Summarizer for Local Service Reviews. WWW Workshop

另外，其他的一些情感分析的相關任務有：

Emotion: 個人情緒

Detecting annoyed callers to dialogue system

Detecting confused/frustrated versus confident students

Mood: 個人情緒

Finding traumatized or depressed writers

Interpersonal stances: 人際關系中的談話方式

Detection of flirtation or friendliness in conversations

Personality traits: 性格

Detection of extroverts

文章出處：大數據文摘

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/lib/view/open1421114964515.html

情感分析機器學習

斯坦福大學怎樣講“情感分析”

相關經驗

相關資訊

相關文檔

目錄