Mahout推薦算法API詳解
1. Mahout推薦算法介紹
Mahoutt推薦算法,從數據處理能力上,可以劃分為2類:
- 單機內存算法實現
- 基于Hadoop的分步式算法實現
1). 單機內存算法實現
單機內存算法實現:就是在單機下運行的算法,是由cf.taste項目實現的,像我的們熟悉的UserCF,ItemCF都支持單機內存運行,并且參數可以靈活配置。單機算法的基本實例,請參考文章:用Maven構建Mahout項目
單機內存算法的問題在于,受限于單機的資源。對于中等規模的數據,像1G,10G的數據量,有能力進行計算,但是超過100G的數據量,對于單機來說是不可能完成的任務。
2). 基于Hadoop的分步式算法實現
基于Hadoop的分步式算法實現:就是把單機內存算法并行化,把任務分散到多臺計算機一起運行。Mahout提供了ItemCF基于Hadoop并行化算法實現。基于Hadoop的分步式算法實現,請參考文章:
Mahout分步式程序開發 基于物品的協同過濾ItemCF
分步式并行算法的問題在于,如何讓單機算法并行化。在單機算法中,我們只需要考慮算法,數據結構,內存,CPU就夠了,但是分步式算法還要額外考慮很多的情況,比如多節點的數據合并,數據排序,網路通信的效率,節點宕機重算,數據分步式存儲等等的很多問題。
2. 算法評判標準:召回率(recall)與查準率(precision)
Mahout提供了2個評估推薦器的指標,查準率和召回率(查全率),這兩個指標是搜索引擎中經典的度量方法。
相關 不相關 檢索到 A C 未檢索到 B D
- A:檢索到的,相關的 (搜到的也想要的)
- B:未檢索到的,但是相關的 (沒搜到,然而實際上想要的)
- C:檢索到的,但是不相關的 (搜到的但沒用的)
- D:未檢索到的,也不相關的 (沒搜到也沒用的)
被檢索到的越多越好,這是追求“查全率”,即A/(A+B),越大越好。
被檢索到的,越相關的越多越好,不相關的越少越好,這是追求“查準率”,即A/(A+C),越大越好。
在大規模數據集合中,這兩個指標是相互制約的。當希望索引出更多的數據的時候,查準率就會下降,當希望索引更準確的時候,會索引更少的數據。
3. Recommender的API接口
1). 系統環境:
- Win7 64bit
- Java 1.6.0_45
- Maven 3
- Eclipse Juno Service Release 2
- Mahout 0.8
- Hadoop 1.1.2
2). Recommender接口文件:
org.apache.mahout.cf.taste.recommender.Recommender.java
接口中方法的解釋:
- recommend(long userID, int howMany): 獲得推薦結果,給userID推薦howMany個Item
- recommend(long userID, int howMany, IDRescorer rescorer): 獲得推薦結果,給userID推薦howMany個Item,可以根據rescorer對結構重新排序。
- estimatePreference(long userID, long itemID): 當打分為空,估計用戶對物品的打分
- setPreference(long userID, long itemID, float value): 賦值用戶,物品,打分
- removePreference(long userID, long itemID): 刪除用戶對物品的打分
- getDataModel(): 提取推薦數據
通過Recommender接口,我可以猜出核心算法,應該會在子類的estimatePreference()方法中進行實現。
3). 通過繼承關系到Recommender接口的子類:
推薦算法實現類:
- GenericUserBasedRecommender: 基于用戶的推薦算法
- GenericItemBasedRecommender: 基于物品的推薦算法
- KnnItemBasedRecommender: 基于物品的KNN推薦算法
- SlopeOneRecommender: Slope推薦算法
- SVDRecommender: SVD推薦算法
- TreeClusteringRecommender:TreeCluster推薦算法
下面將分別介紹每種算法的實現。
4. 測試程序:RecommenderTest.java
測試數據集:item.csv
1,101,5.0 1,102,3.0 1,103,2.5 2,101,2.0 2,102,2.5 2,103,5.0 2,104,2.0 3,101,2.5 3,104,4.0 3,105,4.5 3,107,5.0 4,101,5.0 4,103,3.0 4,104,4.5 4,106,4.0 5,101,4.0 5,102,3.0 5,103,2.0 5,104,4.0 5,105,3.5 5,106,4.0
測試程序:org.conan.mymahout.recommendation.job.RecommenderTest.java
package org.conan.mymahout.recommendation.job; import java.io.IOException; import java.util.List; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.eval.RecommenderBuilder; import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.recommender.RecommendedItem; import org.apache.mahout.common.RandomUtils; public class RecommenderTest { final static int NEIGHBORHOOD_NUM = 2; final static int RECOMMENDER_NUM = 3; public static void main(String[] args) throws TasteException, IOException { RandomUtils.useTestSeed(); String file = "datafile/item.csv"; DataModel dataModel = RecommendFactory.buildDataModel(file); slopeOne(dataModel); } public static void userCF(DataModel dataModel) throws TasteException{} public static void itemCF(DataModel dataModel) throws TasteException{} public static void slopeOne(DataModel dataModel) throws TasteException{} ...
每種算法都一個單獨的方法進行算法測試,如userCF(),itemCF(),slopeOne()….
5. 基于用戶的協同過濾算法UserCF
基于用戶的協同過濾,通過不同用戶對物品的評分來評測用戶之間的相似性,基于用戶之間的相似性做出推薦。簡單來講就是:給用戶推薦和他興趣相似的其他用戶喜歡的物品。
舉例說明:
基于用戶的 CF 的基本思想相當簡單,基于用戶對物品的偏好找到相鄰鄰居用戶,然后將鄰居用戶喜歡的推薦給當前用戶。計算上,就是將一個用戶對所有物品的偏好作為一個向量 來計算用戶之間的相似度,找到 K 鄰居后,根據鄰居的相似度權重以及他們對物品的偏好,預測當前用戶沒有偏好的未涉及物品,計算得到一個排序的物品列表作為推薦。圖 2 給出了一個例子,對于用戶 A,根據用戶的歷史偏好,這里只計算得到一個鄰居 – 用戶 C,然后將用戶 C 喜歡的物品 D 推薦給用戶 A。
上文中圖片和解釋文字,摘自: https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/
算法API: org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { DataModel model = getDataModel(); Float actualPref = model.getPreferenceValue(userID, itemID); if (actualPref != null) { return actualPref; } long[] theNeighborhood = neighborhood.getUserNeighborhood(userID); return doEstimatePreference(userID, theNeighborhood, itemID); } protected float doEstimatePreference(long theUserID, long[] theNeighborhood, long itemID) throws TasteException { if (theNeighborhood.length == 0) { return Float.NaN; } DataModel dataModel = getDataModel(); double preference = 0.0; double totalSimilarity = 0.0; int count = 0; for (long userID : theNeighborhood) { if (userID != theUserID) { // See GenericItemBasedRecommender.doEstimatePreference() too Float pref = dataModel.getPreferenceValue(userID, itemID); if (pref != null) { double theSimilarity = similarity.userSimilarity(theUserID, userID); if (!Double.isNaN(theSimilarity)) { preference += theSimilarity * pref; totalSimilarity += theSimilarity; count++; } } } } // Throw out the estimate if it was based on no data points, of course, but also if based on // just one. This is a bit of a band-aid on the 'stock' item-based algorithm for the moment. // The reason is that in this case the estimate is, simply, the user's rating for one item // that happened to have a defined similarity. The similarity score doesn't matter, and that // seems like a bad situation. if (count <= 1) { return Float.NaN; } float estimate = (float) (preference / totalSimilarity); if (capper != null) { estimate = capper.capEstimate(estimate); } return estimate; }
測試程序:
public static void userCF(DataModel dataModel) throws TasteException { UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel); UserNeighborhood userNeighborhood = RecommendFactory.userNeighborhood(RecommendFactory.NEIGHBORHOOD.NEAREST, userSimilarity, dataModel, NEIGHBORHOOD_NUM); RecommenderBuilder recommenderBuilder = RecommendFactory.userRecommender(userSimilarity, userNeighborhood, true); RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2); LongPrimitiveIterator iter = dataModel.getUserIDs(); while (iter.hasNext()) { long uid = iter.nextLong(); List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM); RecommendFactory.showItems(uid, list, true); } }
程序輸出:
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.0 Recommender IR Evaluator: [Precision:0.5,Recall:0.5] uid:1,(104,4.333333)(106,4.000000) uid:2,(105,4.049678) uid:3,(103,3.512787)(102,2.747869) uid:4,(102,3.000000)
用R語言重寫UserCF的實現,請參考文章:用R解析Mahout用戶推薦協同過濾算法(UserCF)
6. 基于物品的協同過濾算法ItemCF
基于item的協同過濾,通過用戶對不同item的評分來評測item之間的相似性,基于item之間的相似性做出推薦。簡單來講就是:給用戶推薦和他之前喜歡的物品相似的物品。
舉例說明:
基于物品的 CF 的原理和基于用戶的 CF 類似,只是在計算鄰居時采用物品本身,而不是從用戶的角度,即基于用戶對物品的偏好找到相似的物品,然后根據用戶的歷史偏好,推薦相似的物品給他。從計算 的角度看,就是將所有用戶對某個物品的偏好作為一個向量來計算物品之間的相似度,得到物品的相似物品后,根據用戶歷史的偏好預測當前用戶還沒有表示偏好的 物品,計算得到一個排序的物品列表作為推薦。圖 3 給出了一個例子,對于物品 A,根據所有用戶的歷史偏好,喜歡物品 A 的用戶都喜歡物品 C,得出物品 A 和物品 C 比較相似,而用戶 C 喜歡物品 A,那么可以推斷出用戶 C 可能也喜歡物品 C。
上文中圖片和解釋文字,摘自: https://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/
算法API: org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { PreferenceArray preferencesFromUser = getDataModel().getPreferencesFromUser(userID); Float actualPref = getPreferenceForItem(preferencesFromUser, itemID); if (actualPref != null) { return actualPref; } return doEstimatePreference(userID, preferencesFromUser, itemID); } protected float doEstimatePreference(long userID, PreferenceArray preferencesFromUser, long itemID) throws TasteException { double preference = 0.0; double totalSimilarity = 0.0; int count = 0; double[] similarities = similarity.itemSimilarities(itemID, preferencesFromUser.getIDs()); for (int i = 0; i < similarities.length; i++) { double theSimilarity = similarities[i]; if (!Double.isNaN(theSimilarity)) { // Weights can be negative! preference += theSimilarity * preferencesFromUser.getValue(i); totalSimilarity += theSimilarity; count++; } } // Throw out the estimate if it was based on no data points, of course, but also if based on // just one. This is a bit of a band-aid on the 'stock' item-based algorithm for the moment. // The reason is that in this case the estimate is, simply, the user's rating for one item // that happened to have a defined similarity. The similarity score doesn't matter, and that // seems like a bad situation. if (count <= 1) { return Float.NaN; } float estimate = (float) (preference / totalSimilarity); if (capper != null) { estimate = capper.capEstimate(estimate); } return estimate; }
測試程序:
public static void itemCF(DataModel dataModel) throws TasteException { ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel); RecommenderBuilder recommenderBuilder = RecommendFactory.itemRecommender(itemSimilarity, true); RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2); LongPrimitiveIterator iter = dataModel.getUserIDs(); while (iter.hasNext()) { long uid = iter.nextLong(); List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM); RecommendFactory.showItems(uid, list, true); } }
程序輸出:
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.8676552772521973 Recommender IR Evaluator: [Precision:0.5,Recall:1.0] uid:1,(105,3.823529)(104,3.722222)(106,3.478261) uid:2,(106,2.984848)(105,2.537037)(107,2.000000) uid:3,(106,3.648649)(102,3.380000)(103,3.312500) uid:4,(107,4.722222)(105,4.313953)(102,4.025000) uid:5,(107,3.736842)
7. SlopeOne算法
這個算法在mahout-0.8版本中,已經被@Deprecated。
SlopeOne是一種簡單高效的協同過濾算法。通過均差計算進行評分。SlopeOne論文下載(PDF)
1). 舉例說明:
用戶X,Y,Z,對于物品A,B進行打分,如下表,求Z對B的打分是多少?
Slope one算法認為:平均值可以代替某兩個未知個體之間的打分差異,事物A對事物B的平均差是:((5 - 4) + (4 - 2)) / 2 = 1.5,就得到Z對B的打分是,3-1.5 = 1.5。
Slope one算法將用戶的評分之間的關系看作簡單的線性關系:
Y = mX + b
2). 平均加權計算:
用戶X,Y,Z,對于物品A,B,C進行打分,如下表,求Z對A的打分是多少?
- 1. 計算A和B的平均差, ((5-3)+(3-4))/2=0.5
- 2. 計算A和C的平均差, (5-2)/1=3
- 3. Z對A的評分,通過AB得到, 2+0.5=2.5
- 4. Z對A的評分,通過AC得到,5+3=8
- 5. 通過加權平均計算Z對A的評分:A和B都有評價的用戶數為2,A和C都有評價的用戶數為1,權重為別是2和1, (2*2.5+1*8)/(2+1)=13/3=4.33
通過這種簡單的方式,我們可以快速計算出一個評分項,完成推薦過程!
算法API: org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { DataModel model = getDataModel(); Float actualPref = model.getPreferenceValue(userID, itemID); if (actualPref != null) { return actualPref; } return doEstimatePreference(userID, itemID); } private float doEstimatePreference(long userID, long itemID) throws TasteException { double count = 0.0; double totalPreference = 0.0; PreferenceArray prefs = getDataModel().getPreferencesFromUser(userID); RunningAverage[] averages = diffStorage.getDiffs(userID, itemID, prefs); int size = prefs.length(); for (int i = 0; i < size; i++) { RunningAverage averageDiff = averages[i]; if (averageDiff != null) { double averageDiffValue = averageDiff.getAverage(); if (weighted) { double weight = averageDiff.getCount(); if (stdDevWeighted) { double stdev = ((RunningAverageAndStdDev) averageDiff).getStandardDeviation(); if (!Double.isNaN(stdev)) { weight /= 1.0 + stdev; } // If stdev is NaN, then it is because count is 1. Because we're weighting by count, // the weight is already relatively low. We effectively assume stdev is 0.0 here and // that is reasonable enough. Otherwise, dividing by NaN would yield a weight of NaN // and disqualify this pref entirely // (Thanks Daemmon) } totalPreference += weight * (prefs.getValue(i) + averageDiffValue); count += weight; } else { totalPreference += prefs.getValue(i) + averageDiffValue; count += 1.0; } } } if (count <= 0.0) { RunningAverage itemAverage = diffStorage.getAverageItemPref(itemID); return itemAverage == null ? Float.NaN : (float) itemAverage.getAverage(); } else { return (float) (totalPreference / count); } }
測試程序:
public static void slopeOne(DataModel dataModel) throws TasteException { RecommenderBuilder recommenderBuilder = RecommendFactory.slopeOneRecommender(); RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2); LongPrimitiveIterator iter = dataModel.getUserIDs(); while (iter.hasNext()) { long uid = iter.nextLong(); List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM); RecommendFactory.showItems(uid, list, true); } }
程序輸出:
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.3333333333333333 Recommender IR Evaluator: [Precision:0.25,Recall:0.5] uid:1,(105,5.750000)(104,5.250000)(106,4.500000) uid:2,(105,2.286115)(106,1.500000) uid:3,(106,2.000000)(102,1.666667)(103,1.625000) uid:4,(105,4.976859)(102,3.509071)
8. KNN Linear interpolation item–based推薦算法
這個算法在mahout-0.8版本中,已經被@Deprecated。
算法來自論文:
This algorithm is based in the paper of Robert M. Bell and Yehuda Koren in ICDM '07.
(TODO未完)
算法API: org.apache.mahout.cf.taste.impl.recommender.knn.KnnItemBasedRecommender
@Override protected float doEstimatePreference(long theUserID, PreferenceArray preferencesFromUser, long itemID) throws TasteException { DataModel dataModel = getDataModel(); int size = preferencesFromUser.length(); FastIDSet possibleItemIDs = new FastIDSet(size); for (int i = 0; i < size; i++) { possibleItemIDs.add(preferencesFromUser.getItemID(i)); } possibleItemIDs.remove(itemID); List mostSimilar = mostSimilarItems(itemID, possibleItemIDs.iterator(), neighborhoodSize, null); long[] theNeighborhood = new long[mostSimilar.size() + 1]; theNeighborhood[0] = -1; List usersRatedNeighborhood = Lists.newArrayList(); int nOffset = 0; for (RecommendedItem rec : mostSimilar) { theNeighborhood[nOffset++] = rec.getItemID(); } if (!mostSimilar.isEmpty()) { theNeighborhood[mostSimilar.size()] = itemID; for (int i = 0; i < theNeighborhood.length; i++) { PreferenceArray usersNeighborhood = dataModel.getPreferencesForItem(theNeighborhood[i]); int size1 = usersRatedNeighborhood.isEmpty() ? usersNeighborhood.length() : usersRatedNeighborhood.size(); for (int j = 0; j < size1; j++) { if (i == 0) { usersRatedNeighborhood.add(usersNeighborhood.getUserID(j)); } else { if (j >= usersRatedNeighborhood.size()) { break; } long index = usersRatedNeighborhood.get(j); if (!usersNeighborhood.hasPrefWithUserID(index) || index == theUserID) { usersRatedNeighborhood.remove(index); j--; } } } } } double[] weights = null; if (!mostSimilar.isEmpty()) { weights = getInterpolations(itemID, theNeighborhood, usersRatedNeighborhood); } int i = 0; double preference = 0.0; double totalSimilarity = 0.0; for (long jitem : theNeighborhood) { Float pref = dataModel.getPreferenceValue(theUserID, jitem); if (pref != null) { double weight = weights[i]; preference += pref * weight; totalSimilarity += weight; } i++; } return totalSimilarity == 0.0 ? Float.NaN : (float) (preference / totalSimilarity); } }
測試程序:
public static void itemKNN(DataModel dataModel) throws TasteException { ItemSimilarity itemSimilarity = RecommendFactory.itemSimilarity(RecommendFactory.SIMILARITY.EUCLIDEAN, dataModel); RecommenderBuilder recommenderBuilder = RecommendFactory.itemKNNRecommender(itemSimilarity, new NonNegativeQuadraticOptimizer(), 10); RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2); LongPrimitiveIterator iter = dataModel.getUserIDs(); while (iter.hasNext()) { long uid = iter.nextLong(); List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM); RecommendFactory.showItems(uid, list, true); } }
程序輸出:
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:1.5 Recommender IR Evaluator: [Precision:0.5,Recall:1.0] uid:1,(107,5.000000)(104,3.501168)(106,3.498198) uid:2,(105,2.878995)(106,2.878086)(107,2.000000) uid:3,(103,3.667444)(102,3.667161)(106,3.667019) uid:4,(107,4.750247)(102,4.122755)(105,4.122709) uid:5,(107,3.833621)
9. SVD推薦算法
(TODO未完)
算法API: org.apache.mahout.cf.taste.impl.recommender.svd.SVDRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { double[] userFeatures = factorization.getUserFeatures(userID); double[] itemFeatures = factorization.getItemFeatures(itemID); double estimate = 0; for (int feature = 0; feature < userFeatures.length; feature++) { estimate += userFeatures[feature] * itemFeatures[feature]; } return (float) estimate; }
測試程序:
public static void svd(DataModel dataModel) throws TasteException { RecommenderBuilder recommenderBuilder = RecommendFactory.svdRecommender(new ALSWRFactorizer(dataModel, 10, 0.05, 10)); RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2); LongPrimitiveIterator iter = dataModel.getUserIDs(); while (iter.hasNext()) { long uid = iter.nextLong(); List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM); RecommendFactory.showItems(uid, list, true); } }
程序輸出:
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:0.09990564982096355 Recommender IR Evaluator: [Precision:0.5,Recall:1.0] uid:1,(104,4.032909)(105,3.390885)(107,1.858541) uid:2,(105,3.761718)(106,2.951908)(107,1.561116) uid:3,(103,5.593422)(102,2.458930)(106,-0.091259) uid:4,(105,4.068329)(102,3.534025)(107,0.206257) uid:5,(107,0.105169)
10. Tree Cluster-based 推薦算法
這個算法在mahout-0.8版本中,已經被@Deprecated。
(TODO未完)
算法API: org.apache.mahout.cf.taste.impl.recommender.TreeClusteringRecommender
@Override public float estimatePreference(long userID, long itemID) throws TasteException { DataModel model = getDataModel(); Float actualPref = model.getPreferenceValue(userID, itemID); if (actualPref != null) { return actualPref; } buildClusters(); List topRecsForUser = topRecsByUserID.get(userID); if (topRecsForUser != null) { for (RecommendedItem item : topRecsForUser) { if (itemID == item.getItemID()) { return item.getValue(); } } } // Hmm, we have no idea. The item is not in the user's cluster return Float.NaN; }
測試程序:
public static void treeCluster(DataModel dataModel) throws TasteException { UserSimilarity userSimilarity = RecommendFactory.userSimilarity(RecommendFactory.SIMILARITY.LOGLIKELIHOOD, dataModel); ClusterSimilarity clusterSimilarity = RecommendFactory.clusterSimilarity(RecommendFactory.SIMILARITY.FARTHEST_NEIGHBOR_CLUSTER, userSimilarity); RecommenderBuilder recommenderBuilder = RecommendFactory.treeClusterRecommender(clusterSimilarity, 10); RecommendFactory.evaluate(RecommendFactory.EVALUATOR.AVERAGE_ABSOLUTE_DIFFERENCE, recommenderBuilder, null, dataModel, 0.7); RecommendFactory.statsEvaluator(recommenderBuilder, null, dataModel, 2); LongPrimitiveIterator iter = dataModel.getUserIDs(); while (iter.hasNext()) { long uid = iter.nextLong(); List list = recommenderBuilder.buildRecommender(dataModel).recommend(uid, RECOMMENDER_NUM); RecommendFactory.showItems(uid, list, true); } }
程序輸出:
AVERAGE_ABSOLUTE_DIFFERENCE Evaluater Score:NaN Recommender IR Evaluator: [Precision:NaN,Recall:0.0]
11. Mahout推薦算法總結
算法及適用場景:
算法評分的結果:
通過對上面幾種算法的一平分比較:itemCF,itemKNN,SVD的Rrecision,Recall的評分值是最好的,并且itemCF和 SVD的AVERAGE_ABSOLUTE_DIFFERENCE是最低的,所以,從算法的角度知道了,哪個算法是更準確的或者會索引到更多的數據集。
另外的一些因素:
- 1. 這3個指標,并不能直接決定計算結果一定itemCF,SVD好
- 2. 各種算法的參數我們并沒有調優
- 3. 數據量和數據分布,是影響算法的評分
程序源代碼下載