mongodb 通過MapReduce統計用戶Pv Uv
通過spring data 操作mongodb,利用map reduce 來統計用戶訪問的Pv Uv。
詳細代碼見 https://github.com/WangErXiao/spring-data
具體的spring-data 操作mongodb這里不做介紹。這里只介紹mongo map reduce。
@Component
public class UserDaoImpl extends MongoBaseDao implements UserDao {
public void insertRecord(UserVisitRecord record) {
getMongoTemplate().insert(record);
}
public void statisUserPvUv(String date) {
String map = "function() { "
+ " if(this.date=='"+date+"'){ "
+ " emit(this.date ,{uv:1,pv:1,userIds:this.userId}) "
+ " }"
+ " } ";
String reduce = "function(key, values) { "
+ " var temp = new Array(); "
+ " var userIds= new Array(); "
+ " for(i = 0; i < values.length; i++) { "
+ " userIds=userIds.concat(values[i].userIds);"
+ " } "
+ " userIds.sort(); "
+ " for(i = 0; i < userIds.length; i++) {"
+ " if( userIds[i] == userIds[i+1]) { continue;}"
+ " temp[temp.length]=userIds[i];"
+ " } "
+ " return {uv:temp.length,pv:userIds.length,userIds:userIds};"
+ " }";
MapReduceOutput mapReduceOutput = getMongoTemplate().getCollection("userVisitRecord").mapReduce(map,reduce,"tmp",null);
DBCollection resultColl = mapReduceOutput.getOutputCollection();
try {
DBCursor cursor = resultColl.find();
while (cursor.hasNext()) {
DBObject dbObject = cursor.next();
if (dbObject.get("value") != null) {
UserStaticModel userStaticModel=new UserStaticModel();
userStaticModel.setUv(Math.round((double)((DBObject) dbObject.get("value")).get("uv")));
userStaticModel.setPv(Math.round((double) ((DBObject) dbObject.get("value")).get("pv")));
List<String>userIds=(List) ((DBObject) dbObject.get("value")).get("userIds");
Set<String> idSet=new HashSet<>(userIds);
userStaticModel.setUserIds(new ArrayList(idSet));
userStaticModel.setDate(date);
getMongoTemplate().insert(userStaticModel);
}
}
}catch (Exception e){
e.printStackTrace();
}finally {
resultColl.drop();
}
}
public UserStaticModel findStatic(String date) {
Query query=new Query();
query.addCriteria(Criteria.where("date").is(date));
return getMongoTemplate().findOne(query,UserStaticModel.class);
}
}</pre>
這段代碼中staticUserPvUv方法統計某天用戶訪問的Pv Uv。
map reduce方法如下:
String map = "function() { "
+ " if(this.date=='"+date+"'){ "
+ " emit(this.date ,{uv:1,pv:1,userIds:this.userId}) "
+ " }"
+ " } ";
String reduce = "function(key, values) { "
+ " var temp = new Array(); "
+ " var userIds= new Array(); "
+ " for(i = 0; i < values.length; i++) { "
+ " userIds=userIds.concat(values[i].userIds);"
+ " } "
+ " userIds.sort(); "
+ " for(i = 0; i < userIds.length; i++) {"
+ " if( userIds[i] == userIds[i+1]) { continue;}"
+ " temp[temp.length]=userIds[i];"
+ " } "
+ " return {uv:temp.length,pv:userIds.length,userIds:userIds};"
+ " }";
看到這里很多人會疑惑:map方法為啥emit為
emit(this.date ,{uv:1,pv:1,userIds:this.userId})
而不直接
emit(this.date ,{userId:this.userId})
剛剛開始我也是這么寫的,這么寫會產生以下結果:
-
當某天只有一條記錄:該記錄就不走reduce ,直接出來,你得到的value就只有一個userId字符串,其他啥也沒有。pv,uv 自然也沒有。所以 你在emit 應該初始化{pv:1,uv:1,userIds:this.userId}
-
當某天記錄特別多,超過100條的emit,mongo比較缺德的是,它會把這100的reduce的結果重新自動emit,所以這里把map中emit的對象結構和reduce的return返回的對象結構寫成一致的原因。同一個key ,當每超過100個emit,結果就會從新emit,所以這個結果的pv uv 是無效的,這里只會用到重新emit的userIds,然后在繼續在reduce進行統計。
這兩個點是mongo mapreduce 比較坑爹的地方。注意這兩點其他都OK了
轉發標柱來源:http://my.oschina.net/robinyao/blog/467591