Elasticsearch 2.20入門篇:聚合操作
來自: http://my.oschina.net/secisland/blog/614127
聚合(Aggregations)提供分組和統計文檔的能力。聚合類似關系數據庫中group by分組的功能,在Elasticsearch中,對一次的聚合查詢中可以同時得到聚合的具體結果再次進行聚合,這是一個非常有用的功能。你可以通過一次操作得到多次聚合的結果,從而避免多次請求,減少網絡和服務器的負擔。
數據準備:我們插入幾條數據:
請求:POST localhost:9200/customer/external/?pretty
參數:
{"name": "secisland","age":25,"state":"open","gender":"woman","balance":87 }
{"name": "zhangsan","age":32,"state":"close","gender":"man","balance":95 }
{"name": "zhangsan1","age":33,"state":"close","gender":"man","balance":91 }
{"name": "lisi","age":34,"state":"open","gender":"woman","balance":99 }
{"name": "wangwu","age":46,"state":"close","gender":"woman","balance":78 }
其中插入5條數據作為測試。
有了數據后我們進行聚合測試:
例子:將所有的客戶按狀態分組,然后再返回前10(默認)狀態,按統計(也默認)排序:
請求:POST http://localhost:9200/customer/_search?pretty
參數:
{ "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" } } } }
這個查詢條件類似關系數據庫中的group by:
SELECT state, COUNT(*) FROM customer GROUP BY state ORDER BY COUNT(*) DESC
返回結果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "close", "doc_count" : 3 }, { "key" : "open", "doc_count" : 2 } ] } } }
我們從中可以看出,有2個close狀態的客戶,2個open狀態的用戶。
下面我們在上面的基礎上再增加一個功能就是在統計狀態的同時計算每個狀態的平均余額。
請求和剛才一樣,但參數變了,請看下面的參數:
{ "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }
得到的查詢結果如下:
{ "took" : 16, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "close", "doc_count" : 3, "average_balance" : { "value" : 88.0 } }, { "key" : "open", "doc_count" : 2, "average_balance" : { "value" : 93.0 } } ] } } }
請仔細觀察是如何嵌套在group_by_state聚集中的average_balance聚集。這是聚合的一個常見的模式。可以在聚合后再次聚合任意字段得到我們想要的結果。
在看下面的例子,我們對上面得出的結果中再次對平均賬戶金額進行降序排列:
請求和之前的一樣:
參數:
{ "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state", "order": { "average_balance": "desc" } }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } }
得到的查詢結果:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "open", "doc_count" : 2, "average_balance" : { "value" : 93.0 } }, { "key" : "close", "doc_count" : 3, "average_balance" : { "value" : 88.0 } } ] } } }
本文由賽克藍德(secisland)原創,轉載請標明作者和出處。
下面這個例子比較復雜:演示了如何通過年齡組(年齡20-29歲,30-39歲,40-49),然后通過性別,最后得到是每個年齡段,每個性別的平均賬戶余額:
{ "size": 0, "aggs": { "group_by_age": { "range": { "field": "age", "ranges": [ { "from": 20, "to": 30 }, { "from": 30, "to": 40 }, { "from": 40, "to": 50 } ] }, "aggs": { "group_by_gender": { "terms": { "field": "gender" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } } }
查詢出的返回結果:
{ "took" : 15, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 0.0, "hits" : [ ] }, "aggregations" : { "group_by_age" : { "buckets" : [ { "key" : "20.0-30.0", "from" : 20.0, "from_as_string" : "20.0", "to" : 30.0, "to_as_string" : "30.0", "doc_count" : 1, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "woman", "doc_count" : 1, "average_balance" : { "value" : 87.0 } } ] } }, { "key" : "30.0-40.0", "from" : 30.0, "from_as_string" : "30.0", "to" : 40.0, "to_as_string" : "40.0", "doc_count" : 3, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "man", "doc_count" : 2, "average_balance" : { "value" : 93.0 } }, { "key" : "woman", "doc_count" : 1, "average_balance" : { "value" : 99.0 } } ] } }, { "key" : "40.0-50.0", "from" : 40.0, "from_as_string" : "40.0", "to" : 50.0, "to_as_string" : "50.0", "doc_count" : 1, "group_by_gender" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "woman", "doc_count" : 1, "average_balance" : { "value" : 78.0 } } ] } } ] } } }
從上面的例子中可以看出,Elasticsearch的聚合能力是非常強大的。
賽克藍德(secisland)后續會逐步對Elasticsearch的最新版本的各項功能進行分析,近請期待,也歡迎加入secisland公眾號進行關注。