Elasticsearch 聚合计算
聚合框架用于收集搜索查询选择的所有数据。该框架由许多构建块组成,有助于构建复杂的数据摘要
下面的 JSON 对象使用聚合函数的一般请求正文格式
"aggregations" : { "<aggregation_name>" : { "<aggregation_type>" : { <aggregation_body> } [,"meta" : { [<meta_data_body>] } ]? [,"aggregations" : { [<sub_aggregation>]+ } ]? } }
Elasticsearch 提供了大量的聚合函数,它们都有各自不同的目的
矩阵聚合 ( Metrics )
这些聚合函数可以根据聚合文档的字段值计算度量值,而且有时可以从脚本生成一些值
数字矩阵既可以是单值,也可以是平均聚合或多值统计等
平均数聚合 ( avg )
该聚合函数用于计算文档中出现的任何数字字段的平均值
例如
POST http://localhost:9200/user_admin/_search?pretty
请求正文
{ "aggs":{ "avg_money":{"avg":{"field":"money"}} } }
返回响应结果
{ "took" : 160, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "user_admin", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "雅少", "description" : "虚怀若谷", "street" : "四川大学", "city" : "Chengdu", "state" : "Sichuan", "zip" : "610044", "location" : [ 104.094537, 30.640174 ], "money" : 68023, "tags" : [ "Python", "HTML" ], "vitality" : "7.8" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "语飞", "description" : "简单教程,简单编程", "street" : "东四十条", "city" : "Beijing", "state" : "Beijing", "zip" : "100007", "location" : [ 116.432727, 39.937732 ], "money" : 5201814, "tags" : [ "PHP", "Python" ], "vitality" : "9.0" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "3", "_score" : 1.0, "_source" : { "nickname" : "歌者", "description" : "程序设计也是设计,研发新菜也是研发", "street" : "五道口", "city" : "Beijing", "state" : "Beijing", "zip" : "100083", "location" : [ 116.346346, 39.999333 ], "money" : 71128, "tags" : [ "Java", "Scala" ], "vitality" : "6.9" } } ] }, "aggregations" : { "avg_money" : { "value" : 1780321.6666666667 } } }
如果一个或多个聚合文档中不存在此值,默认情况下它们会被忽略
我们可以在聚合中添加缺失字段来设置缺失字段的默认值
{ "aggs":{ "avg_money":{ "avg":{ "field":"money" "missing":0 } } } }
基数聚合 ( cardinality )
基数聚合 ( cardinality ) 用于计算特定字段的不同值的计数
例如
POST http://localhost:9200/user*/_search?pretty
请求正文
{ "aggs":{ "distinct_nickname_count":{"cardinality":{"field":"nickname"}} } }
响应内容
{ "error" : { "root_cause" : [ { "type" : "illegal_argument_exception", "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [nickname] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead." } ], "type" : "search_phase_execution_exception", "reason" : "all shards failed", "phase" : "query", "grouped" : true, "failed_shards" : [ { "shard" : 0, "index" : "user", "node" : "4zwAMlTzRCaioBeOE9PaNw", "reason" : { "type" : "illegal_argument_exception", "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [nickname] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead." } } ], "caused_by" : { "type" : "illegal_argument_exception", "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [nickname] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.", "caused_by" : { "type" : "illegal_argument_exception", "reason" : "Fielddata is disabled on text fields by default. Set fielddata=true on [nickname] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead." } } }, "status" : 400 }
很明显响应出错了,提示 Fielddata
要单独加载
好吧,那我们先运行下面的请求来修改下
PUT http://localhost:9200/user*/_mapping/user
请求正文
{ "properties": { "nickname": { "type": "text", "fielddata": true } } }
响应内容
{"acknowledged":true}
然后重新发起刚刚报错的请求,响应如下
{ "took" : 186, "timed_out" : false, "_shards" : { "total" : 10, "successful" : 10, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 1.0, "hits" : [ { "_index" : "user", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "枫晚", "description" : "停车坐爰枫林晚", "street" : "苏州大学", "city" : "Suzhou", "state" : "Jiangsu", "zip" : "215006", "location" : [ 120.65426, 31.30797 ], "money" : 10235, "tags" : [ "Java", "Android" ], "vitality" : "3.5" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "雅少", "description" : "虚怀若谷", "street" : "四川大学", "city" : "Chengdu", "state" : "Sichuan", "zip" : "610044", "location" : [ 104.094537, 30.640174 ], "money" : 68023, "tags" : [ "Python", "HTML" ], "vitality" : "7.8" } }, { "_index" : "user", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "question", "description" : "问题少年也是少年", "street" : "张江高科技园区", "city" : "Shanghai", "state" : "Shanghai", "zip" : "201204", "location" : [ 121.60632, 31.199305 ], "money" : 13648, "tags" : [ "VUE", "HTML" ], "vitality" : "8.8" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "语飞", "description" : "简单教程,简单编程", "street" : "东四十条", "city" : "Beijing", "state" : "Beijing", "zip" : "100007", "location" : [ 116.432727, 39.937732 ], "money" : 5201814, "tags" : [ "PHP", "Python" ], "vitality" : "9.0" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "3", "_score" : 1.0, "_source" : { "nickname" : "歌者", "description" : "程序设计也是设计,研发新菜也是研发", "street" : "五道口", "city" : "Beijing", "state" : "Beijing", "zip" : "100083", "location" : [ 116.346346, 39.999333 ], "money" : 71128, "tags" : [ "Java", "Scala" ], "vitality" : "6.9" } } ] }, "aggregations" : { "distinct_nickname_count" : { "value" : 9 } } }
扩展统计聚合 ( extended_stats )
此聚合用于生成有关聚合文档中特定数字字段的所有统计信息
例如
POST http://localhost:9200/user_admin/user/_search?pretty
请求正文
{ "aggs" : { "money_stats" : { "extended_stats" : { "field" : "money" } } } }
响应内容
{ "took" : 12, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "user_admin", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "雅少", "description" : "虚怀若谷", "street" : "四川大学", "city" : "Chengdu", "state" : "Sichuan", "zip" : "610044", "location" : [ 104.094537, 30.640174 ], "money" : 68023, "tags" : [ "Python", "HTML" ], "vitality" : "7.8" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "语飞", "description" : "简单教程,简单编程", "street" : "东四十条", "city" : "Beijing", "state" : "Beijing", "zip" : "100007", "location" : [ 116.432727, 39.937732 ], "money" : 5201814, "tags" : [ "PHP", "Python" ], "vitality" : "9.0" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "3", "_score" : 1.0, "_source" : { "nickname" : "歌者", "description" : "程序设计也是设计,研发新菜也是研发", "street" : "五道口", "city" : "Beijing", "state" : "Beijing", "zip" : "100083", "location" : [ 116.346346, 39.999333 ], "money" : 71128, "tags" : [ "Java", "Scala" ], "vitality" : "6.9" } } ] }, "aggregations" : { "money_stats" : { "count" : 3, "min" : 68023.0, "max" : 5201814.0, "avg" : 1780321.6666666667, "sum" : 5340965.0, "sum_of_squares" : 2.7068555211509E13, "variance" : 5.853306500366889E12, "std_deviation" : 2419360.762756743, "std_deviation_bounds" : { "upper" : 6619043.192180153, "lower" : -3058399.858846819 } } } }
最大值聚合 ( max )
最大值聚合用于查找聚合文档中特定数字字段的最大值
例如
POST http://localhost:9200/user*/_search
请求正文
{ "aggs" : { "max_money" : { "max" : { "field" : "money" } } } }
响应内容
{ "took" : 22, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "user_admin", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "雅少", "description" : "虚怀若谷", "street" : "四川大学", "city" : "Chengdu", "state" : "Sichuan", "zip" : "610044", "location" : [ 104.094537, 30.640174 ], "money" : 68023, "tags" : [ "Python", "HTML" ], "vitality" : "7.8" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "语飞", "description" : "简单教程,简单编程", "street" : "东四十条", "city" : "Beijing", "state" : "Beijing", "zip" : "100007", "location" : [ 116.432727, 39.937732 ], "money" : 5201814, "tags" : [ "PHP", "Python" ], "vitality" : "9.0" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "3", "_score" : 1.0, "_source" : { "nickname" : "歌者", "description" : "程序设计也是设计,研发新菜也是研发", "street" : "五道口", "city" : "Beijing", "state" : "Beijing", "zip" : "100083", "location" : [ 116.346346, 39.999333 ], "money" : 71128, "tags" : [ "Java", "Scala" ], "vitality" : "6.9" } } ] }, "aggregations" : { "max_money" : { "value" : 5201814.0 } } }
最小值聚合 ( min )
最小值聚合用于查找聚合文档中特定数字字段的最小值
例如
POST http://localhost:9200/user*/_search?pretty
请求正文
{ "aggs" : { "min_money" : { "min" : { "field" : "money" } } } }
响应内容
{ "took" : 20, "timed_out" : false, "_shards" : { "total" : 10, "successful" : 10, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 1.0, "hits" : [ { "_index" : "user", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "枫晚", "description" : "停车坐爰枫林晚", "street" : "苏州大学", "city" : "Suzhou", "state" : "Jiangsu", "zip" : "215006", "location" : [ 120.65426, 31.30797 ], "money" : 10235, "tags" : [ "Java", "Android" ], "vitality" : "3.5" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "雅少", "description" : "虚怀若谷", "street" : "四川大学", "city" : "Chengdu", "state" : "Sichuan", "zip" : "610044", "location" : [ 104.094537, 30.640174 ], "money" : 68023, "tags" : [ "Python", "HTML" ], "vitality" : "7.8" } }, { "_index" : "user", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "question", "description" : "问题少年也是少年", "street" : "张江高科技园区", "city" : "Shanghai", "state" : "Shanghai", "zip" : "201204", "location" : [ 121.60632, 31.199305 ], "money" : 13648, "tags" : [ "VUE", "HTML" ], "vitality" : "8.8" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "语飞", "description" : "简单教程,简单编程", "street" : "东四十条", "city" : "Beijing", "state" : "Beijing", "zip" : "100007", "location" : [ 116.432727, 39.937732 ], "money" : 5201814, "tags" : [ "PHP", "Python" ], "vitality" : "9.0" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "3", "_score" : 1.0, "_source" : { "nickname" : "歌者", "description" : "程序设计也是设计,研发新菜也是研发", "street" : "五道口", "city" : "Beijing", "state" : "Beijing", "zip" : "100083", "location" : [ 116.346346, 39.999333 ], "money" : 71128, "tags" : [ "Java", "Scala" ], "vitality" : "6.9" } } ] }, "aggregations" : { "min_money" : { "value" : 10235.0 } } }
求和聚合 ( sum )
求和聚合 ( sum ) 用于计算聚合文档中特定数字字段的总和
例如
POST http://localhost:9200/user*/_search?pretty
请求正文
{ "aggs" : { "total_money" : { "sum" : { "field" : "money" } } } }
返回响应
{ "took" : 10, "timed_out" : false, "_shards" : { "total" : 10, "successful" : 10, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5, "max_score" : 1.0, "hits" : [ { "_index" : "user", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "枫晚", "description" : "停车坐爰枫林晚", "street" : "苏州大学", "city" : "Suzhou", "state" : "Jiangsu", "zip" : "215006", "location" : [ 120.65426, 31.30797 ], "money" : 10235, "tags" : [ "Java", "Android" ], "vitality" : "3.5" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "2", "_score" : 1.0, "_source" : { "nickname" : "雅少", "description" : "虚怀若谷", "street" : "四川大学", "city" : "Chengdu", "state" : "Sichuan", "zip" : "610044", "location" : [ 104.094537, 30.640174 ], "money" : 68023, "tags" : [ "Python", "HTML" ], "vitality" : "7.8" } }, { "_index" : "user", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "question", "description" : "问题少年也是少年", "street" : "张江高科技园区", "city" : "Shanghai", "state" : "Shanghai", "zip" : "201204", "location" : [ 121.60632, 31.199305 ], "money" : 13648, "tags" : [ "VUE", "HTML" ], "vitality" : "8.8" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "1", "_score" : 1.0, "_source" : { "nickname" : "语飞", "description" : "简单教程,简单编程", "street" : "东四十条", "city" : "Beijing", "state" : "Beijing", "zip" : "100007", "location" : [ 116.432727, 39.937732 ], "money" : 5201814, "tags" : [ "PHP", "Python" ], "vitality" : "9.0" } }, { "_index" : "user_admin", "_type" : "user", "_id" : "3", "_score" : 1.0, "_source" : { "nickname" : "歌者", "description" : "程序设计也是设计,研发新菜也是研发", "street" : "五道口", "city" : "Beijing", "state" : "Beijing", "zip" : "100083", "location" : [ 116.346346, 39.999333 ], "money" : 71128, "tags" : [ "Java", "Scala" ], "vitality" : "6.9" } } ] }, "aggregations" : { "total_money" : { "value" : 5364848.0 } } }
此外,还存在一些其它聚合函数用于计算地理位置,如地理边界聚合和地理质心聚合
批量聚合 ( Bucket )
这些聚合包含了许多具有统一标准的不同类型的桶聚合,它们用于确定文档是否应该属于某个桶。
下面我们将会罗列这些桶聚合
子聚合
批量聚合会生成一组文档,这些文档将映射到父桶中
参数 type
用于定义父索引
例如,假如我们有一个品牌及其不同的模型,然后模型类型将包含以下 _parent
字段
{ "model" : { "_parent" : { "type" : "brand" } } }
还有很多其它的特殊的批量集合,在某些特定的情况下很好用,我们罗列如下
- Date Histogram 聚合
- Date Range 聚合
- Filter 聚合
- Filters 聚合
- Geo Distance 聚合
- GeoHash grid 聚合
- Global 聚合
- Histogram 聚合
- IPv4 Range 聚合
- Missing 聚合
- Nested 聚合
- Range 聚合
- Reverse nested 聚合
- Sampler 聚合
- Significant Terms 聚合
- Terms 聚合
聚合元数据
可以在请求时使用 meta
参数添加关于聚合的一些数据,然后就可以在响应时获取到这些数据
POST http://localhost:9200/user*/report/_search?pretty
请求正文
{ "aggs" : { "min_money" : { "avg" : { "field" : "money" } , "meta" :{"dsc" :"Lowest Moneys"} } } }
响应内容
{ "took" : 30, "timed_out" : false, "_shards" : { "total" : 10, "successful" : 10, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] }, "aggregations" : { "min_money" : { "meta" : { "dsc" : "Lowest Moneys" }, "value" : null } } }