1

Elasticsearch.Nest 教程系列 8 聚合:Writing Aggregations | 使用聚合

 2 years ago
source link: https://blog.zhuliang.ltd/2020/01/backend/Elasticsearch-Nest-Writing-Aggregations.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Elasticsearch.Nest 教程系列 8 聚合:Writing Aggregations | 使用聚合

create: 2020-01-22 13:18:01 | update: 2020-01-23 12:15:44 本文总阅读量:  次  |  文章总字数: 1.3k 字  |  阅读约需: 6 分钟


可以简单将 ES 中的聚合和 Sql server 中的“聚合函数(如 SUM,COUNT 等”)相关联。聚合可以嵌套,通过聚合可以找出某个字段的最大值,最小值,平均值,以及对字段进行求和操作等复杂数据的构建。

另外,ES 还提出了 buckets(桶) 这个概念,你可以简单理解为相当于是 Sql server 中的分组(GROUP BY),即在 ES 中的称 GROUP BY 为“分桶”。

关于 Elasticsearch 中的聚合说明,可以见此

Nest 提供了 3 种方式来让你使用聚合:

  • 通过 lambda 表达式的方式。
  • 通过内建的请求对象 AggregationDictionary。
  • 通过结合二元运算符来简化 AggregationDictionary 的使用。

假设有以下 Project 类:

public class Project
{
    public string Name { get; set; }
    public int Quantity { get; set; }
}

三种方式的请求命令见下方:

POST /project/_search?typed_keys=true
{
    "aggs": { //关键字 aggregations,可以用 aggs 简写
        "average_quantity": { //聚合的名字
            "avg": {  //聚合的类型,可以理解为相当于 sql server 中的聚合函数
                "field": "quantity"  //聚合体,对哪些字段进行聚合
            }
        },
        "max_quantity": {
            "max": {
                "field": "quantity"
            }
        },
        "min_quantity": {
            "min": {
                "field": "quantity"
            }
        }
    }
}

lambda 方式

通过 lambda 表达式来使用聚合是简洁的方式

var searchResponse = _client.Search<Project>(s => s
    .Aggregations(aggs => aggs
        .Average("average_quantity", avg => avg.Field(p => p.Quantity))
        .Max("max_quantity", avg => avg.Field(p => p.Quantity))
        .Min("min_quantity", avg => avg.Field(p => p.Quantity))
    )
);

响应结果如下:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 6,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "name": "Emma",
                    "quantity": 1
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "name": "Tran",
                    "quantity": 2
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "3",
                "_score": 1.0,
                "_source": {
                    "name": "Lucy",
                    "quantity": 3
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "4",
                "_score": 1.0,
                "_source": {
                    "name": "Geo",
                    "quantity": 4
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "5",
                "_score": 1.0,
                "_source": {
                    "name": "Luby",
                    "quantity": 5
                }
            },
            {
                "_index": "project",
                "_type": "_doc",
                "_id": "6",
                "_score": 1.0,
                "_source": {
                    "name": "Han",
                    "quantity": 6
                }
            }
        ]
    },
    "aggregations": {
        "avg#average_quantity": {
            "value": 3.5
        },
        "max#max_quantity": {
            "value": 6.0
        },
        "min#min_quantity": {
            "value": 1.0
        }
    }
}

一般进行聚合查询的时候,并不需要 _source 的东西,所以你在进行聚合查询是,可以在查询语句上指定 size=0,这样就只会返回 聚合 的结果,方式如下:

var searchResponse = _client.Search<Project>(s => s
    .Size(0)  //显式指定为 0
    .Aggregations(aggs => aggs
        .Average("average_quantity", avg => avg.Field(p => p.Quantity))
        .Max("max_quantity", avg => avg.Field(p => p.Quantity))
        .Min("min_quantity", avg => avg.Field(p => p.Quantity))
    )
);

调整后的返回结果如下:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 6,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "avg#average_quantity": {
            "value": 3.5
        },
        "max#max_quantity": {
            "value": 6.0
        },
        "min#min_quantity": {
            "value": 1.0
        }
    }
}

通过内建对象 AggregationDictionary

以下代码的效果和通过 lambda 表达式的效果一样

var searchRequest = new SearchRequest<Project>
{
    Size = 0,
    Aggregations = new AggregationDictionary
    {
        {"average_quantity", new AverageAggregation("average_quantity", "quantity")},
        {"max_quantity", new MaxAggregation("max_quantity", "quantity")},
        {"min_quantity", new MinAggregation("min_quantity", "quantity")},
    }
};
var searchResponse = _client.Search<Project>(searchRequest);
  • 这种方式在可读性上较差。

通过结合二元运算符来简化 AggregationDictionary 的使用

通过二元运算符,可以让代码的可读性更高,以下代码等效于上方:

var searchRequest = new SearchRequest<Project>
{
    Size = 0,
    Aggregations = new AverageAggregation("average_quantity", "quantity")
    &&new MaxAggregation("max_quantity", "quantity")
    &&new MinAggregation("min_quantity", "quantity")
};
var searchResponse = _client.Search<Project>(searchRequest);

获取响应结果

通过使用响应模型的 .Aggregations 属性,可以让你得到聚合的结果,如下:

保留关键字

在使用聚合功能的时候,需要避免跟 ES 保留关键字冲突,如以下关键字(包含但不限于):

  • “score”
  • “value_as_string”
  • “keys”
  • “max_score”

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK