
8

elasticsearch用function_score自定义相关度分数算法
source link: https://wakzz.cn/2018/10/31/elasticsearch/%E7%94%A8function_score%E8%87%AA%E5%AE%9A%E4%B9%89%E7%9B%B8%E5%85%B3%E5%BA%A6%E5%88%86%E6%95%B0%E7%AE%97%E6%B3%95/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

elasticsearch用function_score自定义相关度分数算法
祈雨的博客
2018-10-31
转载自CSDN本文链接地址: ElasticSearch用function_score自定义相关度分数算法
- 在field: tile 和 content 中查找 java spark 的doc
- 要求follower_num越多的 doc 分数越高。(看帖子的人越多,那么帖子的分数就越高)
function_score
函数:
- 我们可以做到自定义一个
function_score
函数 - 自己将某个field的值,跟es内置算出来的分数进行运算
- 然后由自己指定的field来进行分数的增强
给所有的帖子数据增加follower数量
POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"follower_num" : 5} }
{ "update": { "_id": "2"} }
{ "doc" : {"follower_num" : 10} }
{ "update": { "_id": "3"} }
{ "doc" : {"follower_num" : 25} }
{ "update": { "_id": "4"} }
{ "doc" : {"follower_num" : 3} }
{ "update": { "_id": "5"} }
{ "doc" : {"follower_num" : 60} }
- 将对帖子搜索得到的分数,跟
follower_num
进行运算,由follower_num
在一定程度上增强帖子的分数 - 看帖子的人越多,那么帖子的分数就越高
GET /forum/article/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "java spark",
"fields": ["tile", "content"]
}
},
"field_value_factor": {
"field": "follower_num",
"modifier": "log1p",
"factor": 0.5
},
"boost_mode": "sum",
"max_boost": 2
}
}
}
field_value_factor
中如果只有field,那么会将每个doc的分数都乘以follower_num
,如果有的doc follower是0,那么分数就会变为0,效果很不好。- 因此一般会加个
log1p
函数,公式会变为,new_score = old_score * log(1 + number_of_votes)
,这样出来的分数会比较合理 - 再加个
factor
,可以进一步影响分数,new_score = old_score * log(1 + factor * number_of_votes)
boost_mode
,可以决定分数与指定字段的值如何计算,multiply,sum,min,max,replacemax_boost
,限制计算出来的分数不要超过max_boost
指定的值
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK