ElasticSearch-query-match-phrase短语查询
2025-01-22 08:19:30    884 字   
This post is also available in English and alternative languages.

ElasticSearch版本:6.5.0(点击跳转官方文档)

短语查询


短语查询,match_phrase query 首先会把query内容分词,可以指定分词器;

同时文档还要满足两个条件

  1. 分词后所有词项都要出现在字段中
  2. 字段中词项顺序要一致

默认使用 match_phrase 会精确匹配查询的短语,需要词项、顺序完全一致(标点除外)。


1. 测试数据

新插入一批测试数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
POST /bookdatas/bookType
{
"author":"cyx",
"price":11,
"publish":"工业出版社",
"name":"cyx",
"type":"大学教材",
"info":"java is good language, and also spark is very good."
}

POST /bookdatas/bookType
{
"author":"cyx",
"price":11,
"publish":"工业出版社",
"name":"cyx",
"type":"大学教材",
"info":"java spark are very related, because scala is spark's programming language and scala is also based on jvm like java"
}

POST /bookdatas/bookType
{
"author":"cyx",
"price":11,
"publish":"工业出版社",
"name":"cyx",
"type":"大学教材",
"info":"java are spark very related, because scala is spark's programming language and scala is also based on jvm like java"
}

POST /bookdatas/bookType
{
"author":"cyx1",
"price":12,
"publish":"工业出版社",
"name":"cyx1",
"type":"大学教材",
"info":"spark are java very related, because scala is spark's programming language and scala is also based on jvm like java"
}

2. DSL

1
2
3
4
5
6
7
8
9
10
11
12
GET /bookdatas/bookType/_search
{
"query": {
"match_phrase": {
"info": {
"query": "java spark",
"analyzer": "ik_max_word"
}
}
},
"_source": "info"
}

结果集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
"hits" : [
{
"_index" : "bookdatas",
"_type" : "bookType",
"_id" : "94YAD3MBUNaKM7SA-jQU",
"_score" : 15.382665,
"_source" : {
"info" : "java spark are very related, because scala is spark's programming language and scala is also based on jvm like java"
}
},
{
"_index" : "bookdatas",
"_type" : "bookType",
"_id" : "-IYBD3MBUNaKM7SAAzQJ",
"_score" : 12.4879465,
"_source" : {
"info" : "java are spark very related, because scala is spark's programming language and scala is also based on jvm like java"
}
}
]

返回文档的info字段中,包含了"java"、“spark” 两个词项。

同时,它也支持通过 analyzer 指定分词器。


2.1. slop

上面说,默认使用中会精确匹配,在正常使用中比较苛刻。

就比如,第一条测试用例就检索不出来。因为该条数据中,java、spark两个词项分开的太远。

slop参数控制match_phrase查询,每个词项之间,允许相隔几个词项仍能将文档视为匹配。

第一条测试数据为例:

javaisgoodlanguageandalsoapsrk
javaspark
javaspark
java
javaspark
javaspark
javaspark
javaspark
1
2
3
4
5
6
7
8
9
10
11
12
GET /bookdatas/_search
{
"query": {
"match_phrase": {
"info": {
"query": "java spark",
"analyzer": "ik_max_word",
"slop": 7
}
}
}
}

3. match 和 match_phrase 的区别

match query,会将输入的query拆解开,去倒排索引里面一一匹配,只要匹配上任意一个拆解后的单词,就可以作为结果返回。

当不想将搜索字符拆分,就需要使用 match_phrase 短语搜索;它会将给定的短语(phrase)当成一个完整的查询条件。

当使用match_phrase进行搜索的时候,结果集中所有的Document都必须包含你指定的查询词组。


4. Reference