一起来学ElasticSearch(七)

本文涉及的产品
Elasticsearch Serverless通用抵扣包,测试体验金 200元
简介: 一起来学ElasticSearch(七)

前言

目前正在出一个Es专题系列教程, 篇幅会较多, 喜欢的话,给个关注❤️ ~


承接上文,本节把上节遗留的条件查询操作给大家讲一下~


为了方便学习, 本节中所有示例沿用上节的索引。本文偏实战一些,好了, 废话不多说直接开整吧~


多条件组合查询

bool

es中使用bool来控制多条件查询,bool查询支持以下参数:

  • must:被查询的数据必须满足当前条件
  • mush_not:被查询的数据必须不满足当前条件
  • should:被查询的数据应该满足当前条件。should查询被用于修正查询结果的评分。需要注意的是,如果组合查询中没有must,那么被查询的数据至少要匹配一条should。如果有must语句,那么就无须匹配shouldshould将完全用于修正查询结果的评分
  • filter:被查询的数据必须满足当前条件,但是filter操作不涉及查询结果评分。仅用于条件过滤


下面通过一个例子来看下如何使用:

GET class_1/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple"
        }}
      ],
      "must_not": [
        {"term": {
          "num": {
            "value": "5"
          }
        }}
      ],
      "should": [
        {"match": {
          "name": "k"
        }}
      ],"filter": [
        {"range": {
          "num": {
            "gte": 0,
            "lte": 10
          }
        }}
      ]
    }
  }
}
复制代码


结果返回:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.7389809,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}
复制代码


constant_score

constant_score查询可以通过boost指定一个固定的评分,通常来说,constant_score的作用是代替一个只有filterbool查询


下面看具体使用:

GET class_1/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "num": 6
        }
      },
      "boost": 1.2
    }
  }
}
复制代码


返回:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.2,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : 1.2,
        "_source" : {
          "name" : "b",
          "num" : 6
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.2,
        "_source" : {
          "name" : "l",
          "num" : 6
        }
      }
    ]
  }
}
复制代码


查询验证 & 分析

验证

es中通过/_validate/query路由来验证查询条件的正确性, 这里要注意是验证查询条件是否准确


示例:

GET class_1/_validate/query?explain
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple"
        }}
      ]
    }
  }
}
复制代码


正常返回:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : "+name:apple"
    }
  ]
}
复制代码


name字段改为 name1再查询:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : """+MatchNoDocsQuery("unmapped fields [name1]")"""
    }
  ]
}
复制代码

可以看到报了异常错误


分析

es中通过/_validate/query?explain路由来进行查询分析


示例:

GET class_1/_validate/query?explain
{
  "query": {
    "bool": {
      "must": [
        {"match": {
          "name": "apple so"
        }}
      ]
    }
  }
}
复制代码


返回:

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "valid" : true,
  "explanations" : [
    {
      "index" : "class_1",
      "valid" : true,
      "explanation" : "+(name:apple name:so)"
    }
  ]
}
复制代码


可以看到"explanation" : "+(name:apple name:so)",查询的短语apple so被进行了分词,分成了name:apple, name: so


排序

默认排序

在前面的几个例子中,我们可以看到它的默认排序是按照_score降序,也就是匹配度高的比较靠前,但是_socre的计算是很占用查询性能的,这个不难理解。


当我们不需要进行_score计算,可以通过filterconstant_score来进行构建查询条件


filter示例:

GET class_1/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term": {
          "num": 1
        }}
      ]
    }
  }
}
复制代码


返回:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.0,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}
复制代码


通过查询结果我们发现score都为0.0了,说明没有进行score计算

constant_score示例:

GET class_1/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "num": 1
        }
      },
      "boost": 1.2
    }
  }
}
复制代码


返回:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.2,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 1.2,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}
复制代码

可以看到,对应返回的分值,都是使用boost属性指定的分值


自定义排序

自定义可以用于大部分场景,那么es中怎么进行自定义排序呢? es中使用sort参数来自定义排序顺序,默认为升序,那么降序怎么操作呢?


  • 升序
{"sort":["num"]}
复制代码


  • 降序, desc代表降序
{"sort":[{"num":{"order":"desc"}}]} 
复制代码


tips

  • es中使用doc value列式存储来实现字段的排序功能
  • text字段默认不创建doc value,因此无法针对text字段进行排序
  • 可以通过设置text字段属性fielddata=true来开启对text字段的排序功能,但是不建议开启,对text字段排序及其消耗查询性能且不符合需求


单字段排序

GET class_1/_search
{
    "sort": [
        "num"
    ]
}
复制代码


返回:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : null,
        "_source" : {
          "name" : "b",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "l",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "num" : 9,
          "name" : "e",
          "age" : 9,
          "desc" : [
            "hhhh"
          ]
        },
        "sort" : [
          9
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "f",
          "age" : 10,
          "num" : 10
        },
        "sort" : [
          10
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "RWlfBIUBDuA8yW5cu9wu",
        "_score" : null,
        "_source" : {
          "name" : "一年级",
          "num" : 20
        },
        "sort" : [
          20
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iGFt-4UBECmbBdQAnVJe",
        "_score" : null,
        "_source" : {
          "name" : "g",
          "age" : 8
        },
        "sort" : [
          9223372036854775807
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iWFt-4UBECmbBdQAnVJg",
        "_score" : null,
        "_source" : {
          "name" : "h",
          "age" : 9
        },
        "sort" : [
          9223372036854775807
        ]
      }
    ]
  }
}
复制代码

可以看到是按照num默认升序排序


再看下降序:

GET class_1/_search
{
    "sort": [
        {"num": {"order":"desc"}}
    ]
}
复制代码


返回:

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 11,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "RWlfBIUBDuA8yW5cu9wu",
        "_score" : null,
        "_source" : {
          "name" : "一年级",
          "num" : 20
        },
        "sort" : [
          20
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : null,
        "_source" : {
          "name" : "f",
          "age" : 10,
          "num" : 10
        },
        "sort" : [
          10
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : null,
        "_source" : {
          "num" : 9,
          "name" : "e",
          "age" : 9,
          "desc" : [
            "hhhh"
          ]
        },
        "sort" : [
          9
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "h2Fg-4UBECmbBdQA6VLg",
        "_score" : null,
        "_source" : {
          "name" : "b",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "name" : "l",
          "num" : 6
        },
        "sort" : [
          6
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : null,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        },
        "sort" : [
          1
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iGFt-4UBECmbBdQAnVJe",
        "_score" : null,
        "_source" : {
          "name" : "g",
          "age" : 8
        },
        "sort" : [
          -9223372036854775808
        ]
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "iWFt-4UBECmbBdQAnVJg",
        "_score" : null,
        "_source" : {
          "name" : "h",
          "age" : 9
        },
        "sort" : [
          -9223372036854775808
        ]
      }
    ]
  }
}
复制代码

这下就降序排序了


多字段

GET class_1/_search
{
    "sort": [
        "num", "age"
    ]
}
复制代码


scroll分页

还记得之前给大家讲的from+size的分页方式吗,es中默认允许from+size的分页的最大数据量为10000。当我们想要批量获取更大的数据量时,使用from+size就会十分的耗费性能。


然而大部分应用场景下的数据量是极其庞大的,比如你要查询某些系统日志数据。es中可以使用/scorll路由来进行滚动分页查询,它类似于在查询初始时间点创建了一个当前服务集群的数据快照(包含每一个分片),并保留它一段时间。在时间超过了设置的过期时间以后,快照将在es空闲时被删除。


需要注意的是,因为是进行快照查询,因此在快照创建后数据的变更在本次的滚动查询中,不可见


初始化快照 & 快照保存10分钟

查询示例:

GET class_1/_search?scroll=10m
{
"query": {
 "match_phrase": {
   "name": "apple"
 }
},
"size": 2
}
复制代码


返回:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "b8fcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi1~",
          "num" : 1
        }
      },
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "ccfcCoYB090miyjed7YE",
        "_score" : 0.752627,
        "_source" : {
          "name" : "I eat apple so haochi3~",
          "num" : 1
        }
      }
    ]
  }
}
复制代码

如图,当前共返回2条数据,并且返回了一个快照ID,后续可以根据快照ID进行滚动查询:


根据快照ID滚动查询

GET /_search/scroll
{
 "scroll": "10m", 
 "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw=="
}
复制代码


返回:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [
      {
        "_index" : "class_1",
        "_type" : "_doc",
        "_id" : "cMfcCoYB090miyjed7YE",
        "_score" : 0.7389809,
        "_source" : {
          "name" : "I eat apple so zhen haochi2~",
          "num" : 1
        }
      }
    ]
  }
}
复制代码


在滚动一次:

{
  "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoAwAAAAAAAAXoFjEwWkdOMkxLUTVPZEMzM01ZdHhPc1EAAAAAAAACABZjUy1CemQwQVFfU3BUeGs2OGk0R1Z3AAAAAAAAAgEWY1MtQnpkMEFRX1NwVHhrNjhpNEdWdw==",
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.752627,
    "hits" : [ ]
  }
}
复制代码


有的小伙伴可能不知道怎么滚动的,因为后续滚动都是同一个scroll_id,其实通过结果,我们不难发现:

  • 首先创建了一个10分钟的快照,规定了每次返回的数据量为2条,并且初始化的时候,返回了2条
  • 通过scroll_id进行滚动操作,返回了1条数据,原因是快照的数据量总共只有3条,初始化的时候返回了2条,所以现在只有1条
  • 再次滚动的时候,发现返回了空,因为数据已经被查完了


结束语

本节就到此结束了,大家一定要多去练习。下节我们进入进阶查询部分内容 ~

相关实践学习
以电商场景为例搭建AI语义搜索应用
本实验旨在通过阿里云Elasticsearch结合阿里云搜索开发工作台AI模型服务,构建一个高效、精准的语义搜索系统,模拟电商场景,深入理解AI搜索技术原理并掌握其实现过程。
ElasticSearch 最新快速入门教程
本课程由千锋教育提供。全文搜索的需求非常大。而开源的解决办法Elasricsearch(Elastic)就是一个非常好的工具。目前是全文搜索引擎的首选。本系列教程由浅入深讲解了在CentOS7系统下如何搭建ElasticSearch,如何使用Kibana实现各种方式的搜索并详细分析了搜索的原理,最后讲解了在Java应用中如何集成ElasticSearch并实现搜索。  
相关文章
|
机器学习/深度学习 数据可视化 数据挖掘
【10月更文挑战第4天】「Mac上学Python 5」入门篇5 - Jupyter 环境配置与高效使用技巧
本篇将介绍如何在Mac系统上安装和配置Jupyter,并详细介绍Jupyter Notebook的一些常用“神奇函数”。Jupyter是一个支持交互式计算的工具,广泛用于数据分析、机器学习等领域,通过学习本篇,用户将能够在Python项目中高效使用Jupyter Notebook。
453 3
【10月更文挑战第4天】「Mac上学Python 5」入门篇5 - Jupyter 环境配置与高效使用技巧
ly~
|
Ubuntu Linux C语言
SDL 图形库安装常见错误及解决方法
SDL(Simple DirectMedia Layer)图形库安装过程中可能会遇到编译错误、运行时错误、依赖库缺失等问题。本文总结了在 Linux 和 Windows 系统上常见的错误及解决方法,包括检查和安装依赖库、配置 SDL 子系统、处理 X11 错误等,帮助用户顺利完成 SDL 的安装和配置。
ly~
2256 8
|
机器学习/深度学习 人工智能 Serverless
20行代码,Serverless架构下用Python轻松搞定图像分类和预测
本文将AI项目与Serverless架构进行结合,在Serverless架构下用20行Python代码搞定图像分类和预测。
112300 127
|
供应链 API UED
逆向海淘代购案例解读:类似Pandabuy淘宝代购集运系统搭建攻略
逆向海淘模式下,Pandabuy式代购集运系统搭建涉及市场定位、供应链管理、平台开发与优化、支付物流及用户体验。系统提供丰富商品选择,集成多平台API,确保数据同步。关键点包括确定目标用户,建立稳定供应链,优化网站与支付流程,合作可靠物流,并提供客服支持以提升用户满意度。通过这样的攻略,可构建一站式跨境购物解决方案。
|
JavaScript 定位技术
vue-baidu-map 百度地图检索、获取坐标
vue-baidu-map 百度地图检索、获取坐标
305 1
|
数据安全/隐私保护
URI 和URL 的区别是什么?
URI 和URL 的区别是什么?
1735 0
|
存储 程序员
【汇编】Loop指令、段前缀
【汇编】Loop指令、段前缀
1110 0
【汇编】Loop指令、段前缀
|
Web App开发 安全 前端开发
Wappalyzer浏览器插件:揭开网站的技术秘密
你曾经访问过一个网站,想知道它是由哪些技术构建的吗?在这个数字时代,网站技术正在不断发展,而Wappalyzer浏览器插件就是一个能够帮助你揭开网站技术秘密的神奇工具。
465 0
Wappalyzer浏览器插件:揭开网站的技术秘密
|
存储 Prometheus 监控
Prometheus 四种指标类型
Prometheus 四种指标类型
720 0