ElasticSearch DSL操作-阿里云开发者社区

ElasticSearch DSL操作

2023-12-25 112

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

实时数仓Hologres，5000CU*H 100GB 3个月

智能开放搜索 OpenSearch行业算法版，1GB 20LCU 1个月

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

简介： ElasticSearch DSL操作

创建索引

curl -XPUT http://IP:9200/ads_user_profile -H 'Content-Type: application/json' -H 'Authorization: Basic ==' -d'
{
   
  "settings":{
   
    "number_of_shards":6,
    "number_of_replicas":0
  },
  "mappings":{
   
      "properties":{
   
        "one_id":{
   
          "type":"keyword"
        },
        "user_groups":{
   
          "type": "nested",
          "properties":{
   
            "code":{
   "type":"keyword"},
            "name":{
   "type":"keyword"}
          }
        }
      }
  }
}
'

删除某个索引下全部数据

POST operator_other_index/_delete_by_query?wait_for_completion=false
  {
    "query": {
      "match_all": {}
    }
  }

Ip查询


GET my-index/_search
{
  "query": {
    "term": {
      "ip_addr": "192.168.0.0/16"
    }
  }
}

针对上面的搜索，我稍微做一下解释：对于上面的 IPv4 的 IP 地址含有4个 bytes，而每个 byte 含有8个 digits。在上面的 /16 即表示前面的 16 位的 digits，也即 192.168。我们可以这么说任何一个 IP 地址位于 192.168.0.0 至 192.168.255.255 都在这个范围内

根据ip范围查询

GET operator_other_index/_search
{
  "query": {
    "range": {
      "start_ip": {
        "gte": "192.168.2.100",
        "lte": "192.168.2.102"
      }
    }
  } ,"_source": [
    "start_ip",
    "end_ip"
  ]
}

获取重复数据

GET test.project/_search
{
   
    "size":0,
    "aggs":{
   
        "field":{
   
            "terms":{
   
                "field":"id.keyword",
                "size":3000,
                "min_doc_count":1
            }
        }
    }
}

获取去重后数量

GET test.project/_search
{
   
  "size": 0, 
  "aggs": {
   
    "count": {
   
      "cardinality": {
   
        "field": "id.keyword"
      }
    }
  }
}

模糊(Like)匹配单个字段

GET operator_other_index/_search
{
    "query":{
        "wildcard":{
            "certificate_code":"*824607*"
        }
    }
}

模糊(Like)匹配多个字段

GET operator_other_index/_search
{
    "query":{
        "bool":{
            "should":[
                {
                    "wildcard":{
                        "name":"*张*"
                    }
                },
                {
                    "wildcard":{
                        "emergency_contact_name":"*张*"
                    }
                },
                {
                    "wildcard":{
                        "certificate_type":"*张*"
                    }
                }
            ]
        }
    }
}

查询只返回某些指定字段

返回start_ip与end_ip字段

GET operator_other_index/_search
{
  "_source": [
    "start_ip",
    "end_ip"
  ]
}

多字段检索 multi_match

multi_match 说明：https://www.elastic.co/guide/cn/elasticsearch/guide/current/multi-match-query.html

GET operator_other_index/_search
{
  "query": {
    "multi_match": {
      "query": "互联网数据中心编码",
      "fields": ["data_center_service_code","computer_room_address","name","user_type","credit_code","address","emergency_contact_name","certificate_code","certificate_type","mobile_phone","phone"]
    }
  }
}

全字段检索

GET operator_other_index/_search
{
  "query": {
    "multi_match": {
      "query": "互联网数据中心编码"
    }
  }
}

全字段检索 - 设置完全匹配 minimum_should_match

参考：https://blog.csdn.net/qq_22985751/article/details/90704189

这里写100% 即是必须命中搜索词

GET operator_other_index/_search
{
  "query": {
    "multi_match": {
      "query": "申伟",
      "fields": ["data_center_service_code","computer_room_address","name","user_type","credit_code","address","emergency_contact_name","certificate_code","certificate_type","mobile_phone","phone"],
    "minimum_should_match":"100%"
    }
  }
}

根据keyword字段进行group by

java代码：https://www.cnblogs.com/xionggeclub/p/7975982.html

GET log_lnk_data_flow_index/_search
{
  "size":0,
  "aggs": {     
    "group_by_keyword": {    
      "terms": { 
        "field": "task_keyword" 
        ,"size": 40000
        ,"order": {
          "_count": "asc"
        }
      }  
    }
  }
}

根据ID更新数据

数据必须存在，如果之前不存在则会报错，报错内容如下

{
    
  "error" : {
    
    "root_cause" : [
      {
    
        "type" : "document_missing_exception",
        "reason" : "[_doc][2]: document missing",
        "index_uuid" : "KhAqJx5SR7uJIVZkdO0LIw",
        "shard" : "0",
        "index" : "index1"
      }
    ],
    "type" : "document_missing_exception",
    "reason" : "[_doc][2]: document missing",
    "index_uuid" : "KhAqJx5SR7uJIVZkdO0LIw",
    "shard" : "0",
    "index" : "index1"
  },
  "status" : 404
}

POST /customer/_update/1?pretty
{
   
  "doc": {
    "name": "Jane Doe", "age": 20 }
}

Upsert操作

upsert 操作用于如果指定的 document 不存在，就执行 upsert 中的初始化操作；如果指定的 document 存在，就执行 doc 或者 script 指定的 partial update 操作

往index1所用中添加id为3的数据，如果id为3的数据不存在，则使用upsert下的数据修改或新增字段counter为1;如果存在则使用doc下的数据修改或新增字段name为new_name

POST index1/_update/3
{
   
    "doc" : {
   
        "name" : "new_name"
    },
    "upsert" : {
   
        "counter" : 1
    }
}

script demo

数据存在则将num字段值加1，数据不存在则添加upsert下的字段

POST indexname/_update/id
{
   
   "script" : "ctx._source.num+=1",
   "upsert": {
   
         "field1":"value1",
           "field2":"value2"
    }
}

根据nested类型字段进行集合统计数据

参考：https://www.cnblogs.com/niulang/p/16455158.html

user_groups是嵌套字段，类型type=nested，user_groups.name 是他一个子属性，即: [{user_groups.name}, {user_groups.name}, {user_groups.name}]

POST /ads_user_profile/_search?scroll=2m
{
   
  "timeout": "6000s", 
  "aggregations": {
   
      "test": {
   
          "nested": {
   
              "path": "user_groups"
          },
          "aggregations": {
   
              "tag_bucket": {
   
                  "terms": {
   
                      "field": "user_groups.name"
                  }
              }
          }
      }
  },
  "query": {
    //查询条件可有可无
    "term": {
   
      "one_id": {
   
        "value": "153180716"
      }
    }
  }
}

返回结果

 {
   
    "test" : {
   
      "doc_count" : 98840193,
      "tag_bucket" : {
   
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 12336025,
        "buckets" : [
          {
   
            "key" : "测试新增",
            "doc_count" : 12764541
          },
          {
   
            "key" : "标签年龄段不等于0-18",
            "doc_count" : 12555174
          }
        ]
      }
    }}

根据指定条件，查询nested字段中b属性等于x，对a属性进行聚合统计，其他不符合条件的子属性不统计

参考：https://blog.csdn.net/qq_23030337/article/details/123005664

已有数据情况如下：

有500多万数据如下
[
  {
   
    "first_login_time" : "1652407073",
    "user_groups" : [
            {
   
              "code" : "A001",
              "name" : "测试统计字段"
            },
            {
   
              "code" : "B001",
              "name" : "测试统计字段2"
            }
    ]
  },......
]

有1条数据如下
[
  {
   
    "first_login_time" : "1652407073",
    "user_groups" : [
            {
   
              "code" : "A002",
              "name" : "测试统计字段"
            },
            {
   
              "code" : "B001",
              "name" : "测试统计字段2"
            }
    ]
  },......
]

实现统计效果：

统计 name=测试统计字段,并对name=测试统计字段的code值进行聚合统计数量，不包含name等于其他值的统计

查询语句：

POST index_name/_search?scroll=2m
{
   
    "timeout":"6000s",
    "aggregations":{
   
        "test":{
   
            "nested":{
   
                "path":"user_groups"
            },
            "aggregations":{
   
                "tag_bucket":{
   
                    "filter":{
   
                        "term":{
     // 仅统计 user_groups.name = 测试统计字段 的数据，其他的不统计
                            "user_groups.name":"测试统计字段"
                        }
                    },
                    "aggregations":{
   
                        "group_count":{
    //自定义的统计名称
                            "terms":{
     // 根据 user_groups.code 进行聚合数量统计
                                "field":"user_groups.code"
                            }
                        }
                    }
                }
            }
        }
    },
    "query":{
   
        "nested":{
   
            "query":{
   
                "term":{
   
                    "user_groups.name":{
    //最外层查询条件，查询 user_groups.name = 测试统计字段 的数据
                        "value":"测试统计字段",
                        "boost":1
                    }
                }
            },
            "path":"user_groups",
            "ignore_unmapped":false,
            "score_mode":"none",
            "boost":1
        }
    }
}

查询结果

{
   
  "_scroll_id" : "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFjJ6X2lpRXVWUy1DWlVubDJVUGYxVmcAAAAAAxL1ABYxeWhadXI5dVJNcUVsX290ZGRrUEtn",
  "took" : 3205,
  "timed_out" : false,
  "_shards" : {
   
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "aggregations" : {
   
    "test" : {
   
      "doc_count" : 60977543,
      "tag_bucket" : {
   
        "doc_count" : 5995453,
        "group_count" : {
   
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
   
              "key" : "A001",
              "doc_count" : 5995452
            },
            {
   
              "key" : "A002",
              "doc_count" : 1
            }
          ]
        }
      }
    }
  }
}

参考：https://www.xjx100.cn/news/132050.html?action=onClick

时间查询语法

这里做下简单介绍

相对时间查询：

now：表示当前时间点。
now-1h：表示从当前时间向前推算的1小时。
now-2d：表示从当前时间向前推算的2天。
now-1w：表示从当前时间向前推算的1周。
now-1M：表示从当前时间向前推算的1个月。
now-1y：表示从当前时间向前推算的1年。

绝对时间查询：

2021-09-01：表示指定的日期，不包括具体的时间。
2021-09-01T10:00:00：表示指定的日期和时间。

查询方法

{"range": {"timestamp": {"gte": "now-1d", "lt": "now"}}}：表示查询从过去一天内的数据，包括当前时间之前的数据。
{"range": {"timestamp": {"gte": "2021-09-01", "lt": "2021-09-02"}}}：表示查询指定日期范围内的数据，不包括结束日期的数据。

{"range": {"timestamp": {"time_zone": "+08:00", "gte": "now-1h", "lte": "now"}}}：表示在指定的时间区间内进行查询，并指定时区。

{"bool": {"filter": {"range": {"timestamp": {"gte": "now-1h/h", "lte": "now/h"}}}}}：表示查询过去一小时内每个完整小时的数据。

ElasticSearch DSL操作

创建索引

删除某个索引下全部数据

Ip查询

根据ip范围查询

获取重复数据

获取去重后数量

模糊(Like)匹配单个字段

模糊(Like)匹配多个字段

查询只返回某些指定字段

多字段检索 multi_match

多字段检索 multi_match

全字段检索

全字段检索 - 设置完全匹配 minimum_should_match

根据keyword字段进行group by

根据ID更新数据

Upsert操作

根据nested类型字段进行集合统计数据

根据指定条件，查询nested字段中b属性等于x，对a属性进行聚合统计，其他不符合条件的子属性不统计

时间查询语法

大数据与机器学习

热门文章

最新文章

相关产品

相关课程

相关电子书

相关实验场景