实战Elasticsearch6的join类型

本文涉及的产品
Elasticsearch Serverless通用抵扣包,测试体验金 200元
简介: elasticsearch6版本新增了join类型,本文通过实战来熟悉和了解该类型

欢迎访问我的GitHub

这里分类和汇总了欣宸的全部原创(含配套源码): https://github.com/zq2599/blog_demos

本篇概览

  • 《Elasticsearch实战》(英文名Elasticsearch IN ACTION)是经典es教程,对应demo源码地址为:https://github.com/dakrone/elasticsearch-in-action ,最新分支6.x,在使用源码时,发现索引_doc的静态映射脚本增加了一个类型为join的字段,如下所示,:
"mappings" : {
    "_doc" : {
      "_source" : {
        "enabled" : true
      },
      "properties" : {
        "relationship_type": {
          "type": "join",
          "relations" : {
            "group": "event"
          }
        },
        ...
  • 这是es6新增的类型,一起来通过实战学习这个join;

环境信息

  1. 操作系统:Ubuntu 18.04.2 LTS
  2. elasticsearch:6.7.1
  3. kibana:6.7.1

《Elasticsearch实战》demo源码下载地址

  • 本文用到的源码一共两个文件,一个是创建静态映射的mapping.json, 另一个是创建文档的populate.sh , 地址分别如下:
  1. https://github.com/dakrone/elasticsearch-in-action/blob/6.x/mapping.json
  2. https://github.com/dakrone/elasticsearch-in-action/blob/6.x/populate.sh
  • 上述文件的用法:下载到同一个目录,执行命令./populate.sh 192.168.1.101:9200,"192.168.1.101:9200"是es6的http地址和端口;

官方说法

  • 官方对join类型的说明如下:

在这里插入图片描述

  • 我的理解:
  1. join类型用于建立索引内文档的父子关系;
  2. 用父子文档的名字来表示关系;
  • 接下来看看《Elasticsearch实战》的demo中是怎么使用这个字段的;

《Elasticsearch实战》的demo

  • demo中部分文档的创建脚本如下所示:
curl -s -XPOST "$ADDRESS/get-together/_doc/1" -H'Content-Type: application/json' -d'{
  "relationship_type": "group",
  "name": "Denver Clojure",
  "organizer": ["Daniel", "Lee"],
  "description": "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",
  "created_on": "2012-06-15",
  "tags": ["clojure", "denver", "functional programming", "jvm", "java"],
  "members": ["Lee", "Daniel", "Mike"],
  "location_group": "Denver, Colorado, USA"
}'

curl -s -XPOST "$ADDRESS/get-together/_doc/100?routing=1" -H'Content-Type: application/json' -d'{
  "relationship_type": {
    "name": "event",
    "parent": "1"
  },
  "host": ["Lee", "Troy"],
  "title": "Liberator and Immutant",
  "description": "We will discuss two different frameworks in Clojure for doing different things. Liberator is a ring-compatible web framework based on Erlang Webmachine. Immutant is an all-in-one enterprise application based on JBoss.",
  "attendees": ["Lee", "Troy", "Daniel", "Tom"],
  "date": "2013-09-05T18:00",
  "location_event": {
    "name": "Stoneys Full Steam Tavern",
    "geolocation": "39.752337,-105.00083"
  },
  "reviews": 4
}'
  • 如上所示,id为1的记录,其relationship_type字段的值为"group",id为2的记录,relationship_type字段的值不是字符串,而是对象,parent为1表示父文档id为1,name为"event"表示父子关系是"group:event"类型;
  • 注意:上述第二个文档的地址中携带了routing参数,以保持父子在同一个分片,这是在使用join类型是要格外注意的地方;
  • 接下来,确保前面提到的populate.sh脚本已经执行,使得_doc索引及其文档数据在es环境中准备好,就可以实战了,实战环境是Kibana的Det Tools:

查找所有父类型为"group"的文档(结果是子文档):

  • 执行如下脚本:
GET get-together/_search
{
  "query": {
    "has_parent": {
      "parent_type": "group",
      "query": {
        "match_all": {}
      }
    }
  }
}
  • 可以得到所有父类型为"group"的子文档:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 15,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "106",
        "_score" : 1.0,
        "_routing" : "3",
        "_source" : {
          "relationship_type" : {
            "name" : "event",
            "parent" : "3"
          },
          "host" : "Mik",
          "title" : "Social management and monitoring tools",
          "description" : "Shay Banon will be there to answer questions and we can talk about management tools.",
          "attendees" : [
            "Shay",
            "Mik",
            "John",
            "Chris"
          ],
          "date" : "2013-03-06T18:00",
          "location_event" : {
            "name" : "Quid Inc",
            "geolocation" : "37.798442,-122.399801"
          },
          "reviews" : 5
        }
      },
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "107",
        "_score" : 1.0,
        "_routing" : "3",
        "_source" : {
          "relationship_type" : {
            "name" : "event",
            "parent" : "3"
          },
          "host" : "Mik",
          "title" : "Logging and Elasticsearch",
          "description" : "Get a deep dive for what Elasticsearch is and how it can be used for logging with Logstash as well as Kibana!",
          "attendees" : [
            "Shay",
            "Rashid",
            "Erik",
            "Grant",
            "Mik"
          ],
          "date" : "2013-04-08T18:00",
          "location_event" : {
            "name" : "Salesforce headquarters",
            "geolocation" : "37.793592,-122.397033"
          },
          "reviews" : 3
        }
      },
     ...

查找所有子类型为"event"的文档(结果是父文档)

  • 执行如下脚本:
GET get-together/_search
{
  "query": {
    "has_child": {
      "type": "event",
      "query": {
        "match_all": {}
      }
    }
  }
}
  • 可以得到所有子类型为"event"的文档:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "relationship_type" : "group",
          "name" : "Elasticsearch San Francisco",
          "organizer" : "Mik",
          "description" : "Elasticsearch group for ES users of all knowledge levels",
          "created_on" : "2012-08-07",
          "tags" : [
            "elasticsearch",
            "big data",
            "lucene",
            "open source"
          ],
          "members" : [
            "Lee",
            "Igor"
          ],
          "location_group" : "San Francisco, California, USA"
        }
      },
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "relationship_type" : "group",
          "name" : "Denver Clojure",
          "organizer" : [
            "Daniel",
            "Lee"
          ],
          "description" : "Group of Clojure enthusiasts from Denver who want to hack on code together and learn more about Clojure",
          "created_on" : "2012-06-15",
          "tags" : [
            "clojure",
            "denver",
            "functional programming",
            "jvm",
            "java"
          ],
          "members" : [
            "Lee",
            "Daniel",
            "Mike"
          ],
          "location_group" : "Denver, Colorado, USA"
        }
      },
     ...

查找parent的id等于1的子文档

  • 执行如下脚本:
GET get-together/_search
{
  "query": {
    "parent_id": {
      "type": "event",
      "id": "1"
    }
  }
}
  • 可以得到所有parent的id等于1的子文档:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.3291359,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "100",
        "_score" : 1.3291359,
        "_routing" : "1",
        "_source" : {
          "relationship_type" : {
            "name" : "event",
            "parent" : "1"
          },
          "host" : [
            "Lee",
            "Troy"
          ],
          "title" : "Liberator and Immutant",
          "description" : "We will discuss two different frameworks in Clojure for doing different things. Liberator is a ring-compatible web framework based on Erlang Webmachine. Immutant is an all-in-one enterprise application based on JBoss.",
          "attendees" : [
            "Lee",
            "Troy",
            "Daniel",
            "Tom"
          ],
          "date" : "2013-09-05T18:00",
          "location_event" : {
            "name" : "Stoneys Full Steam Tavern",
            "geolocation" : "39.752337,-105.00083"
          },
          "reviews" : 4
        }
      },
      ...

用script_fields简化返回内容

  • 前面的查询,返回的内容是整个_source,如果不需要全部内容,可以用script_fields来简化;
  • 查找所有父文档ID等1的的子文档,并且返回内容只有三个字段:父文档ID、子文档ID、子文档title字段:
GET get-together/_search
{
   "query": {
    "parent_id": {
      "type": "event",
      "id": "1"
    }
  },
  "script_fields":{
      "group_id":{
        "script":{
          "source":"doc['relationship_type#group']"
        }
      },"event_id":{
        "script":{
          "source":"doc['_id']"
        }
      },
      "title":{
        "script":"params['_source']['title']"
      }
    }
}
  • 得到结果如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.3291359,
    "hits" : [
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "100",
        "_score" : 1.3291359,
        "_routing" : "1",
        "fields" : {
          "event_id" : [
            "100"
          ],
          "title" : [
            "Liberator and Immutant"
          ],
          "group_id" : [
            "1"
          ]
        }
      },
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "101",
        "_score" : 1.3291359,
        "_routing" : "1",
        "fields" : {
          "event_id" : [
            "101"
          ],
          "title" : [
            "Sunday, Surly Sunday"
          ],
          "group_id" : [
            "1"
          ]
        }
      },
      {
        "_index" : "get-together",
        "_type" : "_doc",
        "_id" : "102",
        "_score" : 1.3291359,
        "_routing" : "1",
        "fields" : {
          "event_id" : [
            "102"
          ],
          "title" : [
            "10 Clojure coding techniques you should know, and project openbike"
          ],
          "group_id" : [
            "1"
          ]
        }
      }
    ]
  }
}

聚合

  • 执行以下查询,会将所有父文档为group的子文档做桶聚合聚合:
GET get-together/_search
{
  "query": {
    "has_parent": {
      "parent_type": "group",
      "query": {
        "match_all": {}
      }
    }
  },
   "aggs":{
      "parents":{
        "terms":{
          "field":"relationship_type#group"
        }
      }
    }
}
  • 得到的结果如下,按照父文档ID得到聚合结果:
"aggregations" : {
    "parents" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "1",
          "doc_count" : 3
        },
        {
          "key" : "2",
          "doc_count" : 3
        },
        {
          "key" : "3",
          "doc_count" : 3
        },
        {
          "key" : "4",
          "doc_count" : 3
        },
        {
          "key" : "5",
          "doc_count" : 3
        }
      ]
    }
  }
}
  • 以上就是join类型的主要实战内容了,希望能帮助您理解这个新的类型;

欢迎关注阿里云开发者社区博客:程序员欣宸

学习路上,你不孤单,欣宸原创一路相伴...
相关实践学习
以电商场景为例搭建AI语义搜索应用
本实验旨在通过阿里云Elasticsearch结合阿里云搜索开发工作台AI模型服务,构建一个高效、精准的语义搜索系统,模拟电商场景,深入理解AI搜索技术原理并掌握其实现过程。
ElasticSearch 最新快速入门教程
本课程由千锋教育提供。全文搜索的需求非常大。而开源的解决办法Elasricsearch(Elastic)就是一个非常好的工具。目前是全文搜索引擎的首选。本系列教程由浅入深讲解了在CentOS7系统下如何搭建ElasticSearch,如何使用Kibana实现各种方式的搜索并详细分析了搜索的原理,最后讲解了在Java应用中如何集成ElasticSearch并实现搜索。  
相关文章
|
3天前
|
缓存 监控 前端开发
顺企网 API 开发实战:搜索 / 详情接口从 0 到 1 落地(附 Elasticsearch 优化 + 错误速查)
企业API开发常陷参数、缓存、错误处理三大坑?本指南拆解顺企网双接口全流程,涵盖搜索优化、签名验证、限流应对,附可复用代码与错误速查表,助你2小时高效搞定开发,提升响应速度与稳定性。
|
存储 运维 监控
超越传统模型:从零开始构建高效的日志分析平台——基于Elasticsearch的实战指南
【10月更文挑战第8天】随着互联网应用和微服务架构的普及,系统产生的日志数据量日益增长。有效地收集、存储、检索和分析这些日志对于监控系统健康状态、快速定位问题以及优化性能至关重要。Elasticsearch 作为一种分布式的搜索和分析引擎,以其强大的全文检索能力和实时数据分析能力成为日志处理的理想选择。
757 6
|
6月前
|
人工智能 自然语言处理 运维
让搜索引擎“更懂你”:AI × Elasticsearch MCP Server 开源实战
本文介绍基于Model Context Protocol (MCP)标准的Elasticsearch MCP Server,它为AI助手(如Claude、Cursor等)提供与Elasticsearch数据源交互的能力。文章涵盖MCP概念、Elasticsearch MCP Server的功能特性及实际应用场景,例如数据探索、开发辅助。通过自然语言处理,用户无需掌握复杂查询语法即可操作Elasticsearch,显著降低使用门槛并提升效率。项目开源地址:<https://github.com/awesimon/elasticsearch-mcp>,欢迎体验与反馈。
1626 1
|
自然语言处理 关系型数据库 数据库
ElasticSearch 映射类型及数据类型区分
ElasticSearch 映射类型及数据类型区分
185 0
|
存储 数据采集 数据处理
数据处理神器Elasticsearch_Pipeline:原理、配置与实战指南
数据处理神器Elasticsearch_Pipeline:原理、配置与实战指南
581 12
|
人工智能 自然语言处理 开发者
Langchain 与 Elasticsearch:创新数据检索的融合实战
Langchain 与 Elasticsearch:创新数据检索的融合实战
|
缓存 数据处理 数据安全/隐私保护
Elasticsearch索引状态管理实战指南
Elasticsearch索引状态管理实战指南
243 0
|
存储 索引
Elasticsearch中父子文档的关联:利用Join类型赋予文档的层级关系
Elasticsearch中父子文档的关联:利用Join类型赋予文档的层级关系
|
存储 索引
Elasticsearch索引之嵌套类型:深度剖析与实战应用
Elasticsearch索引之嵌套类型:深度剖析与实战应用
|
存储 JSON 搜索推荐
Springboot2.x整合ElasticSearch7.x实战(三)
Springboot2.x整合ElasticSearch7.x实战(三)
169 0