ELK技术栈 - Elasticsearch 学习笔记（一）-阿里云开发者社区

为了更好的操作es，这里安装了kibana进行辅助操作，并且实际使用的过程也是使用kibana进行操作的

安装 Sense

Sense 是一个 Kibana 程序，它的交互式控制台可以帮助你直接通过浏览器向 Elasticsearch
提交请求。 在书籍的在线版中，众多的代码示例都包含了 View in Sense 链接。当你点击之
后，它将自动在 Sense 控制台中运行这段代码。你并不是一定要安装 Sense，但那将失去很
多与本书的互动以及直接在你本地的集群中的实验代码的乐趣。

在 Kibana 的目录中运行以下命令以下载并安装 Sense 程序:（5.0之后使用devtool进行支持）

使用命令进行写入

GET方式

curl -XGET 'http://localhost:9200/_count?pretty' -d '
{ 
    "query": {
    "match_all": {}
    }
}
'

相应的 HTTP 请求方法或者变量 : GET , POST , PUT , HEAD 或者 DELETE 。
集群中任意一个节点的访问协议、主机名以及端口。
请求的路径。
任意一个查询后再加上 ?pretty 就可以生成更加美观的JSON反馈，以增强可读性。
一个 JSON 编码的请求主体（如果需要的话）。

响应内容

{
    "count" : 0,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
  }
}

建立一个员工名单

想象我们正在为一个名叫 megacorp 的公司的 HR 部门制作一个新的员工名单系统，这些名单应该可以满足实时协同工作，所以它应该可以满足以下要求：

数据可以包含多个值的标签、数字以及纯文本内容，
可以检索任何职员的所有数据。
允许结构化搜索。例如，查找30岁以上的员工。
允许简单的全文搜索以及相对复杂的短语搜索。
在返回的匹配文档中高亮关键字。
拥有数据统计与管理的后台。

关系数据库 ⇒ 数据库 ⇒ 表 ⇒ 行 ⇒ 列(Columns)
Elasticsearch ⇒ 索引 ⇒ 类型 ⇒ 文档 ⇒ 字段(Fields)

所以为了创建员工名单，我们需要进行如下操作：

为每一个员工的文档创建索引，每个文档都包含了一个员工的所有信息
每个文档都会被标记为 employee 类型。
这种类型将存活在 megacorp 这个索引中。
这个索引将会存储在 Elasticsearch 的集群中

添加一个索引库

curl -XPUT 'http://localhost:9200/megacorp/employee/1?pretty' -d '
{
  "first_name" : "John",
  "last_name" : "Smith",
  "age" : 25,
  "about" : "I love to go rock climbing",
  "interests": [ "sports", "music" ]
}'
curl -XPUT 'http://localhost:9200/megacorp/employee/2?pretty' -d '
{
    "first_name" : "Jane",
    "last_name" : "Smith",
    "age" : 32,
    "about" : "I like to collect rock albums",
    "interests": [ "music" ]
}
'
curl -XPUT 'http://localhost:9200/megacorp/employee/3?pretty' -d '
{
    "first_name" : "Douglas",
    "last_name" : "Fir",
    "age" : 35,
    "about": "I like to build cabinets",
    "interests": [ "forestry" ]
}
'

{
  "_index" : "megacorp",
  "_type" : "employee",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
megacorp 索引的名字
employee 类型的名字
1 当前员工的ID

检索文档

curl -XGET 'localhost:9200/megacorp/employee/1?pretty'

返回的内容包含了这个文档的元数据信息，而 John Smith 的原始 JSON 文档也在 _source 字段中出现了：

{
    "_index" : "megacorp",
    "_type" : "employee",
    "_id" : "1",
    "_version" : 1,
    "found" : true,
    "_source" : {
        "first_name" : "John",
        "last_name" : "Smith",
        "age" : 25,
        "about" : "I love to go rock climbing",
        "interests": [ "sports", "music" ]
    }
}

我们通过将HTTP后的请求方式由 PUT 改变为 GET 来获取文档，同理，我们也可以将其更换为 DELETE 来删除这个文档， HEAD 是用来查询这个文档是否存在的。如果你想替换一个已经存在的文档，你只需要使用 PUT 再次发出请求即可。

简易搜索

搜索全部员工：

curl -XGET 'localhost:9200/megacorp/employee/_search?pretty'

响应数据

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]
  }
}

查询字符串(query string) 搜索

简写：

GET /megacorp/employee/_search?q=last_name:Smith

非简写：

curl -XGET 'localhost:9200/megacorp/employee/_search?q=last_name:Fir&pretty'  （查找last_name=Far的员工）

使用Query DSL搜索

查询字符串是通过命令语句完成点对点(ad hoc) 的搜索，但是这也有它的局限性（可参阅

《搜索局限性》章节）。Elasticsearch 提供了更加丰富灵活的查询语言，它被称作 Query

DSL，通过它你可以完成更加复杂、强大的搜索任务。

查询语句

curl -XGET '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match" : {
          "last_name" : "Smith"
        }
  }
}
'

查询结果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

更加复杂的搜索

接下来，我们再提高一点儿搜索的难度。我们依旧要寻找出姓 Smith 的员工，但是我们还将添加一个年龄大于30岁的限定条件。我们的查询语句将会有一些细微的调整来以识别结构化搜索的限定条件 filter（过滤器）:

curl -XGET 'localhost:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "filtered":{
            "filter":{
                "range":{
                    "age":{"gt": "30"}
                }
            },
            "query":{
                "match":{
                    "last_name":"Smith"
                }
            }
        }
    }
}
'

这一部分的语句是 range filter ，它可以查询所有超过30岁的数据 -- gt 代表 greater than （大于）

no [query] registered for [filtered]**

解决办法: 过滤查询已被弃用，并在ES 5.0中删除。现在应该使用bool / must / filter查询。

curl -XPOST '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "age": {
            "gt": 20
          }
        }
      },
      "must": {
        "match": {
          "last_name": "Smith"
        }
      }
    }
  }
}
'

结果

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

全文搜索

curl -XPOST '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match" : {
          "about" : "rock climbing"
        }
    }
}
'

结果

{
  "took" : 25,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.53484553,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.53484553,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.26742277,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      }
    ]
  }
}

你会发现我们同样使用了 match 查询来搜索 about 字段中的 rock climbing。我们会得到两个匹配的文档：

通常情况下，Elasticsearch 会通过相关性来排列顺序，第一个结果中，John Smith 的 about 字段中明确地写到 rock climbing。而在 Jane Smith 的 about 字段中，提及到了 rock，但是并没有提及到 climbing，所以后者的 _score 就要比前者的低。

段落搜索

curl -XPOST '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match_phrase" : {
          "about" : "rock climbing"
        }
    }
}
'

结果

{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.53484553,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.53484553,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

高亮我们的搜索

curl -XPOST '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
    "query" : {
        "match_phrase" : {
          "about" : "rock climbing"
        }
    },
    "highlight": {
        "fields" : {
          "about" : {}
        }
    }
}
'

结果

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.53484553,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.53484553,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        },
        "highlight" : {
          "about" : [
            "I love to go <em>rock</em> <em>climbing</em>"
          ]
        }
      }
    ]
  }
}

统计

最后，我们还有一个需求需要完成：可以让老板在职工目录中进行统计。Elasticsearch 把这项功能称作汇总 (aggregations)，通过这个功能，我们可以针对你的数据进行复杂的统计。这个功能有些类似于 SQL 中的 GROUP BY ，但是要比它更加强大。

例如，让我们找一下员工中最受欢迎的兴趣是什么：

curl -XPOST '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
    "aggs": {
      "all_interests": {
        "terms": { "field": "interests" }
      }
    }
}
'

Fielddata is disabled on text fields by default. Set fielddata=true on [interests] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead

默认情况下，在文本字段上禁用Fielddata。在[的兴趣]上设置fielddata = true，以便通过反转索引来加载内存中的fielddata。请注意，这可能会占用大量内存。或者，也可以使用关键字字段

（fielddata会消耗大量的栈内存，尤其在进行加载文本的时候，所以一单fielddata完成了加载，就会一直存在。）

curl -XPOST '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
    "aggs": {
      "all_interests": {
        "terms": { "field": "interests.keyword" }
      }
    }
}
'

结果（截取部分）

"aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "music",
          "doc_count" : 2
        },
        {
          "key" : "forestry",
          "doc_count" : 1
        },
        {
          "key" : "sports",
          "doc_count" : 1
        }
      ]
    }
  }

查询汇总

curl -XGET '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
    "query": {
        "match": {
          "last_name": "Smith"
            }
      },
        "aggs": {
            "all_interests": {
                "terms": {
                    "field": "interests.keyword"
                }
        }
    }
}
'

结果

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  },
  "aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "music",
          "doc_count" : 2
        },
        {
          "key" : "sports",
          "doc_count" : 1
        }
      ]
    }
  }
}

汇总还允许多个层面的统计。比如我们还可以统计每一个兴趣下的平均年龄：

curl -XGET '172.18.118.222:9200/megacorp/employee/_search?pretty' -d '
{
    "aggs" : {
        "all_interests" : {
          "terms" : { "field" : "interests.keyword" },
            "aggs" : {
                "avg_age" : {
                  "avg" : { "field" : "age" }
              }
        }
      }
    }
}
'

结果

 "aggregations" : {
    "all_interests" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "music",
          "doc_count" : 2,
          "avg_age" : {
            "value" : 28.5
          }
        },
        {
          "key" : "forestry",
          "doc_count" : 1,
          "avg_age" : {
            "value" : 35.0
          }
        },
        {
          "key" : "sports",
          "doc_count" : 1,
          "avg_age" : {
            "value" : 25.0
          }
        }
      ]
    }
  }

ElasticSearch 分布式特性

Elasticsearch 很努力地在避免复杂的分布式系统，很多操作都是自动完成的：

可以将你的文档分区到不同容器或者分片中，这些文档可能被存在一个节点或者多个节点。
跨节点平衡集群中节点间的索引与搜索负载。
自动复制你的数据以提供冗余副本，防止硬件错误导致数据丢失。
自动在节点之间路由，以帮助你找到你想要的数据。无缝扩展或者恢复你的集群

空集群

节点是 Elasticsearch 运行中的实例，而集群则包含一个或多个具有相同 cluster.name 的节点，它们协同工作，共享数据，并共同分担工作负荷。由于节点是从属集群的，集群会自我重组来均匀地分发数据。
集群中的一个节点会被选为 master 节点，它将负责管理集群范畴的变更，例如创建或删除索引，添加节点到集群或从集群删除节点。master 节点无需参与文档层面的变更和搜索，这意味着仅有一个 master 节点并不会因流量增长而成为瓶颈。任意一个节点都可以成为 master 节点。我们例举的集群只有一个节点，因此它会扮演 master 节点的角色。
作为用户，我们可以访问包括 master 节点在内的集群中的任一节点。每个节点都知道各个文档的位置，并能够将我们的请求直接转发到拥有我们想要的数据的节点。无论我们访问的是哪个节点，它都会控制从拥有数据的节点收集响应的过程，并返回给客户端最终的结果。这一切都是由 Elasticsearch 透明管理的。

集群健康

在 Elasticsearch 集群中可以监控统计很多信息，其中最重要的就是：集群健康(cluster health)。它的 status 有 green 、 yellow 、 red 三种；

GET /_cluster/health

{
    "cluster_name": "elasticsearch",
    "status": "green", <1>
    "timed_out": false,
    "number_of_nodes": 1,
    "number_of_data_nodes": 1,
    "active_primary_shards": 0,
    "active_shards": 0,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 0
}

status 是我们最应该关注的字段。

状态	意义
green	所有主分片和从分片都可用
yellow	所有主分片可用，但存在不可用的从分片
red	存在不可用的主要分片

添加索引

PUT /blogs
{
    "settings" : {
        "number_of_shards" : 3,
        "number_of_replicas" : 1
    }
}

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

ELK技术栈 - Elasticsearch 学习笔记（一）