Reroute Unassigned Shards——遇到主shard 出现的解决方法就是重新路由

本文涉及的产品
Elasticsearch Serverless通用抵扣包,测试体验金 200元
简介:

Red Cluster!

摘自:http://blog.kiyanpro.com/2016/03/06/elasticsearch/reroute-unassigned-shards/

There are 3 cluster states:

  1. green: All primary and replica shards are active
  2. yellow: All primary shards are active, but not all replica shards are active
  3. red: Not all primary shards are active

When cluster health is red, it means cluster is dead. And that means you can do nothing until it’s recovered, which is very bad indeed. I will share with you how to deal with one common situation: when cluster is red due to unassigned shards.

Steps

The general idea is pretty simple: find those shards which are unassigned, manually assign them to a node with reroute API. Let’s see how we can do that step by step. Then we can combine them into a configurable simple script.

Step 1: Check Unassigned Shards

To get cluster information, we usually use cat APIs. There is a GET /_cat/shards endpoint to show a detailed view of what nodes contain which shards[1].

Cat shards

1
2
3
4
5
6
7
8
9
# cat shards verbose
curl  "http://your.elasticsearch.host.com:9200/_cat/shards?v"
 
# cat shards index
curl  "http://your.elasticsearch.host.com:9200/_cat/shards/wiki2"
# example return
# wiki2 0 p STARTED 197 3.2mb 192.168.56.10 Stiletto
# wiki2 1 p STARTED 205 5.9mb 192.168.56.30 Frankie Raye
# wiki2 2 p STARTED 275 7.8mb 192.168.56.20 Commander Kraken

By piping cat shards to fgrep, we can get all unassigned shards.

Get unassigned shards

1
2
3
4
5
6
# cat shards with fgrep
curl  "http://your.elasticsearch.host.com:9200/_cat/shards" | fgrep UNASSIGNED
# example return
# wiki1 0 r UNASSIGNED ALLOCATION_FAILED
# wiki1 1 r UNASSIGNED ALLOCATION_FAILED
# wiki1 2 r UNASSIGNED ALLOCATION_FAILED

 

If you don’t want to deal with shell script, you can also find these unassigned shards using another endpoint POST /_flush/synced[2]. This endpoint is actually not just some information. It allows an administrator to initiate a synced flush manually. This can be particularly useful for a planned (rolling) cluster restart where you can stop indexing and don’t want to wait the default 5 minutes for idle indices to be sync-flushed automatically. It returns with a json response.

_flush/synced

1
curl -XPOST  "http://your.elasticsearch.host.com:9200/twitter/_flush/synced"

If there are failed shards in the response, we can iterate through a failures array to get all unassigned ones.

Example response with failed shards

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
"_shards": {
"total": 4,
"successful": 1,
"failed": 1
},
"twitter": {
"total": 4,
"successful": 3,
"failed": 1,
"failures": [
{
"shard": 1,
"reason": "unexpected error",
"routing": {
"state": "STARTED",
"primary": false,
"node": "SZNr2J_ORxKTLUCydGX4zA",
"relocating_node": null,
"shard": 1,
"index": "twitter"
}
}
]
}
}

 

Step 2: Reroute

The reroute command allows to explicitly execute a cluster reroute allocation command including specific commands[3] . An unassigned shard can be explicitly allocated on a specific node.

Reroute example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
curl -XPOST  'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"move" :
{
"index" : "test", "shard" : 0,
"from_node" : "node1", "to_node" : "node2"
}
},
{
"allocate" : {
"index" : "test", "shard" : 1, "node" : "node3"
}
}
]
}'

There are 3 kinds of commands you can use:

move: Move a started shard from one node to another node. Accepts index and shard for index name and shard number, from_node for the node to move the shard from, and to_node for the node to move the shard to.

cancel: Cancel allocation of a shard (or recovery). Accepts index and shard for index name and shard number, and node for the node to cancel the shard allocation on. It also accepts allow_primary flag to explicitly specify that it is allowed to cancel allocation for a primary shard. This can be used to force resynchronization of existing replicas from the primary shard by cancelling them and allowing them to be reinitialized through the standard reallocation process.

allocate: Allocate an unassigned shard to a node. Accepts the index and shard for index name and shard number, and node to allocate the shard to. It also accepts allow_primary flag to explicitly specify that it is allowed to explicitly allocate a primary shard (might result in data loss).

Combining step 2 with the unassigned shards from Step 1, we can reroute all unassigned shards 1 by 1, thus getting faster cluster recovery from red state.

Example Solutions

Python

Below is a python script I wrote using POST /_flush/synced and POST /reroute

Shell Script

Below is a shell script I found elsewhere in a blog post[4]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
for shard in $(curl -XGET http://localhost:9200/_cat/shards | grep UNASSIGNED | awk '{print $2}'); do
curl -XPOST  'localhost:9200/_cluster/reroute' -d '{
"commands" : [ {
"allocate" : {
"index" : "t37", # index name
"shard" : $shard,
"node" : "datanode15", # node name
"allow_primary" : true
}
}
]
}'
sleep 5
done

EDIT: Based on Vincent’s comment I updated the shell script:

Possible Unassigned Shard Reasons

FYI, these are the possible reasons for a shard be in a unassigned state[1]:

 

Name Comment
INDEX_CREATED Unassigned as a result of an API creation of an index
CLUSTER_RECOVERED Unassigned as a result of a full cluster recovery
INDEX_REOPENED Unassigned as a result of opening a closed index
DANGLING_INDEX_IMPORTED Unassigned as a result of importing a dangling index
NEW_INDEX_RESTORED Unassigned as a result of restoring into a new index
EXISTING_INDEX_RESTORED Unassigned as a result of restoring into a closed index
REPLICA_ADDED Unassigned as a result of explicit addition of a replica
ALLOCATION_FAILED Unassigned as a result of a failed allocation of the shard
NODE_LEFT Unassigned as a result of the node hosting it leaving the cluster
REROUTE_CANCELLED Unassigned as a result of explicit cancel reroute command
REINITIALIZED When a shard moves from started back to initializing, for example, with shadow replicas
REALLOCATED_REPLICA A better replica location is identified and causes the existing replica allocation to be cancelled

References

  1. ElasticSearch Document Cat Shards
  2. ElasticSearch Document Synced Flush
  3. ElasticSearch Document Cluster Reroute
  4. How to fix your elasticsearch cluster stuck in initializing shards mode?


















本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/7459408.html,如需转载请自行联系原作者

相关实践学习
以电商场景为例搭建AI语义搜索应用
本实验旨在通过阿里云Elasticsearch结合阿里云搜索开发工作台AI模型服务,构建一个高效、精准的语义搜索系统,模拟电商场景,深入理解AI搜索技术原理并掌握其实现过程。
ElasticSearch 最新快速入门教程
本课程由千锋教育提供。全文搜索的需求非常大。而开源的解决办法Elasricsearch(Elastic)就是一个非常好的工具。目前是全文搜索引擎的首选。本系列教程由浅入深讲解了在CentOS7系统下如何搭建ElasticSearch,如何使用Kibana实现各种方式的搜索并详细分析了搜索的原理,最后讲解了在Java应用中如何集成ElasticSearch并实现搜索。  
相关文章
|
canal 关系型数据库 MySQL
es添加索引命令行和浏览器添加索引--图文详解
es添加索引命令行和浏览器添加索引--图文详解
393 1
解决attempted to register plugin but it was already registered with this flutterengine
解决attempted to register plugin but it was already registered with this flutterengine
235 2
|
8月前
|
数据管理 关系型数据库 MySQL
数据管理服务DMS支持MySQL数据库的无锁结构变更
本文介绍了使用Sysbench准备2000万数据并进行全表字段更新的操作。通过DMS的无锁变更功能,可在不锁定表的情况下完成结构修改,避免了传统方法中可能产生的锁等待问题。具体步骤包括:准备数据、提交审批、执行变更及检查表结构,确保变更过程高效且不影响业务运行。
401 2
|
9月前
|
存储 Linux
logstash与Rsyslog安装配置
通过将Logstash和Rsyslog结合使用,可以实现强大的日志收集和处理功能。Rsyslog负责接收和转发系统日志,Logstash负责解析和存储日志数据。以上指南提供了详细的安装和配置步骤,确保了两者能够无缝协作,以满足各种日志管理需求。希望本文能帮助你在实际项目中高效地部署和使用Logstash与Rsyslog。
234 8
|
12月前
|
SQL JavaScript 数据库连接
Seata的工作原理
【10月更文挑战第30天】
345 3
|
12月前
|
存储 JSON 数据格式
docker中查看数据卷
【10月更文挑战第15天】
229 2
|
运维 Linux 虚拟化
linux|磁盘管理工作|lvm逻辑管理卷的创建和使用总结(包括扩容,根目录扩容演示)
linux|磁盘管理工作|lvm逻辑管理卷的创建和使用总结(包括扩容,根目录扩容演示)
1033 0
|
存储 弹性计算 固态存储
阿里云服务器可选云盘ESSD AutoPL、ESSD云盘、SSD云盘区别参考
目前阿里云服务器的云盘有ESSD AutoPL、高效云盘、ESSD云盘、SSD云盘等可供选择,有的新手用户并不清楚他们之间的区别,因此也就不知道应该如何选择,因为不同种类的云盘在最大IOPS、最大吞吐量等性能上是有区别的,下面我们一起来他们之间的区别,这样就有助于我们选择适合自己需求的系统盘与数据盘了。
阿里云服务器可选云盘ESSD AutoPL、ESSD云盘、SSD云盘区别参考
|
存储 监控 安全
Elasticsearch 8.X 集群 SSL 证书到期了,怎么更换?
Elasticsearch 8.X 集群 SSL 证书到期了,怎么更换?