前言
ES的查询效率算是比较高的,但是from+size
的分页查询方式只能查到一万条,并且随着分页到后面,执行效率越低。
Scroll滚动查询的方式可以查询大量数据,并能保证查询数据结果稳定。对于后台批量数据来说非常有用。
查询
第一次查询
第一次查询和通常的_search
查询基本一致,只需要在后面加上?scroll=1m
,1m代表一分钟,参考的时间格式如下
GET bbs/_search?scroll=1m { "size": 200 }
返回结果除了和正常查询结果基本一致之外,增加了返回值_scroll_id
{ "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAFwRFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABcEhZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAXBMWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAFwUFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABcFRZ6b2pWamw2RFRBT0FqbUtMMmY2M013", "took" : 6, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5001, "max_score" : 1.0, "hits" : [ { (省略。。。。。。)
滚动请求
得到第一次请求的_scroll_id
之后,就可以在设定的有效时间内,使用这个_scroll_id
完成滚动查询。
GET /_search/scroll { "scroll":"10m", "scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAFsTFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABbFxZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAWxQWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAFsVFnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABbFhZ6b2pWamw2RFRBT0FqbUtMMmY2M013" }
滚动查询得到的结果和第一次请求的结果一致,返回的_scroll_id
也是一致的
如果请求翻页的结果已经翻完,返回的结果也是一致的,只是hits
里面没有数据了,可以根据这个判断数据已经刷完。
{ "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAF17FnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABdeRZ6b2pWamw2RFRBT0FqbUtMMmY2M013AAAAAAAAXXwWem9qVmpsNkRUQU9Bam1LTDJmNjNNdwAAAAAAAF16FnpvalZqbDZEVEFPQWptS0wyZjYzTXcAAAAAAABdfRZ6b2pWamw2RFRBT0FqbUtMMmY2M013", "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 5001, "max_score" : 1.0, "hits" : [ ] } }
如果查询的_scroll_id
已经超时,那么就会返回错误码
{ "error" : { "root_cause" : [ { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24047]" }, { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24051]" }, { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24048]" }, { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24049]" }, { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24050]" } ], "type" : "search_phase_execution_exception", "reason" : "all shards failed", "phase" : "query", "grouped" : true, "failed_shards" : [ { "shard" : -1, "index" : null, "reason" : { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24047]" } }, { "shard" : -1, "index" : null, "reason" : { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24051]" } }, { "shard" : -1, "index" : null, "reason" : { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24048]" } }, { "shard" : -1, "index" : null, "reason" : { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24049]" } }, { "shard" : -1, "index" : null, "reason" : { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24050]" } } ], "caused_by" : { "type" : "search_context_missing_exception", "reason" : "No search context found for id [24050]" } }, "status" : 404 }
清理scroll
我们可以主动清理scroll,释放es压力。
DELETE /_search/scroll { "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" }
总结
优点
- 可以查询大量数据
- 稳定分页不会数据重复
- 可以超出分页的一万条限制
缺点
- 不能跨页请求
- 不支持重试请求