概述
继续跟中华石杉老师学习ES,第56篇
课程地址: https://www.roncoo.com/view/55
官网
简言之,就是对类似文件系统这种的有多层级关系的数据进行分词
Path Hierarchy Tokenizer Examples:戳这里
示例
模拟:文件系统数据构造
PUT /filesystem { "settings": { "analysis": { "analyzer": { "paths": { "tokenizer": "path_hierarchy" } } } } }
测试path_hierarchy分词
POST filesystem/_analyze { "tokenizer": "path_hierarchy", "text": "/home/elasticsearch/image" }
返回:
{ "tokens": [ { "token": "/home", "start_offset": 0, "end_offset": 5, "type": "word", "position": 0 }, { "token": "/home/elasticsearch", "start_offset": 0, "end_offset": 19, "type": "word", "position": 0 }, { "token": "/home/elasticsearch/image", "start_offset": 0, "end_offset": 25, "type": "word", "position": 0 } ] }
path_hierarchy tokenizer: 会把/a/b/c/d
路径通过path_hierarchy 分词为 /a/b/c/d, /a/b/c, /a/b, /a
需求一: 查找一份,内容包括ES,
在/workspace/workspace/projects/helloworld这个目录下的文件
手动指定字段类型,并模拟个数据到索引
#指定字段类型 PUT /filesystem/_mapping/file { "properties": { "name": { "type": "keyword" }, "path": { "type": "keyword", "fields": { "tree": { "type": "text", "analyzer": "paths" } } } } } #查看映射 GET /filesystem/_mapping #写入数据 PUT /filesystem/file/1 { "name": "README.txt", "path": "/workspace/projects/helloworld", "contents": "小工匠跟石杉老师学习ES" }
需求DSL:
#文件搜索需求:查找一份,内容包括ES,在/workspace/workspace/projects/helloworld这个目录下的文件 GET /filesystem/_search { "query": { "bool": { "must": [ { "match": { "contents": "ES" } } ], "filter": { "term": { "path": "/workspace/projects/helloworld" } } } } }
返回:
需求二: 搜索/workspace目录下,内容包含ES的所有的文件
再写几条数据进去
PUT /filesystem/file/2 { "name": "README.txt", "path": "/workspace/projects", "contents": "小工匠跟石杉老师学习ES" } PUT /filesystem/file/3 { "name": "README.txt", "path": "/workspace/xxxxx", "contents": "小工匠跟石杉老师学习ES" } PUT /filesystem/file/4 { "name": "README.txt", "path": "/home/artisan", "contents": "小工匠跟石杉老师学习ES" } PUT /filesystem/file/5 { "name": "README.txt", "path": "/workspace", "contents": "小工匠跟石杉老师学习ES" }
需求DSL: "path.tree": "/workspace"
GET filesystem/_search { "query": { "bool": { "must": [ { "match": { "contents": "ES" } } ], "filter": { "term": { "path.tree": "/workspace" } } } } }
返回:
{ "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 4, "max_score": 0.2876821, "hits": [ { "_index": "filesystem", "_type": "file", "_id": "5", "_score": 0.2876821, "_source": { "name": "README.txt", "path": "/workspace", "contents": "小工匠跟石杉老师学习ES" } }, { "_index": "filesystem", "_type": "file", "_id": "1", "_score": 0.2876821, "_source": { "name": "README.txt", "path": "/workspace/projects/helloworld", "contents": "小工匠跟石杉老师学习ES" } }, { "_index": "filesystem", "_type": "file", "_id": "3", "_score": 0.2876821, "_source": { "name": "README.txt", "path": "/workspace/xxxxx", "contents": "小工匠跟石杉老师学习ES" } }, { "_index": "filesystem", "_type": "file", "_id": "2", "_score": 0.18232156, "_source": { "name": "README.txt", "path": "/workspace/projects", "contents": "小工匠跟石杉老师学习ES" } } ] } }
可以看到id=4的数据,不符合需求,没有被查询出来,OK。