搜索引擎 _ Elasticsearch(一)https://developer.aliyun.com/article/1469573
指定字段的类型
获得这个规则! 可以通过 GET 请求获取具体的信息!
查看默认的信息
如果自己的文档字段没有指定,那么es 就会给我们默认配置字段类型!
扩展: 通过命令 elasticsearch 索引情况! 通过get _cat/ 可以获得es的当前的很多信息!
修改 提交还是使用PUT 即可! 然后覆盖!最新办法!
曾经!
现在的方法!
删除索引!
通过DELETE 命令实现删除、 根据你的请求来判断是删除索引还是删除文档记录!
使用RESTFUL 风格是我们ES推荐大家使用的!
8. 关于文档的基本操作(重点)
基本操作
添加数据
PUT /kuangshen/user/1 { "name": "狂神说", "age": 23, "desc": "一顿操作猛如虎,一看工资2500", "tags": ["技术宅","温暖","直男"] }
获取数据 GET
更新数据 PUT
Post _update , 推荐使用这种更新方式!
简单地搜索!
GET kuangshen/user/1
简答的条件查询,可以根据默认的映射规则,产生基本的查询!
复杂操作搜索 select ( 排序,分页,高亮,模糊查询,精准查询!)
输出结果,不想要那么多!
我们之后使用Java操作es ,所有的方法和对象就是这里面的 key!
排序!
分页查询!
数据下标还是从0开始的,和学的所有数据结构是一样的!
/search/{current}/{pagesize}
布尔值查询
must (and),所有的条件都要符合 where id = 1 and name = xxx
should(or),所有的条件都要符合 where id = 1 or name = xxx
must_not (not)
过滤器 filter
- gt 大于
- gte 大于等于
- lt 小于
- lte 小于等于!
匹配多个条件!
精确查询!
term 查询是直接通过倒排索引指定的词条进程精确查找的!
关于分词:
- term ,直接查询精确的
- match,会使用分词器解析!(先分析文档,然后在通过分析的文档进行查询!)
两个类型 text
keyword
多个值匹配精确查询
高亮查询!
使用的命令
PUT /test1/type1/1 { "name" : "小冷", "age" : 3 } PUT /test2 { "mappings": { "properties": { "name": { "type": "text" }, "age":{ "type": "long" }, "birthDay":{ "type": "date" } } } } GET test2 PUT /test3/_doc/1 { "name": "", "age":8, "brith":"2004-02-08" } POST /test3/_doc/1/_update { "doc":{ "name": "小冷" } } GET test3 PUT /lhy/user/1 { "name": "狂神说", "age": 23, "desc": "一顿操作猛如虎,一看工资2500", "tags": ["技术宅","温暖","直男"] } PUT /lhy/user/2 { "name": "法外狂徒张三", "age": 30, "desc": "罗老师手下的得力干将", "tags": ["身体好","懂法律","难判刑"] } PUT /lhy/user/2 { "name": "法外狂徒张三", "age": 19, "desc": "罗老师手下的得力干将", "tags": ["身体好","懂法律","难判刑"] } POST /lhy/user/2/_update { "doc":{ "name": "张三" } } PUT /lhy/user/3 { "name": "狂神说前端", "age": 23, "desc": "前端特效大杀手", "tags": ["游戏强","抗压强","007"] } GET /lhy/user/2 GET lhy/user/_search?q=name:狂神说 GET lhy/user/_search { "query":{ "match": { "name": "狂神" } }, "sort": [ { "age": { "order": "asc" } } ], "from": 0, "size": 1 } #boolean GET lhy/user/_search { "query":{ "bool":{ "should": [ { "match": { "name": "狂神说" } }, { "match": { "age": 23 } } ] } } } #没有什么,相当与 not GET lhy/user/_search { "query":{ "bool":{ "must_not": [ { "match": { "name": "狂神说" } } ] } } } #过滤器filter 筛选 GET lhy/user/_search { "query":{ "bool":{ "must": [ { "match": { "name": "狂神说" } } ], "filter": [ { "range": { "age": { "lt": 20 } } } ] } } } GET lhy/user/_search { "query":{ "bool":{ "must": [ { "match": { "tags": "技术 男 身体 007" } } ] } } } #精确查询和text keyword 两种类型的细节 PUT testdb { "mappings": { "properties": { "name":{ "type": "text" }, "desc":{ "type": "keyword" } } } } PUT testdb/_doc/1 { "name":"小冷学java", "desc":"java真的是个好玩的语言" } PUT testdb/_doc/2 { "name":"小冷学java", "desc":"java真的是个好玩的语言2" } GET _analyze { "analyzer": "keyword" , "text": "小冷" } GET _analyze { "analyzer": "standard" , "text": "小冷" } GET testdb/_search { "query": { "term": { "desc": "java真的是个好玩的语言" } } } #高亮查询 GET lhy/user/_search { "query": { "match": { "name":"狂神" } }, "highlight": { "fields": { "name":{} } } }
集成SpringBoot
找官方文档!
- 找到原生的依赖
<dependency> <groupId>org.elasticsearch.client</groupId> <artifactId>elasticsearch-rest-high-level-client</artifactId> <version>7.6.2</version> </dependency>
- 找对象
- 分析这个类中的方法即可!
配置基本的项目
问题:一定要保证 我们的导入的依赖和我们的es 版本一致
源码中提供对象!
虽然这里导入3个类,静态内部类,核心类就一个!
/** * Elasticsearch rest client infrastructure configurations. * * @author Brian Clozel * @author Stephane Nicoll */ class RestClientConfigurations { @Configuration(proxyBeanMethods = false) static class RestClientBuilderConfiguration { // RestClientBuilder @Bean @ConditionalOnMissingBean RestClientBuilder elasticsearchRestClientBuilder(RestClientProperties properties, ObjectProvider<RestClientBuilderCustomizer> builderCustomizers) { HttpHost[] hosts = properties.getUris().stream().map(HttpHost::create).toArray(HttpHost[]::new); RestClientBuilder builder = RestClient.builder(hosts); PropertyMapper map = PropertyMapper.get(); map.from(properties::getUsername).whenHasText().to((username) -> { CredentialsProvider credentialsProvider = new BasicCredentialsProvider(); Credentials credentials = new UsernamePasswordCredentials(properties.getUsername(), properties.getPassword()); credentialsProvider.setCredentials(AuthScope.ANY, credentials); builder.setHttpClientConfigCallback( (httpClientBuilder) -> httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider)); }); builder.setRequestConfigCallback((requestConfigBuilder) -> { map.from(properties::getConnectionTimeout).whenNonNull().asInt(Duration::toMill is) .to(requestConfigBuilder::setConnectTimeout); map.from(properties::getReadTimeout).whenNonNull().asInt(Duration::toMillis) .to(requestConfigBuilder::setSocketTimeout); return requestConfigBuilder; }); builderCustomizers.orderedStream().forEach((customizer) -> customizer.customize(builder)); return builder; } } @Configuration(proxyBeanMethods = false) @ConditionalOnClass(RestHighLevelClient.class) static class RestHighLevelClientConfiguration { // RestHighLevelClient 高级客户端,也是我们这里要讲,后面项目会用到的客户端 @Bean @ConditionalOnMissingBean RestHighLevelClient elasticsearchRestHighLevelClient(RestClientBuilder restClientBuilder) { return new RestHighLevelClient(restClientBuilder); } @Bean @ConditionalOnMissingBean RestClient elasticsearchRestClient(RestClientBuilder builder, ObjectProvider<RestHighLevelClient> restHighLevelClient) { RestHighLevelClient client = restHighLevelClient.getIfUnique(); if (client != null) { return client.getLowLevelClient(); } return builder.build(); } } @Configuration(proxyBeanMethods = false) static class RestClientFallbackConfiguration { // RestClient 普通的客户端! @Bean @ConditionalOnMissingBean RestClient elasticsearchRestClient(RestClientBuilder builder) { return builder.build(); } } }
具体的Api测试!
- 创建索引
- 判断索引是否存在
- 删除索引
- 创建文档
- crud文档!
@SpringBootTest class KuangshenEsApiApplicationTests { // 面向对象来操作 @Autowired @Qualifier("restHighLevelClient") private RestHighLevelClient client; // 测试索引的创建 Request PUT kuang_index @Test void testCreateIndex() throws IOException { // 1、创建索引请求 CreateIndexRequest request = new CreateIndexRequest("kuang_index"); // 2、客户端执行请求 IndicesClient,请求后获得响应 CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT); System.out.println(createIndexResponse); } // 测试获取索引,判断其是否存在 @Test void testExistIndex() throws IOException { GetIndexRequest request = new GetIndexRequest("kuang_index2"); boolean exists = client.indices().exists(request, RequestOptions.DEFAULT); System.out.println(exists); } // 测试删除索引 @Test void testDeleteIndex() throws IOException { DeleteIndexRequest request = new DeleteIndexRequest("kuang_index"); // 删除 AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT); System.out.println(delete.isAcknowledged()); } // 测试添加文档 @Test void testAddDocument() throws IOException { // 创建对象 User user = new User("狂神说", 3); // 创建请求 IndexRequest request = new IndexRequest("kuang_index"); // 规则 put /kuang_index/_doc/1 request.id("1"); request.timeout(TimeValue.timeValueSeconds(1)); request.timeout("1s"); // 将我们的数据放入请求 json request.source(JSON.toJSONString(user), XContentType.JSON); // 客户端发送请求 , 获取响应的结果 IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT); System.out.println(indexResponse.toString()); // System.out.println(indexResponse.status()); // 对应我们命令返回的状态 CREATED } // 获取文档,判断是否存在 get /index/doc/1 @Test void testIsExists() throws IOException { GetRequest getRequest = new GetRequest("kuang_index", "1"); // 不获取返回的 _source 的上下文了 getRequest.fetchSourceContext(new FetchSourceContext(false)); getRequest.storedFields("_none_"); boolean exists = client.exists(getRequest, RequestOptions.DEFAULT); System.out.println(exists); } // 获得文档的信息 @Test void testGetDocument() throws IOException { GetRequest getRequest = new GetRequest("kuang_index", "1"); GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT); System.out.println(getResponse.getSourceAsString()); // 打印文档的内容 System.out.println(getResponse); // 返回的全部内容和命令式一样的 } // 更新文档的信息 @Test void testUpdateRequest() throws IOException { UpdateRequest updateRequest = new UpdateRequest("kuang_index","1"); updateRequest.timeout("1s"); User user = new User("狂神说Java", 18); updateRequest.doc(JSON.toJSONString(user),XContentType.JSON); UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT); System.out.println(updateResponse.status()); } // 删除文档记录 @Test void testDeleteRequest() throws IOException { DeleteRequest request = new DeleteRequest("kuang_index","1"); request.timeout("1s"); DeleteResponse deleteResponse = client.delete(request, RequestOptions.DEFAULT); System.out.println(deleteResponse.status()); } // 特殊的,真的项目一般都会批量插入数据! @Test void testBulkRequest() throws IOException { BulkRequest bulkRequest = new BulkRequest(); bulkRequest.timeout("10s"); ArrayList<User> userList = new ArrayList<>(); userList.add(new User("kuangshen1",3)); userList.add(new User("kuangshen2",3)); userList.add(new User("kuangshen3",3)); userList.add(new User("qinjiang1",3)); userList.add(new User("qinjiang1",3)); userList.add(new User("qinjiang1",3)); // 批处理请求 for (int i = 0; i < userList.size() ; i++) { // 批量更新和批量删除,就在这里修改对应的请求就可以了 bulkRequest.add( new IndexRequest("kuang_index") .id(""+(i+1)) .source(JSON.toJSONString(userList.get(i)),XContentType.JSON)); } BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT); System.out.println(bulkResponse.hasFailures()); // 是否失败,返回 false 代表 成功! } // 查询 // SearchRequest 搜索请求 // SearchSourceBuilder 条件构造 // HighlightBuilder 构建高亮 // TermQueryBuilder 精确查询 // MatchAllQueryBuilder // xxx QueryBuilder 对应我们刚才看到的命令! @Test void testSearch() throws IOException { SearchRequest searchRequest = new SearchRequest("kuang_index"); // 构建搜索条件 SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.highlighter() // 查询条件,我们可以使用 QueryBuilders 工具来实现 // QueryBuilders.termQuery 精确 // QueryBuilders.matchAllQuery() 匹配所有 TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "qinjiang1"); // MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery(); sourceBuilder.query(termQueryBuilder); sourceBuilder.timeout(new TimeValue(60,TimeUnit.SECONDS)); searchRequest.source(sourceBuilder); SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSON.toJSONString(searchResponse.getHits())); System.out.println("================================="); for (SearchHit documentFields : searchResponse.getHits().getHits()) { System.out.println(documentFields.getSourceAsMap()); } } }
实战
新建jd boot的项目
之后我们设置端口和把 thymeleaf的缓存关掉,之后访问一下 index 查看
爬虫
数据怎么来?从数据库获取,消息队列中获取,都可以成为数据源 爬虫!
需要使用爬虫来爬取数据
public List<content> parseJD(String keywords) throws IOException { // https://search.jd.com/Search?keyword=java // 前提需要联网 String url = "https://search.jd.com/Search?keyword=" + keywords + "&enc=utf-8"; //解析网页(jsoup返回document就是js,浏览器的Document对象) Document document = Jsoup.parse(new URL(url), 30000); //所有我们再js中可以操作的,在这里都可以 Element element = document.getElementById("J_goodsList"); //System.out.println(element.html()); //获取所有的li标签 Elements li_elements = document.getElementsByTag("li"); ArrayList<content> goodsList = new ArrayList<>(); for (Element el : li_elements) { if (el.attr("class").equalsIgnoreCase("gl-item")) { String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img"); String price = el.getElementsByClass("p-price").eq(0).text(); String title = el.getElementsByClass("p-name").eq(0).text(); content content = new content(); content.setTitle(title); content.setImg(img); content.setPrice(price); goodsList.add(content); } } return goodsList; }
关键高亮
//获取数据实现搜索高亮功能 public List<Map<String, Object>> getContentHighContent(String keywords, int pageNo, int pageSize) throws IOException { if (pageNo < 1) { pageNo = 1; } //条件搜索 SearchRequest searchRequest = new SearchRequest("jd_goods"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.from(pageNo); searchSourceBuilder.size(pageSize); //精确查询 MatchBoolPrefixQueryBuilder queryBuilder = QueryBuilders.matchBoolPrefixQuery("title", keywords); searchSourceBuilder.query(queryBuilder); searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS)); //配置高亮 HighlightBuilder highlightBuilder = new HighlightBuilder(); highlightBuilder.field("title"); highlightBuilder.requireFieldMatch(false); //多个高亮关闭 highlightBuilder.preTags("<span style='color:red '>"); highlightBuilder.postTags("</span>"); searchSourceBuilder.highlighter(highlightBuilder); //执行搜索 searchRequest.source(searchSourceBuilder); SearchResponse searchResponse = Client.search(searchRequest, RequestOptions.DEFAULT); ArrayList<Map<String, Object>> list = new ArrayList<>(); for (SearchHit documentFields : searchResponse.getHits().getHits()) { //解析高亮的字段 Map<String, HighlightField> highlightFields = documentFields.getHighlightFields(); HighlightField title = highlightFields.get("title"); Map<String, Object> sourceAsMap = documentFields.getSourceAsMap(); if (title != null) { //取出全部的高亮title Text[] texts = title.fragments(); String name = ""; //拼接成新字段 for (Text text : texts) { name += text; } //如果需要就替换原来获取到的title sourceAsMap.put("title", name); } list.add(sourceAsMap); } return list; }
之后我们需要用前端
vue 去解析html
<!--标题--> <p class="productTitle"> <a v-html="result.title"></a> </p>