前言
elasticsearch学习一:了解 ES,版本之间的对应。安装elasticsearch,kibana,head插件、elasticsearch-ik分词器。
elasticsearch学习二:使用springboot整合TransportClient 进行搭建elasticsearch服务
前面学习了elasticsearch的安装和操作,接下来就学习一下springboot整合elasticsearch的全文检索框,并实现高亮。前端使用
thymeleaf +vue
简单骚操作。代码已上传到GitHub: https://github.com/fengfanli/springboot-elasticsearch代码为
elasticsearch-jingdongSearch
模块数据就重京东上进行爬取。
一、项目案例展示
项目很简单,就一个页面如下所示,现在要在文本框输入,然后搜索
搜索结果是在elasticsearch中获取的,如下显示,搜索的关键字还要高亮显示。
elasticsearch中的数据是从京东上爬起下来的,非常简单,是
com.feng.es.utils.HtmlParseUtil
这个类来完成数据爬取的。
二、pom.xml依赖
1. 版本说明
springboot 2.2.5.RELEASE
elasticsearch 6.4.2
kibana 6.4.2spring-boot-starter-data-elasticsearch
包也在 properties
中定义为 6.4.2
我这里把这个实际操作和上两个博客的学习篇,放到了一起,一共三个module。
2. 添加的依赖
<properties>
<java.version>1.8</java.version>
<!-- 自己定义 es 版本依赖,保证和本地一致-->
<elasticsearch.version>6.4.2</elasticsearch.version>
</properties>
<dependencies>
<!--jsoup 解析网页-->
<!--解析网页 jsoup-->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.2</version>
</dependency>
<!--导入 elasticsearch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!--导入 thymeleaf -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency>
</dependencies>
3. 父项目的pom依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.2.5.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.feng.es</groupId>
<artifactId>springboot-elasticsearch</artifactId>
<packaging>pom</packaging>
<version>1.0-SNAPSHOT</version>
<modules>
<module>elasticsearch-transport</module>
<module>elasticsearch-rest</module>
<module>elasticsearch-jingdongSearch</module>
</modules>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<scope>runtime</scope>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.16.16</version>
</dependency>
<!--alibaba fastjson-->
<dependency>
<groupId>com.alibaba</groupId>
<artifactId>fastjson</artifactId>
<version>1.2.38</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.4</version>
</dependency>
</dependencies>
</project>
4. 子项目 elasticsearch-jingdongSearch 的pom依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>springboot-elasticsearch</artifactId>
<groupId>com.feng.es</groupId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<groupId>com.feng.es</groupId>
<artifactId>elasticsearch-jingdongSearch</artifactId>
<properties>
<java.version>1.8</java.version>
<!-- 自己定义 es 版本依赖,保证和本地一致-->
<elasticsearch.version>6.4.2</elasticsearch.version>
</properties>
<dependencies>
<!--jsoup 解析网页-->
<!--解析网页 jsoup-->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.2</version>
</dependency>
<!--导入 elasticsearch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!--导入 thymeleaf -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency>
</dependencies>
</project>
二、目录结构
1. 项目目录
稍微解释一下:Content
:POJO类,存放图片的实体类,图片image为URLESConfig
:elasticsearch的配置类controller
:控制器service
:业务层utils
:重点类,爬取京东数据到elasticsearch的主要类
2. 创建ES 索引
进入elasticsearch-head
的 Chrome插件。
创建一个 goods_index
的索引,用于存放从京东爬取下来的数据。
出现如下图所示的提示框,则索引创建成功。
三、爬取京东数据到elasticsearch
1. 找到数据源
2. 分析数据源
3. 写入到elasticsearch
四、目录代码
1. POJO类Content
@Data
@NoArgsConstructor
@AllArgsConstructor
public class Content {
private String img;
private String price;
private String title;
}
2. ESConfig 配置类
@Data
@Configuration
@ConfigurationProperties(prefix = "elasticsearch")
public class ESConfig {
private String hostname;
private String port;
private String scheme;
@Bean
public RestHighLevelClient restHighLevelClient(){
RestHighLevelClient client = new RestHighLevelClient(
RestClient.builder(new HttpHost(hostname, Integer.valueOf(port), scheme)));
return client;
}
}
3. ContentService业务接口类
package com.feng.es.service;
import java.io.IOException;
import java.util.List;
import java.util.Map;
public interface ContentService {
// 解析 关键词
Boolean parseContent(String keywords) throws IOException;
// 搜索
List<Map<String,Object>> searchPage(String keyword, int pageNo, int pageSize);
// 搜索并高亮
List<Map<String,Object>> searchPageHighlight(String keyword, int pageNo, int pageSize);
}
4. ContentService业务实现类
package com.feng.es.service.impl;
import com.alibaba.fastjson.JSON;
import com.feng.es.bean.Content;
import com.feng.es.service.ContentService;
import com.feng.es.utils.HtmlParseUtil;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermsQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Service;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
@Service
public class ContentServiceImpl implements ContentService {
@Autowired
@Qualifier("restHighLevelClient")
private RestHighLevelClient client;
/**
* @Author fengfanli
* @Description //TODO 将关键词在京东中搜索出来的数据 放到 elasticsearch 中
* @Date 18:21 2021/1/18
* @Param [keywords]
* @return java.lang.Boolean
**/
@Override
public Boolean parseContent(String keywords) throws IOException {
List<Content> contents = HtmlParseUtil.parseJD(keywords);
// 把查询的数据放入到 es 中
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.timeout("2m");
for (int i = 0; i < contents.size(); i++){
bulkRequest.add(new IndexRequest("goods_index")
.type("_doc")
.source(JSON.toJSONString(contents.get(i)), XContentType.JSON));
}
BulkResponse bulk = client.bulk(bulkRequest, RequestOptions.DEFAULT);
return !bulk.hasFailures(); //bulk.hasFailures(): 返回false,代表成功
}
/**
* @Author fengfanli
* @Description //TODO 获取这些数据 实现搜索功能
* @Date 18:22 2021/1/18
* @Param [keyword, pageNo, pageSize]
* @return java.util.List<java.util.Map<java.lang.String,java.lang.Object>>
**/
@Override
public List<Map<String, Object>> searchPage(String keyword, int pageNo, int pageSize) {
if (pageNo<=1){
pageNo=1;
}
// 条件搜索
SearchRequest searchRequest = new SearchRequest("goods_index");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 分页
searchSourceBuilder.from(pageNo);
searchSourceBuilder.size(pageSize);
// 精准匹配
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("title", keyword);
searchSourceBuilder.query(termsQueryBuilder);
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
// 执行搜索
searchRequest.source(searchSourceBuilder);
ArrayList<Map<String, Object>> list = new ArrayList<>();
try {
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
// 解析结果
SearchHit[] hits = searchResponse.getHits().getHits();
for (SearchHit documentFields : hits){
list.add(documentFields.getSourceAsMap());
}
} catch (IOException e) {
e.printStackTrace();
System.out.println(e.getLocalizedMessage());
}
return list;
}
/**
* 获取这些数据 实现搜索功能
* @param keyword
* @param pageNo
* @param pageSize
* @return
*/
@Override
public List<Map<String, Object>> searchPageHighlight(String keyword, int pageNo, int pageSize) {
if (pageNo<=1){
pageNo=1;
}
// 条件搜索
SearchRequest searchRequest = new SearchRequest("goods_index");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 分页
searchSourceBuilder.from(pageNo);
searchSourceBuilder.size(pageSize);
// 精准匹配
TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("title", keyword);
searchSourceBuilder.query(termsQueryBuilder);
searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
// 高亮
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
highlightBuilder.requireFieldMatch(false); // 多个高亮显示!
highlightBuilder.preTags("<span style= 'color:red'>");
highlightBuilder.postTags("</span>");
searchSourceBuilder.highlighter(highlightBuilder);
// 执行搜索
searchRequest.source(searchSourceBuilder);
ArrayList<Map<String, Object>> list = new ArrayList<>();
try {
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
// 解析结果
SearchHit[] hits = searchResponse.getHits().getHits();
for (SearchHit hit : hits){
Map<String, HighlightField> highlightFields = hit.getHighlightFields(); // 获取高亮字段
HighlightField title = highlightFields.get("title");
Map<String, Object> sourceAsMap = hit.getSourceAsMap(); // 获取结果集
// 解析高亮的字段,将原来的字段替换为我们高亮的字段即可!
if (title != null){
// 如果高亮字段存在
Text[] fragments = title.fragments(); // 取出高亮字段
String new_title = ""; // 新高亮标题
for (Text text : fragments){
new_title += text;
}
sourceAsMap.put("title", new_title);
}
list.add(sourceAsMap);
}
} catch (IOException e) {
System.out.println(e.getLocalizedMessage());
}
return list;
}
}
5. IndexController 视图控制器
@Controller
public class IndexController {
@GetMapping({"/", "/index"})
public String index(){
return "index";
}
}
6. ContentController数据控制器
package com.feng.es.controller;
import com.feng.es.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.ResponseBody;
import java.io.IOException;
import java.util.List;
import java.util.Map;
@Controller
public class ContentController {
@Autowired
private ContentService contentService;
/**
* 往 es 中添加数据
* @param keyword
* @return
* @throws IOException
*/
@ResponseBody
@GetMapping("/parse/{keyword}")
public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {
return contentService.parseContent(keyword);
}
/**
* 检索
* @param keyword
* @param pageNo
* @param pageSize
* @return
*/
@ResponseBody
@GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
public List<Map<String, Object>> search(@PathVariable("keyword") String keyword,
@PathVariable("pageNo") Integer pageNo,
@PathVariable("pageSize") Integer pageSize){
return contentService.searchPage(keyword, pageNo, pageSize);
}
/**
* 检索高亮
* @param keyword
* @param pageNo
* @param pageSize
* @return
*/
@ResponseBody
@GetMapping("/searchHight/{keyword}/{pageNo}/{pageSize}")
public List<Map<String, Object>> searcHighlight(@PathVariable("keyword") String keyword,
@PathVariable("pageNo") Integer pageNo,
@PathVariable("pageSize") Integer pageSize){
return contentService.searchPageHighlight(keyword, pageNo, pageSize);
}
}
五. 控制器接口测试并分析
http://localhost:9090/parse/java
此接口就是将关键词 java 在京东上搜索的数据存放到elasticsearch中
可以在elasticsearch-head 插件中查看数据,都是java 相关数据。
可以多添加几个关键词的数据http://localhost:9090/search/java/1/10
查找关键字 java 的数据并分页。1和10 为分页数据http://localhost:9090/searchHight/java/1/10
查找关键字 java 的数据并分页。1和10 为分页数据,并对关键词进行高亮显示
,返回的数据关键词被HTML所包含,在vue中,直接渲染即可。