elasticsearch安装dynamic-synonym插件
今天就来和大家讲讲如何在es中安装dynamic-synonym插件,首先我们需要去github上下载与es版本对应的插件,一般github上基本都是本地词库和远程文本词库的,在gitee上可以找到采用数据库作为词库的源码,大致思路就是修改一些参数配置,然后自己创建一个表作为同义词词库,最后将打包好的jar包插件丢到es-plugins目录下面,最后重启一下就能跑起来了。但是!!!作者没有跑起来,遇到了好多问题【哭泣泣】,因为我是在docker容器中运行的es,而容器一直报的是Java权限问题,我在网络上找了一圈才东拼西凑的把这个问题给解决,真的太高兴啦!!!
接下来就开始讲讲思路
- 下载源码,修改dynamic-synonym配置
- 新增MySQL代码
- 创建一个dynamic-synonym的表
- 修改docker中es容器的Java.policy文件【非常重要】
- 将打包好的jar包放入到 {es-root}/es-plugins目录下面
- docker重启es容器
- 新建es的dynamic-synonym索引测试
文章末尾会给出作者已经配置好的插件代码!!!!!! 请注意签收!!!!!可以直接跳到四或者五,根据你自己的需求来选择
一、下载源码并且修改配置
github好多好多的源码啊,真的是看都看不过来,下载之后要结合自己es版本切换分支,这里建议直接下载最原始的源码,链接为:https://github.com/bells/elasticsearch-analysis-dynamic-synonym,下载好了之后需要切换与es版本对应代码分支,作者的es版本为7.12.1,修改一下pom文件的配置
1.1 修改pom.xml文件
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.bellszhu.elasticsearch</groupId>
<artifactId>elasticsearch-analysis-dynamic-synonym</artifactId>
<version>7.12.1</version>
<packaging>jar</packaging>
<name>elasticsearch-dynamic-synonym</name>
<description>Analysis-plugin for synonym</description>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<elasticsearch.version>${project.version}</elasticsearch.version>
<maven.compiler.target>1.8</maven.compiler.target>
<elasticsearch.plugin.name>analysis-dynamic-synonym</elasticsearch.plugin.name>
<elasticsearch.assembly.descriptor>${project.basedir}/src/main/assemblies/plugin.xml
</elasticsearch.assembly.descriptor>
<elasticsearch.plugin.classname>com.bellszhu.elasticsearch.plugin.DynamicSynonymPlugin
</elasticsearch.plugin.classname>
<elasticsearch.plugin.jvm>true</elasticsearch.plugin.jvm>
</properties>
<licenses>
<license>
<name>The Apache Software License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>
<parent>
<groupId>org.sonatype.oss</groupId>
<artifactId>oss-parent</artifactId>
<version>9</version>
</parent>
<scm>
<connection>scm:git:git@github.com:bells/elasticsearch-analysis-dynamic-synonym.git</connection>
<developerConnection>scm:git:git@github.com:bells/elasticsearch-analysis-dynamic-synonym.git
</developerConnection>
<url>https://github.com/bells/elasticsearch-analysis-dynamic-synonym</url>
</scm>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
</dependency>
<dependency>
<groupId>org.codelibs.elasticsearch.module</groupId>
<artifactId>analysis-common</artifactId>
<version>7.10.2</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.13.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.22</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.13.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.11.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.codelibs</groupId>
<artifactId>elasticsearch-cluster-runner</artifactId>
<version>7.10.2.0</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>${maven.compiler.target}</source>
<target>${maven.compiler.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.11</version>
<configuration>
<includes>
<include>**/*Tests.java</include>
</includes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.1.2</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<outputDirectory>${project.build.directory}/releases/</outputDirectory>
<descriptors>
<descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor>
</descriptors>
<archive>
<manifest>
<mainClass>fully.qualified.MainClass</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
这里在做链接MySQL数据的时候要注意一下MySQL的驱动jar包,不同版本的url会有所区别。
二、新增MySQL代码
2.1 新增MysqlRemoteSynonymFile文件
public class MySqlRemoteSynonymFile implements SynonymFile{
/**
* 数据库配置文件名
*/
private final static String DB_PROPERTIES = "jdbc-reload.properties";
private static Logger logger = LogManager.getLogger("dynamic-synonym");
private String format;
private boolean expand;
private boolean lenient;
private Analyzer analyzer;
private Environment env;
// 数据库配置
private String location;
/**
* 数据库地址
*/
private static final String JDBC_URL = "jdbc.url";
/**
* 数据库驱动
*/
private static final String JDBC_DRIVER = "jdbc.driver";
/**
* 数据库用户名
*/
private static final String JDBC_USER = "jdbc.user";
/**
* 数据库密码
*/
private static final String JDBC_PASSWORD = "jdbc.password";
/**
* 当前节点的同义词版本号
*/
private LocalDateTime thisSynonymVersion = LocalDateTime.now();
private static Connection connection = null;
private Statement statement = null;
private Properties props;
private Path conf_dir;
MySqlRemoteSynonymFile(Environment env, Analyzer analyzer,
boolean expand, boolean lenient, String format, String location) {
this.analyzer = analyzer;
this.expand = expand;
this.format = format;
this.lenient = lenient;
this.env = env;
this.location = location;
this.props = new Properties();
//读取当前 jar 包存放的路径
Path filePath = PathUtils.get(new File(DynamicSynonymPlugin.class.getProtectionDomain().getCodeSource()
.getLocation().getPath())
.getParent(), "config")
.toAbsolutePath();
this.conf_dir = filePath.resolve(DB_PROPERTIES);
//判断文件是否存在
File configFile = conf_dir.toFile();
InputStream input = null;
try {
input = new FileInputStream(configFile);
} catch (FileNotFoundException e) {
logger.info("jdbc-reload.properties 数据库配置文件没有找到, " + e);
}
if (input != null) {
try {
props.load(input);
} catch (IOException e) {
logger.error("数据库配置文件 jdbc-reload.properties 加载失败," + e);
}
}
isNeedReloadSynonymMap();
}
/**
* 加载同义词词典至SynonymMap中
* @return SynonymMap
*/
@Override
public SynonymMap reloadSynonymMap() {
try {
logger.info("start reload local synonym from {}.", location);
Reader rulesReader = getReader();
SynonymMap.Builder parser = RemoteSynonymFile.getSynonymParser(rulesReader, format, expand, lenient, analyzer);
return parser.build();
} catch (Exception e) {
logger.error("reload local synonym {} error! cause: {}", location, e.getMessage());
throw new IllegalArgumentException(
"could not reload local synonyms file to build synonyms", e);
}
}
/**
* 判断是否需要进行重新加载
* @return true or false
*/
@Override
public boolean isNeedReloadSynonymMap() {
try {
LocalDateTime mysqlLastModify = getMySqlSynonymLastModify();
if (!thisSynonymVersion.isEqual(mysqlLastModify)) {
thisSynonymVersion = mysqlLastModify;
return true;
}
} catch (Exception e) {
logger.error(e);
}
return false;
}
/**
* 获取MySql中同义词版本号信息
* 用于判断同义词是否需要进行重新加载
*
* @return getLastModify
*/
public LocalDateTime getMySqlSynonymLastModify() {
ResultSet resultSet = null;
LocalDateTime mysqlSynonymLastModify = null;
try {
if (statement == null) {
statement = getConnection(props);
}
resultSet = statement.executeQuery(props.getProperty("jdbc.reload.swith.synonym.last_modify"));
while (resultSet.next()) {
Timestamp lastModify = resultSet.getTimestamp("last_modify");
mysqlSynonymLastModify = lastModify.toLocalDateTime();
// logger.info("当前MySql同义词最后修改时间为:{}, 当前节点同义词库最后修改时间为:{}", mysqlSynonymLastModify, thisSynonymVersion);
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
try {
if (resultSet != null) {
resultSet.close();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
return mysqlSynonymLastModify;
}
/**
* 查询数据库中的同义词
* @return DBData
*/
public ArrayList<String> getDbData() {
ArrayList<String> arrayList = new ArrayList<>();
ResultSet resultSet = null;
try {
if (statement == null) {
statement = getConnection(props);
}
logger.info("正在执行SQL查询同义词列表,SQL:{}", props.getProperty("jdbc.reload.synonym.sql"));
resultSet = statement.executeQuery(props.getProperty("jdbc.reload.synonym.sql"));
while (resultSet.next()) {
String theWord = resultSet.getString("words");
arrayList.add(theWord);
}
} catch (SQLException e) {
logger.error(e);
} finally {
try {
if (resultSet != null) {
resultSet.close();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
return arrayList;
}
/**
* 同义词库的加载
* @return Reader
*/
@Override
public Reader getReader() {
StringBuilder sb = new StringBuilder();
try {
ArrayList<String> dbData = getDbData();
for (String dbDatum : dbData) {
logger.info("正在加载同义词:{}", dbDatum);
// 获取一行一行的记录,每一条记录都包含多个词,形成一个词组,词与词之间使用英文逗号分割
sb.append(dbDatum)
.append(System.getProperty("line.separator"));
}
} catch (Exception e) {
logger.error("同义词加载失败");
}
return new StringReader(sb.toString());
}
/**
* 获取数据库可执行连接
* @param props 配置文件
* @throws SQLException 获取连接失败
*/
private static Statement getConnection(Properties props) throws SQLException {
try {
Class.forName(props.getProperty(JDBC_DRIVER));
} catch (ClassNotFoundException e) {
logger.error("驱动加载失败", e);
}
if (connection == null) {
connection = DriverManager.getConnection(
props.getProperty(JDBC_URL),
props.getProperty(JDBC_USER),
props.getProperty(JDBC_PASSWORD));
}
return connection.createStatement();
}
}
2.2 在getSynonymFile新增MySQL的连接方式
修改的DynamicSynonymTokenFilterFactory的资源获取代码
SynonymFile getSynonymFile(Analyzer analyzer) {
try {
SynonymFile synonymFile;
if ("MySql".equals(location)) {
synonymFile = new MySqlRemoteSynonymFile(environment, analyzer, expand, lenient, format, location);
} else if (location.startsWith("http://") || location.startsWith("https://")) {
synonymFile = new RemoteSynonymFile(
environment, analyzer, expand, lenient, format, location);
} else {
synonymFile = new LocalSynonymFile(
environment, analyzer, expand, lenient, format, location);
}
if (scheduledFuture == null) {
scheduledFuture = pool.scheduleAtFixedRate(new Monitor(synonymFile),
interval, interval, TimeUnit.SECONDS);
}
return synonymFile;
} catch (Exception e) {
logger.error("failed to get synonyms: " + location, e);
throw new IllegalArgumentException("failed to get synonyms : " + location, e);
}
}
三、创建一个dynamic-synonym的表
3.1 建库建表
作者这边的数据库名称为word,表名为synonym
/*
Navicat Premium Data Transfer
Source Server : localhost
Source Server Type : MySQL
Source Server Version : 50717
Source Host : localhost:3306
Source Schema : auth
Target Server Type : MySQL
Target Server Version : 50717
File Encoding : 65001
Date: 05/01/2022 17:01:31
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
-- ----------------------------
-- Table structure for synonym
-- ----------------------------
DROP TABLE IF EXISTS `synonym`;
CREATE TABLE `synonym` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`words` text CHARACTER SET utf8 COLLATE utf8_bin NULL COMMENT '同义词',
`last_modify` timestamp(0) NULL DEFAULT CURRENT_TIMESTAMP(0) ON UPDATE CURRENT_TIMESTAMP(0) COMMENT '最后更新时间',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 2 CHARACTER SET = utf8 COLLATE = utf8_bin ROW_FORMAT = Dynamic;
-- ----------------------------
-- Records of synonym
-- ----------------------------
INSERT INTO `synonym` VALUES (1, '西红柿,番茄,洋柿子', '2022-01-05 16:48:24');
SET FOREIGN_KEY_CHECKS = 1;
3.2 修改数据库连接的配置文件
在项目的src同级目录下新增config/jdbc-reload.properties文件
# permission java.net.SocketPermission "*", "connect,resolve";
# CHCP 65001
jdbc.url=jdbc:mysql://192.168.255.132:3306/word?serverTimezone=GMT
jdbc.user=root
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.password=123456
# 查询词库
jdbc.reload.synonym.sql=select words from synonym
# 查询更新时间
jdbc.reload.swith.synonym.last_modify=SELECT MAX(last_modify) last_modify FROM synonym
四、修改docker中es容器的Java.policy文件【非常重要】
这里作者用的是docker容器化部署,如果是直接装在windows系统或者centos系统下,就要去修改es依赖的Jdk,直接修改系统的jdk的java.policy文件。在这里不直接修改系统jdk的java.policy文件是因为docker容器化部署的es是独立于系统的jdk运行的,这个es有一套自己的输出逻辑。
4.1 找到Java.policy
首先进入到容器内部操作 docker exec -it es /bin/bash,然后直接打开 cd /usr/share/elasticsearch/jdk/conf/security/文件夹,找到Java.policy文件。
[root@localhost ~]# docker exec -it es /bin/bash
[root@ee5fd3f35131 elasticsearch]# cd /usr/share/elasticsearch/jdk/conf/security/
[root@ee5fd3f35131 security]# ls
java.policy java.security policy
[root@ee5fd3f35131 security]# vi java.policy
4.2 修改java.policy文件
下面文件的全部内容:
//
// This system policy file grants a set of default permissions to all domains
// and can be configured to grant additional permissions to modules and other
// code sources. The code source URL scheme for modules linked into a
// run-time image is "jrt".
//
// For example, to grant permission to read the "foo" property to the module
// "com.greetings", the grant entry is:
//
// grant codeBase "jrt:/com.greetings" {
// permission java.util.PropertyPermission "foo", "read";
// };
//
grant codeBase "file:${
{java.ext.dirs}}/*" {
permission java.security.AllPermission;
};
// default permissions granted to all domains
grant {
// allows anyone to listen on dynamic ports
permission java.net.SocketPermission "localhost:0", "listen";
// "standard" properies that can be read by anyone
permission java.util.PropertyPermission "java.version", "read";
permission java.util.PropertyPermission "java.vendor", "read";
permission java.util.PropertyPermission "java.vendor.url", "read";
permission java.util.PropertyPermission "java.class.version", "read";
permission java.util.PropertyPermission "os.name", "read";
permission java.util.PropertyPermission "os.version", "read";
permission java.util.PropertyPermission "os.arch", "read";
permission java.util.PropertyPermission "file.separator", "read";
permission java.util.PropertyPermission "path.separator", "read";
permission java.util.PropertyPermission "line.separator", "read";
permission java.util.PropertyPermission
"java.specification.version", "read";
permission java.util.PropertyPermission "java.specification.vendor", "read";
permission java.util.PropertyPermission "java.specification.name", "read";
permission java.util.PropertyPermission
"java.vm.specification.version", "read";
permission java.util.PropertyPermission
"java.vm.specification.vendor", "read";
permission java.util.PropertyPermission
"java.vm.specification.name", "read";
permission java.util.PropertyPermission "java.vm.version", "read";
permission java.util.PropertyPermission "java.vm.vendor", "read";
permission java.util.PropertyPermission "java.vm.name", "read";
permission java.net.SocketPermission "*", "connect,resolve";
permission java.lang.RuntimePermission "setContextClassLoader";
permission java.lang.RuntimePermission "accessDeclaredMembers";
permission java.lang.RuntimePermission "createClassLoader";
permission java.security.AllPermission;
};
五、将打包好的jar包放入到 {es-root}/es-plugins目录下面
5.1 在打包之前一定要注意自己es的版本号
5.2 打包完成之后解压文件并且上传到服务器中的es的plugins目录
这里作者用的docker的容器部署,如果是windows本地直接找到plugins目录放进去就可以了。
六、docker重启es容器
如果直接安装在系统上,就直接去找到elasticsearch/bin目录下重启一下就可以啦。作者这里是容器部署的哈。
docker restart es
容器重启之后记得查看一下docker的控制台输出,看看有没有什么问题,如果出现权限之类的问题,那基本上就是java.policy文件没有配置正确,如果出现数据库之类的问题,请在本地建个Java项目连接一下试试,看看能不能跑的起来。
docker logs -f es
七、新建es的dynamic-synonym索引测试
PUT synonyms_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"analyzer": {
"synonym": {
"type":"custom",
"tokenizer": "ik_smart",
"filter": ["synonym_custom"]
}
},
"filter": {
"synonym_custom": {
"type": "dynamic_synonym",
"synonyms_path": "MySql"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "synonym"
}
}
}
}
GET /synonyms_index/_analyze
{
"text": "西红柿",
"analyzer": "synonym"
}
这样子就算运行成功啦,开心撒花!!!
delete synonyms_index
八、总结
8.1 源码地址
为了做这个项目,作者搞了大概得有一天,为了让大家节省时间,这里可以直接下载我已经配置好的源码
8.2 小节
经过一天的研究,终于大致弄明白es插件的运行过程了,为后续实现自动补全功能、优化搜索、广告推荐、聚合查询做好了前提条件。
以后如果做这些功能了再将博客补上,最后,感谢大家的支持