概述
继续跟中华石杉老师学习ES,第30篇
课程地址: https://www.roncoo.com/view/55
白话Elasticsearch28-IK中文分词器的安装和使用
白话Elasticsearch29-IK分词器配置文件+自定义词库
上面两篇学习了如何安装IK以及基本的使用,当我们使用自定义词库的时候,是不是每次都得重启,而且得逐个节点依次修改,是不是有点不方便呢?
主要缺点:
- 每次添加完,都要重启es才能生效,非常麻烦
- es是分布式的,如果有数百个节点…
热更新方案
常用的有两种方式
修改ik分词器源码,然后手动支持从mysql中每隔一定时间,自动加载新的词库
基于ik分词器原生支持的热更新方案,部署一个web服务器,提供一个http接口,通过modified和tag两个http响应头,来提供词语的热更新
推荐第一种方案修改ik分词器源码, 第二种方案ik git社区官方都不建议采用,不太稳定。
既然说到了要修改源码,那接着来吧,到ik的GitHub上下载源码
IK Github 下载Source Code
https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v6.4.1
找到对应ES版本的IK,下载源码 ,这里我是用的是6.4.1版本的ES 。
导入maven工程
导入maven,这里就不细说了,很简单。 导入完成后,一个标准的maven工程就呈现在你的面前了。
修改源码
简单说下整体思路: 开启一个后台线程,扫描mysql中定义的表,加载数据。
Dictionary#initial方法中开启扫描线程
// Step1.开启新的线程重新加载词典 new Thread(new HotDictReloadThread()).start();
HotDictReloadThread
死循环,调用Dictionary.getSingleton().reLoadMainDict(),重新加载词典
package org.wltea.analyzer.dic; import org.apache.logging.log4j.Logger; import org.elasticsearch.common.logging.ESLoggerFactory; public class HotDictReloadThread implements Runnable { private static final Logger logger = ESLoggerFactory.getLogger(HotDictReloadThread.class.getName()); @Override public void run() { while(true) { logger.info("[==========]reload hot dict from mysql......"); Dictionary.getSingleton().reLoadMainDict(); } } }
那看下 reLoadMainDict 干了啥吧
两件事儿,加载主词库 和 停用词词库 ,那我们就把自定义的mysql部分分别放到这两个方法里就OK了。
配置文件 jdbc-reload.properties
配置文件 jdbc-reload.properties
jdbc-reload.properties
jdbc.url=jdbc:mysql://localhost:3306/ik?serverTimezone=GMT jdbc.user=root jdbc.password=root jdbc.reload.sql=select word from hot_words jdbc.reload.stopword.sql=select stopword as word from hot_stopwords jdbc.reload.interval=1000
reload间隔,1秒轮训一次 。
Dictionary#iloadMainDict 自定义从mysql加载主词典
// Step2 从mysql加载词典 this.loadMySQLExtDict();
加载自定义的db配置文件,通过JDBC查询mysql ,就是这么简单
private static Properties prop = new Properties(); static { try { //Class.forName("com.mysql.jdbc.Driver"); Class.forName("com.mysql.cj.jdbc.Driver"); } catch (ClassNotFoundException e) { logger.error("error", e); } } /** * 从mysql加载热更新词典 */ private void loadMySQLExtDict() { Connection conn = null; Statement stmt = null; ResultSet rs = null; try { Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties"); prop.load(new FileInputStream(file.toFile())); logger.info("[==========]jdbc-reload.properties"); for(Object key : prop.keySet()) { logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key))); } logger.info("[==========]query hot dict from mysql, " + prop.getProperty("jdbc.reload.sql") + "......"); conn = DriverManager.getConnection( prop.getProperty("jdbc.url"), prop.getProperty("jdbc.user"), prop.getProperty("jdbc.password")); stmt = conn.createStatement(); rs = stmt.executeQuery(prop.getProperty("jdbc.reload.sql")); while(rs.next()) { String theWord = rs.getString("word"); logger.info("[==========]hot word from mysql: " + theWord); _MainDict.fillSegment(theWord.trim().toCharArray()); } Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval")))); } catch (Exception e) { logger.error("erorr", e); } finally { if(rs != null) { try { rs.close(); } catch (SQLException e) { logger.error("error", e); } } if(stmt != null) { try { stmt.close(); } catch (SQLException e) { logger.error("error", e); } } if(conn != null) { try { conn.close(); } catch (SQLException e) { logger.error("error", e); } } } }
Dictionary#loadStopWordDict自定义从mysql加载停止词词典
// Step3 从mysql加载停用词 this.loadMySQLStopwordDict();
/** * 从mysql加载停用词 */ private void loadMySQLStopwordDict() { Connection conn = null; Statement stmt = null; ResultSet rs = null; try { Path file = PathUtils.get(getDictRoot(), "jdbc-reload.properties"); prop.load(new FileInputStream(file.toFile())); logger.info("[==========]jdbc-reload.properties"); for(Object key : prop.keySet()) { logger.info("[==========]" + key + "=" + prop.getProperty(String.valueOf(key))); } logger.info("[==========]query hot stopword dict from mysql, " + prop.getProperty("jdbc.reload.stopword.sql") + "......"); conn = DriverManager.getConnection( prop.getProperty("jdbc.url"), prop.getProperty("jdbc.user"), prop.getProperty("jdbc.password")); stmt = conn.createStatement(); rs = stmt.executeQuery(prop.getProperty("jdbc.reload.stopword.sql")); while(rs.next()) { String theWord = rs.getString("word"); logger.info("[==========]hot stopword from mysql: " + theWord); _StopWords.fillSegment(theWord.trim().toCharArray()); } Thread.sleep(Integer.valueOf(String.valueOf(prop.get("jdbc.reload.interval")))); } catch (Exception e) { logger.error("erorr", e); } finally { if(rs != null) { try { rs.close(); } catch (SQLException e) { logger.error("error", e); } } if(stmt != null) { try { stmt.close(); } catch (SQLException e) { logger.error("error", e); } } if(conn != null) { try { conn.close(); } catch (SQLException e) { logger.error("error", e); } } } }
编译
项目右键–Run As --Maven Build —> clean package
编译成功后,去获取zip文件
将zip解压到 es ik插件目录下
添加mysql依赖包
我本地的mysql是 8.0.11版本的
放到ik目录下
mysql建表语句
/* Navicat MySQL Data Transfer Source Server : localhost_root Source Server Version : 80011 Source Host : localhost:3306 Source Database : ik Target Server Type : MYSQL Target Server Version : 80011 File Encoding : 65001 Date: 2019-08-20 23:35:18 */ SET FOREIGN_KEY_CHECKS=0; -- ---------------------------- -- Table structure for `hot_stopwords` -- ---------------------------- DROP TABLE IF EXISTS `hot_stopwords`; CREATE TABLE `hot_stopwords` ( `id` int(11) NOT NULL AUTO_INCREMENT, `stopword` longtext COLLATE utf8mb4_general_ci, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci; -- ---------------------------- -- Records of hot_stopwords -- ---------------------------- -- ---------------------------- -- Table structure for `hot_words` -- ---------------------------- DROP TABLE IF EXISTS `hot_words`; CREATE TABLE `hot_words` ( `id` int(11) NOT NULL AUTO_INCREMENT, `word` longtext COLLATE utf8mb4_general_ci, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
重启ES
启动日志
成功
验证热加载
热加载主词典
我们先看下IK默认的配置文件 ,我们并没有修改过。
使用 ik_max_word 来看下 IK的 对 “盘他”的分词
插入一条数据
INSERT INTO `hot_words` VALUES ('1', '盘他');
查看es elasticsearch.log的日志
可以看到加载成功,那重新来查看下分词
不会被IK分词了,成功。
热加载停用词词典
我们把“啥”作为停用词,添加到mysql的停用词表中
INSERT INTO `hot_stopwords` VALUES ('1', '啥');
查看es elasticsearch.log日志
重新执行分词测试
可以看到“啥”已经不会被IK当做分词了,成功。
遇到的问题 及解决办法
问题:java.security.AccessControlException: access denied (“java.lang.RuntimePermission” “setContextClassLoader”)
[2019-08-20T22:32:43,444][INFO ][o.e.n.Node ] [aQ19O09] starting ... [2019-08-20T22:32:46,133][INFO ][o.e.t.TransportService ] [aQ19O09] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300} [2019-08-20T22:32:49,435][INFO ][o.e.c.s.MasterService ] [aQ19O09] zen-disco-elected-as-master ([0] nodes joined)[, ], reason: new_master {aQ19O09}{aQ19O095TZmH9VHKNHC1qw}{PjHRPar4TV2JQ-iy-bWIoA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=10614976512, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} [2019-08-20T22:32:49,442][INFO ][o.e.c.s.ClusterApplierService] [aQ19O09] new_master {aQ19O09}{aQ19O095TZmH9VHKNHC1qw}{PjHRPar4TV2JQ-iy-bWIoA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=10614976512, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {aQ19O09}{aQ19O095TZmH9VHKNHC1qw}{PjHRPar4TV2JQ-iy-bWIoA}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=10614976512, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)[, ]]]) [2019-08-20T22:32:49,685][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [] fatal error in thread [elasticsearch[aQ19O09][generic][T#4]], exiting java.lang.ExceptionInInitializerError: null at java.lang.Class.forName0(Native Method) ~[?:1.8.0_161] at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_161] at com.mysql.cj.jdbc.NonRegisteringDriver.<clinit>(NonRegisteringDriver.java:106) ~[?:?] at java.lang.Class.forName0(Native Method) ~[?:1.8.0_161] at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_161] at org.wltea.analyzer.dic.Dictionary.<clinit>(Dictionary.java:117) ~[?:?] at org.wltea.analyzer.cfg.Configuration.<init>(Configuration.java:40) ~[?:?] at org.elasticsearch.index.analysis.IkTokenizerFactory.<init>(IkTokenizerFactory.java:15) ~[?:?] at org.elasticsearch.index.analysis.IkTokenizerFactory.getIkSmartTokenizerFactory(IkTokenizerFactory.java:23) ~[?:?] at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:377) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenizerFactories(AnalysisRegistry.java:191) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:158) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.index.IndexService.<init>(IndexService.java:162) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:383) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:475) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.indices.IndicesService.verifyIndexMetadata(IndicesService.java:547) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.gateway.Gateway.performStateRecovery(Gateway.java:127) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.gateway.GatewayService$1.doRun(GatewayService.java:223) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:723) ~[elasticsearch-6.4.1.jar:6.4.1] at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-6.4.1.jar:6.4.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161] Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "setContextClassLoader") at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472) ~[?:1.8.0_161] at java.security.AccessController.checkPermission(AccessController.java:884) ~[?:1.8.0_161] at java.lang.SecurityManager.checkPermission(SecurityManager.java:549) ~[?:1.8.0_161] at java.lang.Thread.setContextClassLoader(Thread.java:1474) ~[?:1.8.0_161] at com.mysql.cj.jdbc.AbandonedConnectionCleanupThread$1.newThread(AbandonedConnectionCleanupThread.java:56) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.<init>(ThreadPoolExecutor.java:619) ~[?:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932) ~[?:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:1.8.0_161] at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668) ~[?:1.8.0_161] at com.mysql.cj.jdbc.AbandonedConnectionCleanupThread.<clinit>(AbandonedConnectionCleanupThread.java:60) ~[?:?] ... 23 more
解决办法
Java 安全权限导致的异常。
找到ES使用的JDK,这里我使用的是 1.8.0_161
java version "1.8.0_161" Java(TM) SE Runtime Environment (build 1.8.0_161-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
找到安装目录–>进入 jre\lib\security 目录
比如我本地的 E:\Program Files\Java\jdk1.8.0_161\jre\lib\security ,找到 java.policy ,在 grant最后一行加入 permission java.security.AllPermission; ,然后重启ES ,即可解决
编译后的资源
如果你的觉的麻烦,可以用我编译好的zip包 ,戳这里