文章搜索我现在使用的是mysql的模糊查询like搜索标题关键字。
之前也有用全文索引,但是全文索引的效率比较低,所以,后期就没有在对文章内容进行匹配。
后来接触到中文分词器,感觉他刚好能解决我的问题:目前比较好的支持PHP的分词器大概有solr(基于Java开发),sphinx(基于C++开发)
Solr需要java环境才可以运行。我不太喜欢,所以,这个先被过滤掉。
比较好的选择就是sphinx(斯文克斯)
但是,sphinx是不支持中文分词的,所以,百度上给的大多数的结果是基于sphinx内核开发的coreseek+mmseg分词的一套组合来实现中文分词+全文检索。
但是有个问题,coreseek目前已经没有人在维护了。
官方网站已经不能访问了:www.coreseek.cn/
我能找到的最新版本是coreseek4.1。
coreseek4.1版本我在阿里云的centos7.8上边没有编译安装成功。所以我这里还是推荐使用coreseek3.2版本(基于sphinx0.9版本开发),版本有点老。
下载地址:
我这里主要使用的是红框标注的压缩包。
一:安装编译环境
yum -y install gcc gcc-c++ autoconf python python-devel libiconv libtool
已安装的同学请略过
二:安装mmseg3
我的软件包放在usr/local/download目录下
cd /usr/local/download/coreseek-3.2.14 cd mmseg-3.2.14 chmod -R 777 ./configure # configure文件增加执行权限 ./configure --prefix=/usr/local/mmseg3 # 安装目录是/usr/local/mmseg3 make&&make install
1:可能出现的报错
(1):config.status: error: cannot find input file: src/Makefile.in
解决方法:
yum -y install libtool aclocal libtoolize --force automake --add-missing autoconf autoheader make clean ./configure --prefix=/usr/local/mmseg3 make&&make install
2:编译成功显示
------------------------------------------------------------------------ Configuration: Source code location: . Compiler: gcc Compiler flags: -g -O2 Host System Type: x86_64-redhat-linux-gnu Install path: /usr/local/mmseg3 See config.h for further configuration information. ------------------------------------------------------------------------
三:安装coreseek
1:安装依赖项
yum -y install expat expat-devel
2:进入目录
# 进入目录 cd csft-3.2.14 # 给configure文件执行权限 chmod -R 777 ./configure # 执行编译,编译命令需要根据你自己软件安装的情况来修改目录。 ./configure --prefix=/usr/local/coreseek -without-unixodbc -with-mmseg -with-mmseg-includes=/usr/local/mmseg3/include/mmseg/ -with-mmseg-libs=/usr/local/mmseg3/lib/ -with-mysql=/usr/local/mariadb # 我的mysql安装目录
这里需要注意一下,我的mysql是采用编译安装的,将所有文件(配置文件,数据库文件)都编译到了同一个目录下(/usr/local/mariadb),如果你的数据库是使用yum源安装的,那么上边的编译命令,可能用不了。
编译成功显示:
generating configuration files ------------------------------ configure: creating ./config.status config.status: creating Makefile config.status: creating src/Makefile config.status: creating libstemmer_c/Makefile config.status: creating sphinx.conf.dist config.status: creating sphinx-min.conf.dist config.status: creating config/config.h config.status: executing depfiles commands configuration done ------------------
执行安装
make&&make install
1:可能出现的报错
make[2]: *** [sphinxexpr.o] Error 1 make[2]: Leaving directory `/usr/local/download/coreseek-3.2.14/csft-3.2.14/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/usr/local/download/coreseek-3.2.14/csft-3.2.14/src' make: *** [all-recursive] Error 1
解决方法:
上面已经有提示, 在sphinxexpr.cpp文件里面(会有好多行),将”ExprEval“替换为”this->ExprEval“, 然后从新./configure........, 编译安装:
make && make install
安装成功显示:
make[2]: Leaving directory `/usr/local/download/coreseek-3.2.14/csft-3.2.14/src' make[1]: Leaving directory `/usr/local/download/coreseek-3.2.14/csft-3.2.14/src' Making all in test make[1]: Entering directory `/usr/local/download/coreseek-3.2.14/csft-3.2.14/test' make[1]: Nothing to be done for `all'. make[1]: Leaving directory `/usr/local/download/coreseek-3.2.14/csft-3.2.14/test' make[1]: Entering directory `/usr/local/download/coreseek-3.2.14/csft-3.2.14' make[1]: Nothing to be done for `all-am'. make[1]: Leaving directory `/usr/local/download/coreseek-3.2.14/csft-3.2.14'
至此,编译安装成功。
四:启动报错解决方法
使用如下命令启动coreseek
/usr/local/coreseek/bin/searchd
报错:
/usr/local/coreseek/bin/searchd: error while loading shared libraries: libmariadb.so.3: cannot open shared object file: No such file or directory
解决方法:
ln -s /usr/local/mariadb/lib/libmariadb.so.3 /usr/lib64/libmariadb.so.3
再次启动:
/usr/local/coreseek/bin/searchd
报错:
Coreseek Fulltext 3.2 [ Sphinx 0.9.9-release (r2117)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) FATAL: no readable config file (looked in /usr/local/coreseek/etc/csft.conf, ./csft.conf).
没有配置文件,解决方法:
cp /usr/local/coreseek/etc/sphinx-min.conf.dist csft.conf
再次启动
/usr/local/coreseek/bin/searchd
报错:
Coreseek Fulltext 3.2 [ Sphinx 0.9.9-release (r2117)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.conf'... listening on all interfaces, port=9312 WARNING: index 'test1': preload: failed to open /usr/local/coreseek/var/data/test1.sph: No such file or directory; NOT SERVING FATAL: no valid indexes to serve
就是找不到索引文件。
我们来配置cstf.conf文件:
# # Minimal Sphinx configuration sample (clean, simple, functional) # source src1 { type = mysql # 你的数据库纤细 sql_host = localhost sql_user = mysql sql_pass = sql_db = test sql_port = 3306 # optional, default is 3306 sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added sql_query_info = SELECT * FROM documents WHERE id=$id } index test1 { source = src1 # 确保一下路径存在,不存在提前创建 path = /usr/local/coreseek/var/data/test1 docinfo = extern charset_type = sbcs } indexer { mem_limit = 32M } searchd { port = 9312 # 确保一下路径存在,不存在提前创建 log = /usr/local/coreseek/var/log/searchd.log # 确保一下路径存在,不存在提前创建 query_log = /usr/local/coreseek/var/log/query.log read_timeout = 5 max_children = 30 # 确保一下路径存在,不存在提前创建 pid_file = /usr/local/coreseek/var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 0 unlink_old = 1 }
我们将/usr/local/coreseek/etc目录下(安装目录)的example.sql导入数据库
# 使用test数据库 MariaDB [(none)]> use test; Database changed #导入sql文件 MariaDB [test]> source /usr/local/coreseek/etc/example.sql Query OK, 0 rows affected, 1 warning (0.018 sec) Query OK, 0 rows affected (0.011 sec) Query OK, 4 rows affected (0.003 sec) Records: 4 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected, 1 warning (0.002 sec) Query OK, 0 rows affected (0.010 sec) Query OK, 10 rows affected (0.001 sec) Records: 10 Duplicates: 0 Warnings: 0
创建索引:
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft.conf --all –rotate
创建成功显示:
Coreseek Fulltext 3.2 [ Sphinx 0.9.9-release (r2117)] Copyright (c) 2007-2011, Beijing Choice Software Technologies Inc (http://www.coreseek.com) using config file '/usr/local/coreseek/etc/csft.conf'... indexing index 'test1'... collected 4 docs, 0.0 MB sorted 0.0 Mhits, 100.0% done total 4 docs, 193 bytes total 0.003 sec, 56581 bytes/sec, 1172.67 docs/sec total 2 reads, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg total 7 writes, 0.000 sec, 0.1 kb/call avg, 0.0 msec/call avg WARNING: failed to scanf pid from pid_file '/usr/local/coreseek/var/log/searchd.pid'. WARNING: indices NOT rotated. 复制代码
最后这两个警告,就是缺少文件。
解决方法不是自己去创建,重启服务器,再重新启动coreseek就可以了
五:coreseek常用命令
1:启动
/usr/local/coreseek/bin/searchd
2:停止
/usr/local/coreseek/bin/searchd –stop
3:创建索引
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft.conf --all –rotate
4:搜索测试
/usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/csft_mysql.conf -a abc
5:如果在coreseek运行时创建索引,加上--rotate参数,这样索引创建完成就直接生效了
/usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/csft_mysql.conf --all --rotate
其他使用方法,请参照sphinx。


