【PHP征文】php 使用 sphinx 实现实时 innodb 全文索引 -

wangxiaoxu

浏览: 576193 次
性别:
来自: 北京

最近访客更多访客>>

loginboot

healthylife

lwj_199011

nanber1

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2015-11 ( 37)
2015-10 ( 13)
2015-09 ( 10)
更多存档...

【PHP征文】php 使用 sphinx 实现实时 innodb 全文索引

博客分类：

mysql

原文地址：http://cloudbbs.org/forum.php?mod=viewthread&tid=7992

目录：
须知 --------------------------------- 1楼
一.安装 ------------------------------ 1楼
二.配置 ------------------------------ 1楼
三.启动sphinxsearch -------------- 2楼
四.测试 ------------------------------ 2楼
五.下载sphinxapi.php ------------- 2楼
六.PHP使用Sphinx ----------------- 3楼
七.sphinx定时更新索引 ----------- 3楼
八.结束语 --------------------------- 3楼

环境：ubuntu 12.04 , mysql5.5 , php5.3 , nginx1.1
事前须知：
1.
sphinx可以实现全文索引，但不会随着数据的增加而自动增加索引，
所以需要重新生成索引，但每次都全部建立开销比较大，所以使用增量索引实现实时检索。之后通过定时任务来更新主索引。
方案如下：
每分钟更新增量索引一次，每天更新主索引一次。

2.
sphinx配置文件中，数据源和索引的配置可以继承（详见下面的配置）

一. 安装
sudo apt-get install sphinxsearch
复制代码

二.配置
1.前置工作

在配置以前需要在数据库里添加一个表用于增量备份。
该表有2个字段，1个字段关联一个索引，另一个字段记录被索引字段的最后一个id。
而增量索引根据第二个字段去索引数据。

CREATE TABLE IF NOT EXISTS `search_counter` (
`counterid` int(11) unsigned NOT NULL AUTO_INCREMENT,
`max_doc_id` int(11) unsigned NOT NULL,
PRIMARY KEY (`counterid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
复制代码

2.配置sphinx

在/etc/sphinxsearch/下
有一个示例配置文件 sphinx.conf.sample，有哪些参数不理解可以看看这个文档
在这个文件夹下，我们新建一个文件sphinx.conf
内容如下：
#凡是后边带有（手册），（网上）的，都是我不确定或者我不懂的。
#设置数据源
source info_src
{
        #绑定数据库
        type                    = mysql
        sql_host                = localhost
        sql_user                = mysql_name
        sql_pass                = mysql_pwd
        sql_db                  = mysql_db
        sql_port                = 3306 # optional, default is 3306

        #个人理解：sql_query_pre 是生成索引前置语句
        sql_query_pre           = set names utf8
        #记录生成索引时的最高id
        sql_query_pre           = replace into search_counter select 1,max(id) from infos

        #生成索引的主查询语句，第一列必须是唯一标识符，后两列是要生成索引的列。
        sql_query               = SELECT id, title, content FROM infos
}

#配置索引
index info_ind
{
        #数据源
        source                  = info_src
        #索引存放位置
        path                    = /var/lib/sphinxsearch/data/info
        #文档属性值的存储方法。分别为extern,inline和none（手册）
        docinfo                 = extern
        #是否为缓存的数据使用内存锁（手册）
        mlock                   = 0
        #内置预处理类型，比如stem_en,stem_ru,soundex,libstemmer_german等。默认为none。（手册）
        morphology              = none
        #索引单词的最小长度。
        min_word_len            = 1
        charset_type            = utf-8

        #指定中文编码表，不设置将不能搜索中文（网上资料）
        charset_table           = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
        #在索引时，是否将html抽去掉，默认为0，不抽取
        html_strip              = 1
        #索引的n-gram长度（手册）
        ngram_len               = 1
        #CJK索引的n-gram字符列表，默认为空（手册）
        ngram_chars             = U+3000..U+2FA1F
}

#设置数据源，继承自info_src,该数据源用于增量索引
source infoadd_src:info_src
{
        #覆盖父数据源的主查询语句，其他一样的就不用写了。
        sql_query               = SELECT N_Id, N_Title,SummerText FROM hlb_infos where n_id >(select max_doc_id from search_counter where counterid=1)
}

#设置索引，继承自info_ind,该索引用于增量索引
index infoadd_ind:info_ind
{
        #覆盖
        source                  = infoadd_src
        path                    = /var/lib/sphinxsearch/data/infoadd
}

#indexer是sphinx自带的程序，负责连接数据源，生成索引。
#indexer的相关配置
indexer
{
        #indexer使用的索引缓冲区的内存限制，可以用K和M来指定单位，不能用G。最大2047M。（手册）
        mem_limit               = 128M
        #写缓冲区的最大大小，默认1M。这些缓冲区是除了mem_limit设置以外的分配的缓存(手册)
        # write_buffer          = 1M
}

#searchd是shipinx的守护程序，负责搜素索引。
#searchd的相关配置
searchd
{
        #searchd守护程序运行的主机，端口或者主机：端口，或者unix的socket路径（手册）
        #默认配置有2个，我也不知道为什么。
        listen                  = 9312
        listen                  = 9306:mysql41
        log                     = /var/log/sphinxsearch/searchd.log
        query_log               = /var/log/sphinxsearch/query.log
        #客户读的超时时间
        read_timeout            = 5
        #请求超时
        client_timeout          = 300
        #可以拓展的最多的子集个数
        max_children            = 30
        pid_file                = /var/run/sphinxsearch/searchd.pid
        #查询后，最多匹配数
        max_matches             = 1000
        #获得索引后，是否预先打开索引（手册）
        seamless_rotate         = 1
        #是否预先打开所有索引，或者在每次查询时打开（手册）
        preopen_indexes         = 1
        #在索引旋转时是否解开旧的索引（手册）
        unlink_old              = 1
        #如果设置该项，那么属性更新共享池大小。这样将会禁止属性刷新（手册）
        mva_updates_pool        = 1M
        #客户查询数据包和代理响应的最大包大小。默认为8M（手册）
        max_packet_size         = 8M
        #过滤器的最大数目。默认为256（手册）
        max_filters             = 256
        #每个过滤器的最大数目。默认为4096（手册）
        max_filter_values       = 4096
        # max allowed per-batch query count (aka multi-query count) (手册）
        max_batch_queries       = 32
        # 多处理模式。默认为fork。值有 none, fork, prefork, threads。
        workers                 = threads # for RT to work

}
复制代码

3.开启sphinxsearch功能
编辑/etc/default/sphinxsearch 将START=no 修改为 START=yes
三.启动sphinxsearch

1.创建索引
sudo indexer --rotate info_ind
sudo indexer --rotate infoadd_ind
复制代码

indexer 参数介绍
--quiet 安静模式
--rotate 建立索引后重启searchd

2.启动

sudo /etc/init.d/sphinxsearch start
复制代码

启动成功后会有如下提示：
Starting sphinxsearch: Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
WARNING: compat_sphinxql_magics=1 is deprecated; please update your application and config
listening on all interfaces, port=9312
listening on all interfaces, port=9306
precaching index 'info_ind'
precaching index 'infoadd_ind'
precached 2 indexes in 0.002 sec
sphinxsearch.
复制代码

四.测试
search -i info_ind -e '3' -l 3
复制代码

-i 使用哪个索引
-e 使用extended模式
-l 3 只显示前3个结果
'3' 搜索词
root@ubuntu:/$ search -i info_ind -e '3' -l 3

Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
index 'info_ind': query '3 ': returned 134 matches of 134 total in 0.016 sec

displaying matches:
1. document=35948, weight=1687
2. document=17678, weight=1664
3. document=17767, weight=1664

words:
1. '3': 134 documents, 148 hits
复制代码

一共有134个匹配（matches).
好了，到这里sphinx已经配置完了。

五.下载sphinxapi.php
1.获取版本号
root@ubuntu:/$ searchd -h
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

Usage: searchd [OPTIONS]

Options are:

...以下省略
复制代码

得知版本号是sphinx2.0.4

2.去官网下载Sphinx 2.0.4-release
http://sphinxsearch.com/downloads/archive/
找到tar.gz的sphinx 2.0.4 release ,下载。
解压后去api文件夹里找到sphinxapi.php。

六.PHP使用Sphinx
直接上程序啦.
PHP代码
<?php
    require('include/sphinxapi.php');
    $sphinx = new SphinxClient();
    $host = 'localhost';
    $port = 9312;
    $sphinx->SetServer($host,$port);
    //设置模式为extended
    $sphinx->SetMatchMode(SPH_MATCH_EXTENDED);
    //只取3个结果
    $sphinx->SetLimits(0, 3);
    //使用的索引
    $index = 'info_ind,infoadd_ind';
    //在title,content上搜索 ‘吗’
    $word = '吗';
    //在title,content上搜索‘吃’或者‘今天’
    //$word = '吃|今天';
    //在title上搜索‘吃’或者‘今天’
    //$word = '@title 吃|今天';

    $result = $sphinx->Query($word,$index);
    print_r($result);
?>
在result中，有个键名为matches，若搜索结果不为0，则包含id集合，如果搜索结果为0，则不存在matches。

完整示例如下（示例中的mysql类是自己封装的）：

PHP代码
<?php
    require('config/configure.inc.php');
    require('include/sphinxapi.php');
    require('include/mysql.class.php');
    $sphinx = new SphinxClient();
    $host = 'localhost';
    $port = 9312;
    $sphinx->SetServer($host,$port);
    $sphinx->SetMatchMode(SPH_MATCH_EXTENDED);
//    $sphinx->SetArrayResult(TRUE);
    $sphinx->SetLimits(0, 3);
    $index = 'info_ind,infoadd_ind';
    $word = '@title 工商年检|转让五千万';
    $result = $sphinx->Query($word,$index);
//    print_r($result);
    if(key_exists('matches',$result)){
      $db = new Mysql();
      $db->connect();
      $sql = 'select id,title,content from infos where id in ('.implode(',', array_keys($result['matches'])).')';
//      echo $sql;
      $infos = $db->search($sql,'n_id');
      print_r($infos);
    }
?>
七.sphinx定时更新索引
sphinx已经设置了每天自动整合。编辑/etc/cron.d/sphinxsearch
在第一句话开头添上#号，注释掉这一句，并添加定时任务。
# Rebuild all indexes daily and notify searchd.
#默认每天更新索引
#@daily      root . /etc/default/sphinxsearch && if [ "$START" = "yes" ] && [ -x /usr/bin/indexer ]; then /usr/bin/indexer --quiet --rotate --all; fi
#每分钟重新建立增量索引
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59 * * * * root . /etc/default/sphinxsearch && if [ "$START" = "yes" ] && [ -x /usr/bin/indexer ]; then /usr/bin/indexer --quiet --rotate infoadd_ind; fi

#每天3点40重新建立主索引，并建立增量索引
40 03 * * * root . /etc/default/sphinxsearch && if [ "$START" = "yes" ] && [ -x /usr/bin/indexer ]; then /usr/bin/indexer --quiet --rotate info_ind && /usr/bin/indexer --quiet --rotate infoadd_ind; fi
复制代码

八.结束语

在实际生产环境中，中文搜索是需要分词的，在这里就不多做介绍了。我用的分词工具是scws。
这篇文章也算是凑巧，刚刚完成sphinx的配置，一边配置，一边测试，一边写的，恰巧碰上sae征文。
就过来献丑了。

顺便已申请普通开发者认证，求推荐啊~应用地址
http://qianqianqian.sinaapp.com

其中钓鱼和音乐盒也都是半成品，单是还是想来试试运气。
求推荐~~~

推荐链接
http://sae.sina.com.cn/?m=home&a=devlevel&level=normal_level&voteme=fbejCB

分享到：

手把手搭建sphinx环境 | sphinx mysql innodb 联表数据源配置

2015-04-20 10:51
浏览 1547
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

【PHP征文】php 使用 sphinx 实现实时 innodb 全文索引

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

【PHP征文】php 使用 sphinx 实现实时 innodb 全文索引

评论

发表评论

相关推荐

使用amoeba 数据库端出现 ERROR 1044 (42000):

amoeba实现mysql读写分离

Mysql繁忙主从库在线修改表结构与添加索引问题

mysql 主从复制双主架构在线修改表结构、在线DDL

高性能Mysql主从架构的复制原理及配置详解

MySQ索引操作命令总结（创建、重建、查询和删除索引命令详解）

数据库性能优化一：数据库自身优化

mysql在一个表中存储创建时间和最近更新时间

Mysql 如何设置字段自动获取当前时间

MySQL的create table as 与 like区别

Mysql 慢查询和慢查询日志分析

linux下mysql的root密码忘记解决方

手把手搭建sphinx环境

sphinx mysql innodb 联表数据源配置

mysql垂直分区和水平分区

MySQL隔离级别

jdbc---隔离级别

数据库事务隔离级别

JDBC事务隔离级别

mysql开发者sql权威指南

最近访客更多访客>>