coreseek实时索引更新之增量索引

abc123456789cba

浏览: 610425 次
性别:
来自: 北京

最近访客更多访客>>

yumo93121

hedehuang

lims813927980

kingtsing

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

PHP
运维

coreseek实时索引更新有两种选择:

1.使用基于磁盘的索引，手动分区，然后定期重建较小的分区（被称为“增量”）。通过尽可能的减小重建部分的大小，可以将平均索引滞后时间降低到30~60秒.在0.9.x版本中，这是唯一可用的方法。在一个巨大的文档集上，这可能是最有效的一种方法

2.版本1.x（从版本1.10-beta开始）增加了实时索引（简写为Rt索引）的支持，用于及时更新全文数据。在RT索引上的更新，可以在1~2毫秒（0.001-0.002秒）内出现在搜索结果中。然而，RT实时索引在处理较大数据量的批量索引上效率并不高。

这篇我们只要是增量索引

基本思路是设置两个数据源和两个索引，对很少更新或根本不更新的数据建立主索引，而对新增文档建立增量索引

在配置文件中定义了主索引和增量索引之后,不能直接用indexer –config d:\coreseek\csft.conf –all,再添加数据到数据库中,再用indexer –config d:\coreseek\csft.confg main delta –rotate来弄(我居然这样弄了两次)。正确的步骤为:

1.创建主索引:indexer –cd:\coreseek\csft.conf --all

2.添加数据

3.再创建增量索引:indexer –cd:\coreseek\csft.conf delta --rotate

4.合并索引:indexer –cd:\coreseek\csft.conf --merge main delta –rotate(为了防止多个关键字指向同一个文档加上--merge-dst-range deleted 0 0)

增量配置文件如下:

[plain]view plaincopy 
#增量索引  
source main  
{  
    type                    = mysql  
    sql_host                = localhost  
    sql_user                = root  
    sql_pass                = 123456  
    sql_db                  = hottopic  
    sql_port                = 3306  
    sql_query_pre           = SET NAMES utf8  
    sql_query_pre       = replace into sph_counter select 1,max(id) from st_info  
    sql_query_range     = select 1,max(id) from st_info  
    sql_range_step          = 1000  
  
    sql_query               = SELECT id, pubDate, title, description,nav_id,rss_id FROM st_info where id>=$start and id <=$end and \  
                id <=(select max_doc_id from sph_counter where counter_id=1)  
    sql_attr_uint           = nav_id            
    sql_attr_uint       = rss_id  
    sql_attr_timestamp      = pubDate   
}  
  
source delta : main  
{  
    sql_query_pre           = SET NAMES utf8  
    sql_query           = SELECT id, pubDate, title, description,nav_id,rss_id FROM st_info where id>=$start and id <=$end and \  
                id >(select max_doc_id from sph_counter where counter_id=1)  
    sql_query_post_index    = replace into sph_counter select 1,max(id) from st_info  
}  
  
#index定义  
index main  
{  
    source              = main              
    path                = D:/coreseek/coreseek-4.1-win32/var/data/mysqlInfoSPHMain   
    docinfo             = extern  
    mlock               = 0  
    morphology          = none  
    min_word_len        = 1  
    html_strip          = 0  
    stopwords       =  
  
    charset_dictpath    =  D:/coreseek/coreseek-4.1-win32/etc      
    charset_type        = zh_cn.utf-8  
}  
  
index delta : main  
{  
    source      = delta  
    path                = D:/coreseek/coreseek-4.1-win32/var/data/mysqlInfoSPHDelta  
     
}  
  
#全局index定义  
indexer  
{  
    mem_limit            = 128M  
}  
  
#searchd服务定义  
searchd  
{  
    listen          = 127.0.0.1:9312  
    read_timeout        = 5  
    max_children        = 30  
    max_matches         = 1000  
    seamless_rotate     = 0  
    preopen_indexes     = 0  
    unlink_old          = 1  
    pid_file            = D:/coreseek/coreseek-4.1-win32/var/log/searchd_mysqlInfoSph.pid  
    log             = D:/coreseek/coreseek-4.1-win32/var/log/searchd_mysqlInfoSph.log  
    query_log           = D:/coreseek/coreseek-4.1-win32/var/log/query_mysqlInfoSph.log  
    binlog_path         =            
    compat_sphinxql_magics  = 0  
}  

注意问题:如果我的主索引为50W条我前天建立的,我昨天增加了10W条的数据,并且建立了增量索引还和主索引合并了,我今天增加了10W的数据并且建立增量索引而且也和主索引合并了,在这两天内我是没有重新建立主索引的,问题来了：昨天是对10W数据进行建立,今天就是20W的数据建立,并且这20W数据中有10W数据其实在主索引中了,这个是非常可怕的?解决方案:

1.一天建立一次主索引

2.在不考虑重新建立主索引的时候,在添加增量索引的时候用sql_query_post_index来改变maxid值我是windows下面手动输入代码成功(不知道用脚本的时候会怎么样)

3.在不考虑重新建立主索引的时候,在合并索引的时候,用脚本链接数据库直接去修改(可以查看:http://banu.blog.163.com/blog/static/2314648201092911412539)

分享到：

coreseek配置文件语法 | coreseek中LibMMsg中文分词

2013-10-28 22:36
浏览 1002
评论(0)
分类:行业应用
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论