中文搜索引擎coreseek安装测试 -

sharkl

浏览: 40962 次
性别:
来自: 南京

最近访客更多访客>>

只为学习

tedeum

116856645

xmgcoffee

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

中文搜索引擎coreseek安装测试

博客分类：

MySQL

搜索引擎 MySQL SQL Windows 全文检索

首先是安装过程，参见coreseek官网安装过程：

http://www.coreseek.cn/products-install/install_on_windows/
其中第三项： http://www.coreseek.cn/uploads/csft/3.2/coreseek-3.2.13-win32.zip可以直接下载解压即可，无需安装，也无需安装其他扩展包。

安装好后接下来需要配置，我以mysql为例，假设我mysql的用户名为root，密码为samyou，需要建立索引的数据库为test数据库的表userinfo。

到coreseek安装目录下：..\coreseek-3.2.13-win32\etc找到配置文件csft_mysql.conf（默认的配置文件csft.conf是针对xml数据源，这里我们使用mysql数据源），将该配置文件做如下修改：

#源定义
source mysql
{
type = mysql

sql_host     = localhost
sql_user     = root
sql_pass     = samyou
sql_db      = test
sql_port     = 3306
sql_query_pre    = SET NAMES utf8

sql_query     = SELECT id, name,tel,email,duty FROM userinfo
            #sql_query第一列id需为整数
            #title、content作为字符串/文本字段，被全文索引
sql_attr_uint    = id    #从SQL读取到的值必须为整数
sql_attr_timestamp   = date_added      #从SQL读取到的值必须为整数，作为时间属性

sql_query_info = SELECT * FROM userinfo WHERE id=$id #命令行查询时，从数据库读取原始数据信息
}

#index定义
index mysql
{
source    = mysql             #对应的source名称
path    = var/data/mysql
docinfo    = extern
mlock    = 0
morphology   = none
min_word_len   = 1
html_strip     = 0
#charset_dictpath = /usr/local/mmseg3/etc/ #BSD、Linux环境下设置，/符号结尾
charset_dictpath = D:\coreseekthings\coreseek\coreseek-3.2.13-win32\etc/       #Windows环境下设置，/符号结尾，注意这里需要用绝对路径
charset_type   = zh_cn.utf-8
ngram_len = 0
}

#全局index定义
indexer
{
mem_limit = 128M
}

#searchd服务定义
searchd
{
    listen                  =   9312
read_timeout   = 5
max_children   = 30
max_matches    = 1000
seamless_rotate   = 0
preopen_indexes   = 0
unlink_old    = 1
pid_file = var/log/searchd_mysql.pid
log = var/log/searchd_mysql.log
query_log = var/log/query_mysql.log
}

以上红色字体部分为所作的修改的地方，修改完成后保存配置文件。

完成配置后可以在windows的cmd命令行窗口中进行测试，具体可以按照http://www.coreseek.cn/products/products-install/install_on_windows/中的方法进行，但注意将其中的

csft.conf修改为我们使用的csft_mysql.conf文件，例如测试关键字时使用命令bin\search -c etc\csft_mysql.conf -a 关键字1 关键字2 ...

另外，cmd窗口中不支持utf8，所以不能在此窗口中测试中文（明天试试在java中测试中文）

ps:coreseek因为是通过对索引检索来提高检索效率，所以在配置好数据源后需要首先建立所以表，测试中的bin\indexer -c etc\csft_mysql.conf --all语句就是针对数据源建立索引表

我们从java里来用这个东西。

首先从目录下打开服务，cmd窗口到coreseek的安装目录，运行bin\searchd -c etc\csft_mysql.conf启动服务。

到该目录下的\coreseek-3.2.13-win32\api\java文件夹下运行mk.cmd得到java接口包，再运行mkdoc.cmd得到java接口文档，然后将该生成的jar包引入java工程即可。

下面是哥借鉴的并调试通过的代码：

public class CoreSeekMain
{

/**
* @param args
*/
public static void main(String[] args)
{
   StringBuffer q = new StringBuffer();
        String host = "localhost";
        int port = 9312; //从配置文件中得到
        int mode = SphinxClient.SPH_MATCH_EXTENDED;
        String index = "*";
        int offset = 0;
        int limit = 50;
        SphinxClient cl = new SphinxClient();
        q.append("samyou090");
      //设置sphinx 服务端，和端口
        try
        {
        cl.SetServer ( host, port );
            cl.SetWeights ( new int[] { 100, 1 } );
            //设置查询模式
            cl.SetMatchMode ( mode );
            //取20条
            cl.SetLimits ( offset, limit );
            SphinxResult res = cl.Query(q.toString(), index);
            if ( res==null )
            {
                System.err.println ( "Error: " + cl.GetLastError() );
                System.exit ( 1 );
            }
            if ( cl.GetLastWarning()!=null && cl.GetLastWarning().length()>0 )
                System.out.println ( "WARNING: " + cl.GetLastWarning() + "\n" );

            /* print out result*/
            System.out.println ( "Query '" + q + "' retrieved " + res.total + " of " + res.totalFound + " matches in " + res.time + " sec." );
            System.out.println ( "Query stats:" );
            for ( int i=0; i<res.words.length; i++ )
            {
                SphinxWordInfo wordInfo = res.words[i];
                System.out.println ( "\t" + wordInfo.word + "' found " + wordInfo.hits + " times in " + wordInfo.docs + " documents" );
            }
            /*print out matches*/
            System.out.println ( "\nMatches: "+res.matches.length );
            for ( int i=0; i<res.matches.length; i++ )
            {
                SphinxMatch info = res.matches[i];
                System.out.print ( (i+1) + ". id=" + info.docId + ", weight=" + info.weight );

                if ( res.attrNames==null || res.attrTypes==null )
                    continue;

                for ( int a=0; a<res.attrNames.length; a++ )
                {
                    System.out.print ( ", " + res.attrNames[a] + "=" );

                    if ( ( res.attrTypes[a] & SphinxClient.SPH_ATTR_MULTI )!=0 )
                    {
                        System.out.print ( "(" );
                        long[] attrM = (long[]) info.attrValues.get(a);
                        if ( attrM!=null )
                            for ( int j=0; j<attrM.length; j++ )
                        {
                            if ( j!=0 )
                                System.out.print ( "," );
                            System.out.print ( attrM[j] );
                        }
                        System.out.print ( ")" );

                    } else
                    {
                        switch ( res.attrTypes[a] )
                        {
                            case SphinxClient.SPH_ATTR_INTEGER:
                            case SphinxClient.SPH_ATTR_ORDINAL:
                            case SphinxClient.SPH_ATTR_FLOAT:
                            case SphinxClient.SPH_ATTR_BIGINT:
                                /* longs or floats; print as is */
                                System.out.print ( info.attrValues.get(a) );
                                break;

                            case SphinxClient.SPH_ATTR_TIMESTAMP:
                                Long iStamp = (Long) info.attrValues.get(a);
                                Date date = new Date ( iStamp.longValue()*1000 );
                                System.out.print ( date.toString() );
                                break;

                            default:
                                System.out.print ( "(unknown-attr-type=" + res.attrTypes[a] + ")" );
                        }
                    }
                }

                System.out.println();
            }

   }
        catch (Exception e)
   {
    e.printStackTrace();
   }



}