zoie-3.3+lucene3.5实时检索和查询

leiyongping88

浏览: 77276 次
性别:
来自: 深圳

最近访客更多访客>>

xlscutcs

youling0548

xx5333

juggerhoo

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Lucene

表1.1 Lucene版本发布历史

版本	发布日期	里程碑
0.01	2000年3月30日	在SourceForge网站第一次开源发布
0.04	2000年4月19日	包含基于语法的语汇单元化StandardTokenizer等
1.0	2000年10月日	修复bug，性能优化
1.01b	2001年6月2日	在SourceForge网站最后一次发布，修复bug，支持前缀查询
1.2 rc1	2001年10月2日	在Apache Jakarta第一次发布
1.2 rc2	2001年10月19日	发布源代码，修复bug
1.2 rc3	2002年1月27日	修复bug
1.2 rc4	2002年2月14日	修复bug
1.2 rc5	2002年5月14日	新增MultiFieldQueryParser等，修复bug
1.2 rc6	2002年6月13日	修改QueryParser支持？通配符
1.3 rc1	2003年3月24日	修改QueryParser支持范围查询，修复bug
1.3 rc2	2003年10月22日	新增CachingWrapperFilter和PerFieldAnalyzerWrapper等，修复bug
1.3 rc3	2003年11月25日	支持minMergeDocs，修复bug
1.3 final	2003年12月26日	修复bug
1.4 rc1	2004年3月29日	修改.tis文件格式，新增ParallelMultiSearcher等
1.4 rc2	2004年3月30日	修复bug
1.4 rc3	2004年5月11日	修复bug
1.4 final	2004年7月1日	修复bug，更新部分API实现
1.4.1	2004年8月2日	修复bug
1.4.2	2004年10月1月	修复bug，优化IndexSearcher
1.4.3	2004年12月7日	修复bug
1.9 rc1	2006年2月21日	新增MMapDirectory等，修复bug
1.9.1 final	2006年2月27日	兼容1.4.3之后的版本，修复bug
1.9.1	2006年3月2日	修复bug
2.0.0	2006年6月1日	修复bug，性能优化，不在兼容1.4.3版本
2.1.0	2007年2月17日	新增FieldSelector等，修复bug，性能优化
2.2.0	2007年6月19日	新增BoostingTermQuery等，修复bug，性能优化
2.3.0	2008年1月23日	新增SpanQueryFilter等，修复bug，性能优化
2.3.1	2008年2月22日	修复bug
2.3.2	2008年5月6日	修复bug
2.4.0	2008年10月8日	新增QueryAutoStopWordAnalyzer等，修复bug，性能优化
2.4.1	2009年3月9日	修复bug
2.9.0	2009年9月25日	新增FieldCacheRangeFilter等，修复bug，性能优化
2.9.1	2009年11月6日	修复bug
2.9.2	2010年2月26日	修复bug，性能优化
2.9.3	2010年6月18日	修复bug，性能优化
2.9.4	2010年12月3日	修复bug，性能优化
3.0.0	2009年11月25日	新增AttributeFactory等，修复bug，性能优化
3.0.1	2010年2月26日	修复bug，性能优化
3.0.2	2010年6月18日	修复bug，性能优化
3.0.3	2010年12月3日	修复bug，性能优化
3.1.0	2011年3月31日	新增ReusableAnalyzerBase等，修复bug，性能优化
3.2.0	2011年6月3日	新增TieredMergePolicy等，修复bug，性能优化
3.3.0	2011年7月1日	新增TwoPhaseCommitTool等，修复bug，性能优化
3.4.0	2011年9月14日	新增FixedBitSet等，修复bug，性能优化
3.5.0	2011年11月27日	新增IndexSearcher.searchAfter等，修复bug，性能优化
3.6.0	2012年4月12日	新增FieldValueFilter等，修复bug，性能优化
3.6.1	2012年7月22日	修复bug，性能优化
3.6.2	2012年12月25日	修复bug
4.0.0-alpha	2012年7月3日	新增RegexpQuery等，修复bug，性能优化
4.0.0-beta	2012年8月13日	新增BloomFilteringPostingsFormat等，修复bug，性能优化
4.0.0	2012年10月12日	新增BlockPostingsFormat等，修复bug
4.1.0	2013年1月22日	新增AnalyzingSuggester和FuzzySuggester等，性能优化

4.2.0

4.3.0

4.4.0

zoie最新版本只支持lucene3.5 lucene3.6+目前不支持，zoie最新版本为zoie-core-3.3.0

spring配置文件:

检索部分：

public class UserIndexJob {
    private Log log = LogFactory.getLog(UserIndexJob.class);

    public static final long MAX_INCREMENT_INDEX_NUMBER = 1300000; // 最大增量索引资源数

    public String userIndexPath;

    private IdCompator idCompator = new IdCompator();// id比较器

    private long _currentVersion = 0L;

    @SuppressWarnings("rawtypes")
    public ZoieSystem zoieSystem;

    //批量大小：即队列中放入多少项方才触发索引
    // 内存中的大小
    private int zoieBatchSize;

    //批量延时：即等待多长时间方才触发索引
    // 最大延迟时间(单位:毫秒)
    private int zoieBatchDelay;

    private float docboost = 1.0f;

    private int rows = 20000;

    public String lastUpdateTime;


    @SuppressWarnings({ "unchecked", "rawtypes" })
    public void init(){
        try {
            System.out.println("UserIndexJob init start");
            //索引文件夹
            File idxDir = new File(userIndexPath);
            if(!idxDir.exists()){
                idxDir.mkdir();
            }
            //数据解析器
            ZoieIndexableInterpreter interpreter = new MyUserInfoDataInterpreter();
            //设置翻译器
            DefaultIndexReaderDecorator decorator = new DefaultIndexReaderDecorator();

            PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new IKAnalyzer());
            ZoieConfig zoieConfig = new ZoieConfig();
            zoieConfig.setBatchDelay(zoieBatchDelay);
            zoieConfig.setBatchSize(zoieBatchSize);
            zoieConfig.setAnalyzer(analyzer);//设置分词器
            zoieConfig.setSimilarity(new DefaultSimilarity());//设置相似性评分器
            zoieConfig.setRtIndexing(true);
            zoieSystem = new ZoieSystem(idxDir, interpreter, decorator, zoieConfig);
            zoieSystem.start();
            zoieSystem.getAdminMBean().flushToDiskIndex();
            System.out.println("UserIndexJob init end");
            setLastUpdateTime();    // 设置上次更新时间
        } catch (Exception e) {
            e.printStackTrace();
            log.error(e, e);
        }
    }



    @SuppressWarnings("unchecked")
    public void doUpdateIndexData(){
        try {
            System.out.println("执行job start");
            // 增量索引，检查资源是否有更新
            UserInfoDao userInfoDao = ServiceFactory.getBean(UserInfoDao.class);
            String maxUpdateTime = userInfoDao.getUserLastModify();
            if (StringUtils.isNotBlank(lastUpdateTime) && lastUpdateTime.equals(maxUpdateTime)) { // 无资源更新
                System.out.println("======user index data no update!========");
                return;
            }
            Thread.sleep(1000);
            long maxId = 0;
            long total = 0;
            int start = 0;
            if (StringUtils.isNotBlank(lastUpdateTime)){// 增量索引
                // 得到增量索引量
                long increNum = userInfoDao.getIncrementIndexResNumber(lastUpdateTime, maxUpdateTime);
                if (increNum > MAX_INCREMENT_INDEX_NUMBER) { // 转换成全量索引
                    lastUpdateTime = null; // 全量索引
                }
            }

            if(StringUtils.isBlank(lastUpdateTime)){
                maxId = userInfoDao.getMaxID();
            }

            long begin = System.currentTimeMillis();
            while (true) {
                List<UserInfo> resList = userInfoDao.findUserIndexInfo(lastUpdateTime,maxUpdateTime,start, rows);
                if (resList.size() > 0) { // 索引资源
                    List<DataEvent<Document>> dataEventList = transform(resList);
                    if (null != dataEventList && !dataEventList.isEmpty()) {
                        zoieSystem.consume(dataEventList);
                    }
                }
                total += resList.size();
                System.out.println("=========user index increment num:" + total);
                if (resList.size() < rows) {
                    break;
                }

                if (null == lastUpdateTime) { // 全量索引
                    if (resList.size() > 0) {
                        Collections.sort(resList, idCompator); // 排序
                        if (resList.get(0).getUserId() >= maxId) { // 全量索引完成
                            break;
                        }
                    }
                }
                start = start + rows;
            }
            long end = System.currentTimeMillis();
            lastUpdateTime = maxUpdateTime;
            System.out.println("============user index increment total num:" + total
                    + ",elasped time " + ((end - begin) / 1000) + " seconds");
        } catch (Exception e) {
            e.printStackTrace();
            log.info(e.getMessage(),e);
        }
    }

    private List<DataEvent<Document>> transform(List<UserInfo> resList) {
        List<DataEvent<Document>> dataEventList = new ArrayList<DataEvent<Document>>();
        if (resList != null && !resList.isEmpty()) {
            for (UserInfo user : resList) {
                Document doc = new Document();
                doc.setBoost(docboost);
                doc.add(new Field("userId", user.getUserId().toString(), Field.Store.YES,Field.Index.NOT_ANALYZED));
                doc.add(new Field("headImg", StringUtils.isNotBlank(user.getHeadImg())?user.getHeadImg() : "", Field.Store.YES,Field.Index.NOT_ANALYZED));
                doc.add(new Field("nickName", StringUtils.isNotBlank(user.getNickName())?user.getNickName():"", Field.Store.YES,Field.Index.ANALYZED));
                doc.add(new Field("loginName", StringUtils.isNotBlank(user.getLoginName())?user.getLoginName():"", Field.Store.YES,Field.Index.NOT_ANALYZED));
                doc.add(new Field("checkinCount", String.valueOf(user.getCheckinCount()), Field.Store.YES,Field.Index.NOT_ANALYZED));
                doc.add(new Field("favoriteCount", String.valueOf(user.getFavoriteCount()), Field.Store.YES,Field.Index.NOT_ANALYZED));
                doc.add(new Field("sex", String.valueOf(user.getSex()), Field.Store.YES,Field.Index.NOT_ANALYZED));
                doc.add(new Field("updatedTime", StringUtils.isNotBlank(user.getUpdatedTime())?user.getUpdatedTime():"", Field.Store.YES,Field.Index.NOT_ANALYZED));
                dataEventList.add(new DataEvent<Document>(doc, "1.0"));
            }
        }
        return dataEventList;
    }


    @SuppressWarnings("unchecked")
    private void setLastUpdateTime() {
        List<ZoieIndexReader<IndexReader>> zoieReaderList = null;
        MultiReader multiReader = null;
        IndexSearcher indexSearcher = null;
        try {
            zoieReaderList = zoieSystem.getIndexReaders();
            multiReader = new MultiReader(zoieReaderList.toArray(new IndexReader[zoieReaderList.size()]), false);
            indexSearcher = new IndexSearcher(multiReader);
            indexSearcher.setSimilarity(new AppSimilarity());
            Sort sort = new Sort(new SortField("updatedTime",SortField.STRING, true));
            TopDocs topDocs = indexSearcher.search(new MatchAllDocsQuery(), 1, sort);
            if (topDocs.totalHits == 0) {
                System.out.println("======search user index path no results======");
                return;
            }
            Document doc = indexSearcher.doc(topDocs.scoreDocs[0].doc);
            lastUpdateTime = DateUtil.formateDate(doc.get("updatedTime"), 1);
            System.out.println("lastUpdateTime:"+lastUpdateTime);
        } catch (IOException e) {
            e.printStackTrace();
        }finally{
            try {
                if(null !=indexSearcher){
                    indexSearcher.close();
                    indexSearcher = null;
                }
                if (null != multiReader) {
                    multiReader.close();
                    multiReader = null;
                }
                if (null != zoieReaderList && !zoieReaderList.isEmpty()) {
                    zoieSystem.returnIndexReaders(zoieReaderList);
                    zoieReaderList = null;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    public void destroy(){
        zoieSystem.shutdown(); // 将内存索引刷新到磁盘索引中
        System.out.println("destroy method");
    }

    private static class IdCompator implements Comparator<UserInfo> {
        public int compare(UserInfo o1, UserInfo o2) {
            if (o2.getUserId() > o1.getUserId()) {
                return 1;
            } else if (o2.getUserId() < o1.getUserId()) {
                return -1;
            }
            return 0;
        }
    }


    public String getZoieVersion() {
        return Long.toString(_currentVersion);
    }

    public String getMinZoieVersion() {
        return Long.toString(0L);
    }

    public String nextZoieVersion() {
        return Long.toString(++_currentVersion);
    }


    public String getUserIndexPath() {
        return userIndexPath;
    }

    public void setUserIndexPath(String userIndexPath) {
        if(StringUtils.isNotBlank(userIndexPath)){
            this.userIndexPath =SearchEngineCore.getResourcePath(userIndexPath)+ File.separator + "userIndex";
        }else{
            this.userIndexPath = SearchEngineCore.getIndexpath("VSOYOU_USER_INDEX_PATH")+ File.separator + "userIndex";
        }
    }

    @SuppressWarnings("rawtypes")
    public ZoieSystem getZoieSystem() {
        return zoieSystem;
    }

    @SuppressWarnings("rawtypes")
    public void setZoieSystem(ZoieSystem zoieSystem) {
        this.zoieSystem = zoieSystem;
    }

    public int getZoieBatchSize() {
        return zoieBatchSize;
    }

    public void setZoieBatchSize(int zoieBatchSize) {
        this.zoieBatchSize = zoieBatchSize;
    }

    public int getZoieBatchDelay() {
        return zoieBatchDelay;
    }

    public void setZoieBatchDelay(int zoieBatchDelay) {
        this.zoieBatchDelay = zoieBatchDelay;
    }
}

实时搜索部分：

public class UserSearch {
    private UserIndexJob userIndexJob;

    @SuppressWarnings("unchecked")
    public Map<String, Object> seachUser(String searchWord, int page, int pageSize) {
        Map<String, Object> map = new HashMap<String, Object>();
        map.put(Const.HEADIMG_DOMAIN_KEY,Const.HEADIMG_DOMAIN_VALUE);
        List<ZoieIndexReader<IndexReader>> zoieReaderList = null;
        MultiReader multiReader = null;
        IndexSearcher indexSearcher = null;
        TopDocs topDocs = null;
        try {
            searchWord = SearchUtil.wmlEncode(searchWord);
            searchWord = SearchUtil.traditionalToSimple(searchWord).trim();// 繁体转简体

            zoieReaderList = userIndexJob.zoieSystem.getIndexReaders();
            multiReader = new MultiReader(zoieReaderList.toArray(new IndexReader[zoieReaderList.size()]), false);
            indexSearcher = new IndexSearcher(multiReader);
            indexSearcher.setSimilarity(new DefaultSimilarity());

            BooleanQuery allQuery = new BooleanQuery();
            QueryParser parser = new QueryParser(Version.LUCENE_35,"nickName",new IKAnalyzer());
            Query query = parser.parse(searchWord);
            query.setBoost(100.0f);
            allQuery.add(query, BooleanClause.Occur.SHOULD);

            QuerySort keywordQuerySort = getKeywordQuerySort(searchWord);
            keywordQuerySort.query.setBoost(50.0f);
            allQuery.add(keywordQuerySort.query, BooleanClause.Occur.SHOULD);

            topDocs = indexSearcher.search(allQuery, page*pageSize, keywordQuerySort.sort);
            if(topDocs == null || topDocs.totalHits ==0){
                map.put("list", null);
                return map;
            }
            map.put("pageCount", getPageCount(topDocs.totalHits,pageSize));
            ScoreDoc[] scoreDocs = topDocs.scoreDocs; // 搜索返回的结果集合
            //查询起始记录位置
            int begin = (page - 1)*pageSize ;
            //查询终止记录位置
            int end = Math.min(begin + pageSize, scoreDocs.length);
            List<UserInfo> userInfos = addHits2List(indexSearcher,scoreDocs,begin,end);
            map.put("list", userInfos);

        } catch (Exception e) {
            e.printStackTrace();
        }finally {
            try {
                if (null != indexSearcher) {
                    indexSearcher.close();
                    indexSearcher = null;
                }
                if (null != multiReader) {
                    multiReader.close();
                    multiReader = null;
                }
                if (null != zoieReaderList && !zoieReaderList.isEmpty()) {
                    userIndexJob.zoieSystem.returnIndexReaders(zoieReaderList);
                    zoieReaderList = null;
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return map;

    }

    private static List<UserInfo> addHits2List(IndexSearcher indexSearcher,ScoreDoc[] scoreDocs, int begin, int end) {
        List<UserInfo> userInfos = new ArrayList<UserInfo>();
        try {
            for (int i = begin; i < end; i++) {
                int docID = scoreDocs[i].doc;
                Document doc = indexSearcher.doc(docID);
                UserInfo userInfo = new UserInfo();
                userInfo.setCheckinCount(Integer.valueOf(doc.get("checkinCount")));
                userInfo.setFavoriteCount(Integer.valueOf(doc.get("favoriteCount")));
                userInfo.setHeadImg(doc.get("headImg"));

                if(StringUtils.isNotBlank(doc.get("nickName"))){
                    userInfo.setNickName(doc.get("nickName"));
                }else{
                    userInfo.setNickName(doc.get("loginName"));
                }
                userInfo.setLoginName(doc.get("loginName"));
                userInfo.setSex(Integer.valueOf(doc.get("sex")));
                userInfo.setUserId(Long.valueOf(doc.get("userId")));
                userInfos.add(userInfo);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        return userInfos;
    }

    private static QuerySort getKeywordQuerySort(String searchWord) {
        QuerySort querySort = new QuerySort();
        querySort.query = new PrefixQuery(new Term("loginName", searchWord));

        //先按记录的得分排序,然后再按记录的签到总数倒序，收藏总数倒序
        querySort.sort = new Sort(new SortField[] {
                new SortField(null,SortField.SCORE,false),
                new SortField("checkinCount", SortField.INT, true),
                new SortField("favoriteCount", SortField.INT, true)
            });
        return querySort;
    }

    private int getPageCount(int rowCount, int pageSize) {
        int pageCount = 1;
        if ((rowCount % pageSize) == 0) {
            pageCount = rowCount / pageSize;
        } else {
            pageCount = rowCount / pageSize + 1;
        }
        if (pageCount == 0) {
            pageCount = 1;
        }
        return pageCount;
    }

    public UserIndexJob getUserIndexJob() {
        return userIndexJob;
    }

    public void setUserIndexJob(UserIndexJob userIndexJob) {
        this.userIndexJob = userIndexJob;
    }

}

分享到：

如何在基于Lucene的中文分词器中添加自定义 ... | mysql性能优化-慢查询分析、优化索引和配 ...

2013-08-30 17:59
浏览 1047
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论