APDPlat拓展搜索之集成Solr

全部 Hibernate Spring Struts iBATIS 企业应用 Lucene SOA Java综合 Tomcat 设计模式 OO JBoss

浏览 2857 次

锁定老帖子主题：APDPlat拓展搜索之集成Solr 精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者	正文
yangshangchuan 等级: 初级会员性别: 文章: 1 积分: 40 来自: 北京	发表时间：2014-02-01 最后修改：2014-02-01 相关推荐: APDPlat拓展搜索之集成ElasticSearch python在线考试系统设计csdn下载_一种通用的网页相似度检测算法计算ITEYE博文在百度的收录与排名情况 OSCHINA博文抄袭检查 ITEYE博文抄袭检查更多相关推荐 Java Solr 开发平台 solrj APDPlat APDPlat充分利用Compass的OSEM和ORM integration特性，提供了简单易用且功能强大的内置搜索特性。 APDPlat的内置搜索，在设计简洁优雅的同时，还具备了强大的实时搜索能力，用户只需用注解的方式在模型中指定需要搜索哪些字段（还可在模型之间进行关联搜索）就获得了搜索能力，而不用编写任何代码。平台自动处理索引维护、查询解析、结果高亮等支撑功能。然而APDPlat的内置搜索只能在单机上面使用，不支持分布式，只能用于中小规模的场景。为了支持大规模的分布式搜索和实时分析，APDPlat除了可以选择Compass的进化版ElasticSearch外（APDPlat拓展搜索之集成ElasticSearch），还可以有另外一个选择，那就是Solr。 Solr提供了Java Client API（SolrJ)，我们可以使用SolrJ来和Solr服务器进行交互。首先我们在pom.xml中引入SolrJ依赖： <dependency> <groupId>org.apache.solr</groupId> <artifactId>solr-solrj</artifactId> <version>${solrj.version}</version> </dependency> 接下来我们看一个APDPlat和Solr集成的例子： APDPlat提供了可扩展的日志处理接口，用户可编写自己的插件并在配置文件中指定启用哪些插件，日志处理接口如下： /** * 日志处理接口: * 可将日志存入独立日志数据库（非业务数据库） * 可将日志传递到activemq\rabbitmq\zeromq等消息队列 * 可将日志传递到kafka\flume\chukwa\scribe等日志聚合系统 * 可将日志传递到elasticsearch\solr等搜索服务器 * @author 杨尚川 / public interface LogHandler { public <T extends Model> void handle(List<T> list); } 要想让Solr搜索服务器索引日志数据，我们首先要构造一个HttpSolrServer的实例，然后用待索引的日志对象列表构造一个SolrInputDocument列表，其次就可以调用HttpSolrServer的add和commit方法把SolrInputDocument列表提交给Solr服务器建立索引，最后解析返回结果，判断操作是否成功。构造HttpSolrServer实例需要指定几个配置信息，这些配置信息默认存放在config.properties中，可以在config.local.properties中对其进行覆盖，如下所示： #Solr服务器配置 solr.host=192.168.0.100 solr.port=8983 solr.core=apdplat_for_log solr.max.retries=1 solr.connection.timeout=5 solr.allow.compression =9200 solr.socket.read.timeout=3000 solr.max.connections.per.host=100 solr.max.total.connections=300 当我们在配置Solr服务器的时候，要把core如这里的apdplat_for_log配置为schemaless，否则需要一一指定待索引的日志对象的字段就太麻烦了，因为我们把apdplat_for_log这个core配置为schemaless，所以我们提交的各种各样未知类型的对象都可以索引到同一个core中。我们在建立索引的时候加一个type字段，其值为对象的类名称，这样搜索的时候就可以区分不同的对象。我们看看如何构造HttpSolrServer： private static final String host = PropertyHolder.getProperty("solr.host"); private static final String port = PropertyHolder.getProperty("solr.port"); private static final String core = PropertyHolder.getProperty("solr.core"); private static final int maxRetries = PropertyHolder.getIntProperty("solr.max.retries"); private static final int connectionTimeout = PropertyHolder.getIntProperty("solr.connection.timeout"); private static final boolean allowCompression = PropertyHolder.getBooleanProperty("solr.allow.compression"); private static final int socketReadTimeout = PropertyHolder.getIntProperty("solr.socket.read.timeout"); private static final int maxConnectionsPerHost = PropertyHolder.getIntProperty("solr.max.connections.per.host"); private static final int maxTotalConnections = PropertyHolder.getIntProperty("solr.max.total.connections"); private static SolrServer solrServer; public SolrLogHandler(){ LOG.info("solr.host: "+host); LOG.info("solr.port: "+port); LOG.info("solr.core: "+core); LOG.info("solr.max.retries: "+maxRetries); LOG.info("solr.connection.timeout: "+connectionTimeout); LOG.info("solr.allow.compression: "+allowCompression); LOG.info("solr.socket.read.timeout: "+socketReadTimeout); LOG.info("solr.max.connections.per.host: "+maxConnectionsPerHost); LOG.info("solr.max.total.connections: "+maxTotalConnections); String url = "http://"+host+":"+port+"/solr/"+core+"/"; LOG.info("初始化Solr服务器连接："+url); HttpSolrServer httpSolrServer = new HttpSolrServer(url); httpSolrServer.setMaxRetries(maxRetries); httpSolrServer.setConnectionTimeout(connectionTimeout); httpSolrServer.setAllowCompression(allowCompression); httpSolrServer.setSoTimeout(socketReadTimeout); httpSolrServer.setDefaultMaxConnectionsPerHost(maxConnectionsPerHost); httpSolrServer.setMaxTotalConnections(maxTotalConnections); solrServer = httpSolrServer; } 值得注意的是这里的url： String url = "http://"+host+":"+port+"/solr/"+core+"/"; 接下来要把日志对象列表转换为SolrInputDocument列表： public <T extends Model> List<SolrInputDocument> getSolrInputDocuments(List<T> list){ int j = 1; //构造批量索引请求 List<SolrInputDocument> docs = new ArrayList<>(list.size()); LOG.info("开始构造Solr文档"); for(T model : list){ try{ String simpleName = model.getClass().getSimpleName(); LOG.debug((j++)+"、simpleName: 【"+simpleName+"】"); SolrInputDocument doc = new SolrInputDocument(); Field[] fields = model.getClass().getDeclaredFields(); int len = fields.length; for(int i = 0; i < len; i++){ Field field = fields[i]; String name = field.getName(); field.setAccessible(true); Object value = field.get(model); //小心空指针异常，LogHandler线程会悄无声息地推出！ if(value == null){ LOG.debug("忽略空字段："+name); continue; } LOG.debug("name: "+name+" value: "+value); doc.addField(name, value); } //日志类型（类名称） doc.addField("type", simpleName); //增加主键 UUID uuid = UUID.randomUUID(); doc.addField("id", uuid.toString()); docs.add(doc); }catch(IllegalAccessException \| IllegalArgumentException \| SecurityException e){ LOG.error("构造索引请求失败【"+model.getMetaData()+"】\n"+model, e); } } LOG.info("Solr文档构造完毕"); return docs; } 这里，我们用UUID生成了一个随机主键，增加了一个type字段，其值为类名称，使用反射的方式取得日志对象的字段名称和字段值。文档列表准备完毕之后，就可以提交索引请求了： solrServer.add(docs); UpdateResponse updateResponse = solrServer.commit(); 然后处理返回结果，判断索引操作是否成功： int status = updateResponse.getStatus(); if(status==0){ LOG.info("成功为Core: "+core+" 提交 "+docs.size()+" 个文档"); }else{ LOG.info("索引提交失败，status："+status); } 下面是SolrLogHandler完整的实现： /* * * 日志处理插件: * 将日志保存到Solr中 * 进行高性能实时搜索和分析 * 支持大规模分布式搜索 * * @author 杨尚川 / @Service public class SolrLogHandler implements LogHandler{ private static final APDPlatLogger LOG = new APDPlatLogger(SolrLogHandler.class); private static final String host = PropertyHolder.getProperty("solr.host"); private static final String port = PropertyHolder.getProperty("solr.port"); private static final String core = PropertyHolder.getProperty("solr.core"); private static final int maxRetries = PropertyHolder.getIntProperty("solr.max.retries"); private static final int connectionTimeout = PropertyHolder.getIntProperty("solr.connection.timeout"); private static final boolean allowCompression = PropertyHolder.getBooleanProperty("solr.allow.compression"); private static final int socketReadTimeout = PropertyHolder.getIntProperty("solr.socket.read.timeout"); private static final int maxConnectionsPerHost = PropertyHolder.getIntProperty("solr.max.connections.per.host"); private static final int maxTotalConnections = PropertyHolder.getIntProperty("solr.max.total.connections"); private static SolrServer solrServer; public SolrLogHandler(){ LOG.info("solr.host: "+host); LOG.info("solr.port: "+port); LOG.info("solr.core: "+core); LOG.info("solr.max.retries: "+maxRetries); LOG.info("solr.connection.timeout: "+connectionTimeout); LOG.info("solr.allow.compression: "+allowCompression); LOG.info("solr.socket.read.timeout: "+socketReadTimeout); LOG.info("solr.max.connections.per.host: "+maxConnectionsPerHost); LOG.info("solr.max.total.connections: "+maxTotalConnections); String url = "http://"+host+":"+port+"/solr/"+core+"/"; LOG.info("初始化Solr服务器连接："+url); HttpSolrServer httpSolrServer = new HttpSolrServer(url); httpSolrServer.setMaxRetries(maxRetries); httpSolrServer.setConnectionTimeout(connectionTimeout); httpSolrServer.setAllowCompression(allowCompression); httpSolrServer.setSoTimeout(socketReadTimeout); httpSolrServer.setDefaultMaxConnectionsPerHost(maxConnectionsPerHost); httpSolrServer.setMaxTotalConnections(maxTotalConnections); solrServer = httpSolrServer; } @Override public <T extends Model> void handle(List<T> list) { LOG.info("开始将 "+list.size()+" 个日志对象索引到Solr服务器"); long start = System.currentTimeMillis(); index(list); long cost = System.currentTimeMillis() - start; LOG.info("耗时："+ConvertUtils.getTimeDes(cost)); } /* * 批量索引 * 批量提交 * * @param <T> 泛型参数 * @param list 批量模型 / public <T extends Model> void index(List<T> list){ List<SolrInputDocument> docs = getSolrInputDocuments(list); //批量提交索引 try{ LOG.info("开始批量提交索引文档"); solrServer.add(docs); UpdateResponse updateResponse = solrServer.commit(); int status = updateResponse.getStatus(); if(status==0){ LOG.info("成功为Core: "+core+" 提交 "+docs.size()+" 个文档"); }else{ LOG.info("索引提交失败，status："+status); } LOG.info("ResponseHeader:\n"+updateResponse.getResponseHeader().toString()); LOG.info("Response:\n"+updateResponse.getResponse().toString()); //加速内存释放 docs.clear(); }catch(IOException \| SolrServerException e){ LOG.error("批量提交索引失败", e); } } /* * 把对象列表转换为SOLR文档列表 * @param <T> 对象类型 * @param list 对象列表 * @return SOLR文档列表 */ public <T extends Model> List<SolrInputDocument> getSolrInputDocuments(List<T> list){ int j = 1; //构造批量索引请求 List<SolrInputDocument> docs = new ArrayList<>(list.size()); LOG.info("开始构造Solr文档"); for(T model : list){ try{ String simpleName = model.getClass().getSimpleName(); LOG.debug((j++)+"、simpleName: 【"+simpleName+"】"); SolrInputDocument doc = new SolrInputDocument(); Field[] fields = model.getClass().getDeclaredFields(); int len = fields.length; for(int i = 0; i < len; i++){ Field field = fields[i]; String name = field.getName(); field.setAccessible(true); Object value = field.get(model); //小心空指针异常，LogHandler线程会悄无声息地推出！ if(value == null){ LOG.debug("忽略空字段："+name); continue; } LOG.debug("name: "+name+" value: "+value); doc.addField(name, value); } //日志类型（类名称） doc.addField("type", simpleName); //增加主键 UUID uuid = UUID.randomUUID(); doc.addField("id", uuid.toString()); docs.add(doc); }catch(IllegalAccessException \| IllegalArgumentException \| SecurityException e){ LOG.error("构造索引请求失败【"+model.getMetaData()+"】\n"+model, e); } } LOG.info("Solr文档构造完毕"); return docs; } } 最后我们在配置文件config.local.properties中指定log.handlers的值为SolrLogHandler类的Spring bean name solrSearchLogHandler，因为SolrLogHandler类加了Spring的@Service注解： log.handlers=solrLogHandler APDPlat托管在Github 声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

论坛首页 → Java企业应用版

跳转论坛: