Hibernate search

gaolixu

浏览: 451203 次
性别:
来自: 深圳

最近访客更多访客>>

fantaxy025025

zlalalal

dongguangming88

u012363178

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hibernate

Hibernate lucene 全文检索搜索引擎 junit

Hibernate Search是Hibernate的子项目，把数据库全文检索能力引入到项目中，并通过"透明"(不影响既有系统)的配置，提供一套标准的全文检索接口。这一章我们就来学习这块内容。

全文检索的概念

在进入正文之前，有必要介绍一下全文检索的概念。简单来说，Google就是一个全文检索引擎。全文检索允许用户输入一些关键字，从数据层中查找到所需要的信息。此外全文检索和数据库"LIKE"语句相比，没有数据库开销或是数据库的开销非常小，因为检索过程全部从通过检索文件完成，因此效率非常高。此外，全文检索引擎可以提供的还远不止"LIKE"语句这么多。在全文检索领域，用户输入的搜索信息叫做关键字，而全文检索系统把海量信息按照这些关键字进行结构化处理，把文章打散成段落、文字，最后，按关键字对文章的数据进行分类。这个处理后的数据文本叫做检索文件，检索文件往往比实际数据小得多，但它的数据所包含的信息量损失却非常小。当用户输入一个关键字时，全文检索引擎可以很快地定位到相关文本。

什么是Lucene

Lucene是一个开源的全文检索引擎，目前已经成为了Apache基金会赞助项目。Lucene是Java社区非常流行的全文检索引擎，功能强大。它不仅可以检索一般的数据文本，还可以检索PDF、HTML及微软的Word文件等。此外，Lucene成功的原因之一是它开放的框架，几乎框架的每一部分都可以扩展。它的文本分析器可以定制，检索文件存储方式可以定制，查询引擎也有不同的可选方案，如果愿意，还可以自已定制。此外，它提供一套非常强大的API接口，使客户用起来很方便。此外，Lucene除支持非结构化检索\footnote{用户输入一个关键字，全文检索引擎去匹配任何字段包含该关键字的数据条目。}外，还支持结构化检索(用户可以指定具体搜索的model类、字段名以及搜索条件)。这章的重点不是Lucene，但做为Hibernate Search的核心，您有必要对它的基本概念有所了解。下面介绍一些Lucene中的基本概念：

Document：在Lucene中，一个Document即一个搜索单元。举例来说：如果对一个用户表做检索，那么每条用户信息就是一个Document。
Field：每一个Document都包含一或多个Field，每一个Field都是key-value数据对。
Analyzer：分析器/断字器。这是全文检索引擎的心脏，如何将一篇文章打散成一些关键字，并能够不丢失信息量，这是一门单独的学科。Lucene提供多种Analyzer，并提供开放的接口让社区的专家提供新的Analyzer。
Index：系统生成的检索信息，这里面存储了Document。
IndexSearcher：IndexSearcher负责检索Index内容负责给出检索结果。
IndexWriter：IndexWriter负责调用Analyzer，分析后生成Index。

Lucene、Hibernate Search及Hibernate的联系

如果在本项目中直接使用Lucene，将不得不面临一些问题。因为本项目是基于数据库的，因此，当数据库中的数据发生变化时，就必须手工触发Lucene，让它随之更新检索文件中的内容，使之与数据库中的实际数据保持一致。这也就意味着dao中的每一个函数都要插入一段Lucene的代码，这样做有违OCP原则，这一层面应被提取到单独的逻辑层。此外model类别如何映射到全文检索引擎中，这也是一个问题，必须要手工处理这种映射关系，这样使用Lucene的代价就大大增加了。为了解决这些使用上的问题，Hibernate Search应运而生。

那么，Lucene、Hibernate Search及Hibernate三者之间是什么样的关系呢？请见下图：

如图所示，Hibernate+Hibernate Search位于全文检索数据目录及实际数据库中间。一方面，Hibernate处理与数据库相关的事宜，另一方面Hibernate Search会根据数据库中实际数据的情况，自动触发更新全文检索数据目录。此外Hibernate Search自动完成model层数据类对Lucene检索文件结构的映射。理论总是很枯躁，接下来依然拿报名系统来展示具体使用方法。

安装Hibernate Search

如果需要在项目中使用Hibernate Search功能，请在Maven的pom.xml配置文件中添加下述dependency：

Xml代码 
<dependency>  

    <groupId>org.hibernate</groupId>  

    <artifactId>hibernate-search</artifactId>  

    <version>3.0.0.GA</version>  

</dependency>  
<dependency>
	<groupId>org.hibernate</groupId>
	<artifactId>hibernate-search</artifactId>
	<version>3.0.0.GA</version>
</dependency>

1、创建POJO
package com.yehui;

import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.JoinColumn;
import javax.persistence.ManyToOne;
import javax.persistence.Table;

import org.hibernate.search.annotations.DocumentId;
import org.hibernate.search.annotations.Field;
import org.hibernate.search.annotations.Index;
import org.hibernate.search.annotations.Indexed;
import org.hibernate.search.annotations.IndexedEmbedded;
import org.hibernate.search.annotations.Store;

/** *//**
* Employee generated by MyEclipse Persistence Tools
*/
@Entity
@Table(name = "employee", catalog = "hise", uniqueConstraints = ...{})
@Indexed(index = "indexes/employee")
public class Employee implements java.io.Serializable ...{
    private static final long serialVersionUID = 7794235365739814541L;
    private Integer empId;
    private String empName;
    private Department dept;
    private String empNo;
    private Double empSalary;

// Constructors

    /** *//** default constructor */
    public Employee() ...{
    }

    /** *//** minimal constructor */
    public Employee(Integer empId) ...{
        this.empId = empId;
    }

    /** *//** full constructor */
    public Employee(Integer empId, String empName,
            String empNo, Double empSalary) ...{
        this.empId = empId;
        this.empName = empName;
        this.empNo = empNo;
        this.empSalary = empSalary;
    }

    // Property accessors
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    @Column(name = "emp_id", unique = true, nullable = false, insertable = true, updatable = true)
    @DocumentId
    public Integer getEmpId() ...{
        return this.empId;
    }

    public void setEmpId(Integer empId) ...{
        this.empId = empId;
    }

    @Column(name = "emp_name", unique = false, nullable = true, insertable = true, updatable = true, length = 30)
    @Field(name="name", index=Index.TOKENIZED, store=Store.YES)
    public String getEmpName() ...{
        return this.empName;
    }

    public void setEmpName(String empName) ...{
        this.empName = empName;
    }

    @Column(name = "emp_no", unique = false, nullable = true, insertable = true, updatable = true, length = 30)
    @Field(index=Index.UN_TOKENIZED)
    public String getEmpNo() ...{
        return this.empNo;
    }

    public void setEmpNo(String empNo) ...{
        this.empNo = empNo;
    }

    @Column(name = "emp_salary", unique = false, nullable = true, insertable = true, updatable = true, precision = 7)
    public Double getEmpSalary() ...{
        return this.empSalary;
    }

    public void setEmpSalary(Double empSalary) ...{
        this.empSalary = empSalary;
    }

    @ManyToOne(cascade = CascadeType.ALL)
    @JoinColumn(name="dept_id")
    @IndexedEmbedded(prefix="dept_", depth=1)
    public Department getDept() ...{
        return dept;
    }

    public void setDept(Department dept) ...{
        this.dept = dept;
    }

}
package com.yehui;

import java.util.List;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.OneToMany;
import javax.persistence.Table;

import org.hibernate.search.annotations.ContainedIn;
import org.hibernate.search.annotations.DocumentId;
import org.hibernate.search.annotations.Field;
import org.hibernate.search.annotations.Index;
import org.hibernate.search.annotations.Indexed;
import org.hibernate.search.annotations.Store;

/** *//**
* Department generated by MyEclipse Persistence Tools
*/
@Entity
@Table(name = "department", catalog = "hise", uniqueConstraints = ...{})
@Indexed(index="indexes/department")
public class Department implements java.io.Serializable ...{
    private static final long serialVersionUID = 7891065193118612907L;
    private Integer deptId;
    private String deptNo;
    private String deptName;
    private List<Employee> empList;

// Constructors

    @OneToMany(mappedBy="dept")
    @ContainedIn
    public List<Employee> getEmpList() ...{
        return empList;
    }

    public void setEmpList(List<Employee> empList) ...{
        this.empList = empList;
    }

    /** *//** default constructor */
    public Department() ...{
    }

    /** *//** minimal constructor */
    public Department(Integer deptId) ...{
        this.deptId = deptId;
    }

    /** *//** full constructor */
    public Department(Integer deptId, String deptNo, String deptName) ...{
        this.deptId = deptId;
        this.deptNo = deptNo;
        this.deptName = deptName;
    }

    // Property accessors
    @Id
    @GeneratedValue(strategy=GenerationType.AUTO)
    @Column(name = "dept_id", unique = true, nullable = false, insertable = true, updatable = true)
    @DocumentId
    public Integer getDeptId() ...{
        return this.deptId;
    }

    public void setDeptId(Integer deptId) ...{
        this.deptId = deptId;
    }

    @Column(name = "dept_no", unique = false, nullable = true, insertable = true, updatable = true, length = 30)
    public String getDeptNo() ...{
        return this.deptNo;
    }

    public void setDeptNo(String deptNo) ...{
        this.deptNo = deptNo;
    }

    @Column(name = "dept_name", unique = false, nullable = true, insertable = true, updatable = true, length = 30)
    @Field(name="name", index=Index.TOKENIZED,store=Store.YES)
    public String getDeptName() ...{
        return this.deptName;
    }

    public void setDeptName(String deptName) ...{
        this.deptName = deptName;
    }
}         不了解Hibernate映射相关的Annotation的朋友可以到Hibernate的官方网站下载Hibernate Annotation Reference，有http://wiki.redsaga.com/翻译的中文文档。当然，也可以直接使用hbm.xml文件。
        Hibernate Search相关的Annotation主要有两个：
         @Indexed                标识需要进行索引的对象，
         属性        index         指定索引文件的路径
          @Field                     标注在类的get属性上，标识一个索引的Field
          属性       index         指定是否索引，与Lucene相同
                         store         指定是否索引，与Lucene相同
                         name        指定Field的name，默认为类属性的名称
                         analyzer    指定分析器

         另外@IndexedEmbedded 与 @ContainedIn 用于关联类之间的索引
          @IndexedEmbedded有两个属性，一个prefix指定关联的前缀，一个depth指定关联的深度
          如上面两个类中Department类可以通过部门名称name来索引部门，在Employee与部门关联的前缀为dept_，因此可以通过部门名称dept_name来索引一个部门里的所有员工。

2、配置文件
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE hibernate-configuration PUBLIC
"-//Hibernate/Hibernate Configuration DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">

<hibernate-configuration>

<session-factory>
    <property name="hibernate.dialect">
        org.hibernate.dialect.MySQLDialect
    </property>
    <property name="hibernate.connection.url">
        jdbc:mysql://localhost:3306/hise
    </property>
    <property name="hibernate.connection.username">root</property>
    <property name="hibernate.connection.password">123456</property>
    <property name="hibernate.connection.driver_class">
        com.mysql.jdbc.Driver
    </property>

    <property name="hibernate.search.default.directory_provider">
        org.hibernate.search.store.FSDirectoryProvider
    </property>
    <property name="hibernate.search.default.indexBase">e:/index</property>

    <mapping class="com.yehui.Employee" />
    <mapping class="com.yehui.Department" />
</session-factory>

</hibernate-configuration>如果使用JPA，配置文件为
<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://java.sun.com/xml/ns/persistence
    http://java.sun.com/xml/ns/persistence/persistence_1_0.xsd" version="1.0">

    <persistence-unit name="jpaPU" transaction-type="RESOURCE_LOCAL">
        <provider>org.hibernate.ejb.HibernatePersistence</provider>
        <class>com.yehui.Department</class>
        <class>com.yehui.Employee</class>
        <properties>
            <property name="hibernate.connection.driver_class"
                value="com.mysql.jdbc.Driver" />
            <property name="hibernate.connection.url"
                value="jdbc:mysql://localhost:3306/hise" />
            <property name="hibernate.connection.username" value="root" />
            <property name="hibernate.connection.password"
                value="123456" />
            <property name="hibernate.search.default.directory_provider"
                value="org.hibernate.search.store.FSDirectoryProvider"/>
            <property name="hibernate.search.default.indexBase"
                value="e:/index"/>
        </properties>
    </persistence-unit>

</persistence>主要就是添加两个属性，hibernate.search.default.directory_provider指定Directory的代理，即把索引的文件保存在硬盘中（org.hibernate.search.store.FSDirectoryProvider）还是内存里（org.hibernate.search.store.RAMDirectoryProvider），保存在硬盘的话hibernate.search.default.indexBase属性指定索引保存的路径。

3、测试代码
package com.yehui;

import static junit.framework.Assert.assertNotNull;
import static junit.framework.Assert.assertTrue;

import java.util.List;

import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.queryParser.QueryParser;
import org.hibernate.Query;
import org.hibernate.Session;
import org.hibernate.SessionFactory;
import org.hibernate.Transaction;
import org.hibernate.cfg.AnnotationConfiguration;
import org.hibernate.search.FullTextSession;
import org.hibernate.search.Search;
import org.junit.After;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;

public class SearchResultsHibernate ...{
    private static SessionFactory sf = null;
    private static Session session = null;
    private static Transaction tx = null;

    @BeforeClass
    public static void setupBeforeClass() throws Exception ...{
        sf = new AnnotationConfiguration().configure("hibernate.cfg.xml").buildSessionFactory();

assertNotNull(sf);
}

    @Before
    public void setUp() throws Exception ...{
        session = sf.openSession();
        tx = session.beginTransaction();
        tx.begin();
    }

    @After
    public void tearDown() throws Exception ...{
        tx.commit();
        session.close();
    }

    public static void tearDownAfterClass() throws Exception ...{
        if (sf != null)
            sf.close();
    }

    @Test
    public void testAddDept() throws Exception ...{
        Department dept = new Department();
        dept.setDeptName("Market");
        dept.setDeptNo("6000");

        Employee emp = new Employee();
        emp.setDept(dept);
        emp.setEmpName("Kevin");
        emp.setEmpNo("KGP1213");
        emp.setEmpSalary(8000d);

        session.save(emp);
    }

    @Test
    public void testFindAll() throws Exception ...{
        Query query = session.createQuery("from Department");

List<Department> deptList = query.list();

        assertTrue(deptList.size() > 0);
    }

    @Test
    public void testIndex() throws Exception ...{
        FullTextSession fullTextSession = Search.createFullTextSession(session);
        assertNotNull(session);

        QueryParser parser = new QueryParser("name", new StopAnalyzer());
        org.apache.lucene.search.Query luceneQuery = parser
                .parse("name:Kevin");
        Query hibQuery = fullTextSession.createFullTextQuery(luceneQuery,
                Employee.class);

        List list = hibQuery.list();
        assertTrue(list.size() > 0);
    }

    @Test
    public void testIndex2() throws Exception ...{
        FullTextSession fullTextSession = Search.createFullTextSession(session);
        assertNotNull(session);

        QueryParser parser = new QueryParser("dept_name", new StopAnalyzer());
        org.apache.lucene.search.Query luceneQuery = parser
                .parse("dept_name:Market");
        Query hibQuery = fullTextSession.createFullTextQuery(luceneQuery,
                Employee.class);

        List list = hibQuery.list();
        assertTrue(list.size() > 0);
    }
}

分享到：

泛型上限与下限 | Ehcache 缓存

2011-02-21 14:44
浏览 3429
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论