`

hibernate-search-3.3.0.Final中文文档翻译及学习笔记

阅读更多

转自:http://hpi-ls.blog.163.com/blog/static/2021474820101129105312604/

 

开始只是自己看,没想到要翻译,从第四章开始进行翻译,主要章节基本全部进行了翻译。文档中前面是英文,后面是中文翻译,一一对应。

5、Tuning Lucene indexing performance. 2

ch4. 3

4.3. Analysis 4

4.4. Bridges 4

4.4.1. Built-in bridges 4

4.4.2. Custom bridges 5

Important 9

4.5. Providing your own id. 13

4.6. Programmatic API 13

Chapter 5. Querying. 13

Note. 15

5.1. Building queries 15

5.1.1. Building a Lucene query using the Lucene API 15

5.1.2. Building a Lucene query with the Hibernate Search query DSL. 15

Note. 18

Note. 20

5.1.3. Building a Hibernate Search query. 24

Tip. 26

Chapter 6. Manual index changes 28

6.1. Adding instances to the index. 28

6.2. Deleting instances from the index. 29

Note. 30

6.3. Rebuilding the whole index. 30

6.3.1. Using flushToIndexes() 31

Note. 32

6.3.2. Using a MassIndexer 32

Warning. 33

Tip. 34

Note. 34

Chapter 7. Index Optimization. 35

7.1. Automatic optimization. 36

7.2. Manual optimization. 36

Note. 37

7.3. Adjusting optimization

. 37


1

You  can  think  of  those  two  batch  modes  (no  scope  vs  transactional)  as  the  equivalent  of the  (infamous) autocommit vs  transactional behavior. From a performance perspective,  the  intransaction mode is recommended. The scoping choice is made transparently. Hibernate Search detects the presence of a transaction and adjust the scoping.

 

the  intransaction mode is recommended

 

2

The  good  news  is  that  Hibernate  Search  is  enabled  out  of  the  box  when  detected on  the  classpath  by  Hibernate  Core.  If,  for  some  reason  you  need  to  disable  it,  set hibernate.search.autoregister_listeners  to  false.  Note  that  there  is  no  performance penalty when the listeners are enabled but no entities are annotated as indexed.

 

3

By default, every  time an object  is  inserted, updated or deleted  through Hibernate, Hibernate Search updates  the according Lucene  index.  It  is sometimes desirable  to disable  that  features if either your  index  is  read-only or  if  index updates are done  in a batch way  (see Section 6.3,“Rebuilding the whole index”).

 

To disable event based indexing, set

hibernate.search.indexing_strategy = manual

 

4

 

The different reader strategies are described in Reader strategy. Out of the box strategies are

* shared: share index readers across several queries. This strategy is the most efficient.

* not-shared: create an index reader for each individual query

The default reader strategy is shared. This can be adjusted:

hibernate.search.reader.strategy = not-shared

 

5Tuning Lucene indexing performance

hibernate.search.[default|<indexname>].exclusive_index_useSet to true when no other process will need to write to the same index. This will enable Hibernate Search to work in exlusive mode on the index and improve performance when writing changes to the index. Default valuefalse (releases locks as soon as possible)

 

When your architecture permits it, always set hibernate.search.default.exclusive_index_use=true as it greatly improves efficiency in index writing.

 

6LockFactory configuration

 

Lucene Directorys have default locking strategies which work well for most cases, but it's possible to specify for each index managed by Hibernate Search which LockingFactory you want to use.

 

ch4

Hibernate Search的配置必须使用注解,目前不提供xml配置。

7@Indexed

Foremost we must declare a persistent class as indexable. This is done by annotating the class

with @Indexed (all entities not annotated with @Indexed will be ignored by the indexing process):

不使用@Indexed注解的实体将被忽略,即不被索引。

You can optionially specify the index attribute of the @Indexed annotation to change the default name of the index. For more information see Section 3.2, “Directory configuration”.

你可以使用“index”属性改变默认的索引名。

8@Field

For each property (or attribute) of your entity, you have the ability to describe how it will be indexed. The default (no annotation present) means that the property is ignored by the indexing process. @Field does declare a property as indexed and allows to configure several aspects of the indexing process by setting one or more of the following attributes:

你可以使用@Field来描述实体类的每一个属性。如果属性不加上@Field注解该属性将被忽略。可以使用如下的属性进一步描述@Field

name : describe under which name, the property should be stored in the Lucene Document. The default value is the property name (following the JavaBeans convention)

name:描述了存在在Lucene Document中的名称,默认使用属性的名称。

store  :  describe  whether  or  not  the  property  is  stored  in  the  Lucene  index.  You  can store  the  value  Store.YES  (consuming  more  space  in  the  index  but  allowing  projection, see Section 5.1.3.5,  “Projection”), store  it  in a compressed way Store.COMPRESS  (this does consume more CPU), or avoid any storage Store.NO (this is the default value). When a property is stored, you can retrieve its original value from the Lucene Document. This is not related to whether the element is indexed or not.

store:描述了实体类的字段是否被存储在Lucene Index中。

Stroe.Yes:存储在Index中,需要更多的存储空间,但是允许projection

Store.COMPRESS:压缩存储,需要使用更多的CPU

Store.NO:不存储,默认值。

当实体的字段被存储,你可以从Lucene Document检索它的原始值,这与该元素是否被索引无关。

index: describe how  the element  is  indexed and  the  type of  information store. The different values are Index.NO  (no  indexing,  ie  cannot be  found by a query), Index.TOKENIZED  (use an  analyzer  to  process  the  property),  Index.UN_TOKENIZED  (no  analyzer  pre-processing), Index.NO_NORMS (do not store the normalization data). The default value is TOKENIZED.

index:描述了实体的字段被索引和存储信息。

Index.NO:不被索引,因此无法通过查找查询。

Index.TOKENIZED:使用分词器进行分词并存储。

Index.UN_TOKENIZED:不进行分词。

Index.NO_NORMS:不存储标准化(normalization)数据。

注意:通常文本字段进行tokenized,时间字段不进行tokenized

Fields used for sorting must not be tokenized.(进行排序的自动必须tokenized

termVector:用来进行相似搜索。

 

4.3. Analysis

The default analyzer class used to index tokenized fields is configurable through the hibernate.search.analyzer property. The default value for this property is org.apache.lucene.analysis.standard.StandardAnalyzer.

在同一个实体类中使用不同的Analysis是不推荐的。

 

4.4. Bridges

In Lucene all index fields have to be represented as strings. All entity properties annotated with @Field have to be converted to strings to be indexed. The reason we have not mentioned it so far is, that for most of your properties Hibernate Search does the translation job for you thanks to set of built-in bridges. However, in some cases you need a more fine grained control over the translation process.

Lucene中所有的字段被转化成相应的字符串,所有被@Field注解的字段都转换成字符串然后被索引。到目前为止我们忽略这些转换的原因是由于Hibernate Search内置的转换桥(built-in bridges)在工作。但是有些时候你需要更细粒度的控制转换的过程。

4.4.1. Built-in bridges

内置转换桥包括:nullStringDate,数值类型,urlclass

Hibernate Search comes bundled with a set of built-in bridges between a Java property type and its full text representation.

Hibernate Search 内部绑定了一些java类属性和它们对于的文本之间的转换桥。

java.lang.String

Strings are indexed as are

short, Short, integer, Integer, long, Long, float, Float, double, Double, BigInteger, BigDecimal Numbers are converted into their string representation. Note that numbers cannot be compared by Lucene (ie used in ranged queries) out of the box: they have to be padded

Using a Range query is debatable and has drawbacks, an alternative approach is to use a Filter query which will filter the result query to the appropriate range.

Hibernate Search will support a padding mechanism

数值被转化成了字符串,使用数值进行范围搜索是不推荐和具有缺陷的,可以使用过滤器解决范围搜索的问题。

 

java.utils.Date

Dates are stored as yyyyMMddHHmmssSSS in GMT time (200611072203012 for Nov 7th of 2006 4:03PM and 12ms EST). You shouldn't really bother with the internal format. What is important is that when using a DateRange Query, you should know that the dates have to be expressed in GMT time.

Usually, storing the date up to the millisecond is not necessary. @DateBridge defines the appropriate resolution you are willing to store in the index ( @DateBridge(resolution=Resolution.DAY) ). The date pattern will then be truncated accordingly.

@Field(index=Index.UN_TOKENIZED)
    @DateBridge(resolution=Resolution.MINUTE)
private Date date;
时间被保存到毫秒级别是没有意义的,DateBridge提供了相应的解决方案,可以精确的DayMinute以进行时间范围的搜索,时间的日期也相应的被缩减(不需要的精度被抛弃)

java.net.URI, java.net.URL

URI and URL are converted to their string representation

java.lang.Class

Class are converted to their fully qualified class name. The thread context classloader is used when the class is rehydrated

4.4.2. Custom bridges

Sometimes, the built-in bridges of Hibernate Search do not cover some of your property types, or the String representation used by the bridge does not meet your requirements. The following paragraphs describe several solutions to this problem.

有些时候内置桥不能转换你的实体类字段,或者这些转换不能满足你的要求。下面的段落将阐述几种转换的方法来解决这个问题。

4.4.2.1. StringBridge

是不是可以有附件上传时,设置字段,内容是附件的文本,以便进行附件的检索?

The simplest custom solution is to give Hibernate Search an implementation of your expected Object to String bridge. To do so you need to implement the org.hibernate.search.bridge.StringBridge interface. All implementations have to be thread-safe as they are used concurrently.

最简单的客户解决方案就是实现你所需要的Object转换成字符串的转换桥。要这样做你需要实现“org.hibernate.search.bridge.StringBridge”接口。所有的实训必须是线程安全的因为它们被并发使用。

Example 4.15. Custom StringBridge implementation

/**
 * Padding Integer bridge.
 * All numbers will be padded with 0 to match 5 digits
 *
 * @author Emmanuel Bernard
 */
public class PaddedIntegerBridge implements StringBridge {
 
    private int PADDING = 5;
 
    public String objectToString(Object object) {
        String rawInteger = ( (Integer) object ).toString();
        if (rawInteger.length() > PADDING) 
            throw new IllegalArgumentException( "Try to pad on a number too big" );
        StringBuilder paddedInteger = new StringBuilder( );
        for ( int padIndex = rawInteger.length() ; padIndex < PADDING ; padIndex++ ) 
     {
            paddedInteger.append('0');
        }
        return paddedInteger.append( rawInteger ).toString();
    }
}                

Given the string bridge defined in Example 4.15, “Custom StringBridge implementation”, any property or field can use this bridge thanks to the @FieldBridge annotation:

上面的用户自定义的字符串转换桥可以通过@FieldBridge注解应用在所有的字段上,如:

@FieldBridge(impl = PaddedIntegerBridge.class)
private Integer length;

 

4.4.2.1.1. Parameterized bridge

Parameters can also be passed to the bridge implementation making it more flexible. Example 4.16, “Passing parameters to your bridge implementation” implements a ParameterizedBridge interface and parameters are passed through the @FieldBridge annotation.

可以通过传递参数是转换桥更具灵活性,这样需要实现“ParameterizedBridge”接口,然后通过@ FieldBridge注解传递参数。

示例如下:

Example 4.16. Passing parameters to your bridge implementation

public class PaddedIntegerBridge implements StringBridge, ParameterizedBridge {
 
    public static String PADDING_PROPERTY = "padding";
    private int padding = 5; //default
 
    public void setParameterValues(Map parameters) {
        Object padding = parameters.get( PADDING_PROPERTY );
        if (padding != null) this.padding = (Integer) padding;
    }
 
    public String objectToString(Object object) {
        String rawInteger = ( (Integer) object ).toString();
        if (rawInteger.length() > padding) 
            throw new IllegalArgumentException( "Try to pad on a number too big" );
        StringBuilder paddedInteger = new StringBuilder( );
        for ( int padIndex = rawInteger.length() ; padIndex < padding ; padIndex++ ) 
     {
            paddedInteger.append('0');
        }
        return paddedInteger.append( rawInteger ).toString();
    }
}
 
//property
@FieldBridge(impl = PaddedIntegerBridge.class,
             params = @Parameter(name="padding", value="10")
            )
private Integer length;                

The ParameterizedBridge interface can be implemented by StringBridge, TwoWayStringBridge, FieldBridge implementations.

All implementations have to be thread-safe, but the parameters are set during initialization and no special care is required at this stage.

接口“ParameterizedBridge”可以被StringBridge, TwoWayStringBridge, FieldBridge等实现。所有的这些实现必须是线程安全的,但是所有的参数可以在初始化时设置,并且没有需要特别注意的。

4.4.2.1.2. Type aware bridge
4.4.2.1.3. Two-way bridge

If you expect to use your bridge implementation on an id property (ie annotated with @DocumentId ), you need to use a slightly extended version of StringBridge named TwoWayStringBridge. Hibernate Search needs to read the string representation of the identifier and generate the object out of it. There is no difference in the way the @FieldBridge annotation is used.

如果你想在主键ID字段上使用你的转换桥(如通过@DocumentId注解),你需要稍作扩展的StringBridge的子类TwoWayStringBridgeHibernate Search需要读取标识符的字符串表示并generate the object out of it。这和使用@FieldBridge注解没有区别。

Example 4.17. Implementing a TwoWayStringBridge usable for id properties

public class PaddedIntegerBridge implements TwoWayStringBridge, ParameterizedBridge {
 
    public static String PADDING_PROPERTY = "padding";
    private int padding = 5; //default
 
    public void setParameterValues(Map parameters) {
        Object padding = parameters.get( PADDING_PROPERTY );
        if (padding != null) this.padding = (Integer) padding;
    }
 
    public String objectToString(Object object) {
        String rawInteger = ( (Integer) object ).toString();
        if (rawInteger.length() > padding) 
            throw new IllegalArgumentException( "Try to pad on a number too big" );
        StringBuilder paddedInteger = new StringBuilder( );
        for ( int padIndex = rawInteger.length() ; padIndex < padding ; padIndex++ ) {
            paddedInteger.append('0');
        }
        return paddedInteger.append( rawInteger ).toString();
    }
 
    public Object stringToObject(String stringValue) {
        return new Integer(stringValue);
    }
}
 
 
//id property
@DocumentId
@FieldBridge(impl = PaddedIntegerBridge.class,
             params = @Parameter(name="padding", value="10") 
private Integer id;
                

Important

It is important for the two-way process to be idempotent (ie object = stringToObject( objectToString( object ) ) ).

注意:使用这两种转换方式必须可以对一个对象实训等级转换,即可以逆转的转换,比如:

object = stringToObject( objectToString( object ) )

 

4.4.2.2. FieldBridge

Some use cases require more than a simple object to string translation when mapping a property to a Lucene index. To give you the greatest possible flexibility you can also implement a bridge as a FieldBridge. This interface gives you a property value and let you map it the way you want in your Lucene Document. You can for example store a property in two different document fields. The interface is very similar in its concept to the Hibernate UserTypes.

当你映射字段到Lucene索引时,有些情况下需要的不仅仅是简单的对象到字符串的转换,为了提供更大可能的灵活性需要实现一个桥接口“FieldBridge”。这个借口提供了一个属性值,可以以你想要的方式映射它在Lucene Document。比如你可以存储一个字段到两个不同的文档字段中。这个接口非常类似与他所表示的HibernateUserType的概念.

 

Example 4.18. Implementing the FieldBridge interface

/**
 * Store the date in 3 different fields - year, month, day - to ease Range Query per
 * year, month or day (eg get all the elements of December for the last 5 years).
 * @author Emmanuel Bernard
 */
public class DateSplitBridge implements FieldBridge {
    private final static TimeZone GMT = TimeZone.getTimeZone("GMT");
 
    public void set(String name, Object value, Document document, 
                    LuceneOptions luceneOptions) {
        Date date = (Date) value;
        Calendar cal = GregorianCalendar.getInstance(GMT);
        cal.setTime(date);
        int year = cal.get(Calendar.YEAR);
        int month = cal.get(Calendar.MONTH) + 1;
        int day = cal.get(Calendar.DAY_OF_MONTH);
  
        // set year
        luceneOptions.addFieldToDocument(
            name + ".year",
            String.valueOf( year ),
            document );
  
        // set month and pad it if needed
        luceneOptions.addFieldToDocument(
            name + ".month",
            month < 10 ? "0" : "" + String.valueOf( month ),
            document );
  
        // set day and pad it if needed
        luceneOptions.addFieldToDocument(
            name + ".day",
            day < 10 ? "0" : "" + String.valueOf( day ),
            document );
    }
}
 
//property
@FieldBridge(impl = DateSplitBridge.class)
private Date date;                

 

4.4.2.3. ClassBridge

It is sometimes useful to combine more than one property of a given entity and index this combination in a specific way into the Lucene index. The @ClassBridge respectively @ClassBridges annotations can be defined at class level (as opposed to the property level). In this case the custom field bridge implementation receives the entity instance as the value parameter instead of a particular property. Though not shown in Example 4.19, “Implementing a class bridge”, @ClassBridge supports the termVector attribute discussed in section Section 4.1.1, “Basic mapping”.

ClassBridge有时候非常有用,它用在将实体类的多个字段组合在一起,然后以一种特殊的方式将这个组合体进行索引。@ClassBridges@ClassBridge注解可以定义在类级别上(和属性级别相对应)。在这种情况下用户自定义的桥实现接受一个实体对象作为参数而不是一个特定的书写。尽管没有在例4.19中显示,但是@ClassBridge支持termVector属性。

 

Example 4.19. Implementing a class bridge

(可以用来处理附件)

@Entity
@Indexed
@ClassBridge(name="branchnetwork",
             index=Index.TOKENIZED,
             store=Store.YES,
             impl = CatFieldsClassBridge.class,
             params = @Parameter( name="sepChar", value=" " ) )
public class Department {
    private int id;
    private String network;
    private String branchHead;
    private String branch;
    private Integer maxEmployees
    ...
}
 
public class CatFieldsClassBridge implements FieldBridge, ParameterizedBridge {
    private String sepChar;
 
    public void setParameterValues(Map parameters) {
        this.sepChar = (String) parameters.get( "sepChar" );
    }
 
    public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
        // In this particular class the name of the new field was passed
        // from the name field of the ClassBridge Annotation. This is not
        // a requirement. It just works that way in this instance. The
        // actual name could be supplied by hard coding it below.
        Department dep = (Department) value;
        String fieldValue1 = dep.getBranch();
        if ( fieldValue1 == null ) {
            fieldValue1 = "";
        }
        String fieldValue2 = dep.getNetwork();
        if ( fieldValue2 == null ) {
            fieldValue2 = "";
        }
        String fieldValue = fieldValue1 + sepChar + fieldValue2;
        Field field = new Field( name, fieldValue, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector() );
        field.setBoost( luceneOptions.getBoost() );
        document.add( field );
   }
}

 

In this example, the particular CatFieldsClassBridge is applied to the department instance, the field bridge then concatenate both branch and network and index the concatenation.

在上面的例子中,CatFieldsClassBridge应用在了Department实例上,字段桥包branchnetwork串联在一起并把串联结果进行索引。

4.5. Providing your own id

文档在整理中

4.6. Programmatic API

功能是实验性的,API可能改动

 

 

 

Chapter 5. Querying

Preparing and executing a query consists of four simple steps:

·             Creating a FullTextSession

·             Creating a Lucene query either via the Hibernate Search query DSL (recommended) or by utilizing the Lucene query API

·             Wrapping the Lucene query using an org.hibernate.Query

·             Executing the search by calling for example list() or scroll()

To access the querying facilities, you have to use a FullTextSession. This Search specific session wraps a regular org.hibernate.Session in order to provide query and indexing capabilities.

准备和执行查询由以下四步完成:

1、  创建FullTextSession

2、  创建Lucenequery,可以通过Hibernate Search Query DSL(推荐)或者使用工具化的Lucene查询API

3、  使用org.hibernate.Query包装Lucene Query

4、  执行查询,比如通过list()scroll()

为了使用查询功能,必须使用FullTextSession。为提供查询和索引的功能,这个特定的查询session被常规的org.hibernate.Session包裹。

DSLDomain Specific language

Example 5.1. Creating a FullTextSession

Session session = sessionFactory.openSession();
...
FullTextSession fullTextSession = Search.getFullTextSession(session);

Once you have a FullTextSession you have two options to build the full-text query: the Hibernate Search query DSL or the native Lucene query.

一旦创建了FullTextSession,你拥有两种选择可以创建全文搜索:使用Hibernate Search Query DSL,或者是native Lucene query

If you use the Hibernate Search query DSL, it will look like this:

final QueryBuilder b = fullTextSession.getSearchFactory()
    .buildQueryBuilder().forEntity( Myth.class ).get();
 
org.apache.lucene.search.Query luceneQuery =
    b.keyword()
        .onField("history").boostedTo(3)
        .matching("storm")
        .createQuery();
 
org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery );


List result = fullTextQuery.list(); //return a list of managed objects    

You can alternatively write your Lucene query either using the Lucene query parser or Lucene programmatic API.

Example 5.2. Creating a Lucene query via the QueryParser

SearchFactory searchFactory = fullTextSession.getSearchFactory();
org.apache.lucene.queryParser.QueryParser parser = 
    new QueryParser("title", searchFactory.getAnalyzer(Myth.class) );
try {
    org.apache.lucene.search.Query luceneQuery = parser.parse( "history:storm^3" );
}
catch (ParseException e) {
    //handle parsing failure
}
 


org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery(luceneQuery);


List result = fullTextQuery.list(); //return a list of managed objects    

Note

The Hibernate query built on top of the Lucene query is a regular org.hibernate.Query, which means you are in the same paradigm as the other Hibernate query facilities (HQL, Native or Criteria). The regular list() , uniqueResult(), iterate() and scroll() methods can be used.

注意:Hibernate Query是建立在Lucene Query之上的一个常规的org.hibernate.Query,这就意味着你可通过常规的方式使用Hibernate search的功能,如list() , uniqueResult(), iterate() and scroll()这些方法都可以使用。

5.1. Building queries

5.1.1. Building a Lucene query using the Lucene API

5.1.2. Building a Lucene query with the Hibernate Search query DSL

Writing full-text queries with the Lucene programmatic API is quite complex. It's even more complex to understand the code once written. Besides the inherent API complexity, you have to remember to convert your parameters to their string equivalent as well as make sure to apply the correct analyzer to the right field (a ngram analyzer will for example use several ngrams as the tokens for a given word and should be searched as such).

使用LuceneAPI进行full-text查询十分复杂。甚至你曾经写过的代码也很复杂。除了固有的API的复杂性,你不但要要记住把参数转换成等价的字符串,而且要在不同的字段上应用不同的解析器。

The Hibernate Search query DSL makes use of a style of API called a fluent API. This API has a few key characteristics:

·             it has meaningful method names making a succession of operations reads almost like English

·             it limits the options offered to what makes sense in a given context (thanks to strong typing and IDE autocompletion).

·             It often uses the chaining method pattern

·             it's easy to use and even easier to read

Hibernate Search query DSL使用API的风格流畅,这些API有以下一些关键特性:

1、  他具有有意义的方法名,所做的一系列操作像读英文一样流畅。

2、  在给定场景中提供了有意义的限制选项(借助强大的输入和IDE自动完成功能)

3、  经常使用输入连方法模式。

4、  容易使用和读取。

Let's see how to use the API. You first need to create a query builder that is attached to a given indexed entity type. This QueryBuilder will know what analyzer to use and what field bridge to apply. You can create several QueryBuilders (one for each entity type involved in the root of your query). You get the QueryBuilder from the SearchFactory.

让我们看看如何使用这些API。首先创建一个和已索引的实体类有关联的query Builder,这个QueryBuilder将会知道使用什么analyzer和什么字段转换桥。你也可以创建多个QueryBuilder(每个QueryBuilder包含在一个实体类的根查询中),QueryBuilder通过SearchFactory得到。

QueryBuilder mythQB = searchFactory.buildQueryBuilder().forEntity( Myth.class ).get();

You can also override the analyzer used for a given field or fields. This is rarely needed and should be avoided unless you know what you are doing.

你也可以重新定义字段的analyzer,但这很少需要,并且不推荐使用。

QueryBuilder mythQB = searchFactory.buildQueryBuilder()
    .forEntity( Myth.class )
        .overridesForField("history","stem_analyzer_definition")
    .get();

Using the query builder, you can then build queries. It is important to realize that the end result of a QueryBuilder is a Lucene query. For this reason you can easily mix and match queries generated via Lucene's query parser or Query objects you have assembled with the Lucene programmatic API and use them with the Hibernate Search DSL. Just in case the DSL is missing some features.

使用QueryBuilder你可以进行查询,需要注意的是QueryBuilder的查询结果是Lucene Query。因此你可以方便的混合使用所产生的查询结果:可以通过LuceneQuery parser,或者是使用Lucene编程API组合成Query Object,并且可以通过Hibernate Search DSL使用它们,以防万一DSL丧失了某些不能使用的特性。

5.1.2.1. Keyword queries

Let's start with the most basic use case - searching for a specific word:

让我们从最基本的用例开始:查找一个特定的关键词。

Query luceneQuery = mythQB.keyword().onField("history").matching("storm").createQuery();

keyword() means that you are trying to find a specific word. onField() specifies in which Lucene field to look. matching() tells what to look for. And finally createQuery() creates the Lucene query object. A lot is going on with this line of code.

·             The value storm is passed through the history FieldBridge: it does not matter here but you will see that it's quite handy when dealing with numbers or dates.

·             The field bridge value is then passed to the analyzer used to index the field history. This ensures that the query uses the same term transformation than the indexing (lower case, n-gram, stemming and so on). If the analyzing process generates several terms for a given word, a boolean query is used with the SHOULD logic (roughly an OR logic).

keyword():意思是你试图查找一个特定的关键词。

onField():指定了查找哪个Lucene字段(在哪个字段上进行查找)。

matching():你要查找什么即查找的内容。

createQuery():最后创建了一个Lucene Query Object,这行代码上发生了许多操作。

?     关键词“storm”通过字段“history”的FieldBridge进行了传递:这里没关系(用处)但是在处理数字或日期时非常方便。

?     字段桥的值将会被传递给analyzer用来索引字段“history”。这将确保查询和索引使用相同的词法分析器。如果被查询的关键字被分成了多个词元(term),boolean query将会使用与逻辑(SHOULD)查询(不是或(OR)逻辑)。

Let's see how you can search a property that is not of type string.

下面让我们看看如何搜索不是字符串类型的属性:

//省略了一部分,含两个例子

Note

If for some reason you do not want a specific field to use the field bridge or the analyzer you can call the ignoreAnalyzer() or ignoreFieldBridge() functions

注意:如果你不想在特定字段上使用fieldbridge或者analyzer,你可以调用ignoreAnalyzer() or ignoreFieldBridge()方法。

To search for multiple possible words in the same field, simply add them all in the matching clause.

//search document with storm or lightning in their history

在一个字段上搜索多个可能的关键词,只需要简单的把他们添加到matching方法中即可。

例如:搜索“storm”或者“lightning”在字段“history”字段上,代码如下:

Query luceneQuery =

    mythQB.keyword().onField("history").matching("storm lightning").createQuery();

 

To search the same word on multiple fields, use the onFields method.

搜索一个关键词在不同的字段上,使用方法onFields(),代码如下:

Query luceneQuery = mythQB

    .keyword()

    .onFields("history","description","name")

    .matching("storm")

    .createQuery();

 

Sometimes, one field should be treated differently from another field even if searching the same term, you can use the andField() method for that.

有时候需要对要搜索的字段进行不同的处理,可以使用andField()方法来处理,代码如下:

Query luceneQuery = mythQB.keyword()

    .onField("history")

    .andField("name")

      .boostedTo(5)

    .andField("description")

    .matching("storm")

    .createQuery();

In the previous example, only field name is boosted to 5.

在这个例子中,只有字段“nameis boosted to 5.

5.1.2.2. Fuzzy queries(应该只能用于英文搜索)

To execute a fuzzy query (based on the Levenshtein distance algorithm), start like a keyword query and add the fuzzy flag.

模糊查询:执行一个模糊查询(基于编辑距离算法),查询一个关键字并添加模糊标记如下:

Query luceneQuery = mythQB

    .keyword()

      .fuzzy()

        .withThreshold( .8f )

        .withPrefixLength( 1 )

    .onField("history")

    .matching("starm")

    .createQuery();

threshold is the limit above which two terms are considering matching. It's a decimal between 0 and 1 and defaults to 0.5. prefixLength is the length of the prefix ignored by the "fuzzyness": while it defaults to 0, a non zero value is recommended for indexes containing a huge amount of distinct terms.

threshold临界值)规定了两个terms被认为相同(匹配)的上限,是01之间的数,默认是0.5prefixLength(前缀长度)说明了模糊性(被忽略的前缀长度):如果被设置为0,则任意一个非零的值被推荐(估计是匹配所有)

5.1.2.3. Wildcard queries

You can also execute wildcard queries (queries where some of parts of the word are unknown). ? represents a single character and * represents any character sequence. Note that for performance purposes, it is recommended that the query does not start with either ? or *.

可以执行通配符搜索(查找只知道单词部分内容),“?”代表单个字符,“*”代表任意多个字符。注意:出于性能的考虑,查询时不要以通配符开头。

Note

Wildcard queries do not apply the analyzer on the matching terms. Otherwise the risk of * or ? being mangled is too high.

5.1.2.4. Phrase queries

So far we have been looking for words or sets of words, you can also search exact or approximate sentences. Use phrase() to do so.

成语搜索,可以使用它来搜索确切匹配或者相似的句子,可以使用phrase()来完成:

Query luceneQuery = mythQB

    .phrase()

    .onField("history")

    .matching("Thou shalt not kill")

    .createQuery();

You can search approximate sentences by adding a slop factor. The slop factor represents the number of other words permitted in the sentence: this works like a within or near operator

也可以搜索相似的句子,可以通过添加一个slop factor来实现。它允许其它单词出现在这个句子中。

5.1.2.5. Range queries

After looking at all these query examples for searching for to a given word, it is time to introduce range queries (on numbers, dates, strings etc). A range query searches for a value in between given boundaries (included or not) or for a value below or above a given boundary (included or not).

现在介绍边界搜索(可以作用在数字、日期、字符串等上)。边界搜索用来在某两个边界之间进行搜索,或者搜索给定值之上或之下的结果,示例代码如下:

//look for 0 <= starred < 3


Query luceneQuery = mythQB

    .range()

    .onField("starred")

    .from(0).to(3).excludeLimit()

    .createQuery();
 


//look for myths strictly BC

Date beforeChrist = ...;

Query luceneQuery = mythQB

    .range()

    .onField("creationDate")

    .below(beforeChrist).excludeLimit()

    .createQuery();

5.1.2.6. Combining queries

Finally, you can aggregate (combine) queries to create more complex queries. The following aggregation operators are available:

·             SHOULD: the query should contain the matching elements of the subquery

·             MUST: the query must contain the matching elements of the subquery

·             MUST NOT: the query must not contain the matching elements of the subquery

最后介绍组合查询,可以创建更复杂的查询语句,有以下组合操作可以供使用:

?     SHOULD:查询应该包含子查询的结果。

?     MUST:必须包含匹配元素的子查询。

?     MUST NOT:一定不能包含。

The subqueries can be any Lucene query including a boolean query itself. Let's look at a few examples:

//look for popular modern myths that are not urban
Date twentiethCentury = ...;
Query luceneQuery = mythQB
    .bool()
      .must( mythQB.keyword().onField("description").matching("urban").createQuery() )
        .not()
      .must( mythQB.range().onField("starred").above(4).createQuery() )
      .must( mythQB
        .range()
        .onField("creationDate")
        .above(twentiethCentury)
        .createQuery() )
    .createQuery();
 
//look for popular myths that are preferably urban
Query luceneQuery = mythQB
    .bool()
      .should( mythQB.keyword().onField("description").matching("urban").createQuery() )
      .must( mythQB.range().onField("starred").above(4).createQuery() )
    .createQuery();
 
//look for all myths except religious ones
Query luceneQuery = mythQB
    .all()
      .except( monthQb
        .keyword()
        .onField( "description_stem" 
        .matching( "religion" )
        .createQuery() 
      )
    .createQuery();

5.1.2.7. Query options

We already have seen several query options in the previous example, but lets summarize again the options for query types and fields:

·             boostedTo (on query type and on field): boost the whole query or the specific field to a given factor

·             withConstantScore (on query): all results matching the query have a constant score equals to the boost

·             filteredBy(Filter) (on query): filter query results using the Filter instance

·             ignoreAnalyzer (on field): ignore the analyzer when processing this field

·             ignoreFieldBridge (on field): ignore field bridge when processing this field

在前面的例子中我们已经看到了几个查询选项,下面我们总结一下这些可以用在实体和实体字段中的查询选项:

?     boostedTo:可以用在查询实体或字段中,使用给定的因子提升整个查询或特定字段。

?     withConstantScore (on query):和boost(作用)一样,所有匹配的查询结果有一个常量分数。

?     filteredBy(on query):使用过滤器过滤查询结果。

?     ignoreAnalyzer (on field):处理字段时忽略analyzer

?     ignoreFieldBridge (on field):处理字段时忽略field bridge

Let's check out an example using some of these options

Query luceneQuery = mythQB

    .bool()

      .should( mythQB.keyword().onField("description").matching("urban").createQuery() 

      .should( mythQB

        .keyword()

        .onField("name")

          .boostedTo(3)

          .ignoreAnalyzer()

        .matching("urban").createQuery() )

      .must( mythQB

        .range()

          .boostedTo(5).withConstantScore()

        .onField("starred").above(4).createQuery() )

    .createQuery();

As you can see, the Hibernate Search query DSL is an easy to use and easy to read query API and by accepting and producing Lucene queries, you can easily incorporate query types not (yet) supported by the DSL. Please give us feedback!

5.1.3. Building a Hibernate Search query

So far we only covered the process of how to create your Lucene query (see Section 5.1, “Building queries”). However, this is only the first step in the chain of actions. Let's now see how to build the Hibernate Search query from the Lucene query.

5.1.3 构建Hibernate Search查询

目前为止我们只讨论了如何创建LuceneQuery,这只是一系列动作中的第一步,现在看一看如果从Lucene Query创建Hibernate Search Query

5.1.3.1. Generality

Once the Lucene query is built, it needs to be wrapped into an Hibernate Query. If not specified otherwise, the query will be executed against all indexed entities, potentially returning all types of indexed classes.

5.1.3.1 概述

一旦Lucene Query被创建,他需要被包装成一个Hibernate查询。如果没有特殊说明,它将会对所有的索引实体进行查询,可能返回所有的索引类的类型。

从性能的角度考虑,建议限制返回的实体类型。

Example 5.4. Wrapping a Lucene query into a Hibernate Query

FullTextSession fullTextSession = Search.getFullTextSession( session );


org.hibernate.Query fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery );

 

It is advised, from a performance point of view, to restrict the returned types:

Example 5.5. Filtering the search result by entity type

fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery, Customer.class );
// or
fullTextQuery = fullTextSession.createFullTextQuery( luceneQuery, Item.class, Actor.class );

In Example 5.5, “Filtering the search result by entity type” the first example returns only matching Customers, the second returns matching Actors and Items. The type restriction is fully polymorphic which means that if there are two indexed subclasses Salesman and Customer of the baseclass Person, it is possible to just specify Person.class in order to filter on result types.

在例5.5中,第一个例子只返回匹配Customer的结果,第二个例子返回匹配ActorItem类的机构。结果限制是多态实现的,也就是说如果有两个子类SalesmanCustom继承自父类Person,可以只指定Person.class来过滤返回结果。

5.1.3.2. Pagination

Out of performance reasons it is recommended to restrict the number of returned objects per query. In fact is a very common use case anyway that the user navigates from one page to an other. The way to define pagination is exactly the way you would define pagination in a plain HQL or Criteria query.

5.1.3.2 分页

由于性能的原因,推荐每次查询返回一定数量的查询结果。事实上用户浏览时从一页翻到另一页是非常常见的情况。你定义翻页的方法正是使用HQLCriteria定义分页的方法。

Example 5.6. Defining pagination for a search query

org.hibernate.Query fullTextQuery = 


    fullTextSession.createFullTextQuery( luceneQuery, Customer.class );


fullTextQuery.setFirstResult(15); //start from the 15th element


fullTextQuery.setMaxResults(10); //return 10 elements

 

Tip

It is still possible to get the total number of matching elements regardless of the pagination via fulltextQuery.getResultSize()

注意:

可以使用fulltextQuery.getResultSize()获取全部匹配元素的个数。

5.1.3.4. Fetching strategy(抓取策略)

When you restrict the return types to one class, Hibernate Search loads the objects using a single query. It also respects the static fetching strategy defined in your domain model.

It is often useful, however, to refine the fetching strategy for a specific use case.

Example 5.8. Specifying FetchMode on a query

Criteria criteria = 


    s.createCriteria( Book.class ).setFetchMode( "authors", FetchMode.JOIN );


s.createFullTextQuery( luceneQuery ).setCriteriaQuery( criteria );

 

In this example, the query will return all Books matching the luceneQuery. The authors collection will be loaded from the same query using an SQL outer join.

When defining a criteria query, it is not necessary to restrict the returned entity types when creating the Hibernate Search query from the full text session: the type is guessed from the criteria query itself.

5.1.3.5. Projection

For some use cases, returning the domain object (including its associations) is overkill. Only a small subset of the properties is necessary. Hibernate Search allows you to return a subset of properties:

Example 5.9. Using projection instead of returning the full domain object

org.hibernate.search.FullTextQuery query = 
    s.createFullTextQuery( luceneQuery, Book.class );
query.setProjection( "id", "summary", "body", "mainAuthor.name" );
List results = query.list();
Object[] firstResult = (Object[]) results.get(0);
Integer id = firstResult[0];
String summary = firstResult[1];
String body = firstResult[2];
String authorName = firstResult[3];

 

5.1.3.5 投影

有些时候不需要返回整个实体模型,而仅仅是实体中的部分字段。Hibernate Search允许你这样做,即返回几个字段。

5.1.3.6. Limiting the time of a query

You can limit the time a query takes in Hibernate Search in two ways:

·             raise an exception when the limit is reached

·             limit to the number of results retrieved when the time limit is raised

5.1.3.6 限制查询时间

使用Hibernate Search进行全文检索时,你可以使用下面两种方式限制每次查询的时间:

?       当限定时间到时抛出异常

?       当限定时间到时限制查询结果的个数。(EXPERIMENTAL

两种方式不兼容。

 

Chapter 6. Manual index changes

As Hibernate core applies changes to the Database, Hibernate Search detects these changes and will update the index automatically (unless the EventListeners are disabled). Sometimes changes are made to the database without using Hibernate, as when backup is restored or your data is otherwise affected; for these cases Hibernate Search exposes the Manual Index APIs to explicitly update or remove a single entity from the index, or rebuild the index for the whole database, or remove all references to a specific type.

All these methods affect the Lucene Index only, no changes are applied to the Database.

当使用Hibernate更新数据库是,Hibernate Search侦测到这些改变并自动更新索引(除非把EventListeners设置为disabled)。有些时候数据库的改变不是通过Hibernate,比如数据库通过备份恢复或者数据库通过其他方式改变,这些情况下Hibernate Search公开(使用)Index API手工明确更新或删除一个实体的索引,或者为整个数据库重建索引,或者移除某一个特定类型的所有引用。

所有的这些方法只影响Lucene Index,而不会对数据库做任何改变。

6.1. Adding instances to the index

Using FullTextSession.index(T entity) you can directly add or update a specific object instance to the index. If this entity was already indexed, then the index will be updated. Changes to the index are only applied at transaction commit.

使用FullTextSession.index(T entity)可以直接添加或者更新一个特定的实体对象到索引,如果这个实体已经被索引了,那么索引将会更新。索引的更改只有在事务提交时才能生效

Example 6.1. Indexing an entity via FullTextSession.index(T entity)

FullTextSession fullTextSession = Search.getFullTextSession(session);

Transaction tx = fullTextSession.beginTransaction();

Object customer = fullTextSession.load( Customer.class, 8 );

fullTextSession.index(customer);

tx.commit(); //index only updated at commit time

 

In case you want to add all instances for a type, or for all indexed types, the recommended approach is to use a MassIndexer: see Section 6.3.2, “Using a MassIndexer” for more details.

如果你想为一个类型或所有类型添加所有的对象,推荐使用MassIndexder实现。更多细节请参考:Section 6.3.2, “Using a MassIndexer”

 

6.2. Deleting instances from the index

It is equally possible to remove an entity or all entities of a given type from a Lucene index without the need to physically remove them from the database. This operation is named purging and is also done through the FullTextSession.

6.2 从索引中删除对象

同样有可能需要从索引中删除一个实体或所有实体,而不把他们从数据库中删除。删除操作命名为purging,也是使用FullTextSession来完成。

Example 6.2. Purging a specific instance of an entity from the index

FullTextSession fullTextSession = Search.getFullTextSession(session);

Transaction tx = fullTextSession.beginTransaction();

for (Customer customer : customers) {

    fullTextSession.purge( Customer.class, customer.getId() );

}

tx.commit(); //index is updated at commit time

 

Purging will remove the entity with the given id from the Lucene index but will not touch the database.

根据id从索引中移除实体,而不对数据库进行任何操作。

If you need to remove all entities of a given type, you can use the purgeAll method. This operation removes all entities of the type passed as a parameter as well as all its subtypes.

如果你需要移除给定类型的所有实体,可以使用purgeAll方法,这个操作通过传递类型或者他的子类型参数移除所有实体。

Example 6.3. Purging all instances of an entity from the index

FullTextSession fullTextSession = Search.getFullTextSession(session);

Transaction tx = fullTextSession.beginTransaction();

fullTextSession.purgeAll( Customer.class );

//optionally optimize the index

//fullTextSession.getSearchFactory().optimize( Customer.class );

tx.commit(); //index changes are applied at commit time   

 

It is recommended to optimize the index after such an operation.

推荐在进行移除操作后optimize索引。

Note

All manual indexing methods (index, purge and purgeAll) only affect the index, not the database, nevertheless they are transactional and as such they won't be applied until the transaction is successfully committed, or you make use of flushToIndexes.

注意:

所有手工操作的索引方法(indexpurgepurgeAll)只作用于索引,不作用于数据库。然而它们都是事务性的,只有当事务提交成功时才应用修改。如果不使用事务,可以使用flushToIndexes

6.3. Rebuilding the whole index

If you change the entity mapping to the index, chances are that the whole Index needs to be updated; For example if you decide to index a an existing field using a different analyzer you'll need to rebuild the index for affected types. Also if the Database is replaced (like restored from a backup, imported from a legacy system) you'll want to be able to rebuild the index from existing data. Hibernate Search provides two main strategies to choose from:

·             Using FullTextSession.flushToIndexes() periodically, while using FullTextSession.index() on all entities.

·             Use a MassIndexer.

如果你想改变实体到索引的映射,可能是更改整个索引,比如为已存在的字段需要使用不同的analyzer你需要重建所有受影响(有关)的类型的索引;或者数据库被替换了(如从备份回复或者遗留系统的数据导入),你也需要从已存在的数据库中重建索引。Hibernate Search提供了两种主要的策略可选择:

?     定期的使用FullTextSession.flushToIndexes()当使用FullTextSession.index()更新索引实体时。

?     使用MassIndexer

6.3.1. Using flushToIndexes()

This strategy consists in removing the existing index and then adding all entities back to the index using FullTextSession.purgeAll() and FullTextSession.index(), however there are some memory and efficiency contraints. For maximum efficiency Hibernate Search batches index operations and executes them at commit time. If you expect to index a lot of data you need to be careful about memory consumption since all documents are kept in a queue until the transaction commit. You can potentially face an OutOfMemoryException if you don't empty the queue periodically: to do this you can use fullTextSession.flushToIndexes(). Every time fullTextSession.flushToIndexes() is called (or if the transaction is committed), the batch queue is processed applying all index changes. Be aware that, once flushed, the changes cannot be rolled back.

这个策略包含了移除已经存在的索引和使用FullTextSession.purgeAll() FullTextSession.index()添加所有实体到index,但存在一下内存和效率方面的限制。为最大效率的考虑,Hibernate Search批量索引操作在提交的时候执行。如果你要索引大量的数据,必须要面对大量的内存消耗,它们是由于所有的文档保存在队列中指导事务提交。因此如果你不定期的清除队列你可能会面临OutOfMemoryException异常,定期清除队列可以使用fullTextSession.flushToIndexes()。每次当fullTextSession.flushToIndexes()被调用(或者事务被提交),批量队列操作才会被应用在索引的改变上。要清楚:一旦flushed操作执行,所有的更改将不能回滚。

Example 6.4. Index rebuilding using index() and flushToIndexes()

fullTextSession.setFlushMode(FlushMode.MANUAL);

fullTextSession.setCacheMode(CacheMode.IGNORE);

transaction = fullTextSession.beginTransaction();

//Scrollable results will avoid loading too many objects in memory

ScrollableResults results = fullTextSession.createCriteria( Email.class )

    .setFetchSize(BATCH_SIZE)

    .scroll( ScrollMode.FORWARD_ONLY );

int index = 0;

while( results.next() ) {

    index++;

    fullTextSession.index( results.get(0) ); //index each element

    if (index % BATCH_SIZE == 0) {

        fullTextSession.flushToIndexes(); //apply changes to indexes

        fullTextSession.clear(); //free memory since the queue is processed

    }

}

transaction.commit();

 

Note

hibernate.search.worker.batch_size has been deprecated in favor of this explicit API which provides better control

注意:hibernate.search.worker.batch_size已经不提倡使用,因为清晰的API提供了更好的控制。

Try to use a batch size that guarantees that your application will not run out of memory: with a bigger batch size objects are fetched faster from database but more memory is needed.

使用batch size将保持你的应用不会出现内存溢出:batch size越大,性能速度越好,但需要的内存也越多。

6.3.2. Using a MassIndexer

Hibernate Search's MassIndexer uses several parallel threads to rebuild the index; you can optionally select which entities need to be reloaded or have it reindex all entities. This approach is optimized for best performance but requires to set the application in maintenance mode: making queries to the index is not recommended when a MassIndexer is busy.

Hibernate SearchMassIndexer使用多个并行的线程重建索引,你可以任意选择哪些实体需要被重新加载或者重新索引所有的的实体。这种模式有最好的性能,但是需要应用设置在保持(maintenance)模式:也就是当MassIndexer工作时,最好不要进行查询。

Example 6.5. Index rebuilding using a MassIndexer

fullTextSession.createIndexer().startAndWait();

 

This will rebuild the index, deleting it and then reloading all entities from the database. Although it's simple to use, some tweaking is recommended to speed up the process: there are several parameters configurable.

上面的操作语句将会重新索引,即删除已有索引并从数据库中重新加载所有实体。虽然用起来简单,但为了性能的考虑,还是加上一些配置参数。

Warning

During the progress of a MassIndexer the content of the index is undefined, make sure that nobody will try to make some query during index rebuilding! If somebody should query the index it will not corrupt but most results will likely be missing.

警告:

MassIndexer工作期间,索引的内容是不可预测的。因此要确保重建索引亲近不使用查询。如果有人在这个期间查询,查询可以正常进行,但很多结果将查询不到。

Example 6.6. Using a tuned MassIndexer

fullTextSession

 .createIndexer( User.class )

 .batchSizeToLoadObjects( 25 )

 .cacheMode( CacheMode.NORMAL )

 .threadsToLoadObjects( 5 )

 .threadsForSubsequentFetching( 20 )

 .startAndWait();

 

This will rebuild the index of all User instances (and subtypes), and will create 5 parallel threads to load the User instances using batches of 25 objects per query; these loaded User instances are then pipelined to 20 parallel threads to load the attached lazy collections of User containing some information needed for the index.

这些操作将会重建User实体(和他的子类),程序将创建5个并行的线程加载User实例,每次查询个数为25,这些被加载的User实体将会被输入到20个并行的线程去加载User类有关的需要被索引的信息的集合。

It is recommended to leave cacheMode to CacheMode.IGNORE (the default), as in most reindexing situations the cache will be a useless additional overhead; it might be useful to enable some other CacheMode depending on your data: it might increase performance if the main entity is relating to enum-like data included in the index.

建议不使用CacheMode而使用默认的CacheMode.IGNORE,因为大多数重建索引的情况下cache是无用的额外开销。有些情况下开启CacheMode是有益的,这取决于你的数据:如果你的主类型中有enmu-like的数据需要被索引则会提高性能。

Tip

The "sweet spot" of number of threads to achieve best performance is highly dependent on your overall architecture, database design and even data values. To find out the best number of threads for your application it is recommended to use a profiler: all internal thread groups have meaningful names to be easily identified with most tools.

提示:

为了达到最好性能而设置的最佳线程数高度依赖于你的整个体系结构、数据库设计和数据。为了找出应用中最佳的线程数,推荐使用分析器:所有内置线程都有一个有意义的名字,可以被大多数工具识别。

Note

The MassIndexer was designed for speed and is unaware of transactions, so there is no need to begin one or committing. Also because it is not transactional it is not recommended to let users use the system during it's processing, as it is unlikely people will be able to find results and the system load might be too high anyway.

注意:

MassIndexer设计用来快速索引而不使用事务,因此使用的时候不需开启和提交事务。正是由于它不使用事务,因此不推荐它工作期间用户使用它,因为用户不大可能在它工作期间找到需要的结果并且导致系统消耗过高。

Other parameters which also affect indexing time and memory consumption are:

·             hibernate.search.[default|<indexname>].exclusive_index_use

·             hibernate.search.[default|<indexname>].indexwriter.batch.max_buffered_docs

·             hibernate.search.[default|<indexname>].indexwriter.batch.max_field_length

·             hibernate.search.[default|<indexname>].indexwriter.batch.max_merge_docs

·             hibernate.search.[default|<indexname>].indexwriter.batch.merge_factor

·             hibernate.search.[default|<indexname>].indexwriter.batch.ram_buffer_size

·             hibernate.search.[default|<indexname>].indexwriter.batch.term_index_interval

All .indexwriter parameters are Lucene specific and Hibernate Search is just passing these parameters through - see Section 3.10, “Tuning Lucene indexing performance” for more details.

Chapter 7. Index Optimization

From time to time, the Lucene index needs to be optimized. The process is essentially a defragmentation. Until an optimization is triggered Lucene only marks deleted documents as such, no physical deletions are applied. During the optimization process the deletions will be applied which also effects the number of files in the Lucene Directory.

Optimizing the Lucene index speeds up searches but has no effect on the indexation (update) performance. During an optimization, searches can be performed, but will most likely be slowed down. All index updates will be stopped. It is recommended to schedule optimization:

·             on an idle system or when the searches are less frequent

·             after a lot of index modifications

When using a MassIndexer (see Section 6.3.2, “Using a MassIndexer”) it will optimize involved indexes by default at the start and at the end of processing; you can change this behavior by using respectively MassIndexer.optimizeAfterPurge and MassIndexer.optimizeOnFinish.

有时候Lucene的索引需要优化。优化过程的本质就是整理碎片(重组)。在优化操作被触发之前,Lucene知识标记被删除的文档,而不是进行物理上的删除。在优化的过程中,删除操作将会被应用,这将影响Lucene目录中文件的数量。

优化Lucene索引会提高查询的速度,但对于索引和更新操作没有任何性能上的提高。在优化索引期间查询仍然可以进行,但看起来好像更慢了,所有的更新操作将会停止。因此建议有计划的进行索引优化:

?     当系统空闲或者搜索操作很少的时候进行。

?     当进行了大量的所有修改之后进行。

当使用MassIndexer时,它默认会自动优化索引在开始和结束的时候。你可以改变这个行为通过使用MassIndexer.optimizeAfterPurge and MassIndexer.optimizeOnFinish.

 

7.1. Automatic optimization

Hibernate Search can automatically optimize an index after:

·             a certain amount of operations (insertion, deletion)

·             or a certain amount of transactions

The configuration for automatic index optimization can be defined on a global level or per index:

Example 7.1. Defining automatic optimization parameters

hibernate.search.default.optimizer.operation_limit.max = 1000

hibernate.search.default.optimizer.transaction_limit.max = 100

hibernate.search.Animal.optimizer.transaction_limit.max = 50

 

An optimization will be triggered to the Animal index as soon as either:

·             the number of additions and deletions reaches 1000

·             the number of transactions reaches 50 (hibernate.search.Animal.optimizer.transaction_limit.max having priority over hibernate.search.default.optimizer.transaction_limit.max)

If none of these parameters are defined, no optimization is processed automatically.

如果这些参数没有定义,则不会自动进行优化。

7.2. Manual optimization

You can programmatically optimize (defragment) a Lucene index from Hibernate Search through the SearchFactory:

Example 7.2. Programmatic index optimization

FullTextSession fullTextSession = Search.getFullTextSession(regularSession);

SearchFactory searchFactory = fullTextSession.getSearchFactory();

 

searchFactory.optimize(Order.class);

// or

searchFactory.optimize();

 

The first example optimizes the Lucene index holding Orders; the second, optimizes all indexes.

Note

searchFactory.optimize() has no effect on a JMS backend. You must apply the optimize operation on the Master node.

7.3. Adjusting optimization

Apache Lucene has a few parameters to influence how optimization is performed. Hibernate Search exposes those parameters.

Further index optimization parameters include:

·             hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_buffered_docs

·             hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_field_length

·             hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].max_merge_docs

·             hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].merge_factor

·             hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].ram_buffer_size

·             hibernate.search.[default|<indexname>].indexwriter.[batch|transaction].term_index_interval

See Section 3.10, “Tuning Lucene indexing performance” for more details.

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics