`

lucene3.6.0的扩展搜索

 
阅读更多

自定义排序

IndexSearcher.java  动态计算存储的饭馆离某个位置最近最远
  /** Expert: Low-level search implementation with arbitrary sorting.  Finds
   * the top <code>n</code> hits for <code>query</code>, applying
   * <code>filter</code> if non-null, and sorting the hits by the criteria in
   * <code>sort</code>.
   *
   * <p>Applications should usually call {@link
   * Searcher#search(Query,Filter,int,Sort)} instead.
   * 
   * @throws BooleanQuery.TooManyClauses
   */
  @Override
  public TopFieldDocs search(Weight weight, Filter filter,
      final int nDocs, Sort sort) throws IOException {
    return search(weight, filter, nDocs, sort, true);
  }



SortField.java
  /** Creates a sort with a custom comparison function.
   * @param field Name of field to sort by; cannot be <code>null</code>.
   * @param comparator Returns a comparator for sorting hits.
   */
  public SortField(String field, FieldComparatorSource comparator) {
    initFieldType(field, CUSTOM);
    this.comparatorSource = comparator;
  }

FieldComparatorSource.java
/**
 * Provides a {@link FieldComparator} for custom field sorting.
 *
 * @lucene.experimental
 *
 */
public abstract class FieldComparatorSource implements Serializable {

  /**
   * Creates a comparator for the field in the given index.
   * 
   * @param fieldname
   *          Name of the field to create comparator for.
   * @return FieldComparator.
   * @throws IOException
   *           If an error occurs reading the index.
   */
  public abstract FieldComparator<?> newComparator(String fieldname, int numHits, int sortPos, boolean reversed)
      throws IOException;
}


对查询结果的进一步计算或者处理
Collector.java
* <p><b>NOTE:</b> The doc that is passed to the collect
 * method is relative to the current reader. If your
 * collector needs to resolve this to the docID space of the
 * Multi*Reader, you must re-base it by recording the
 * docBase from the most recent setNextReader call.  Here's
 * a simple example showing how to collect docIDs into a
 * BitSet:</p>
 * 
 * <pre>
 * Searcher searcher = new IndexSearcher(indexReader);
 * final BitSet bits = new BitSet(indexReader.maxDoc());
 * searcher.search(query, new Collector() {
 *   private int docBase;
 * 
 *   <em>// ignore scorer</em>
 *   public void setScorer(Scorer scorer) {
 *   }
 *
 *   <em>// accept docs out of order (for a BitSet it doesn't matter)</em>
 *   public boolean acceptsDocsOutOfOrder() {
 *     return true;
 *   }
 * 
 *   public void collect(int doc) {
 *     bits.set(doc + docBase);
 *   }
 * 
 *   public void setNextReader(IndexReader reader, int docBase) {
 *     this.docBase = docBase;
 *   }
 * });
 * </pre>

扩展QueryParse
1.禁用模糊查询和通配符查询
    /**
   * Builds a new FuzzyQuery instance
   * @param term Term
   * @param minimumSimilarity minimum similarity
   * @param prefixLength prefix length
   * @return new FuzzyQuery Instance
   */
  protected Query newFuzzyQuery(Term term, float minimumSimilarity, int prefixLength) {
    // FuzzyQuery doesn't yet allow constant score rewrite
    return new FuzzyQuery(term,minimumSimilarity,prefixLength);  //去掉改为抛出异常
  }

自定义过滤器,对于搜索结果本身可能会经常变化,导致在某段时间内可能需要过滤掉,某段时间不需要过滤,如果把这个字段加入索引,则可能导致结果不准确。比较好的方案是定义过滤器,可以根据某些特定规则对搜索进行过滤。比如热销书,某本书可能某段时间是热销书,某段时间不是,如果把是否热销书作为一个字段加入索引中,则不太合适,此时可以使用自定义filter计算某个doc是否要过滤掉。
  

/** 
 *  Abstract base class for restricting which documents may
 *  be returned during searching.
 */
public abstract class Filter implements java.io.Serializable {
  
  /**
   * Creates a {@link DocIdSet} enumerating the documents that should be
   * permitted in search results. <b>NOTE:</b> null can be
   * returned if no documents are accepted by this Filter.
   * <p>
   * Note: This method will be called once per segment in
   * the index during searching.  The returned {@link DocIdSet}
   * must refer to document IDs for that segment, not for
   * the top-level reader.
   * 
   * @param reader a {@link IndexReader} instance opened on the index currently
   *         searched on. Note, it is likely that the provided reader does not
   *         represent the whole underlying index i.e. if the index has more than
   *         one segment the given reader only represents a single segment.
   *          
   * @return a DocIdSet that provides the documents which should be permitted or
   *         prohibited in search results. <b>NOTE:</b> null can be returned if
   *         no documents will be accepted by this Filter.
   * 
   * @see DocIdBitSet
   */
  public abstract DocIdSet getDocIdSet(IndexReader reader) throws IOException;
}

DocIdSet是二进制bit位,各bit的位置跟docid对应,如果某个bit设置为1,则会出现在搜索结果中,否则则不会出现在搜索结果。

filterQuery.java使用过滤后的查询,会拼成最终的查询表达式去查询。

性能问题:
1.lucene会在内部把RangeQuery重写booleanQuery来查询,OR查询表达式

如果查询范围超过1024,会抛出 TooManyClauses异常

  /** Thrown when an attempt is made to add more than {@link
   * #getMaxClauseCount()} clauses. This typically happens if
   * a PrefixQuery, FuzzyQuery, WildcardQuery, or TermRangeQuery 
   * is expanded to many terms during search. 
   */
  public static class TooManyClauses extends RuntimeException {
    public TooManyClauses() {
      super("maxClauseCount is set to " + maxClauseCount);
    }
  }
 
分享到:
评论

相关推荐

    lucene-3.6.0.zip

    lucene-3.6.0.zip

    lucene-3.6.0

    开源项目lucene-3.6.0 官网上下下来的源码

    lucene 3.6.0 源代码

    lucene-core-3.6.0-sources 绝对可用

    lucene-3.6.0 api 手册

    lucene-3.6.0 api 手册, 最新的 , lucene 是个好东东, 一直在用, 之前还在使用3.1的,发现已经到3.6了, 落后啊

    IK和Lucene

    IKAnalyzer所有的Jar包以及lucene3.6.0和lucene5.0.0相关的jar包,IKAnalyzer2012兼容lucene3.6.0,IKAnalyzer兼容lucene有限

    lucene-core-3.6.0.jar

    lucene-core-3.6.0.jar,很好,很实用的一个包

    lucene-core-3.6.0.jar.zip

    Lucene是apache软件基金会4 jakarta项目组的一个子项目,是一个开放源代码的全文检索引擎工具包,但它不是一个完整的全文检索引擎,而是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析引擎...

    lucene-highlighter-3.6.0-sources

    lucene-highlighter-3.6.0-sources

    mmseg4j-1.8.5

    详细说明:mmseg 1.8.5的测试分词项目包 和lucene 3.6.0 相配合使用-corresponding test points words mmseg project package and lucene 3.6.0

    lucene的应用程序扩展

    这里是我搜集的一些关于asp.net 所需要的 lucene.net 的一些*.dll 应用程序扩展

    Lucene全文搜索_LuceneJava全文搜索_

    Lucene实现全文搜索,支持英文、模糊和智能查询

    Lucene时间区间搜索

    c#下实现Lucene时间区间查询匹配。主要还是对Lucene查循对像Query的实现

    lucene站内搜索

    主要是关于lucene站内搜索的技术代码,可以使用;按照需要进行代码修改。

    lucene近实时搜索

    lucene 近实时搜索 很清楚的解释了关于lucene近实时搜索的代码。很值得学习

    IKAnalyzer2012_u6和lucene-core-3.6.0

    IKAnalyzer2012_u6和lucene-core-3.6.0两个jar包,互相兼容。ik分词的使用方法可以转https://blog.csdn.net/fr961951149/article/details/98736002。

    基于Lucene的全文搜索引擎研究与应用.pdf

    基于Lucene的全文搜索引擎研究与应用.pdf 详实的介绍Lucene的架构设计分析

    Lucene+nutch搜索引擎开发

    完整的《Lucene+nutch搜索引擎开发》PDF版一共83.6M,无奈我上传的最高限是80M,所以切成两个。这一个是主文件,还需要下载一个副文件Lucene+nutch搜索引擎开发.z01。解压时直接放到一起,解压这个主文件就行了。

    Lucene in Action 中文版

    《Lucene实战 第2版 》基于Apache的Lucene 3 0 从Lucene核心 Lucene应用 案例分析3个方面详细系统地介绍了Lucene 包括认识Lucene 建立索引 为应用程序添加搜索功能 高级搜索技术 扩展搜索 使用Tika提取文本 Lucene的...

    lucene实现全文搜索

    全文检索介绍 索引 分词 Lucene介绍 Lucene应用详解 索引器 检索器 条件查询 实用工具及高亮器 Lucene综合应用——仿搜索引擎

    基于lucene的桌面搜索引擎

    毕业设计:基于lucene的桌面搜索引擎

Global site tag (gtag.js) - Google Analytics