`

solr1.4配置IKAnalyzer3.2

    博客分类:
  • solr
阅读更多

个人技术博客:http://demi-panda.com

 

搜索引擎相关开源框架,一开始是lucene,这两天看了看solr,下了一个最新版本,配置了下,遇到一些问题。解决了一些,还有一些没有解决,这里将我的一些已经解决的问题,分享给大家。
   1、下载solr1.4  http://apache.freelamp.com/lucene/solr/  (注:这里有及时solr的最新版本)
   2、下载IKAnalyzer3.2.3Stable  http://code.google.com/p/ik-analyzer/downloads/list   (注:这里有IKAnalyzer及时的最新版本,也可附件直接下载)

   3、1.4以前的版本不知道是否要扩展BaseTokenizerFactory 1.4的版本必须扩展BaseTokenizerFactorypackage com.analysis.util;

import java.io.Reader;

import org.apache.lucene.analysis.Tokenizer;
import org.apache.solr.analysis.BaseTokenizerFactory;
import org.wltea.analyzer.lucene.IKAnalyzer;

/**
 * 中文分词
 * @author Denghaiping
 * @date 2010-8-14
 */
public class ChineseTokenizerFactory extends BaseTokenizerFactory
{
 /**
  * 重写父类方法
  */
 public Tokenizer create(Reader input) {
  return (Tokenizer)new IKAnalyzer().tokenStream("text", input);
 }
 
}
5、然后修改schema.xml,粗体为修改部分

  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">

      <analyzer type="index">
       <!-- 默认配置
        <tokenizer class="solr.WhitespaceTokenizerFactory"/> -->
       
        <!-- 添加IKAnalyzer分词 -->
        <tokenizer class="com.analysis.util.ChineseTokenizerFactory" isMaxWordLength="false"/>
    
    
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
          add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
       <!-- 默认配置
        <tokenizer class="solr.WhitespaceTokenizerFactory"/> -->
       
        <!-- 添加IKAnalyzer -->
        <tokenizer class="com.analysis.util.ChineseTokenizerFactory" isMaxWordLength="true"/>
    
    
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>

 6、将它打包放入solr.war中同时还有IK的jar包。如果你不想打包,请去附件下载已经打好的包。或者直接放IK的jar包与所打的包放入apache-tomcat-6.0.26\webapps\solr\WEB-INF\lib下

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics