`

Apache SOLR and Carrot2 integration strategies 1

 
阅读更多

deploy carrot2-webapp

1.  download soucre code

#git clone git://github.com/carrot2/carrot2.git

 

2.compile

#cd carrot2

#ant webapp

 

3.deploy

#cp tmp/webapp/carrot2-webapp.war  /path/to/tomcat/webapps

 

4.configure  carrot2

#cd /path/to/tomcat/webapps/carrot2-webapp/WEB-INF/suites

#mv  suite-webapp.xml    suite-webapp.xml.old

#cp   source-solr.xml     suite-webapp.xml

alter it like this:

<component-suite>
  <sources>
    <source component-class="org.carrot2.source.solr.SolrDocumentSource" id="solr"
            attribute-sets-resource="source-solr-attributes.xml">
      <label>Solr</label>
      <title>Solr Search Engine</title>
      <icon-path>icons/solr.png</icon-path>
      <mnemonic>s</mnemonic>
      <description>Solr document source queries an instance of Apache Solr search engine.</description>
      <example-queries>
        <example-query>test</example-query>
        <example-query>solr</example-query>
      </example-queries>
    </source>
  </sources>

  <include suite="algorithm-lingo.xml"></include>

</component-suite>

 

 4. edit source-solr-attributes.xml

<attribute-sets default="overridden-attributes">
  <attribute-set id="overridden-attributes">
    <value-set>
      <label>overridden-attributes</label>
      <attribute key="SolrDocumentSource.serviceUrlBase">
        <value type="java.lang.String" value="http://192.168.10.204:8983/inokarticle/clustering"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrSummaryFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
      <attribute key="SolrDocumentSource.solrTitleFieldName">
        <value type="java.lang.String" value="content"/>
      </attribute>
    </value-set>
  </attribute-set>
</attribute-sets>

  

5. edit algorithm-lingo-attributes.xml   algorithm-lingo.xml

 

 ----------------------------------------------------

integrate with solr

1. configure solrconfig.xml

a. import related jars

  <lib dir="../contrib/clustering/lib/" regex=".*\.jar" />
  <lib dir="../dist/" regex="solr-clustering-\d.*\.jar" />

 

b. add component  adn clustering requesthandler

 

<searchComponent name="clustering"
                   enable="true"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">lingo</str>
      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>
      <str name="carrot.resourcesDir">clustering/carrot2</str>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>
      <str name="PreprocessingPipeline.tokenizerFactory">org.carrot2.text.linguistic.DefaultTokenizerFactory</str>

      </lst>

  </searchComponent>
<requestHandler name="/clustering"
                  startup="lazy"
                  enable="true"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">lingo</str>
      <bool name="clustering.results">true</bool>
      <!-- Field name with the logical "title" of a each document (optional) -->
      <str name="carrot.title">content</str>
      <!-- Field name with the logical "URL" of a each document (optional) -->
      <str name="carrot.url">id</str>
      <!-- Field name with the logical "content" of a each document (optional) -->
      <str name="carrot.snippet">content</str>
      <!-- Apply highlighter to the title/ content and use this for clustering. -->
      <bool name="carrot.produceSummary">true</bool>
      <!-- the maximum number of labels per cluster -->
      <int name="carrot.numDescriptions">5</int>
      <!-- produce sub clusters -->
      <bool name="carrot.outputSubClusters">true</bool>
      <str name="MultilingualClustering.defaultLanguage">CHINESE_SIMPLIFIED</str>

      <!-- Configure the remaining request handler parameters. -->
      <str name="defType">edismax</str>
      <str name="q.alt">*:*</str>
      <str name="rows">10</str>
      <str name="fl">*,score</str>
    </lst>
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

 

2.custom chinese tokenizer for clustering

a. modify related carrot souce code and recompile

b. copy related jars and lexicon  to solr web lib dir

 

Details see Apache SOLR and Carrot2 integration strategies 2

 

 

 

 

 

 

 

References

http://wiki.apache.org/solr/ClusteringComponent

http://www.cnblogs.com/cy163/archive/2010/05/07/1730172.html

http://carrot2.github.io/solr-integration-strategies/carrot2-3.8.0/index.html

http://download.carrot2.org/head/manual/index.html#section.advanced-topics.building-from-source-code

http://www.cnblogs.com/shm10/p/3700604.html
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics