solr的配置参数理解

huanglz19871030

浏览: 251209 次
性别:
来自: 深圳

最近访客更多访客>>

leisure0422

skangrui

anttu

beisicao

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

搜索引擎(lucene/solr)

l dataDir参数

用于替换默认的索引数据目录（./data）。如果重复指定，将使用重复的值。如果不是绝对路径，将使用servlet容器当前工作目录下的相对路径。

l mainIndex参数部分

mainIndex>

    <!-- lucene options specific to the main on-disk lucene index -->

    <useCompoundFile>false</useCompoundFile>

    <mergeFactor>10</mergeFactor>

    <maxBufferedDocs>1000</maxBufferedDocs>

    <maxMergeDocs>2147483647</maxMergeDocs>

    <maxFieldLength>10000</maxFieldLength>

  </mainIndex>

【mergeFactor】指定同样大小的segment达到多少时会被合并。如果你设置改值为10，那么每当1000（maxBufferedDocs）个doc被添加到索引时（它们可能在内存中），一个新的sgement将在硬盘上创建，当第10个同样大小的segment被创建后，这10个segement 将被合并成一个包含10000（10*1000）个doc的segment。同样当第10个包含10000个doc的segment被创建的时候，他们将合并成更大的segment。当然这种合并并不是无休止的。这是因为下面的参数对其进行了限制。

【maxMergeDocs】每个segment所能容纳的doc数目上限。

【maxFieldLength】指定每个field的最大长度。

l Update Handler 参数部分

这部分通常是关于内部如如何处理update低级配置信息（不要与处理客户端发送的update的Request Handler高级配置信息相混淆）。

<updateHandler class="solr.DirectUpdateHandler2">

    <!-- Limit the number of deletions Solr will buffer during doc updating.

        Setting this lower can help bound memory use during indexing.

-->

    <maxPendingDeletes>100000</maxPendingDeletes>

    <!-- autocommit pending docs if certain criteria are met.  Future versions may expand the available

     criteria -->

    <autoCommit>

      <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit triggered -->

      <maxTime>86000</maxTime> <!-- maximum time (in MS) after adding a doc before an autocommit is triggered -->

    </autoCommit>

l 与更新相关的事件监听器（"Update" Related Event Listeners）

为与特殊更新相关的事件（"postCommit" 和 "postOptimize".）指定监听器。监听器能触发任意的特殊代码，它们的典型应用是快照功能。

...

    <!-- The RunExecutableListener executes an external command.

         exe  - the name of the executable to run

         dir  -  dir to use as the current working directory. default="."

         wait - the calling thread waits until the executable returns.

                default="true"

         args - the arguments to pass to the program.  default=nothing

         env  - environment variables to set.  default=nothing

-->

    <!-- A postCommit event is fired after every commit

-->

    <listener event="postCommit" class="solr.RunExecutableListener">

      <str name="exe">snapshooter</str>

      <str name="dir">solr/bin</str>

      <bool name="wait">true</bool>

      <!--

      <arr name="args"> <str>arg1</str> <str>arg2</str> </arr>

      <arr name="env"> <str>MYVAR=val1</str> </arr>

-->

    </listener>

  </updateHandler>

l 查询参数部分（The Query Section）

控制与查询相关的一切。

<query>

    <!-- Maximum number of clauses in a boolean query... can affect range

         or wildcard queries that expand to big boolean queries.

         An exception is thrown if exceeded.

-->

    <maxBooleanClauses>1024</maxBooleanClauses>

l 缓存参数部分（Caching Section）

当你的索引量增加或变化的时候，你需要在这里进行配置。关于缓存配置的更多细节请点这里。

<!-- Cache used by SolrIndexSearcher for filters (DocSets),

         unordered sets of *all* documents that match a query.

         When a new searcher is opened, its caches may be prepopulated

         or "autowarmed" using data from caches in the old searcher.

         autowarmCount is the number of items to prepopulate.  For LRUCache,

         the autowarmed items will be the most recently accessed items.

       Parameters:

         class - the SolrCache implementation (currently only LRUCache)

         size - the maximum number of entries in the cache

         initialSize - the initial capacity (number of entries) of

           the cache.  (seel java.util.HashMap)

         autowarmCount - the number of entries to prepopulate from

           and old cache.

-->

    <filterCache

      class="solr.LRUCache"

      size="512"

      initialSize="512"

      autowarmCount="256"/>

   <!-- queryResultCache caches results of searches - ordered lists of

         document ids (DocList) based on a query, a sort, and the range

         of documents requested.  -->

    <queryResultCache

      class="solr.LRUCache"

      size="512"

      initialSize="512"

      autowarmCount="256"/>

  <!-- documentCache caches Lucene Document objects (the stored fields for each document).

       Since Lucene internal document ids are transient, this cache will not be autowarmed.  -->

    <documentCache

      class="solr.LRUCache"

      size="512"

      initialSize="512"

      autowarmCount="0"/>

    <!-- Example of a generic cache.  These caches may be accessed by name

         through SolrIndexSearcher.getCache().cacheLookup(), and cacheInsert().

         The purpose is to enable easy caching of user/application level data.

         The regenerator argument should be specified as an implementation

         of solr.search.CacheRegenerator if autowarming is desired.  -->

    <!--

    <cache name="myUserCache"

      class="solr.LRUCache"

      size="4096"

      initialSize="1024"

      autowarmCount="1024"

      regenerator="org.mycompany.mypackage.MyRegenerator"

/>

-->

    <!-- An optimization that attempts to use a filter to satisfy a search.

         If the requested sort does not include a score, then the filterCache

         will be checked for a filter matching the query.  If found, the filter

         will be used as the source of document ids, and then the sort will be

         applied to that.

-->

    <useFilterForSortedQuery>true</useFilterForSortedQuery>

    <!-- An optimization for use with the queryResultCache.  When a search

         is requested, a superset of the requested number of document ids

         are collected.  For example, of a search for a particular query

         requests matching documents 10 through 19, and queryWindowSize is 50,

         then documents 0 through 50 will be collected and cached. Any further

         requests in that range can be satisfied via the cache.

-->

    <queryResultWindowSize>50</queryResultWindowSize>

    <!-- This entry enables an int hash representation for filters (DocSets)

         when the number of items in the set is less than maxSize. For smaller

         sets, this representation is more memory efficient, more efficient to

         iterate over, and faster to take intersections.

-->

    <HashDocSet maxSize="3000" loadFactor="0.75"/>

    <!-- boolToFilterOptimizer converts boolean clauses with zero boost

         cached filters if the number of docs selected by the clause exceeds the

         threshold (represented as a fraction of the total index)

-->

    <boolTofilterOptimizer enabled="true" cacheSize="32" threshold=".05"/>

    <!-- Lazy field loading will attempt to read only parts of documents on disk that are

         requested.  Enabling should be faster if you aren't retrieving all stored fields.

-->

    <enableLazyFieldLoading>false</enableLazyFieldLoading>

l 查询相关的事件监听器参数配置（"Query" Related Event Listeners）

在这里定义与特殊查询相关的事件监听器，使用该监听器实现需要的代码，例如启动常用的查询去预热缓存。

【newSearcher】在有注册搜索器存在的时启动一个新的搜索器，下例中的监听器就是这类，它获得查询列表并将它们发送到新的搜索器以达到预热的目的。

<!-- a newSearcher event is fired whenever a new searcher is being

         prepared and there is a current searcher handling requests

         (aka registered).

-->

    <!-- QuerySenderListener takes an array of NamedList and

         executes a local query request for each NamedList in sequence.

-->

    <!--

    <listener event="newSearcher" class="solr.QuerySenderListener">

      <arr name="queries">

        <lst> <str name="q">solr</str>

              <str name="start">0</str>

              <str name="rows">10</str>

        </lst>

        <lst> <str name="q">rocks</str>

              <str name="start">0</str>

              <str name="rows">10</str>

        </lst>

      </arr>

-->

【firstSearcher】

当不存在已注册的搜索器时启动新的firstSearcher。下例正式如此，该监听器获得查询列表将其发送到正启动的新的搜索器，将其预热。（注意，只有当存在已注册搜索器的时候才可以使用自动预热auto-warming）

<!-- a firstSearcher event is fired whenever a new searcher is being

         prepared but there is no current registered searcher to handle

         requests or to gain prewarming data from.

-->

    <!--

    <listener event="firstSearcher" class="solr.QuerySenderListener">

      <arr name="queries">

        <lst> <str name="q">fast_warm</str>

              <str name="start">0</str>

              <str name="rows">10</str>

        </lst>

      </arr>

    </listener>

分享到：

Searching过程详解 | DataImportHandler从数据库导入大量数据而 ...

2012-07-11 11:35
浏览 1077
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论