`

solr master-slave replication

    博客分类:
  • solr
阅读更多
文章地址:http://quentinxxz.iteye.com/blog/2102592
    本文所述的索引replication方式是通过Http传输由,由solr1.4引入的。想了解更多Solr1.1引入的基于
 ssh/rsync实现的replication功能,请查询CollectionDistribution。注意Solr4.0的SolrCloud的replication是通过推的方式实现的 ,这种replication方式以后就不再是必要的了。
 (可见看来有三种replication的方式 一种是solr 1.1引入,基于ssh/rsync实现 ,需要额外配置;一种是solr1.4引入的,也就是本文将要介绍的;一种是solr4.0的solrcolud实现的 。)
 
参考文献
http://wiki.apache.org/solr/SolrReplication
 
 

Features 特性

  •   replication无需额外的配置脚本
  •   仅须在solrconifg.xml配置
  •   同时replicate配置文件
  •   相同配置,跨平台使用
  •   不依赖OS
  •  与solr紧密结全,管理界面提供精细的控制

Configuration配置

新的基于java的replication功能是基本于RequestHandler实现的,所以无需外部置配文件。

Master

<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="master">
        <!--Replicate on 'startup' and 'commit'. 'optimize' is also a valid value for replicateAfter. -->
        <str name="replicateAfter">startup</str>
        <str name="replicateAfter">commit</str>
        <!--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string.  Note that this is just for backup, replication does not require this. -->
        <!-- <str name="backupAfter">optimize</str> -->
        <!--If configuration files need to be replicated give the names here, separated by comma -->
        <str name="confFiles">schema.xml,stopwords.txt,elevate.xml</str>
        <!--The default value of reservation is 10 secs.See the documentation below . Normally , you should not need to specify this -->
        <str name="commitReserveDuration">00:00:10</str>
    </lst>
    <!-- keep only 1 backup.  Using this parameter precludes using the "numberToKeep" request parameter. (Solr3.6 / Solr4.0)-->
    <!-- (For this to work in conjunction with "backupAfter" with Solr 3.6.0, see bug fix https://issues.apache.org/jira/browse/SOLR-3361 )-->
    <str name="maxNumberOfBackups">1</str> 
</requestHandler>
 

Slave

 

 

<requestHandler name="/replication" class="solr.ReplicationHandler" >
    <lst name="slave">
        <!--fully qualified url to the master core. It is possible to pass on this as a request param for the fetchindex command-->
        <str name="masterUrl">http://master_host:port/solr/corename</str>
        <!--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically.
         But a fetchindex can be triggered from the admin or the http API -->
        <str name="pollInterval">00:00:20</str>
        <!-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED-->
        <!--to use compression while transferring the index files. The possible values are internal|external
         if the value is 'external' make sure that your master Solr has the settings to honour the accept-encoding header.
         see here for details http://wiki.apache.org/solr/SolrHttpCompression
         If it is 'internal' everything will be taken care of automatically.
         USE THIS ONLY IF YOUR BANDWIDTH IS LOW . THIS CAN ACTUALLY SLOWDOWN REPLICATION IN A LAN-->
        <str name="compression">internal</str>
        <!--The following values are used when the slave connects to the master to download the index files.
         Default values implicitly set as 5000ms and 10000ms respectively. The user DOES NOT need to specify
         these unless the bandwidth is extremely low or if there is an extremely high latency-->
        <str name="httpConnTimeout">5000</str>
        <str name="httpReadTimeout">10000</str>
        <!-- If HTTP Basic authentication is enabled on the master, then the slave can be configured with the following -->
        <str name="httpBasicAuthUser">username</str>
        <str name="httpBasicAuthPassword">password</str>
     </lst>
</requestHandler>

注意:如果你不使用多核,masterUrl中可以略去"corename"参数。为确保url正确,可以在浏览器中输入该url,看看是否返回一个OK response。

Setting up a Repeater 搭建中继器

 
如果slave从远方的数据中心下载索引,就可能导致下载消耗过多的带宽。为了避免类似的性能降级情况,你可以将一个或多个slave配置为repeater。简单地说,一个repeater即是一个slave也是一个master。                                                                                         
  • 将一个server配置 为repteater,master与slave的配置都应列于solrconfing.xml的ReplicationHandler  requestHandler 中去。

  • Be sure to have replicateAfter 'commit' setup on repeater even if replicateAfter is set to optimize on the main master. This is because on a repeater (or any slave), only a commit is called after index is downloaded. Optimize is never called on slaves.
  • 作用可选项,可以将repeater配置为可以从master中获取压缩文件来减少索引下载时间
repeater 配置例子
<requestHandler name="/replication" class="solr.ReplicationHandler">
    <lst name="master">
      <str name="replicateAfter">commit</str>
      <str name="confFiles">schema.xml,stopwords.txt,synonyms.txt</str>
    </lst>
    <lst name="slave">
      <str name="masterUrl">http://master.solr.company.com:8080/solr</str>
      <str name="pollInterval">00:00:60</str>
    </lst>
  </requestHandler> 

enable/disable master/slave in a node切换

如果server需要从一个slave切换为一个master,或者你想对master与slave使用相同的solrconifg.xml。可以进行如下配置 ,

 

<requestHandler name="/replication" class="solr.ReplicationHandler" >
  <lst name="master">
    <str name="enable">${enable.master:false}</str>
    <str name="replicateAfter">commit</str>
    <str name="confFiles">schema.xml,stopwords.txt</str>
 </lst>
 <lst name="slave">
    <str name="enable">${enable.slave:false}</str>
   <str name="masterUrl">http://master_host:8983/solr</str>
   <str name="pollInterval">00:00:60</str>
 </lst>
</requestHandler>
注意:确保fasle是enable.salve与enable.master的缺省配置。
 
注意:如果布署都ok,但你没有看到core name链接,可能是权限问题。
 
master模式

 

#solrcore.properties in master
enable.master=true
enable.slave=false

slave模式

 

#solrcore.properties in slave
enable.master=false
enable.slave=true

 

Replication with MultiCore 多核replication 

为每个核的solrconfig.xml添加request handler (i.e. /usr/share/tomcat6/solr/CORENAME/conf/solrconfig.xml)。 你可以使用 ${solr.core.name} 避免对core name进行硬编码。

Master 配置不变:

<requestHandler name="/replication" class="solr.ReplicationHandler" >
  <lst name="master">
    <str name="replicateAfter">commit</str>
    <str name="confFiles">schema.xml,mapping-ISOLatin1Accent.txt,protwords.txt,stopwords.txt,synonyms.txt,elevate.xml</str>
  </lst>
</requestHandler>

slave 需要 core.name .

<requestHandler name="/replication" class="solr.ReplicationHandler" >
  <lst name="slave">
    <str name="masterUrl">http://${MASTER_CORE_URL}/${solr.core.name}</str>
    <str name="pollInterval">${POLL_TIME}</str>
  </lst>
</requestHandler>

 

How does it work? 工作原理

该功能是依赖于Lucene的IndexDeletionPolicy特性实现的。通过该API,Lucene暴露了IndexCommits作为每次 commit/optimize的回调。 一次IndexCommit 调用暴露了每次commit相关的文件。这便使得我们得以确认哪些文件是需要被复制的。

按照solr传统,所有的操作都是通过Restful API完成的。
 
 

How does the slave replicate?slave如何复制

Master是无法感 知slaves的,salve会持续轮循master(依据'pollInterval参数')检测当前mater的索引版本。如果slave发现master上有更新版本的索引,便发起repliaction过程。

  • Slave发起filelis命令,获得文件列表。该命令返回文件名的同时还会返回metadate(size,lastmodifyed,alias)。
  •  Slave会检查本地的索引。接着会下载缺少的文件(命令名为“filecontent”)。这会使用一种定制的格式(类似HTTP chunked encoding)每个文件的全部或部份内容。如果连接断开,下载会在上一次失败的点继续。在任意点,会尝式5次,如果都失败,就会放弃整个replication.
  • 文件会被下载至一个临时文件夹。所以如果slave或者master在replication过程中宕机了,不会造成任何损失,只是停止了当前的replication。
  • 当下载完成后,所有的新文件被移至slave的活动索引目录,而且文件的时间戳会与master上的一致。
  • Slave的 ReplicationHandler 会发起‘commit’命令,新的索引被加载。

 

How are configuration files replicated?配置文件如何被复制

  • 需要复制的文件被必须被'confFiles'参数显式指明。
  • 只有conf文件夹下的solr实例会被复制。
  • 配置文件只会跟随索引被复制。意味着即使master的一个文件被改变了,只有等到master上一次新的commit/optimize,这些文件才会被复制。
  • 不同于索引文件,配置文件没有时间戳可用,它们会通过校验和被比较。如果 master和slave上的schema.xml的校验和相同,则会被视为相同文件。
  • 配置文件也会在被移至目的文件夹前,被先下载到临时文件,老文件会被重命令,ReplicationHandler 不会自动清理配置文件。

  • 如果replication包含了最新版本的conf文件,对应的solr核会被重新加载,而不是发起一条"commit"命令

What if I add documents to the slave or if slave index gets corrupted?如果向slave添加doucment会或者slave宕机了怎样?

If docs are added to the slave, then the slave is not in sync with the master anymore. But, it does not do anything to keep it in sync with master until the master has a newer index. When a commit happens on the master, the index version of the master will become different from that of the slave. The slave fetches the list of files and finds that some of the files (same name) are there in the local index with a different size/timestamp. This means that the master and slave have incompatible indexes. Slave then copies all the files from master (there may be scope to optimize this, but this is a rare case and may not be worth it) to a new index dir and and asks the core to load the fresh index from the new directory.


 
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics