Solr 4.x 全量索引 - 导入并索引数据库数据 - Josh_Persistence

Josh_Persistence

浏览: 1664715 次
性别:
来自: 上海

最近访客更多访客>>

reshinder

maxuanzhao

yh4s

依然任逍遥

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Solr 4.x 全量索引 - 导入并索引数据库数据

博客分类：

Solr

solr4 solr4增量索引 solr4导数据增量索引

以solr4.6.1为例说明。

一、准备工作

1. 拷贝solr-dataimporthandler-4.6.1.jar到Tomcat的Solr lib目录中

在下载的solr的相关目录，如C:\Software\solr-4.6.1\dist中将solr-dataimporthandler-4.6.1.jar拷贝到C:\Software\apache-tomcat-7.0.50\webapps\solr\WEB-INF\lib 中。

并确保mysql的驱动包在Tomcat\lib目录下。

2. 假设在Solr Home（C:\solr-tomcat\solr）中现在有一个Solr core实例,名字为“item”（C:\solr-tomcat\solr\item）当然你可以使用你已有的core实例，或者是新建一个core实例。

二、solrconfig.xml中配置DataImportHandler

修改位于C:\solr-tomcat\solr\item\conf下的solrconfig.xml,加入如下内容：

<requestHandler name="/dataimport"  
       class="org.apache.solr.handler.dataimport.DataImportHandler">    
          <lst name="defaults">    
              <str name="config">C:\solr-tomcat\solr\item\conf\data-config.xml</str>    
          </lst>    
  </requestHandler>

注意C:\solr-tomcat\solr\item\conf\data-config.xml前一定不能有空格，否则会出现“找不到相关资源”类似的Exception：Can't find resource ' C:\solr-tomcat\solr\item\conf\data-config.xml ' in classpath or 'C:\solr-tomcat\solr\item\conf'

三、在solrconfig.xml同级目录中，即在C:\solr-tomcat\solr\item\conf中，增加上面的data-config.xml

在其中定义数据源，定义实体类。

注意：1. URL中 jdbc:mysql://localhost/solr中不能有端口号。

2. 需要将数据库的驱动包放在Tomcat的lib目录下。

3. 将url，username和password换成相应的url,username,password.

4. 其中jdbc:mysql://localhost/solr中的solr是数据库名

5. 其中solr_item是表名。

四、需要在Schema.xml中定义不存在的field的信息（本例中除id外的field都不存在，所以都需要定义）：

<field name="POPULRITY" type="string" indexed="true" stored="true" omitNorms="true"/>  
<field name="NAME" type="string" indexed="true" stored="true" omitNorms="true"/>

五、访问solr管理页面，点击collection item中的Dataimport，你会发现Dataimport中有了刚刚配置的DataImporthandler相关的信息。

勾选实体类并勾选commmit及optimize然后点击execute即开始执行导入。也可以选择进行full-import或者是delta-import.

六、当然你也可以通过代码的方式来进行full-import或者是delta-import. 如下是相关的代码示例。

1. full import：需要注意的是full import将会删除之前所有的索引并重新做索引，所以当数据量大的时候，性能会比较低，善用。

代码片段为：

1）AbstractSolrServer类用于构建Solr的连接

package com.wsheng.solr;


import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.impl.HttpSolrServer;

/**
 * @author Josh Wang(Sheng)
 *
 * @email  josh_wang23@hotmail.com
 */
public class AbstractSolrServer {
	
	/**TODO: Using Spring injection*/
	public static SolrServer server;
	
	static {
          server = new HttpSolrServer("http://localhost:9898/solr/item"); // 其中item为solr core
	}
}

2） SolrImport类用于构建Solr的导入类

package com.wsheng.solr.command;

import com.wsheng.solr.AbstractSolrServer;
import com.wsheng.solr.util.HttpUtils;

/**
 * The API also should be moved to ISolrActionService
 * 
 * @author Josh Wang(Sheng)
 *
 * @email  josh_wang23@hotmail.com
 */
public class SolrImport extends AbstractSolrServer {

	private static String CONFIGURATION_URL = "http://localhost:9898/solr/item/dataimport";
	
	private static String FULL_IMPORT_URL = "http://localhost:9898/solr/item/dataimport?command=full-import";
	
	private static String DELTA_IMPORT_URL = "http://localhost:9898/solr/item/dataimport?command=delta-import";
	
	
	public static String verify() throws Exception {
		return HttpUtils.handleSolrReq(CONFIGURATION_URL);
	}
	
	public static void fullImport() throws Exception {
	
		String result = HttpUtils.handleSolrReq(FULL_IMPORT_URL);
		
		System.out.println("Import Ended " + result);
	}
	
	public static void deltaImport() throws Exception {
		String result = HttpUtils.handleSolrReq(DELTA_IMPORT_URL);
		
		System.out.println("Import Ended " + result);
	}
	
}

3） HttpUtils类为使用jersey的方式来执行Solr的导入等Http请求

package com.wsheng.solr.util;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;
import com.sun.jersey.api.client.WebResource;


/**
 * @author Josh Wang(Sheng)
 * @email swang6@email.com
 */
public class HttpUtils {
    
    private static Log logger = LogFactory.getLog(HttpUtils.class);
    
    public static String handleSolrReq(String url) throws Exception {
            
        Client client = Client.create();
 
        WebResource webResource = client.resource(url);
 
        ClientResponse response = webResource.accept("application/json").get(ClientResponse.class);
 
        if (response.getStatus() != 200) {
        	logger.error("Failed...." + response.getStatus());
        	System.out.println("Failed...." + response.getStatus() + "  " + response.getEntity(String.class));
           throw new RuntimeException("Failed : HTTP error code : " + response.getStatus());
        }
 
        String result = response.getEntity(String.class);
              
        return result;
    }
    
    
}

        @Test
	 public void fullImport() {
		try {
		// full Import will re-index and replace the former ones
			SolrImport.fullImport();
			this.query();
			System.out.println("=============");
			this.queryByManu();
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

执行该Unit Test即可完成索引的索引的全部删除与重新构建。

PS：每次在run玩full-import后可查看solr home下相关index目录中文件的变化：如本例中可查看C:\solr-tomcat\solr\item\data\index folder下文件的修改时间。

2、关于增量索引可访问另一篇博文： Solr 4.x 定时、实时增量索引 - 修改、删除和新增索引

1
顶

0
踩

分享到：

Solr 4.x定时、实时增量索引 - 修改、删除 ... | Android工程下运行main方法的配置方法

2014-02-13 16:16
浏览 4575
评论(1)
分类:企业架构
查看更多

1 楼 bonait 2014-02-13

不错啊我最近也是在研究solr

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Solr 4.x 全量索引 - 导入并索引数据库数据

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Solr 4.x 全量索引 - 导入并索引数据库数据

评论

发表评论

相关推荐

Solr5.x 集成 Tomcat8.x (并新增一个core)

Solr 5.x的搭建（Solr自带的Jetty Server）与mmseg4j中文分词

Solr中的Field、CopyField、DynamicField

深入浅出Solr Cache

Slor5.x与mmseg4j的集成【使用solr自带的Jetty Server】

中文分词中的正向最大匹配与逆向最大匹配

SolrCloud之分布式索引及与Zookeeper的集成

Solr 在mmseg4j中使用中文停止词（的、地、得）

Solr Facet Field (Group by field)

Solr 之Field，CopyField, DynamicField

Solr4.7.0中整合中文分词mmseg4j-1.9.1

Tomcat7中打印Solr（4.7.0）log - 配置log可以看成是我们配置Solr的一个必要操作

Solr4.x（4.7.0）中添加Solr实例(Core)

Tomcat7中搭建Solr 4.x(4.7.0)

Solr 4.x定时、实时增量索引 - 修改、删除和新增索引

最近访客更多访客>>