`
chenzehe
  • 浏览: 532959 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

Solr 创建索引 XML格式

 
阅读更多

Solr receives commands and possibly document data through HTTP POST.One way to send an HTTP POST is through the Unix command line program curl (also available on Windows through Cygwin: http://www.cygwin.com) and that's what we'll use here in the examples. An alternative cross-platform option that comes with Solr is post.jar located in Solr's example/exampledocs directory. To get some
basic help on how to use it, run the following command:

>> java –jar example/exampledocs/post.jar -help

You'll see in a bit that you can post name-value pair options as HTML form data. However, post.jar doesn't support that, so you'll be forced to specify the URL and put the options in the query string.(打开post.jar包,看到里面只有一个类SimplePostTool用来转发创建索引的,里面确定了solr服务器的URL只能为:public static final String DEFAULT_POST_URL = "http://localhost:8983/solr/update",对于自己部署的solr服务不能使用)

 

There are several ways to tell Solr to index data, and all of them are through  HTTP POST:

•     Send the data as the entire POST payload. curl does this with --data-binary (or some similar options) and an appropriate content-type header for whatever the format is.
•     Send some name-value pairs akin to an HTML form submission. With curl, such pairs are preceded by -F. If you're giving data to Solr to be indexed as opposed to it looking for it in a database, then there are a few ways to  do that:
     ° Put the data into the stream.body parameter. If it's small, perhaps less than a megabyte, then this approach is fine. The limit is configured with the multipartUploadLimitInKB setting in solrconfig.xml, defaulting to 2GB. If you're tempted to increase this limit, you should reconsider your approach.
     ° Refer to the data through either a local file on the Solr server using the stream.file parameter or a URL that Solr will fetch through the stream.url parameter. These choices are a feature that Solr calls
remote streaming.

 

Here is an example of the first choice. Let's say we have a Solr Update-XML file named artists.xml in the current directory. We can post it to Solr using the following command line:

>> curl http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=utf-8' --data-binary @artists.xml

 

If it succeeds, then you'll have output that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
    <int name="status">0</int><int name="QTime">128</int>
</lst>
</response>

 

To use the stream.body feature for the preceding example, you would do this:

curl http://localhost:8983/solr/update -F stream.body=@artists.xml
 

In both cases, the @ character instructs curl to get the data from the file instead of being @artists.xml literally. If the XML is short, then you can just as easily specify it literally on the command line:

curl http://localhost:8983/solr/update -F stream.body='<commit />'
 

Notice the leading space in the value. This was intentional. In this example, curl treats @ and < to mean things we don't want. In this case, it might be more appropriate to use form-string instead of -F. However, it's more typing, and I'm feeling lazy.

 

Remote streaming
In the preceding examples, we've given Solr the data to index in the HTTP message. Alternatively, the POST request can give Solr a pointer to the data in the form of either a file path accessible to Solr or an HTTP URL to it.

 

Just as before, the originating request does not return a response until Solr has finished processing it. If the file is of a decent size or is already at some known URL, then you may find remote streaming faster and/or more convenient, depending on your situation.

 

Here is an example of Solr accessing a local file:

curl http://localhost:8983/solr/update -F stream.file=/tmp/artists.xml
 

To use a URL, the parameter would change to stream.url, and we'd specify a URL. We're passing a name-value parameter (stream.file and the path), not the actual data.

 

Solr's Update-XML format

Using an XML formatted message, you can supply documents to be indexed, tell Solr to commit changes, to optimize the index, and to delete documents. Here is a sample XML file you can HTTP POST to Solr that adds (or replaces) a couple documents:

<add overwrite="true">
  <doc boost="2.0">
    <field name="id">5432a</field>
    <field name="type" ...</field>
    <field name="a_name" boost="0.5"></field>
    <!-- the date/time syntax MUST look just like this -->
    <field name="begin_date">2007-12-31T09:40:00Z</field>
  </doc>
  <doc>
    <field name="id">myid</field>
    <field name="type" ...
    <field name="begin_date">2007-12-31T09:40:00Z</field>
  </doc>
  <!-- more doc elements here as needed -->
</add>

 

The overwrite attribute defaults to true to guarantee the uniqueness of values in the field that you have designated as the unique field in the schema, assuming you have such a field. If you were to add another document that has the same value for the unique field, then this document would overwrite the previous document. You will not get an error.

 

The boost attribute affects the scores of matching documents in order to affect ranking in score-sorted search results. Providing a boost value, whether at the document or field level, is optional. The default value is 1.0, which is effectively a non-boost. Technically, documents are not boosted, only fields are. The effective boost value of a field is that specified for the document multiplied by that specified for the field.

 

Deleting documents

You can delete a document by its unique field. Here we delete two documents:

<delete><id>Artist:11604</id><id>Artist:11603</id></delete>

 To more flexibly specify which documents to delete, you can alternatively use a Lucene/Solr query:

<delete><query>timestamp:[* TO NOW-12HOUR]</query></delete>
 

Commit

Data sent to Solr is not immediately searchable, nor do deletions take immediate effect. Like a database, changes must be committed first. The easiest way to do this is to add a commit=true request parameter to a Solr update URL. The request to Solr could be the same request that contains data to be indexed then committed or an empty request—it doesn't matter. For example, you can visit this URL to issue a commit on our mbreleases core: http://localhost:8983/solr/update?commit=true. You can also commit changes using the XML syntax by simply sending this to Solr:
<commit />

分享到:
评论

相关推荐

    开源企业搜索引擎SOLR的应用教程

    Solr 提供了层面搜索、命中醒目显示并且支持多种输出格式(包括 XML/XSLT 和 JSON 格式)。它易于安装和配置,而且附带了一个基于 HTTP 的管理界面。Solr已经在众多大型的网站中使用,较为成熟和稳定。Solr 包装并...

    solr基础知识介绍

    4.1 创建索引 7 4.1 删除索引 7 4.1 索引查询 8 5.配置文件分析 9 5.1 schema.xml 9 5.1.1 文件分析 9 5.1.2 文档注释 13 5.2 solrconfig.xml 16 6.Solr缓存 18 6.1 filterCache 18 6.2 queryResultCache 18 6.3 ...

    SOLR的应用教程

    4.4 使用 SolrJ 创建索引 52 4.5 Solrj包的结构说明 53 4.5.1 CommonsHttpSolrServer 53 4.5.2 Setting XMLResponseParser 53 4.5.3 Changing other Connection Settings 53 4.5.4 EmbeddedSolrServer 54 5 Solr的...

    solr 企业搜索引擎教程

    更重要的是,Solr 创建的索引与 Lucene 搜索引擎库完全兼容。 通过对 Solr 进行适当的配置, 某些情况下可能需要进行编码,Solr 可以阅读和使用构建到其他 Lucene 应用程序中的索引。此 外,很多 Lucene 工具(如 Nutch、...

    配置好的solr启动环境

    完全配置好的solr容器,直接修改web.xml设置一下solr core路劲即可

    Solr搜索引擎搭建详细过程

    使用Solr进行创建索引和搜索索引的实现方法很简单,如下:*创建索引:客户端(可以是浏览器可以是Java程序)用POST方法向Solr服务器发送一个描述Field及其内容的XML文档,Solr服务器根据xml文档添加、删除、更新索引...

    solr全文检索

    里面有关于solr环境搭建的详细文档说明,还有schema.xml,solrconfig.xml这两个文件里的配置说明,还有创建索引,删除索引的代码。及性能的优化。

    cloudera search官网参考资料

    solr创建索引时,搭建solrcloud,然后添加collection,修改schemad.xml文件,建立索引。

    solar1.4环境配置

    solr-1.4.1 环境配置: 1、 下载所需软件,安装配置...在X:\solr\solr\conf当中有schema.xml文件,可以配置索引数据格式。 5、运行solr 启动Tomcat,访问http://localhost:8983 6、添加索引 7、添加中文分词

    whyte-dwarf:自定义搜寻器和自动Solr索引更新器

    我很快将添加简单的静态Solr适配器类,用于基于Silo的索引(基于域的键)的Example Schema.xml和用于示例使用的搜索表单。 设置Solr由您决定!学分PHP爬网PHPCrawl由sminnee创建 Robots.class.php 机器人由安迪·...

    morphlines.confmorphline-hbase-mapper.xml

    hbase基于solr创建二级索引时需要的两个文件

    Solr的基本使用

    schema.xml,在SolrCore的conf目录下,它是Solr数据表配置文件,它定义了加入索引的数据的数据类型的。主要包括FieldTypes、Fields和其他的一些缺省设置。field:进行索引,需要创建document,document中包括了很多...

    IKAnalyzer2012FF_u1.jar

    1.在/opt/cloudera/parcels/CDH/lib/solr/webapps/solr/WEB-INF创建classes目录 2.把IKAnalyzer.cfg.xml 和 stopword.dic添加到classes目录 3.把IKAnalyzer2012FF_u1.jar添加到/opt/cloudera/parcels/CDH/lib/...

    IKAnalyzer分词器 下载IKAnalyzer2012FF_u1.jar

    1.在/opt/cloudera/parcels/CDH/lib/solr/webapps/solr/WEB-INF创建classes目录 2.把IKAnalyzer.cfg.xml 和 stopword.dic添加到classes目录 3.把IKAnalyzer2012FF_u1.jar添加到/opt/cloudera/parcels/CDH/lib/...

    sunspot_cell:[DEAD]对Sunspot的单元支持

    jar +)位于由Sunspot gem创建的/solr/lib目录中,位于开发环境。 您的生产设置可能会有所不同。 对Solr schema.xml调整: 称=“已忽略”存储 false”索引=“ multiValued =“ true”类=“&gt;和&lt;dynamicField ...

    使用Python操作Elasticsearch数据索引的教程

    Schema free:可以向服务器提交任意结构的JSON对象,Solr中使用schema.xml指定了索引结构; 多索引文件支持:使用不同的index参数就能创建另一个索引文件,Solr中需要另行配置; 分布式:Solr Cloud的配置比较...

    curator:Rails API引擎,通过更多SQL驱动方法来替换Fedora

    当前,所有数据模型均已使用基本路由和json序列化程序创建去做眼镜设置Rubocop设置数据库清理程序设置Solr包装器固定装置设置FactoryBot设置录像机规格单位规格功能规格整合发展创建索引功能(Solr)创建其他...

    dor_indexing_app:斯坦福大学数字对象存储库的索引API

    Dor索引应用 dor_indexing_app是用于...有一个在单个节点上运行的Karaf作业: sulmq-prod-a:/opt/app/karaf/current/deploy/dor_prod_reindexing.xml ,它向solr查询索引最近的项目并对其进行索引 有一条骆驼路线从fedo

Global site tag (gtag.js) - Google Analytics