Solr 创建索引 XML格式 -

chenzehe

浏览: 532959 次
性别:
来自: 杭州

最近访客更多访客>>

bfs198

theseus

omeweb

hdmo19858

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Solr 创建索引 XML格式

博客分类：

Apache Solr

Solr receives commands and possibly document data through HTTP POST.One way to send an HTTP POST is through the Unix command line program curl (also available on Windows through Cygwin: http://www.cygwin.com) and that's what we'll use here in the examples. An alternative cross-platform option that comes with Solr is post.jar located in Solr's example/exampledocs directory. To get some
basic help on how to use it, run the following command:

>> java –jar example/exampledocs/post.jar -help

You'll see in a bit that you can post name-value pair options as HTML form data. However, post.jar doesn't support that, so you'll be forced to specify the URL and put the options in the query string.（打开post.jar包，看到里面只有一个类SimplePostTool用来转发创建索引的，里面确定了solr服务器的URL只能为：public static final String DEFAULT_POST_URL = "http://localhost:8983/solr/update"，对于自己部署的solr服务不能使用）

There are several ways to tell Solr to index data, and all of them are through HTTP POST:

•    Send the data as the entire POST payload. curl does this with --data-binary (or some similar options) and an appropriate content-type header for whatever the format is.
•    Send some name-value pairs akin to an HTML form submission. With curl, such pairs are preceded by -F. If you're giving data to Solr to be indexed as opposed to it looking for it in a database, then there are a few ways to do that:
    ° Put the data into the stream.body parameter. If it's small, perhaps less than a megabyte, then this approach is fine. The limit is configured with the multipartUploadLimitInKB setting in solrconfig.xml, defaulting to 2GB. If you're tempted to increase this limit, you should reconsider your approach.
    ° Refer to the data through either a local file on the Solr server using the stream.file parameter or a URL that Solr will fetch through the stream.url parameter. These choices are a feature that Solr calls
remote streaming.

Here is an example of the first choice. Let's say we have a Solr Update-XML file named artists.xml in the current directory. We can post it to Solr using the following command line:

>> curl http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=utf-8' --data-binary @artists.xml

If it succeeds, then you'll have output that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
    <int name="status">0</int><int name="QTime">128</int>
</lst>
</response>

To use the stream.body feature for the preceding example, you would do this:

curl http://localhost:8983/solr/update -F stream.body=@artists.xml

In both cases, the @ character instructs curl to get the data from the file instead of being @artists.xml literally. If the XML is short, then you can just as easily specify it literally on the command line:

curl http://localhost:8983/solr/update -F stream.body='<commit />'

Notice the leading space in the value. This was intentional. In this example, curl treats @ and < to mean things we don't want. In this case, it might be more appropriate to use form-string instead of -F. However, it's more typing, and I'm feeling lazy.

Remote streaming
In the preceding examples, we've given Solr the data to index in the HTTP message. Alternatively, the POST request can give Solr a pointer to the data in the form of either a file path accessible to Solr or an HTTP URL to it.

Just as before, the originating request does not return a response until Solr has finished processing it. If the file is of a decent size or is already at some known URL, then you may find remote streaming faster and/or more convenient, depending on your situation.

Here is an example of Solr accessing a local file:

curl http://localhost:8983/solr/update -F stream.file=/tmp/artists.xml

To use a URL, the parameter would change to stream.url, and we'd specify a URL. We're passing a name-value parameter (stream.file and the path), not the actual data.

Solr's Update-XML format

Using an XML formatted message, you can supply documents to be indexed, tell Solr to commit changes, to optimize the index, and to delete documents. Here is a sample XML file you can HTTP POST to Solr that adds (or replaces) a couple documents:

<add overwrite="true">
  <doc boost="2.0">
    <field name="id">5432a</field>
    <field name="type" ...</field>
    <field name="a_name" boost="0.5"></field>
    <!-- the date/time syntax MUST look just like this -->
    <field name="begin_date">2007-12-31T09:40:00Z</field>
  </doc>
  <doc>
    <field name="id">myid</field>
    <field name="type" ...
    <field name="begin_date">2007-12-31T09:40:00Z</field>
  </doc>
  <!-- more doc elements here as needed -->
</add>

The overwrite attribute defaults to true to guarantee the uniqueness of values in the field that you have designated as the unique field in the schema, assuming you have such a field. If you were to add another document that has the same value for the unique field, then this document would overwrite the previous document. You will not get an error.

The boost attribute affects the scores of matching documents in order to affect ranking in score-sorted search results. Providing a boost value, whether at the document or field level, is optional. The default value is 1.0, which is effectively a non-boost. Technically, documents are not boosted, only fields are. The effective boost value of a field is that specified for the document multiplied by that specified for the field.

Deleting documents

You can delete a document by its unique field. Here we delete two documents:

<delete><id>Artist:11604</id><id>Artist:11603</id></delete>

To more flexibly specify which documents to delete, you can alternatively use a Lucene/Solr query:

<delete><query>timestamp:[* TO NOW-12HOUR]</query></delete>

Commit

Data sent to Solr is not immediately searchable, nor do deletions take immediate effect. Like a database, changes must be committed first. The easiest way to do this is to add a commit=true request parameter to a Solr update URL. The request to Solr could be the same request that contains data to be indexed then committed or an empty request—it doesn't matter. For example, you can visit this URL to issue a commit on our mbreleases core: http://localhost:8983/solr/update?commit=true. You can also commit changes using the XML syntax by simply sending this to Solr:
<commit />

分享到：

Solr 配置 SchemaXml | Java泛型

2012-05-11 21:09
浏览 1737
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Solr 创建索引 XML格式

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Solr 创建索引 XML格式

评论

发表评论

相关推荐

Solr 搜索不为空值

Solr Searching(一)

Solr 配置 SchemaXml

Solr 创建索引 From DataBase

在Tomcat下部署Solr Example

Solr's home directory and Solr cores

Solr下载包目录结构

Apache Solr Tutorial

最近访客更多访客>>