lucene-2.0.0的基本应用 -

laotu5i0

浏览: 146439 次
性别:
来自: 上海

最近访客更多访客>>

mx25184

womingshi

leader_one

lixiaohui_android

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

lucene-2.0.0的基本应用

博客分类：

java

lucene Bean Spring Apache

首先肯定是建立索引了啊
public void creatIndex() {
File indexDir = new File(getPathIndex());
try {
List<Article> listArticle = getArticleDao().search(null, null,null, null, null, null, null, null, new Boolean(true));
for (int i = 0; i < listArticle.size(); i++) {
Document doc = new Document();
Article article = listArticle.get(i);
//我的配置是追加方式的建立索引，所以为了不重复数据，只好先删除再添加
deleteAllIndex(article);
Field fieldId = new Field("id", article.getId().toString(),
Field.Store.COMPRESS, Field.Index.TOKENIZED,
Field.TermVector.YES);
Field fieldTitles = new Field("title", article.getTitle(),
Field.Store.COMPRESS, Field.Index.TOKENIZED,
Field.TermVector.YES);
//我没有用任何的分析器，所以只好用HTMLParser 把HTML分析成文本在索引
String contentHtml = article.getContent();
Reader read = new StringReader(contentHtml);
HTMLParser htmlParser = new HTMLParser(read);
BufferedReader breader = new BufferedReader(htmlParser.getReader());
String htmlContent = "";
String tempContent = breader.readLine();
while (tempContent != null && tempContent.length() > 0) {
htmlContent = htmlContent + tempContent;
tempContent = breader.readLine();
}
//下面的是Field 我找了半天可是没有找到存储object的方法，本想自己写，可是没时间，就把对象切开放
Field fieldContents = new Field("content", htmlContent,
Field.Store.COMPRESS, Field.Index.TOKENIZED,Field.TermVector.YES);
Field fieldTime = new Field("time", article.getUpdateTime().toString(), Field.Store.YES, Field.Index.TOKENIZED,
Field.TermVector.YES);
Field fieldAuthor = new Field("author", article.getAuthor(), Field.Store.COMPRESS, Field.Index.TOKENIZED,
Field.TermVector.YES);
Field fieldCategory = new Field("category", article.getCategory().getOutsideName(), Field.Store.COMPRESS,
Field.Index.TOKENIZED, Field.TermVector.YES);
String path = "/" + article.getCategory().getCategoryUrl()+ "/" + article.getId() + ".html";
Field fieldPath = new Field("path", path, Field.Store.COMPRESS,
Field.Index.TOKENIZED, Field.TermVector.YES);
doc.add(fieldId);
doc.add(fieldPath);
doc.add(fieldCategory);
doc.add(fieldTime);
doc.add(fieldAuthor);
doc.add(fieldContents);
doc.add(fieldTitles);
indexWriter.addDocument(doc);
}
indexWriter.optimize();
indexWriter.close();}
catch (IOException e) { e.printStackTrace();}
}

到这里索引已经建立了，下面要做的就是搜索
public List<Document> searchDoc(String type, String queryString) {
List<Document> fileList = new ArrayList<Document>();
//其实这里是不需要的，因为lucene默认是调用它的，当然还有另外一个,我这里只是为了下面的高亮显示
Analyzer analyzer = new StandardAnalyzer();
try {
Directory fsDir = FSDirectory.getDirectory(getPathIndex(), false);
IndexSearcher searcher = new IndexSearcher(fsDir);
QueryParser queryParse = new QueryParser(type, analyzer);
Hits hits = searcher.search(queryParse.parse(queryString));
for (int i = 0; i < hits.length(); i++) {
Document doc = hits.doc(i);
String value = doc.get(type);
//对要高亮显示的字段格式化，我这里只是加红色显示和加粗
SimpleHTMLFormatter sHtmlF = new SimpleHTMLFormatter("<b><font color='red'>", "</font></b>");
Highlighter highlighter = new Highlighter(sHtmlF,new QueryScorer(queryParse.parse(queryString)));
highlighter.setTextFragmenter(new SimpleFragmenter(100));
if (value != null) {
TokenStream tokenStream = analyzer.tokenStream(type,new StringReader(value));
Field tempField = new Field(type, highlighter.getBestFragment(tokenStream, value),Field.Store.NO,
Field.Index.TOKENIZED,Field.TermVector.YES);
doc.removeField(type);
doc.add(tempField);
}
//这里取的是Document 对象哦，里面的东西还需要你自己抽取呵，代码我就不写了
fileList.add(doc);
}
searcher.close();
} catch (IOException e) {
e.printStackTrace();
} catch (ParseException e) {
e.printStackTrace();
}
  return fileList;
}

OK，这里索引就做好了，当时我给我们前台程序员说，好了，我给你2个方法，你调用吧。
以为我轻松了吧，其实没有呢，我只是加了一点必要的存储字段，那个兄弟要求高着呢，最后加了很多，后来还要我用多条件查询（网上应该有这样的教材吧，我后来用的是compass实现的，原理是一样）

在这里我好象少了一个东西，呵呵发现了么？没有么？发现了吧呵呵
我用的是spring配置所以没有indexwriter,下面是配置文件
<bean id="indexWriter" class="org.apache.lucene.index.IndexWriter">
  <constructor-arg index="0" type="java.io.File">
   <bean class="java.io.File">
    <constructor-arg value="E:/Projects/netSchool/indexDatas" />
   </bean>
  </constructor-arg>
  <constructor-arg index="1" >
   <bean class="org.apache.lucene.analysis.standard.StandardAnalyzer" />
  </constructor-arg>
  <constructor-arg index="2" type="boolean" value="true"/>
</bean>

分享到：

使用PDFBox处理PDF文档 | Lucene 2.0.0版本和1.4.3版本中关于Field ...

2009-11-09 16:13
浏览 864
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene-2.0.0的基本应用

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

lucene-2.0.0的基本应用

评论

发表评论

相关推荐

微信收货地址共享接口-终极解决

Java中HashMap的实现原理

java注解(annotation)简介

quartz和spring-quartz

Java 线程实例讲解综述

Java Double 精度问题总结

eXtremeComponents的eXtremeTable分页特性

java---final 关键字 和 static 用法

java版的escape和unescape方法

StatSVN的使用说明

Velocity语法

用KeyTool生成安全证书

Spring 注解学习手札

JDK、JRE、JVM的关系

类装载器学习

Tomcat发布项目方法

理解Java ClassLoader机制

cookie和session的工作机制

如何设置Tomcat的JVM虚拟机内存大小

浅谈设置JVM内存分配的几个妙招

最近访客更多访客>>

java---final 关键字和 static 用法