前言
搞检索的,应该多少都会了解Lucene一些,它开源而且简单上手,官方API足够编写些小DEMO。并且根据倒排索引,实现快速检索。本文就简单的实现增量添加索引,删除索引,通过关键字查询,以及更新索引等操作。
目前博猪使用的不爽的地方就是,读取文件内容进行全文检索时,需要自己编写读取过程(这个solr免费帮我们实现)。而且创建索引的过程比较慢,还有很大的优化空间,这个就要细心下来研究了。
创建索引
Lucene在进行创建索引时,根据前面一篇博客,已经讲完了大体的流程,这里再简单说下:
Directory directory = FSDirectory.open("/tmp/testindex"); IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer); IndexWriter iwriter = new IndexWriter(directory, config); Document doc = new Document(); String text = "This is the text to be indexed."; doc.add(new Field("fieldname", text, TextField.TYPE_STORED)); iwriter.close();
1 创建Directory,获取索引目录
2 创建词法分析器,创建IndexWriter对象
3 创建document对象,存储数据
4 关闭IndexWriter,提交
/** * 建立索引 * * @param args */ public static void index() throws Exception { String text1 = "hello,man!"; String text2 = "goodbye,man!"; String text3 = "hello,woman!"; String text4 = "goodbye,woman!"; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open(new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add(new TextField("filename", "text1", Store.YES)); doc1.add(new TextField("content", text1, Store.YES)); indexWriter.addDocument(doc1); Document doc2 = new Document(); doc2.add(new TextField("filename", "text2", Store.YES)); doc2.add(new TextField("content", text2, Store.YES)); indexWriter.addDocument(doc2); Document doc3 = new Document(); doc3.add(new TextField("filename", "text3", Store.YES)); doc3.add(new TextField("content", text3, Store.YES)); indexWriter.addDocument(doc3); Document doc4 = new Document(); doc4.add(new TextField("filename", "text4", Store.YES)); doc4.add(new TextField("content", text4, Store.YES)); indexWriter.addDocument(doc4); indexWriter.commit(); indexWriter.close(); Date date2 = new Date(); System.out.println("创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); }
增量添加索引
Lucene拥有增量添加索引的功能,在不会影响之前的索引情况下,添加索引,它会在何时的时机,自动合并索引文件。
/** * 增加索引 * * @throws Exception */ public static void insert() throws Exception { String text5 = "hello,goodbye,man,woman"; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open(new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add(new TextField("filename", "text5", Store.YES)); doc1.add(new TextField("content", text5, Store.YES)); indexWriter.addDocument(doc1); indexWriter.commit(); indexWriter.close(); Date date2 = new Date(); System.out.println("增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); }
删除索引
Lucene也是通过IndexWriter调用它的delete方法,来删除索引。我们可以通过关键字,删除与这个关键字有关的所有内容。如果仅仅是想要删除一个文档,那么最好就顶一个唯一的ID域,通过这个ID域,来进行删除操作。
/** * 删除索引 * * @param str 删除的关键字 * @throws Exception */ public static void delete(String str) throws Exception { Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open(new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); indexWriter.deleteDocuments(new Term("filename",str)); indexWriter.close(); Date date2 = new Date(); System.out.println("删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); }
更新索引
Lucene没有真正的更新操作,通过某个fieldname,可以更新这个域对应的索引,但是实质上,它是先删除索引,再重新建立的。
/** * 更新索引 * * @throws Exception */ public static void update() throws Exception { String text1 = "update,hello,man!"; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open(new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add(new TextField("filename", "text1", Store.YES)); doc1.add(new TextField("content", text1, Store.YES)); indexWriter.updateDocument(new Term("filename","text1"), doc1); indexWriter.close(); Date date2 = new Date(); System.out.println("更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); }
通过索引查询关键字
Lucene的查询方式有很多种,这里就不做详细介绍了。它会返回一个ScoreDoc的集合,类似ResultSet的集合,我们可以通过域名获取想要获取的内容。
/** * 关键字查询 * * @param str * @throws Exception */ public static void search(String str) throws Exception { directory = FSDirectory.open(new File(INDEX_DIR)); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); DirectoryReader ireader = DirectoryReader.open(directory); IndexSearcher isearcher = new IndexSearcher(ireader); QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer); Query query = parser.parse(str); ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs; for (int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); System.out.println(hitDoc.get("filename")); System.out.println(hitDoc.get("content")); } ireader.close(); directory.close(); }
全部代码
package test; import java.io.File; import java.util.Date; import java.util.List; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.LongField; import org.apache.lucene.document.TextField; import org.apache.lucene.document.Field.Store; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.Term; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class TestLucene { // 保存路径 private static String INDEX_DIR = "D:\\luceneIndex"; private static Analyzer analyzer = null; private static Directory directory = null; private static IndexWriter indexWriter = null; public static void main(String[] args) { try { // index(); search("man"); // insert(); // delete("text5"); // update(); } catch (Exception e) { e.printStackTrace(); } } /** * 更新索引 * * @throws Exception */ public static void update() throws Exception { String text1 = "update,hello,man!"; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open(new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add(new TextField("filename", "text1", Store.YES)); doc1.add(new TextField("content", text1, Store.YES)); indexWriter.updateDocument(new Term("filename","text1"), doc1); indexWriter.close(); Date date2 = new Date(); System.out.println("更新索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); } /** * 删除索引 * * @param str 删除的关键字 * @throws Exception */ public static void delete(String str) throws Exception { Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open(new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); indexWriter.deleteDocuments(new Term("filename",str)); indexWriter.close(); Date date2 = new Date(); System.out.println("删除索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); } /** * 增加索引 * * @throws Exception */ public static void insert() throws Exception { String text5 = "hello,goodbye,man,woman"; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open(new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add(new TextField("filename", "text5", Store.YES)); doc1.add(new TextField("content", text5, Store.YES)); indexWriter.addDocument(doc1); indexWriter.commit(); indexWriter.close(); Date date2 = new Date(); System.out.println("增加索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); } /** * 建立索引 * * @param args */ public static void index() throws Exception { String text1 = "hello,man!"; String text2 = "goodbye,man!"; String text3 = "hello,woman!"; String text4 = "goodbye,woman!"; Date date1 = new Date(); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); directory = FSDirectory.open(new File(INDEX_DIR)); IndexWriterConfig config = new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer); indexWriter = new IndexWriter(directory, config); Document doc1 = new Document(); doc1.add(new TextField("filename", "text1", Store.YES)); doc1.add(new TextField("content", text1, Store.YES)); indexWriter.addDocument(doc1); Document doc2 = new Document(); doc2.add(new TextField("filename", "text2", Store.YES)); doc2.add(new TextField("content", text2, Store.YES)); indexWriter.addDocument(doc2); Document doc3 = new Document(); doc3.add(new TextField("filename", "text3", Store.YES)); doc3.add(new TextField("content", text3, Store.YES)); indexWriter.addDocument(doc3); Document doc4 = new Document(); doc4.add(new TextField("filename", "text4", Store.YES)); doc4.add(new TextField("content", text4, Store.YES)); indexWriter.addDocument(doc4); indexWriter.commit(); indexWriter.close(); Date date2 = new Date(); System.out.println("创建索引耗时:" + (date2.getTime() - date1.getTime()) + "ms\n"); } /** * 关键字查询 * * @param str * @throws Exception */ public static void search(String str) throws Exception { directory = FSDirectory.open(new File(INDEX_DIR)); analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); DirectoryReader ireader = DirectoryReader.open(directory); IndexSearcher isearcher = new IndexSearcher(ireader); QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "content",analyzer); Query query = parser.parse(str); ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs; for (int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); System.out.println(hitDoc.get("filename")); System.out.println(hitDoc.get("content")); } ireader.close(); directory.close(); } }
参考资料
http://www.cnblogs.com/xing901022/p/3933675.html
相关推荐
该项目中包括创建索引,增删改查索引,以及关键字高亮显示实例~~对于初学者很有帮助,该项目是基于Lucene3.0
这是一个Lucene.net的多索引示例,以数据库的动态数据为数据源,每个表对应一个索引,使用了盘古分词方法,适用于中文的分词,并且实现了增删改查的所有功能。 同时,在查询索引时,适用了分页方法,可直接引用到...
lucene增删改查小demo,最近在研究lucene,有些了解,写成小demo的形式和大家共享,完全自己写的,测试可用
基于lucene5.3.1的项目实例 lucene创建索引,删除索引,更新索引,查找索引。
本案例通过.Net MVC4基础上,针对Lucene.Net实现全文检索的应用。通过查询数据表中数据,创建索引,通过统一输入框进行全文检索。可以进行对索引的增删改查功能。
基于lucene 2.4简单的一个索引和搜索实例
lucene3.5的创建和增删改查的工程
全文检索Lucene,书中详细介绍了Lucene的使用方法,和用lucene创建索引的demo,还介绍了检索框架Compass的使用,供大家学习参考!
NULL 博文链接:https://370371029-qq-com.iteye.com/blog/1832474
里面含有lucene全文检索所需要的一些jar包以及中文检索器IKAnalyzer
Lucene创建索引,查询索引的简单使用。
lucene全文检索全面教程,基于JAVA的lucene全文检索全面教程。www.288158.com
[HeyJava][传智播客]全文检索Lucene源码
全文检索 lucene 3.0 叶涛 全文检索 lucene 3.0 叶涛 非常好用.上手极快!
Lucene索引器实例Lucene索引器实例Lucene索引器实例Lucene索引器实例
Lucene全文检索案例Lucene全文检索案例Lucene全文检索案例Lucene全文检索案例
Lucene是apache软件基金会[4] jakarta项目组的一个子项目,是一个开放源代码[5]的全文检索引擎工具包,即它不是一个完整的全文检索引擎,而是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎,部分文本分析...
在Eclipse环境中运用java,Lucene建索引及查询关键字
全文检索(Lucene)Lucene的PDF