Lucene_demo02_分词

ewf_momo

浏览: 681835 次
性别:
来自: 北京

最近访客更多访客>>

paotong

sikewang

wswa

yufei466036941

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Lucene全文索引

lucene 分词

Lucene_demo02_分词

/**
 * 英文的分词器 中文的分词器
 */
public class AnalyzerTest {

	/**
	 * 英文分词：(Lucene自带包)
	 * @throws Exception
	 */
	@Test
	public void testEN() throws Exception {
		String text = "Creates a searcher searching the index in the named directory";
		Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
		this.testAnalyzer(analyzer, text);
	}

	/**
	 * 中文分词：单字分词器(Lucene自带包)
	 * @throws Exception
	 */
	@Test
	public void testCH1() throws Exception {
		String text = "LBJ和韦德能带领热火在2013赛季拿到NBA总冠军吗？";
		Analyzer analyzer = new ChineseAnalyzer();
		this.testAnalyzer(analyzer, text);
	}

	/**
	 * 中文分词：二分法分词器(Lucene自带包)
	 * @throws Exception
	 */
	@Test
	public void testCH2() throws Exception {
		String text = "LBJ和韦德能带领热火在2013赛季拿到NBA总冠军吗";
		Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_30);
		this.testAnalyzer(analyzer, text);
	}

	/**
	 * 中文分词：IK分词器(Lucene自带包)
	 * @throws Exception
	 */
	@Test
	public void testCH3() throws Exception {
		String text = "fasd";
		Analyzer analyzer = new IKAnalyzer();
		this.testAnalyzer(analyzer, text);
	}

	/**
	 * 输出分词后的结果
	 * @param analyzer
	 * @param text
	 * @throws Exception
	 */
	private void testAnalyzer(Analyzer analyzer, String text) throws Exception {
		TokenStream tokenStream = analyzer.tokenStream("content", new StringReader(text));
		tokenStream.addAttribute(TermAttribute.class);
		while (tokenStream.incrementToken()) {
			TermAttribute termAttribute = tokenStream.getAttribute(TermAttribute.class);
			System.out.println(termAttribute.term());
		}
	}
}

分享到：

Lucene_demo00_IndexCURD | Lucene_demo01_FirstProject

2013-06-07 23:43
浏览 1213
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Lucene_demo02_分词

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Lucene_demo02_分词

评论

发表评论

相关推荐

基于 Lucene 的8 个开源搜索引擎

什么是垂直搜索引擎？

搜索引擎的工作原理

Lucene中文分词 “庖丁解牛”

Lucene_demo09_txt文件索引

Lucene_demo08_Hightlighter高亮

Lucene_demo07_Sort匹配度

Lucene简介

Lucene_demo05_内存索引和文件索引

Lucene_demo06_几种搜索

Lucene_demo04_分页

Lucene_demo03_索引库整理

Lucene_demo00_IndexCURD

Lucene_demo01_FirstProject

最近访客更多访客>>