`
xxj
  • 浏览: 421912 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

备份一个邮件

阅读更多
Thanks Doğacan Güney's reply:

引用
Nutch supports different analyzers, but it is a bit limited. By
default, documents are analyzed with NutchDocumentAnalyzer. If an
analysis plugin is enabled (such as analysis-fr), and a document is
written a language specified by an analysis plugin, that document is
analyzed by the plugin instead of NutchDocumentAnalyzer.

For example, analysis-fr is for analyzing French documents so if a
document is in French (probably recognized by language-identifier)
then that document is analyzed by analysis-fr, instead of default
analyzer.

So, you can either define a new analyzer plugin or change
NutchDocumentAnalyzer to process documents with a different analyzer.

(PS:As I said before, this scheme is a bit limited and it is actually
one of the things that we want to improve. )
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics