英文原文出处:DissectingTheNutchCrawler
转载本文请注明出处:http://blog.csdn.net/pwlazy
Factory classes: Overview
> Class net.nutch.parser.ParserFactory
> used by:
> - net.nutch.db.WebDBInjector
> - net.nutch.fetcher.Fetcher
> - net.nutch.parser.ParserChecker
>
> Class net.nutch.protocol.ProtocolFactory
> used by:
> - net.nutch.fetcher.Fetcher
> - net.nutch.parser.ParserChecker
>
> Class net.nutch.net.URLFilterFactory
> used by:
> - net.nutch.db.WebDBInjector
> - net.nutch.tools.UpdateDatabaseTool
>
> Class net.nutch.plugin.PluginRepository: used by (Parser/Protocol)Factory
Nutch's ParserFactory and ProtocolFactory classes are the key extension points for the crawler. URLFilterFactory additionally provides an extension point for other components, including WebDBInjector and UpdateDatabaseTool. These "Factory" classes can all be reconfigured by editingXML config files. So before we describe the mechanics of any of the Factory classes, we need take a quick look at Nutch's configuration system.
工厂类概览
net.nutch.parser.ParserFactory 被以下几个类使用
- net.nutch.db.WebDBInjector
- net.nutch.fetcher.Fetcher
- net.nutch.parser.ParserChecker
net.nutch.protocol.ProtocolFactory 被以下几个类使用
- net.nutch.fetcher.Fetcher
- net.nutch.parser.ParserChecker
net.nutch.net.URLFilterFactory 被以下几个类使用
- net.nutch.db.WebDBInjector
- net.nutch.tools.UpdateDatabaseTool
net.nutch.plugin.PluginRepository: 被 (Parser/Protocol)Factory 使用
对于crawler来说 ParserFactory 和 ProtocolFactory 是关键的扩展点
URLFilterFactory 又另外为其他组件(比如WebDBInjector 和 UpdateDatabaseTool)提供了一个扩展点.这些工厂类可以通过编辑xml配置文件重新配置。所以在我们阐述任何一个工厂类的机制之前,我们需要迅速浏览一下nutch的配置系统
分享到:
相关推荐
"Dissecting the Hotspot JVM" 本文档是关于 Java 虚拟机(JVM)的深入分析,作者 Martin Toshev 通过分享 JVM 的架构、实现机理和调试技术,帮助读者更好地理解 JVM,并为其提供了实践经验。 虚拟机基础 虚拟机...
##### Dissecting a Font 分解字体 理解字体是由多个部分组成的:家族名、大小、斜体、加粗等属性。Perl/Tk允许开发者对这些属性进行细致的控制。 ##### Using Fonts 使用字体 通过设置Tk::Font对象,可以为不同...
Offensive Malware Analysis - Dissecting OSXFruitFly Via A Custom C&C Server OSXFruitFly是一种复杂的恶意软件,最初由Malwarebytes发现。该恶意软件使用了自定义的C&C服务器,以绕过传统的安全防护机制。为了...
解剖图像作物这是B. Van Hoorick和C. Vondrick的正式资料库,“解剖图像作物”, arXiv预印本arXiv:2011.11831,2020 。简而言之,我们研究了视觉裁剪留下的痕迹。基本用法说明步骤1:使用高分辨率图像文件填充data...
这份报告“信息安全_数据安全_us-18-Goland-Dissecting-Non-Mali.pdf”主要由研究人员Ido Naor和Dani Goland探讨了一个鲜为人知的问题:非恶意工件(Non-malicious Artifacts)如何导致敏感数据泄露,并提出了如何...
Chapter 11 - Dissecting Classes Chapter 12 - Compositional Design Chapter 13 - Extending Class Functionality Through Inheritance Part III - Implementing Polymorphic Behavior Chapter 14 - Ad ...
Real World Java EE Night Hacks--Dissecting the Business Tier.jpg(电子书的封面图片)
在IT领域,尤其是在软件开发与编程教育中,《Dissecting a C# Application Inside SharpDevelop》是一本具有指导意义的专业书籍,由Christian Holm、Mike Krüger和Bernhard Spuida三位作者共同撰写,于2004年由...
In 2019, the rapid rate at which GPU manufacturers refresh their designs, coupled with their reluctance to disclose microarchitectural details, is still a hurdle for those software designers who want ...
GTC 2018Dissecting the Volta GPU Architecture throughMicrobenchmarkingZhe Jia, Marco Maggioni, Benjamin Staiger, Daniele P. ScarpazzaHigh-Performance Computing Group• Micro-architectural details ...
2018CVPR_Dissecting Person Re-identification from the Viewpoint of Viewpoint
这篇文档主要讨论的是一个关于信息安全和数据安全的主题,特别是在云连接设备,如电动滑板车(E-Scooter)上的应用。演讲者Nikias Bassen是一位来自德国的IT专家,拥有计算机科学学位,并在逆向工程(RE)和安全研究...
### MS11-046: 深度解析零日攻击 #### 摘要 本文将深入探讨一种利用MS11-046漏洞进行的零日攻击,该攻击能够实现权限提升,使攻击者能够在受限用户账户下运行原本无法执行的命令。所涉及的特定漏洞为“MS11-046: ...
H0w t0 R34d Dissecting the Hack: The F0rb1dd3n Network xvii About the Authors xix PART 1 F0RB1DD3N PR010gu3 3 A New Assignment 3 ChAPTeR 0N3 15 Problem Solved 15 Getting Started 21 The Acquisition 22 ...
Dissecting the Activity Building and Running the Activity ■Chapter 4: Using XML-Based Layouts What Is an XML-Based Layout? Why Use XML-Based Layouts? OK, So What Does It Look Like? What’s with ...
《解剖入侵:F0rb1dd3n网络》这本书是由Jayson E. Street、Kent Nabors、Brian Baskin和Marcus Carey共同撰写,并由Dustin D. Trammell担任技术编辑。本书的修订版由Syngress出版社出版,该出版社属于Elsevier旗下,...