`
shareHua
  • 浏览: 13848 次
  • 性别: Icon_minigender_1
  • 来自: 群:57917725
社区版块
存档分类
最新评论

crawler-beans.cxml

 
阅读更多
1、CrawlMetadata: including identification of crawler/operator
org.archive.modules.CrawlMetadata:  Basic crawl metadata, as consulted by functional modules and recorded in ARCs/WARCs.

org.archive.modules.seeds.TextSeedModule

org.archive.modules.deciderules.DecideRuleSequence

org.archive.modules.CandidateChain

org.archive.modules.FetchChain

org.archive.modules.DispositionChain

org.archive.crawler.framework.CrawlController

org.archive.crawler.frontier.BdbFrontier

org.archive.crawler.util.BdbUriUniqFilter

forceRetire

smallBudget

veryPolite

highPrecedence

<!--    OPTIONAL BUT RECOMMENDED BEANS  -->
actionDirectory

crawlLimiter

checkpointService

statisticsTracker

loggerModule

sheetOverlaysManager

cookieStorage

serverCache

configPathConfigurer
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics