crawler-beans.cxml - shareHua - ITeye博客

`

shareHua

浏览: 13848 次
性别:
来自: 群：57917725

最近访客更多访客>>

woodding2008

博主相关

博客

微博

相册

收藏

留言

关于我

文章分类

社区版块

存档分类

最新评论

crawler-beans.cxml

博客分类：

heritrix3

阅读更多

1、CrawlMetadata： including identification of crawler/operator
org.archive.modules.CrawlMetadata： Basic crawl metadata, as consulted by functional modules and recorded in ARCs/WARCs.

org.archive.modules.seeds.TextSeedModule

org.archive.modules.deciderules.DecideRuleSequence

org.archive.modules.CandidateChain

org.archive.modules.FetchChain

org.archive.modules.DispositionChain

org.archive.crawler.framework.CrawlController

org.archive.crawler.frontier.BdbFrontier

org.archive.crawler.util.BdbUriUniqFilter

forceRetire

smallBudget

veryPolite

highPrecedence


actionDirectory

crawlLimiter

checkpointService

statisticsTracker

loggerModule

sheetOverlaysManager

cookieStorage

serverCache

configPathConfigurer

分享到：

Processor | Mirroring HTML Files Only

2012-12-11 14:06
浏览 712
评论(0)
分类:互联网
查看更多

评论

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

Heritrix3.0教程使用入门(三) 配置文件crawler-beans.cxml介绍.docx: Heritrix3.0教程使用入门(三) 配置文件crawler-beans.cxml介绍.docx

Python库 | spidy_web_crawler-1.5.3.1-py3-none-any.whl: python库，解压后可用。资源全名：spidy_web_crawler-1.5.3.1-py3-none-any.whl

Python库 | monkey.crawler-1.0.0.dev1-py3-none-any.whl: python库。资源全名：monkey.crawler-1.0.0.dev1-py3-none-any.whl

Renminwang-Message-Crawler-2.rar: 这是配合https://blog.csdn.net/CUFEECR/article/details/104550773的代码和数据，可以用于进行测试和交流学习，不得滥用，违者请自负责任。

PyPI 官网下载 | ckan_crawler-0.1.14-py3-none-any.whl: 资源来自pypi官网。资源全名：ckan_crawler-0.1.14-py3-none-any.whl

PyPI 官网下载 | spidy_web_crawler-1.5.3.1-py3-none-any.whl: 资源来自pypi官网。资源全名：spidy_web_crawler-1.5.3.1-py3-none-any.whl

appcrawler-2.4.0-jar-with-dependencies.jar: app自动化测试工具，能够自动点击ui界面实行测试分析，是移动测试的利器

Python库 | lightnovel_crawler-2.28.10-py3-none-any.whl: 资源分类：Python库所属语言：Python 资源全名：lightnovel_crawler-2.28.10-py3-none-any.whl 资源来源：官方安装方法：https://lanzao.blog.csdn.net/article/details/101784059

Python库 | feedsearch-crawler-0.1.16.tar.gz: python库。资源全名：feedsearch-crawler-0.1.16.tar.gz

PyPI 官网下载 | noizze-crawler-10.tar.gz: 资源来自pypi官网。资源全名：noizze-crawler-10.tar.gz

PyPI 官网下载 | noizze-crawler-8.tar.gz: 资源来自pypi官网。资源全名：noizze-crawler-8.tar.gz

PyPI 官网下载 | koala-crawler-0.0.1.tar.gz: 资源来自pypi官网。资源全名：koala-crawler-0.0.1.tar.gz

crawler-order.zip: crawler-order.zip

TK-crawler.pyTK-crawler.pyTK-crawler.py: TK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_crawler.pyTK_...

Node.js-js-crawler-Node.JS的网络爬虫支持HTTP和HTTPS: js-crawler - Node.JS的网络爬虫，支持HTTP和HTTPS

Python库 | shopee_crawler-0.1.2.tar.gz: python库。资源全名：shopee_crawler-0.1.2.tar.gz

Python库 | inspire-crawler-0.4.2.tar.gz: 资源分类：Python库所属语言：Python 资源全名：inspire-crawler-0.4.2.tar.gz 资源来源：官方安装方法：https://lanzao.blog.csdn.net/article/details/101784059

PyPI 官网下载 | google_news_crawler-0.3.4.tar.gz: 资源来自pypi官网。资源全名：google_news_crawler-0.3.4.tar.gz

Global site tag (gtag.js) - Google Analytics