`
diddyrock
  • 浏览: 45307 次
  • 性别: Icon_minigender_1
  • 来自: 上海
文章分类
社区版块
存档分类
最新评论

笔记笔记

阅读更多
inner class fetcher:
323: metadata.set(Nutch.SEGMENT_NAME_KEY, segmentName);

/** Return the set of anchor texts.  Only a single anchor with a given text
   * is permitted from a given domain. */


IndexerMapReduce.reduce:

else if (CrawlDatum.hasFetchStatus(datum)) {
          // don't index unmodified (empty) pages
          if (datum.getStatus() != CrawlDatum.STATUS_FETCH_NOTMODIFIED)
            fetchDatum = datum;


basicfilter////and

IndexerOutputFormat

createLuceneDoc

now p is in title

hadoop 0.19真是爽阿

将额外的需求加载在 html parser里面
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics