hbase学习之使用并发的mapper

wiki_more

浏览: 33118 次
性别:
来自: 合肥

最近访客更多访客>>

lp164042318

yanxin64

LaoWang12321

shaogx

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hbase

hbase MultithreadedTableMapper 并发Mapper

   首先hadoop是支持并发的Mapper的，所以hbase没有道理不实现并发的Mapper，这个类是org.apache.hadoop.hbase.mapreduce.MultithreadedTableMapper.

   该类简单理解就是重写了Mapper的run方法

/**
   * Run the application's maps using a thread pool.
   */
  @Override
  public void run(Context context) throws IOException, InterruptedException {
    outer = context;
    int numberOfThreads = getNumberOfThreads(context);
    mapClass = getMapperClass(context);
    if (LOG.isDebugEnabled()) {
      LOG.debug("Configuring multithread runner to use " + numberOfThreads +
          " threads");
    }
    executor = Executors.newFixedThreadPool(numberOfThreads);
    for(int i=0; i < numberOfThreads; ++i) {
      MapRunner thread = new MapRunner(context);
      executor.execute(thread);
    }
    executor.shutdown();
    while (!executor.isTerminated()) {
      // wait till all the threads are done
      Thread.sleep(1000);
    }
  }

以上是源代码，引自hbase-0.94.1

同时，该类内部还实现了一个private的class MapRunner，该MapRunner持有一个mapper变量，而这个mapper就是我们要执行的mapper，而这个mapper是怎么设置进去的呢？

/**
   * Set the application's mapper class.
   * @param <K2> the map output key type
   * @param <V2> the map output value type
   * @param job the job to modify
   * @param cls the class to use as the mapper
   */
  public static <K2,V2>
  void setMapperClass(Job job,
      Class<? extends Mapper<ImmutableBytesWritable, Result,K2,V2>> cls) {
    if (MultithreadedTableMapper.class.isAssignableFrom(cls)) {
      throw new IllegalArgumentException("Can't have recursive " +
          "MultithreadedTableMapper instances.");
    }
    job.getConfiguration().setClass(MAPPER_CLASS,
        cls, Mapper.class);
  }

   以上是源代码，引自hbase-0.94.1
   可以看出，我们要实现并发的Mapper类一定不能是MultithreadedTableMapper 的子类(本人在试验的时候就因为继承了MultithreadedTableMapper 而抛出异常)，通过在提交任务之前调用此静态方法，就可以设定我们真实的Mapper类。

   同时

/**
   * Set the number of threads in the pool for running maps.
   * @param job the job to modify
   * @param threads the new number of threads
   */
  public static void setNumberOfThreads(Job job, int threads) {
    job.getConfiguration().setInt(NUMBER_OF_THREADS,
        threads);
  }

   我们还可以调用该方法来设置并发线程的数目，默认的并发数目是10。

   此外还要注意，我们使用TableMapReduceUtil来initTableMapperJob中的Mapper class必须是MultithreadedTableMapper。

   最后，该类其实还实现了一些其它的内部类和方法来辅助数据的一致性，有兴趣的朋友可以自己看源代码，我这里只抛一个砖。

分享到：

mysql操作 | hive绑定hbase的table

2012-11-18 23:54
浏览 4093
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论