【原创】HBase 0.98 coprocessor Endpoint实现行数统计 -

dujian.gu

浏览: 23094 次
性别:
来自: 北京

最近访客更多访客>>

aguang110

jansle

limengna845567

u012363178

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

【原创】HBase 0.98 coprocessor Endpoint实现行数统计

博客分类：

HBase

当我们对HBase表中的数据进行一些简单的行数统计或者聚合计算时，如果使用MapReduce或Native API将数据传到客户端进行计算，就会有较大延迟和大量网络IO开销。如果能把这些计算放在Server端，就可以减少网络IO开销，从而获得很好的性能提升。HBase的协处理器可以很好的实现上述想法。

HBase coprocessor 分为两大类，分别是：

1、Observer：类似于观察者模式，提供了Get、Put、Delete、Scan等一些钩子方法。RegionObserver具体又可以分为：RegionObserver、WALObserver和MasterObserver

2、Endpoint：通过RPC调用实现。

下面介绍使用Endpoint实现行数统计。

开发环境：

Hadoop 2.6.0

HBase 0.98.4

实现代码：

1、定义RPC通讯协议(ExampleProtos.proto)

option java_package = "com.iss.gbg.protobuf.proto.generated";
option java_outer_classname = "ExampleProtos";
option java_generic_services = true;
option java_generate_equals_and_hash = true;
option optimize_for = SPEED;

message CountRequest {
}

message CountResponse {
required int64 count = 1 [default = 0];
}

service RowCountService {
rpc getRowCount(CountRequest)
returns (CountResponse);
}

协议的内容不再解析，想了解各行代理什么意思，请查看Protocol Buffers的相关内容。

使用Protocol Buffers的编译器生成Java类

服务器端代码实现：

public class RowCountEndpoint extends ExampleProtos.RowCountService
    implements Coprocessor, CoprocessorService {
  private RegionCoprocessorEnvironment env;

  public RowCountEndpoint() {
  }

  /**
   * Just returns a reference to this object, which implements the RowCounterService interface.
   */
  @Override
  public Service getService() {
    return this;
  }

  /**
   * Returns a count of the rows in the region where this coprocessor is loaded.
   */
  @Override
  public void getRowCount(RpcController controller, ExampleProtos.CountRequest request,
                          RpcCallback<ExampleProtos.CountResponse> done) {
    Scan scan = new Scan();
    scan.setFilter(new FirstKeyOnlyFilter());
    ExampleProtos.CountResponse response = null;
    InternalScanner scanner = null;
    try {
      scanner = env.getRegion().getScanner(scan);
      List<Cell> results = new ArrayList<Cell>();
      boolean hasMore = false;
      byte[] lastRow = null;
      long count = 0;
      do {
        hasMore = scanner.next(results);
        for (Cell kv : results) {
          byte[] currentRow = CellUtil.cloneRow(kv);
          if (lastRow == null || !Bytes.equals(lastRow, currentRow)) {
            lastRow = currentRow;
            count++;
          }
        }
        results.clear();
      } while (hasMore);

      response = ExampleProtos.CountResponse.newBuilder()
          .setCount(count).build();
    } catch (IOException ioe) {
      ResponseConverter.setControllerException(controller, ioe);
    } finally {
      if (scanner != null) {
        try {
          scanner.close();
        } catch (IOException ignored) {}
      }
    }
    done.run(response);
  }


  /**
   * Stores a reference to the coprocessor environment provided by the
   * {@link org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost} from the region where this
   * coprocessor is loaded.  Since this is a coprocessor endpoint, it always expects to be loaded
   * on a table region, so always expects this to be an instance of
   * {@link RegionCoprocessorEnvironment}.
   * @param env the environment provided by the coprocessor host
   * @throws IOException if the provided environment is not an instance of
   * {@code RegionCoprocessorEnvironment}
   */
  @Override
  public void start(CoprocessorEnvironment env) throws IOException {
    if (env instanceof RegionCoprocessorEnvironment) {
      this.env = (RegionCoprocessorEnvironment)env;
    } else {
      throw new CoprocessorException("Must be loaded on a table region!");
    }
  }

  @Override
  public void stop(CoprocessorEnvironment env) throws IOException {
    // nothing to do
  }
}

客户端调用代码：

public class RowCountClient {
	private static final Logger LOG = LoggerFactory.getLogger(RowCountClient.class);
	public static void main(String[] args) {
		HTableInterface htable = null;
		try {
			htable = HBaseServer.getTable("test_crawler_data");
			LOG.info("htable 获取成功！");
			final ExampleProtos.CountRequest request = ExampleProtos.CountRequest.getDefaultInstance();
			
			Map<byte[], Long> results = htable.coprocessorService(ExampleProtos.RowCountService.class, null, null, new Batch.Call<ExampleProtos.RowCountService, Long>() {
				/* (non-Javadoc)
				 * @see org.apache.hadoop.hbase.client.coprocessor.Batch.Call#call(java.lang.Object)
				 */
				@Override
				public Long call(RowCountService counter) throws IOException {
					ServerRpcController controller = new ServerRpcController();
					BlockingRpcCallback<ExampleProtos.CountResponse> rpcCallback = new BlockingRpcCallback<ExampleProtos.CountResponse>();
					counter.getRowCount(controller, request, rpcCallback);
					ExampleProtos.CountResponse response = rpcCallback.get();
					if(controller.failedOnException()) {
						throw controller.getFailedOn();
					}
					return (null != response && response.hasCount())? response.getCount() : 0 ;
					
				}
			});
			if(null != results && !results.isEmpty()) {
				long sum = 0;
				for(Entry<byte[], Long> entry : results.entrySet()) {
					sum += entry.getValue().longValue();
				}
				System.out.println("sum=" + sum);
			}
		} catch (IOException e) {
			
			e.printStackTrace();
		} catch (ServiceException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (Throwable e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally {
			if(null != htable) {
				try {
					htable.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
	}
}

部署：

1、将代码打包发布到HBase的lib上目录，重启HBase即可。

2、给指定表添加endpoint

alter 'test_crawler_data','coprocessor'=>'|com.iss.gbg.protobuf.hbase.RowCountEndpoint|1001|'

查看test_crawler_data的描述如下：

hbase(main):022:0> describe 'test_crawler_data'
DESCRIPTION                                                                                                       ENABLED
'test_crawler_data', {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.hadoop.hbase.coprocessor.example.RowCou true
ntEndpoint|1001|'}, {NAME => 'extdata', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE =
> '0', COMPRESSION => 'NONE', VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FAL
SE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'srcdata', DATA_BLOCK_ENCODING
=> 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', MIN_VERSIONS
=> '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE
=> 'true'}

调用方法：

执行客户端程序即可返回指表的行数。

转载请注明出处：http://dujian-gu.iteye.com/blog/2225032

分享到：

【原创】HBase中列族设计的原则 | maven打包时默认未包含xml文件解决方法

2015-07-07 18:13
浏览 873
评论(0)
分类:数据库
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

【原创】HBase 0.98 coprocessor Endpoint实现行数统计

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

【原创】HBase 0.98 coprocessor Endpoint实现行数统计

评论

发表评论

相关推荐

【原创】HBase如何实现海量数据的毫秒级查询

【原创】HBase中列族设计的原则

最近访客更多访客>>