serial no | solutions |
level |
preconditino |
runon |
flow |
advances | shortcomings | use cases |
1 | direct client API | log | - |
transfer data via both clusters |
||||
2 | export/import | log |
|
-src then target |
-mr gen hdfs seq files -transfer files -import with mr |
-support time-range filter | ||
3 | copy table | stream |
src |
case: 1.copy directly data(mem+hfile) to other(IFF cluster to cluster is enable)
2.(IFF cluster to cluster is NOT enable) same as export ,but the last step is:using hdfs put files |
||||
4 | replication | wal | sync wal with new cluster | |||||
5 | bulkload | |||||||
6 | snapshot | file |
-flush before snapshoting if online |
-src then target |
-create snapshot -clone to new table -restore from new table[cluster internal] |
|||
7 | distcp | file |
-src |
-flush memstore -distcp files within both clusters |
-cant copy data with specified date-range; but it can be used as the final step to transfer the target files generated by other solutions -stop hbase before distcp |
|||
now,i want to retrieve last month datum from a table to backup to another cluster,but both clusters cant connected to each other(no MR),so i issued the new steps:
1.subset the table data (last month:2014-06-01--> 2014-06-30)
hbase org.apache.hadoop.hbase.mapreduce.CopyTable -Dhbase.client.scanner.caching=1000 -Dmapred.map.tasks.speculative.execution=false --starttime=1401552000000 --endtime=1404057600000 --new.name=new-tableX tableX
then you MUST flush this table as some data lie on memstores,and the next step will operate on file level directly,
echo "flush 'new-tableX' "|hbase shell
2.download hdfs table hfiles
hadoop fs -get /hbase/new-tableX new-tableX
(of curse u can run extend this command in multi nodes parallelly by subtasking the dirs)
3.transfer these files to other cluster parallelly
a.scp part files to local nodeA,B,C...
b.run scp part-files to peer node of another cluster
(so these will balance the network bandwidth limited by one node for both sides)
4.now import the data to hdfs
hadoop fs -put part-files /hbase
(just mkdir it if nonexists)
5.load these hfiles to meta and assign
hbase hbck -fixMeta
then
hbase hbck -fixAssignments
(try second step one more time to the jude whether table is readable or not)
6.rename the new table to original table[optional]
hbase shell> disable 'tableName' hbase shell> snapshot 'tableName', 'tableSnapshot' hbase shell> clone_snapshot 'tableSnapshot', 'newTableName' hbase shell> delete_snapshot 'tableSnapshot' hbase shell> drop 'tableName'
utility snapshot is supported by 0.94.6+ version,and u can patch your old version also if u have a older one.
some optimized usages in step 1
-mapreduce failure times
-D=mapred.map.max.attempts=2
failure ratio
-D=mapred.max.map.failures.percent=0.05
-close the hlog writing(maybe refactor the Import.Imperter.java)
-decrease the block replication
-D-Ddfs.replication=2 or -D-Ddfs.replication=1
-increase the buffer
-Dhbase.client.write.buffer=10485760
-presplit the new table when created in step 1
{NUMREGIONS => [1], SPLITALGO => 'HexStringSplit'}
[1] hbase -how many regions are fit for a table when prespiting or keeping running
ref:
CDH:introduction-to-apache-hbase-snapshots
jira:snapshot of table (attached principle docs)
复制部分HBase表用于测试 (some tools used java class in shell)
相关推荐
hbase-hbck2-1.1.0-SNAPSHOT.jar
HBase 元数据修复工具包。 ①修改 jar 包中的application.properties,重点是 zookeeper.address、zookeeper.nodeParent、hdfs....③开始修复 `java -jar -Drepair.tableName=表名 hbase-meta-repair-hbase-2.0.2.jar`
https://dlcdn.apache.org/hbase/2.4.11/ hbase-2.4.11 稳定版安装部署包
hbase-sdk是基于hbase-client和hbase-thrift的原生API封装的一款轻量级的HBase ORM框架。 针对HBase各版本API(1.x~2.x)间的差异,在其上剥离出了一层统一的抽象。并提供了以类SQL的方式来读写HBase表中的数据。对...
https://github.com/apache/hbase-connectors/tree/master/spark mvn -Dspark.version=2.4.4 -Dscala.version=2.11.7 -Dscala.binary.version=2.11 clean install
HBCK是HBase1.x中的命令,到了HBase2.x中,HBCK命令不适用,且它的写功能(-fix)已删除;...其GitHub地址为:https://github.com/apache/hbase-operator-tools.git 附件资源是已经编译好的hbase2.4.4版本的hbck
参考:https://blog.csdn.net/yangbutao/article/details/12911487
HBase(hbase-2.4.9-bin.tar.gz)是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System...
hbase的hbase-1.2.0-cdh5.14.2.tar.gz资源包
被编译的hive-hbase-handler-1.2.1.jar,用于在Hive中创建关联HBase表的jar,解决创建Hive关联HBase时报FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop....
该文件为hbase hbck2 jar;适用于hbase 2.x维护,hbase 1.x不适用;对于HBase跨集群HD集群迁移,当HDFS文件迁移后,...当前版本:hbase-hbck2-1.3.0.jarhbase hbck -j /opt/software/hbase-hbck2-1.3.0-SNAPSHOT.jar
hbase-client-2.1.0-cdh6.3.0.jar
phoenix-hbase-2.2-5.1.2-bin.tar.gz
hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz,hbase-1.2.6.1-bin.tar.gz
phoenix-client-hbase-2.2-5.1.2.jar
hbase-1.2.1-bin.tar.gz.zip 提示:先解压再使用,最外层是zip压缩文件
赠送jar包:hbase-hadoop-compat-1.1.3.jar; 赠送原API文档:hbase-hadoop-compat-1.1.3-javadoc.jar; 赠送源代码:hbase-hadoop-compat-1.1.3-sources.jar; 赠送Maven依赖信息文件:hbase-hadoop-compat-1.1.3....
赠送jar包:hbase-prefix-tree-1.1.3.jar; 赠送原API文档:hbase-prefix-tree-1.1.3-javadoc.jar; 赠送源代码:hbase-prefix-tree-1.1.3-sources.jar; 赠送Maven依赖信息文件:hbase-prefix-tree-1.1.3.pom; ...
赠送jar包:hbase-metrics-api-1.4.3.jar; 赠送原API文档:hbase-metrics-api-1.4.3-javadoc.jar; 赠送源代码:hbase-metrics-api-1.4.3-sources.jar; 赠送Maven依赖信息文件:hbase-metrics-api-1.4.3.pom; ...