`
chenchao051
  • 浏览: 135608 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

HBase Backup Options

阅读更多
If you are thinking about using HBase you will likely want to understand HBase backup options.  I know we did, so let us share what we found.  Please let us know what we missed and what you use for HBase backup!

Export

You could export your tables using the Export (org.apache.hadoop.hbase.mapreduce.Export) MapReduce job that will export the table data into a Sequence File on HDFS.  This was implemented in HBASE-1684 if you want to check out the patch or comments there.  This tool works on one table at a time, so if you need to backup multiple tables, run this on each table.  The exported data can then be imported back into HBase by the Import tool.

Copy Table

If you have another HBase cluster that you want to treat as a backup cluster, you can use the handy CopyTable tool to copy a table at a time.

Distcp

You could use Hadoop’s distcp command to copy the whole /hbase directory from one HDFS cluster to the other.  However, this can leave your data in an inconsistent state, so it should be avoided.  See http://search-hadoop.com/m/wkMgSjVLDb

At this point we should point out that all of the above backup methods are per-table.  Moreover, they don’t work or create a snapshot of the table.   Export and CopyTable are atomic only at the row level.  Furthermore, if you have multiple tables whose tables depend on each other, if they are being modified while you are exporting or copying them, you will end up with inconsistent data – the data in those tables will not be in sync.  See http://search-hadoop.com/m/Q4bU81G116p.

Backup from Mozilla

Because of the above mentioned issues with distcp when running it over a cluster whose data is being modified while distcp is running, developers at Mozilla came up with their own Backup tool.  They’ve described the tool and its use in the popular Migrating HBase in the Trenches post.

Cluster Replication

HBase has a relatively new and not yet widely used whole cluster replication mechanism.  The backup cluster does not have to be identical to the master cluster, which means that the backup cluster could be much less powerful and thus cheaper, while still having enough storage to serve as backup.

Table Snapshot

Ah, the infamous HBASE-50!  This issue saw some great work during GSoC 2010, but it looks like it was never integrated into HBase.  It is unclear whether the contributor simply ran out of steam or time or whether it became apparent that table snapshots are too difficult to implement or simply not doable because of highly distributed nature of HBase.  The JIRA issue does contain patches you can look at, and the author has a now inactive hbase-snapshot repository up on Github.

HDFS Replication

You could also simply crank up the replication factor to the level that makes you feel safe and call that a backup.  This may not guard against data corruption, but it does guard against certain partial hardware failures.

Since so many people seem to be asking about HBase backup options, I hope this serves as a good point-in-time snapshot, a summary of all HBase backup options that are currently on the table.  With time, this will be added to the HBase Book.

Are there other HBase backup options we should have included?

What you use for HBase backup?

原文:http://blog.sematext.com/2011/03/11/hbase-backup-options/
另有一篇博文也可以参考一下:http://blog.csdn.net/jingling_zy/article/details/7554676
分享到:
评论

相关推荐

    HbaseTemplate 操作hbase

    java 利用 sping-data-hadoop HbaseTemplate 操作hbase find get execute 等方法 可以直接运行

    pinpoint的hbase初始化脚本hbase-create.hbase

    搭建pinpoint需要的hbase初始化脚本hbase-create.hbase

    java大数据作业_3HBase

    课后作业 1. 请用java集合的代码描述HBase的表结构 2. 请简述HBase中数据写入最后导致Region分裂的全过程 3. 如果设计一个笔记的表,表中要求有笔记的属性和笔记的内容,怎么做 ...10. 如何添加backup的master

    hbase-2.3.5单机一键部署工具

    Options: deploy.sh build single 构建并启动一个hbase单实例 deploy.sh start single 启动hbase实例 deploy.sh stop single 停止hbase实例 deploy.sh check single 检测hbase实例状态 deploy.sh connect ...

    Hbase中文文档

    14.7. HBase Backup 14.8. Capacity Planning 15. 创建和开发 HBase 15.1. HBase 仓库 15.2. IDEs 15.3. 创建 HBase 12-5-30 HBase 官方文档 3/81 abloz.com/hbase/book.htm 15.4. Publishing a new version of ...

    HBase数据库设计.doc

    1. HBase有哪些基本的特征? 1 HBase特征: 1 2. HBase相对于关系数据库能解决的问题是什么? 2 HBase与关系数据的区别? 2 HBase与RDBMS的区别? 2 3. HBase的数据模式是怎么样的?即有哪些元素?如何存储?等 3 1...

    HBase(hbase-2.4.9-bin.tar.gz)

    HBase(hbase-2.4.9-bin.tar.gz)是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System...

    hbase-sdk是基于hbase-client和hbase-thrift的原生API封装的一款轻量级的HBase ORM框架

    hbase-sdk是基于hbase-client和hbase-thrift的原生API封装的一款轻量级的HBase ORM框架。 针对HBase各版本API(1.x~2.x)间的差异,在其上剥离出了一层统一的抽象。并提供了以类SQL的方式来读写HBase表中的数据。对...

    HBase学习利器:HBase实战

    HBase开发实战,HBase学习利器:HBase实战

    hbase-2.2.2单机一键部署工具

    Options: deploy.sh build single 构建并启动一个hbase单实例 deploy.sh start single 启动hbase实例 deploy.sh stop single 停止hbase实例 deploy.sh check single 检测hbase实例状态 deploy.sh connect ...

    Hbase资源整理集合

    HBase 官方文档.pdf HBase的操作和编程.pdf HBase Cpressr优化与实验 郭磊涛.pdf null【HBase】Data Migratin frm Gri t Clu Cmputing - Natural Sienes .pdf 分布式数据库HBase快照的设计与实现.pdf 【HBase】...

    HBase开启审计日志

    HBase开启审计日志

    hbase资料api

    HBASE

    HBase3.0参考指南

    HBase3.0参考指南 This is the official reference guide for the HBase version it ships with. Herein you will find either the definitive documentation on an HBase topic as of its standing when the ...

    hbase 资源合集 hbase 企业应用开发实战 权威指南 hbase 实战 hbase 应用架构

    hbase 资源合集 hbase 企业应用开发实战 权威指南 hbase 实战 hbase 应用架构

    实验三:熟悉常用的HBase操作

    A.3实验三:熟悉常用的HBase操作 本实验对应第5章的内容。 A.3.1 实验目的 (1)理解HBase在Hadoop体系结构中的角色。(2)熟练使用HBase操作常用的 Shell命令。(3)熟悉HBase操作常用的 Java API。 A.3.2 实验平台 (1...

    Hbase基本用法简介

    Hbase shell 、Hbase api、Hbase 配置

    Hbase权威指南(HBase: The Definitive Guide)

    如果你正在寻找一种具备可伸缩性的存储解决方案来适应几乎没有穷尽的数据的话,这本书将可以向你表明apache hbase完全能够满足你的需求。作为google bigtable架构的开源实现,hbase能够支持数以十亿计的记录数和数以...

    HBase海量数据存储实战视频教程

    从HBase的集群搭建、HBaseshell操作、java编程、架构、原理、涉及的数据结构,并且结合陌陌海量消息存储案例来讲解实战HBase 课程亮点 1,知识体系完备,从小白到大神各阶段读者均能学有所获。 2,生动形象,化繁为...

    Hbase2.2.4.rar

    Hbase2.2.4安装包,HBase是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System)所提供...

Global site tag (gtag.js) - Google Analytics