`

hadoop 2.x-HDFS snapshot

 
阅读更多

  I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo.

Agenda

 1.what is 

 2.how to 

 3.hadoop snapshot vs hbase snapshot

 4.demos to use snapshot

 

  1.what is

  a long time ago,the term 'snapshot'  was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.

  akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:

  a. a periodic backup 

  b.restore some key data from mistaken deletions

  c.isolutes some important data from product for testing ,comparing etc

 

  and there are some features among this snapshot:

  -no any data to be moved or copied,so the network bandwidth is not affected

  -not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying

 

  2.how to

  benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy  the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.

  for deep study of 'linked data structure' u can check out 'making data structures persistent'

 

  3.hadoop snapshot vs hbase snapshot

  according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:

  hadoop hbase supplement
copy/move data n n  

gen new files refered

to original files

n y

hbase will gen many

temp files to point to the

real hdfs files

       

  so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots.

 

  4.demos to use snapshot

  there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'

 

 

ref:

jira:Support for RW/RO snapshots in HDFS

 

[2]HDFS Snapshots

hbase -tables replication/snapshot/backup within/cross clusters

hadoop-2.x --new features

0
6
分享到:
评论

相关推荐

    hadoop-lzo-0.4.21-SNAPSHOT.jar

    花小钱就能解决的事情,何必自己动手,嘻嘻 ...使用方法:将jar包放在/hadoop-x.x.x/share/hadoop/common下,即可指定lzo压缩方式进行压缩,亲测有效,在flume中使用hdfs sink,即可得到lzo压缩文件

    azkaban-3.72.1.zip

    az-hdfs-viewer/build/distributions/az-hdfs-viewer-0.1.0-SNAPSHOT.tar.gz az-jobsummary/build/distributions/az-jobsummary-0.1.0-SNAPSHOT.tar.gz azkaban-db/build/distributions/azkaban-db-0.1.0-SNAPSHOT....

    styhadoop:大数据相关知识

    hadoop-hdfs hadoop-mapreduce-client-core hive 编程 hiveUDF 程序依赖程序包 groupID org.apache.hive hive-exec hive-common 同时需要hadoop的hadoop-common hiveUDF使用 add jar /home/hadoop/styhadoop-1.0-...

    hfind:查找Hadoop实现

    export HFIND_OPTS = "-Dhfind.hadoop.ugi='pierre\\,pierre' -Dhfind.hadoop.namenode.url=hdfs://namenode.company.com:9000" 然后,您可以从展开的tarball目录中简单运行./hfind: [pierre@mouraf ~/downloads]...

    hadoop hdfs 备份快照

    snapshot为hadoop2.1.0时新增加的功能。 主要为防止用户误删数据,和数据备份的作用 快照的对象为HDFS目录,前提是该目录是可以允许设置快照的(SnapShotable)

    13、HDFS Snapshot快照

    13、HDFS Snapshot快照 网址:https://blog.csdn.net/chenwewi520feng/article/details/130362505 快照(Snapshot)是数据存储的某一时刻的状态记录;与备份不同,备份(Backup)则是数据存储的某一个时刻的副本。 ...

    spark-examples

    spark-examples-0.0.1-SNAPSHOT.jar inputfile.txt 2 从 CDH5 集群中的网关节点运行 JavaWordCount: spark-submit --class com.javachen.spark.wordcount.JavaWordCount --master local \ spark-examples-0.

    hadoop-example:Hadoop MapReduce示例

    hdfs dfs -rm -r / user / huangzehai / wordcount / output hadoop jar target / hadoop-example-1.0-SNAPSHOT.jar huangzehai.mr.wordcout.WordCount / user / huangzehai / wordcount / input / user / ...

    hadoop 2.7.6 eclipse插件

    HDFS: Number of write operations=2 Job Counters Launched map tasks=3 Launched reduce tasks=1 Data-local map tasks=3 Total time spent by all maps in occupied slots (ms)=86763 Total time spent ...

    HiveTestDataGenerator

    将您的DDL查询放在HDFS或S3上,并指定为-q选项,例如)hadoop jar ./HiveTestDataGenerator-1.0-SNAPSHOT.jar -q s3n://path/creation_query.sql -p 3 -c 3 -r 11 用法:hiveTestDataGenerator -c,-每个分区的...

    sequencefile-examples

    序列文件示例 使用序列文件的示例集合 设置: 克隆项目 ...构建项目 cd /tmp/sequencefile-examples && bash -x bin/build.sh 将序列文件写入 HDFS ... hadoop jar target/sequencefile-examples-0.0.1-SNAPSHOT.jar

    flume_interceptor-1.0-SNAPSHOT.jar

    文章Hadoop_16_flume中自定义拦截器的jar包,可以用来测试使用。 在数据采集之后,通过flume的拦截器,实现不需要的数据过滤掉,并将指定的第一个字段进行加密,加密之后再往hdfs上面保存。

    Wikipedia_Graph_Analysis_Single_Source_Shortest_Path_using_Hadoop:Wikipedia_Graph_Analysis_Single_Source_Shortest_Path_using_Hadoop

    通过以分布式方式创建Wikipedia的Webgraph并通过使用BFS(也以分布式方式)求解最短路径来计算从Wikipedia的一页到另一页的最短路径的代码。... hadoop jar target / WikipediaAnalysis-1.0-SNAPSHOT.jar manish.hadoo

    cdh5-demo:CDH5使用示例

    2、运行命令sudo -u hdfs hadoop jar /opt/cloudera/parcels/CDH/jars/cdh5-demo-0.0.1-SNAPSHOT.jar zx.soft.wordcount.WordCountMain /user/hdfs/input /user/hdfs/output1或者sudo -u hdfs hadoop jar /opt/...

    yottabyte:Yottabyte工作室

    优特字节 REPAIR_INPUT_FILE_LOCATION=test/resources/... scp target / hadoop-1.0-SNAPSHOT-jar-with-dependencies.jar YOUR_USERNAME@10.10.5.96:〜 #HDFS命令复制输入文件 hadoop dfs -cp hdfs

    ExcelRecordReaderMapReduce:可以读取Excel文件的MapReduce InputFormat

    ExcelRecordReaderMapReducehadoop ... 不支持Zip文件执行任务为> hadoop jar ExcelRecordReaderMapReduce-0.0.1-SNAPSHOT-jar-with-dependencies.jar出来作业完成后,您可以检查HDFS中输出目录的内容。 hadoop fs

    Spark常用命令

    hdfs dfs -put SpeakerBigData-1.0-SNAPSHOT.jar /spark/hadoop/my_jars/ 2.提交依赖 3.执行任务 /data/software/spark-2.2.1/bin/spark-submit –class com.anker.eufy.device.DeviceRePurchase –master yarn –...

    MapReduce实现倒排索引-可运行的jar包

    运行说明:在linux终端输入 $ hadoop jar test-1.0-SNAPSHOT.jar WordCount /input/* /MyOutput1/ 后两个参数是hdfs上面【输入】的文本文件目录和【输出】目录。 记得清空输出目录。

    mr word count 测试jar

    提交参数 yarn jar kkb-bigdata-hadoop-1.0-SNAPSHOT.jar WordCount /app/word.csv /app/word-count root.default 中间两个参数是 hdfs 输入/输出

    bigdata-analysis-exam

    Hadoop MapReduce 脚步 创建EMR集群并ssh到主节点 ... hadoop jar target/bda-exam-1.0-SNAPSHOT.jar /user/hadoop/input /user/hadoop/output 获取输出文件 hdfs dfs -get output ./ cat output/part-r-00000

Global site tag (gtag.js) - Google Analytics