`

hadoop 2.x-HDFS snapshot

 
阅读更多

  I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo.

Agenda

 1.what is 

 2.how to 

 3.hadoop snapshot vs hbase snapshot

 4.demos to use snapshot

 

  1.what is

  a long time ago,the term 'snapshot'  was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.

  akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:

  a. a periodic backup 

  b.restore some key data from mistaken deletions

  c.isolutes some important data from product for testing ,comparing etc

 

  and there are some features among this snapshot:

  -no any data to be moved or copied,so the network bandwidth is not affected

  -not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying

 

  2.how to

  benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy  the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.

  for deep study of 'linked data structure' u can check out 'making data structures persistent'

 

  3.hadoop snapshot vs hbase snapshot

  according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:

  hadoop hbase supplement
copy/move data n n  

gen new files refered

to original files

n y

hbase will gen many

temp files to point to the

real hdfs files

       

  so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots.

 

  4.demos to use snapshot

  there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'

 

 

ref:

jira:Support for RW/RO snapshots in HDFS

 

[2]HDFS Snapshots

hbase -tables replication/snapshot/backup within/cross clusters

hadoop-2.x --new features

0
6
分享到:
评论

相关推荐

    hadoop-lzo-0.4.21-SNAPSHOT jars

    这个版本是为Hadoop 2.8.0定制的,这意味着它与Hadoop 2.x系列的兼容性已经过验证,可以在该版本的Hadoop环境中稳定运行。 描述中提到的"Mac下编译的hadoop-lzo"意味着这个版本是在Mac操作系统上编译构建的,这确保...

    hadoop-lzo-0.4.21-SNAPSHOT.jar

    3. 分布式缓存:为了提高效率,LZO的库文件(包括.lzo文件和相应的索引文件)可以被添加到Hadoop的分布式缓存中,这样每个任务执行时就不需要从HDFS中下载这些文件,从而减少了网络传输开销。 4. 集成测试:压缩库...

    ranger-2.0.0-SNAPSHOT-hdfs-plugin.tar.gz

    在Ranger 2.0.0-SNAPSHOT版本中,包含了对Hadoop分布式文件系统(HDFS)的插件支持,这个名为“ranger-2.0.0-SNAPSHOT-hdfs-plugin”的压缩包正是用于实现这一功能的核心组件。 一、Ranger概述 Ranger提供了一种...

    azkaban-3.72.1.zip

    az-hdfs-viewer/build/distributions/az-hdfs-viewer-0.1.0-SNAPSHOT.tar.gz az-jobsummary/build/distributions/az-jobsummary-0.1.0-SNAPSHOT.tar.gz azkaban-db/build/distributions/azkaban-db-0.1.0-SNAPSHOT....

    hadoop1.x的eclipse插件

    描述中提到的“编译好的hadoop1.x的插件,直接放到eclipse的plugins目录下重启就可以使用了”,指的是将下载或编译得到的Hadoop Eclipse插件(在这个例子中是`hadoop-eclipse-plugin-1.2.2-SNAPSHOT.jar`)复制到...

    大数据技术基础实验报告-HDFS常用操作命令.doc

    在大数据技术领域,Hadoop 分布式文件系统(HDFS)是核心组件之一,它为大规模数据存储提供了可扩展和高容错性的解决方案。本实验报告主要关注HDFS的常用操作命令,这些命令是管理员和数据分析师日常工作中不可或缺...

    hadoop-snappy-0.0.1-SNAPSHOT

    2. **添加依赖**:如果你正在使用Maven或其他构建工具,需要在项目中添加对"hadoop-snappy-0.0.1-SNAPSHOT"的依赖。在pom.xml文件中,你可以添加如下依赖: ```xml <groupId>org.apache.hadoop <artifactId>...

    ranger-2.0.1-SNAPSHOT-hdfs-plugin.tar.gz

    综上所述,"ranger-2.0.1-SNAPSHOT-hdfs-plugin.tar.gz"压缩包包含的是一个正在开发中的Ranger HDFS插件版本,适用于在Linux环境下加强HDFS的安全管理,提供了丰富的权限控制和审计功能。在使用前,需要了解其安装、...

    Hadoop fs命令详解.docx

    例如,hadoop fs -createSnapshot /hdfs/path/to/directory 12. deleteSnapshot:删除快照 DeleteSnapshot命令用于删除快照。例如,hadoop fs -deleteSnapshot /hdfs/path/to/directory 13. df:显示HDFS文件系统...

    hbase和hadoop数据块损坏处理

    * hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'snap_test' -copyto /data/huang_test:将快照导出到 HDFS * clone_snapshot 'snap_test', 'test':将快照恢复到 HBase 表中 五、手动修复 ...

    hadoop-snappy的jar包

    2. **选择压缩格式**: 当创建HDFS文件时,可以选择使用Snappy作为压缩格式,例如通过`-Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec`命令行参数。 3. **读取压缩...

    linux下maven在eclipse安装测试Hadoop收集.pdf

    接下来,需要将Hadoop的配置文件hdfs-site.xml、core-site.xml和mapred-site.xml复制到src/main/resources/hadoop目录下。这些配置文件是Hadoop集群的核心配置文件,用于配置Hadoop的各种参数。 在将配置文件复制后...

    13、HDFS Snapshot快照

    【HDFS Snapshot快照】是Hadoop分布式文件系统(HDFS)中的一种功能,它提供了在特定时刻对文件系统或特定目录状态的记录。与传统的数据备份不同,快照不创建数据的完整副本,而是记录文件系统在某一时刻的状态,...

    ranger-2.0.0-SNAPSHOT-ranger-tools.tar.gz

    在本篇文章中,我们将深入探讨`ranger-2.0.0-SNAPSHOT-ranger-tools.tar.gz`这个压缩包,了解其在与`ranger-admin`配合使用时如何提升Hadoop的安全性。 首先,`ranger-2.0.0-SNAPSHOT-ranger-tools.tar.gz`是Apache...

    styhadoop:大数据相关知识

    hadoop-hdfs hadoop-mapreduce-client-core hive 编程 hiveUDF 程序依赖程序包 groupID org.apache.hive hive-exec hive-common 同时需要hadoop的hadoop-common hiveUDF使用 add jar /home/hadoop/styhadoop-1.0-...

    ranger-2.0.0-SNAPSHOT-kylin-plugin.tar.gz

    总的来说,`ranger-2.0.0-SNAPSHOT-kylin-plugin`是Apache Ranger和Apache Kylin深度整合的关键,它提升了Kylin的数据安全性,提供了细粒度的权限管理和全面的审计功能,对于企业级的大数据分析平台而言,是不可或缺...

    ranger-2.0.0-SNAPSHOT-knox-plugin.tar.gz

    总的来说,“ranger-2.0.0-SNAPSHOT-knox-plugin.tar.gz”是Ranger和Knox深度整合的关键,它增强了Hadoop集群的安全性,简化了权限管理,提供了全面的审计功能,是大型企业级Hadoop环境的必备组件。通过理解并正确...

    hadoop3.2 winutils.exe hadoop.dll

    5. **生成可执行文件**:编译成功后,会在target目录下生成`hadoop-3.2.2-SNAPSHOT-bin.tar.gz`这样的文件,解压后将包含`winutils.exe`和`hadoop.dll`。 6. **测试运行**:配置Hadoop的相关环境变量(如HADOOP_...

Global site tag (gtag.js) - Google Analytics