I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo.
Agenda
1.what is
2.how to
3.hadoop snapshot vs hbase snapshot
4.demos to use snapshot
1.what is
a long time ago,the term 'snapshot' was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.
akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:
a. a periodic backup
b.restore some key data from mistaken deletions
c.isolutes some important data from product for testing ,comparing etc
and there are some features among this snapshot:
-no any data to be moved or copied,so the network bandwidth is not affected
-not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying
2.how to
benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.
for deep study of 'linked data structure' u can check out 'making data structures persistent'
3.hadoop snapshot vs hbase snapshot
according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:
hadoop | hbase | supplement | |
copy/move data | n | n | |
gen new files refered to original files |
n | y |
hbase will gen many temp files to point to the real hdfs files |
so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots.
4.demos to use snapshot
there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'
ref:
jira:Support for RW/RO snapshots in HDFS
hbase -tables replication/snapshot/backup within/cross clusters
相关推荐
这个版本是为Hadoop 2.8.0定制的,这意味着它与Hadoop 2.x系列的兼容性已经过验证,可以在该版本的Hadoop环境中稳定运行。 描述中提到的"Mac下编译的hadoop-lzo"意味着这个版本是在Mac操作系统上编译构建的,这确保...
3. 分布式缓存:为了提高效率,LZO的库文件(包括.lzo文件和相应的索引文件)可以被添加到Hadoop的分布式缓存中,这样每个任务执行时就不需要从HDFS中下载这些文件,从而减少了网络传输开销。 4. 集成测试:压缩库...
在Ranger 2.0.0-SNAPSHOT版本中,包含了对Hadoop分布式文件系统(HDFS)的插件支持,这个名为“ranger-2.0.0-SNAPSHOT-hdfs-plugin”的压缩包正是用于实现这一功能的核心组件。 一、Ranger概述 Ranger提供了一种...
az-hdfs-viewer/build/distributions/az-hdfs-viewer-0.1.0-SNAPSHOT.tar.gz az-jobsummary/build/distributions/az-jobsummary-0.1.0-SNAPSHOT.tar.gz azkaban-db/build/distributions/azkaban-db-0.1.0-SNAPSHOT....
描述中提到的“编译好的hadoop1.x的插件,直接放到eclipse的plugins目录下重启就可以使用了”,指的是将下载或编译得到的Hadoop Eclipse插件(在这个例子中是`hadoop-eclipse-plugin-1.2.2-SNAPSHOT.jar`)复制到...
在大数据技术领域,Hadoop 分布式文件系统(HDFS)是核心组件之一,它为大规模数据存储提供了可扩展和高容错性的解决方案。本实验报告主要关注HDFS的常用操作命令,这些命令是管理员和数据分析师日常工作中不可或缺...
2. **添加依赖**:如果你正在使用Maven或其他构建工具,需要在项目中添加对"hadoop-snappy-0.0.1-SNAPSHOT"的依赖。在pom.xml文件中,你可以添加如下依赖: ```xml <groupId>org.apache.hadoop <artifactId>...
综上所述,"ranger-2.0.1-SNAPSHOT-hdfs-plugin.tar.gz"压缩包包含的是一个正在开发中的Ranger HDFS插件版本,适用于在Linux环境下加强HDFS的安全管理,提供了丰富的权限控制和审计功能。在使用前,需要了解其安装、...
例如,hadoop fs -createSnapshot /hdfs/path/to/directory 12. deleteSnapshot:删除快照 DeleteSnapshot命令用于删除快照。例如,hadoop fs -deleteSnapshot /hdfs/path/to/directory 13. df:显示HDFS文件系统...
* hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'snap_test' -copyto /data/huang_test:将快照导出到 HDFS * clone_snapshot 'snap_test', 'test':将快照恢复到 HBase 表中 五、手动修复 ...
2. **选择压缩格式**: 当创建HDFS文件时,可以选择使用Snappy作为压缩格式,例如通过`-Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec`命令行参数。 3. **读取压缩...
接下来,需要将Hadoop的配置文件hdfs-site.xml、core-site.xml和mapred-site.xml复制到src/main/resources/hadoop目录下。这些配置文件是Hadoop集群的核心配置文件,用于配置Hadoop的各种参数。 在将配置文件复制后...
【HDFS Snapshot快照】是Hadoop分布式文件系统(HDFS)中的一种功能,它提供了在特定时刻对文件系统或特定目录状态的记录。与传统的数据备份不同,快照不创建数据的完整副本,而是记录文件系统在某一时刻的状态,...
在本篇文章中,我们将深入探讨`ranger-2.0.0-SNAPSHOT-ranger-tools.tar.gz`这个压缩包,了解其在与`ranger-admin`配合使用时如何提升Hadoop的安全性。 首先,`ranger-2.0.0-SNAPSHOT-ranger-tools.tar.gz`是Apache...
hadoop-hdfs hadoop-mapreduce-client-core hive 编程 hiveUDF 程序依赖程序包 groupID org.apache.hive hive-exec hive-common 同时需要hadoop的hadoop-common hiveUDF使用 add jar /home/hadoop/styhadoop-1.0-...
总的来说,`ranger-2.0.0-SNAPSHOT-kylin-plugin`是Apache Ranger和Apache Kylin深度整合的关键,它提升了Kylin的数据安全性,提供了细粒度的权限管理和全面的审计功能,对于企业级的大数据分析平台而言,是不可或缺...
总的来说,“ranger-2.0.0-SNAPSHOT-knox-plugin.tar.gz”是Ranger和Knox深度整合的关键,它增强了Hadoop集群的安全性,简化了权限管理,提供了全面的审计功能,是大型企业级Hadoop环境的必备组件。通过理解并正确...
5. **生成可执行文件**:编译成功后,会在target目录下生成`hadoop-3.2.2-SNAPSHOT-bin.tar.gz`这样的文件,解压后将包含`winutils.exe`和`hadoop.dll`。 6. **测试运行**:配置Hadoop的相关环境变量(如HADOOP_...