`
lg70124752
  • 浏览: 61982 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Scanning in HBase

 
阅读更多
In HBASE-5268 I proposed a "prefix delete marker" (a delete marker that would mark a set of columns identified by a column prefix as deleted).

As it turns out, I didn't quite think through how scanning/seeking in HBase works, especially when delete markers are involved. So I thought I'd write about that here.
At the end of the article I hope you will understand why column prefix delete marker that are sorted with the KeyValues that they affect cannot work in HBase.

This blog entry describes how deletion works in HBase. One interesting piece of information not mentioned there is that version and column delete markers are ordered in line together with the KeyValues that they affect and family delete markers are always sorted to the beginning of their row.

Generally each column family is represented by a Store, which manages one or more StoreFiles. Scanning is a form of merge-sort performed by a RegionScanner, which merges results of one or more StoreScanners (one per family), who in turn merge results of one or more StoreFileScanners (one for each file for this family):


                                                    RegionScanner
                                                    /                     \
                               StoreScanner                      StoreScanner
                               /                   \                      /                  \
        StoreFileScanner   StoreFileScanner   StoreFileScanner  StoreFileScanner
                   |                             |                            |                          |
             StoreFile                  StoreFile                StoreFile              StoreFile



Say you performed the following actions (T is time):
put:            row1, family, col1, value1, T
delete family:  row1, family, T+1
put:            row1, family, col1, value2, T+2
delete columns: row1, family, col1, T+3
put:            row1, family, col1, value3, T+4

What we will find in the StoreFile for "family" is this:
family-delete row1, T+1
row1,col1,value3, T+4
column-delete row1,col1, T+3
row1,col1,value2, T+2
row1,col1,value1, T

KeyValues are ordered in reverse chronological order (within their row and column). The family delete marker, however, is always first on the row. That makes sense, because family delete marker affects potentially many columns in this row, so in order to allow scanners to scan forward-only, the family delete markers need to be seen by a scanner first.

That also means that
even if we are only looking for a specific column, we always seek to the beginning of the row to check if there is a family delete with a timestamp that is greater of equal to the versions of column that we are interested in. After that the scanner seeks to the column.
And
even if we are looking for a past version of a column we have to seek to the "beginning" of the column (i.e. the potential delete marker right before it), before we can scan forward to the version we're interested in.
My initial patch for HBASE-5268 would sort the prefix delete markers just like column delete markers.

By now it should be obvious why this does not work.

The beginning of a row is a known point, so it the "beginning" of a column. The beginning of a prefix of a column is not. So to find out whether a column is marked for deletion we would have to start at the beginning of the row and then scan forward to find all prefix delete markers. That clearly is not efficient.

My 2nd attempt placed the all prefix delete markers at the beginning of the row. That technically works. But notice that a column delete marker only has to be retained by the scanner for a short period of time (until after we scanned past all versions that it affects). For prefix delete markers we'd have to keep them into memory until we scanned past all columns that start with the prefix. In addition the number of prefix delete markers for a row is not principally limited.
Family delete markers do not have this problem because (1) the number of column families is limited for other reasons and (2) store files are per family, so all we have to remember for a family in a StoreScanner is a timestamp.

来源http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html
分享到:
评论

相关推荐

    Laser Scanning Systems in Highway and Safety Assessment- Using LiDAR (2020).pdf

    Recent developments in laser scanning technologies have provided innovative solutions for acquiring three-dimensional (3D) point clouds about road corridors and its environments. Unlike traditional ...

    hbase-0.92.1.tar.gz

    10. **Scanning**:HBase提供了一种扫描机制,可以按行或列族进行数据检索,还可以通过过滤器进行条件筛选。 11. **Replication**:HBase 0.92.1版本已经支持数据复制,可以创建集群之间的数据备份,提高数据的可用...

    hbase-0.90.5.rar

    7. **Scanning**:HBase提供了扫描操作,允许用户按行键范围或时间戳获取多行数据,非常适合数据分析。 8. **MapReduce集成**:HBase可以通过MapReduce进行批量处理和分析,与Hadoop生态系统的其他组件无缝配合。 ...

    hbase-0.98.6.1-src.zip

    - **Scanning**:用于批量获取数据,支持过滤器,优化数据检索效率。 - **Compaction**:定期合并region中的StoreFile,减少数据文件数量,提高读取性能。 - **Splitting**:当region过大时,会自动分裂成两个新...

    Hbase-GUI-1.2.3.zip

    5. **Scanning机制**:HBase提供了扫描器(Scanner)功能,可以按行或列范围遍历表中的数据,适用于大数据分析场景。 6. **HBase GUI**:HBase图形用户界面(GUI)是为了简化数据库的管理和监控。它可以提供以下...

    Nmap in the Enterprise: Your Guide to Network Scanning.pdf

    Use Nmap in the enterprise, secure Nmap, optimize Nmap, and master advanced Nmap scanning techniques. . Install, Configure, and Optimize Nmap Deploy Nmap on Windows, Linux, Mac OS X, and install from ...

    hbase期末复习重点

    8. **Scanning**:HBase支持通过行键范围的扫描操作,可以高效地获取一组数据。 9. **Get和Put操作**:Get操作用于获取单行或多行数据,而Put操作用于插入或更新数据。 10. **Compaction**:HBase的Compaction机制...

    hbase-0.90.3.tar.gz

    5. **Scanning**:HBase支持通过Scan操作扫描整个表或部分Region,用于批量数据处理。 6. **MapReduce集成**:HBase可以与Hadoop的MapReduce框架配合,进行批量数据处理和分析。 解压“hbase-0.90.3”后,你会看到...

    Network Scanning Cookbook

    Network Scanning Cookbook contains recipes for configuring these tools in your infrastructure that get you started with scanning ports, services, and devices in your network. As you progress through ...

    ToolManage.rar_scanning in vb.net_条码_记录

    设备收发管理系统.程序用VB.NET+SQL2000编写,功能:用扫描条码记录设备的领取时间,领取人,使用地点,归还时间,归还地点,归还人,自动更新相关信息,还包括设备的录入,修改,删除,领取人的增加,修改,删除信息,还包括设备的...

    kali-linux-network-scanning-cookbook-2nd.pdf【高清文字版】【带书签】

    Over 100 practical recipes that leverage custom *s and integrated tools in Kali Linux to help you effectively master network scanning About This Book ? Learn the fundamentals behind commonly used ...

    Kali Linux Network Scanning Cookbook - Second Edition [2017]

    This step-by-step cookbook on network scanning trains you in important scanning concepts based on version 2016.2. It will enable you to conquer any network environment through a range of network ...

    Kali Linux Network Scanning Cookbook(PACKT,2014)

    Whether you are brand new to Kali Linux or a seasoned veteran, this book will aid in both understanding and ultimately mastering many of the most powerful and useful scanning techniques in the ...

    霍尼韦尔优解条码设置的设置程序EZConfig-Scanning v4_v4.5.35

    EZConfig-Scanning v4_v4.5.35_Setup 霍尼韦尔优解条码设置的设置程序。 英文版

    线激光传手眼标定Calibration technology in application of robot-laser scanning system.pdf

    线激光传感器和机器人工具坐标系的标定技术是现代机器人激光扫描系统的关键组成部分。在机器人技术与激光扫描系统相结合的背景下,传感器与机器人本体之间的相对位置和姿态关系的精确标定,是实现准确测量和扫描的...

    scanning electron microscopy and x-ray microanalysis 4th

    根据提供的文件信息,我们可以了解到一些关于“scanning electron microscopy and x-ray microanalysis 4th”(第四版《扫描电子显微镜与X射线微分析》)的知识点。 首先,从标题来看,“scanning electron ...

    Nmap Network Scanning官方资源完全版 高清PDF

    Nmap Network Scanning官方资源完全版 高清PDF 免积分网盘下载(我级别不够上传大文件) 为开源中国贡献力量!

Global site tag (gtag.js) - Google Analytics